Soatok's Matrix Disclosure, 2024-08-14

Disclosure Timeline

2024-05-15: I took a quick look at the Matrix source code. I identified two issues and emailed them to their security@ email address.In my email, I specify that I plan to disclose my findings publicly in 90 days (i.e. on August 14), in adherence with industry best practices for coordinated disclosure, unless they request an extension in writing.
2024-05-16: I checked something else on a whim and find a third issue, which I also email to their security@ email address.
2024-05-17: Matrix security team confirms receipt of my reports.
2024-05-17: I follow up with a suspected fourth finding–the most critical of them all. They point out that it is not actually an issue, because I overlooked an important detail in how the code is architected. Mea culpa!
2024-05-18: A friend discloses a separate finding with Matrix’s protocol. (Will update this to link to their write-up once it’s public.)They instructed the Matrix developers to consult with me if they needed cryptography guidance. I never heard from them on this externally reported issue.
2024-07-12: I shared this blog post draft with the Matrix security team while reminding them of the public disclosure date.
2024-07-31: Matrix pushes a commit that announces that libolm is deprecated.
2024-07-31: I email the Matrix security team asking if they plan to fix the reported issues (and if not, if there’s any other reason I should withhold publication).
2024-07-31: Matrix confirms they will not fix these issues (due to its now deprecated status), but ask that I withhold publication until the 14th as originally discussed.
2024-08-14: This blog post is publicly disclosed to the Internet.
2024-08-14: The lead Matrix dev claims they already knew about these issues, and, in fact, knowingly shipped cryptography code that was vulnerable to side-channel attacks for years. See Addendum.

Vulnerabilities in Olm

I identified the following issues with Olm through a quick skim of their source code on Gitlab:

This is sorted by the order in which they were discovered, rather than severity.

AES implementation is vulnerable to cache-timing attacks

Olm ships a pure-software implementation of AES, rather than leveraging hardware acceleration.

// Substitutes a word using the AES S-Box.
WORD SubWord(WORD word)
{
    unsigned int result;
 
    result = (int)aes_sbox[(word >> 4) & 0x0000000F][word & 0x0000000F];
    result += (int)aes_sbox[(word >> 12) & 0x0000000F][(word >> 8) & 0x0000000F] << 8;
    result += (int)aes_sbox[(word >> 20) & 0x0000000F][(word >> 16) & 0x0000000F] << 16;
    result += (int)aes_sbox[(word >> 28) & 0x0000000F][(word >> 24) & 0x0000000F] << 24;
    return(result);
}

The code in question is called from this code, which is in turn used to actually encrypt messages.

Software implementations of AES that use a look-up table for the SubWord step of the algorithm are famously susceptible to cache-timing attacks.

This kind of vulnerability in software AES was previously used to extract a secret key from OpenSSL and dm-crypt in about 65 milliseconds. Both papers were published in 2005.

A general rule in cryptography is, “attacks only get better; they never get worse“.

As of 2009, you could remotely detect a timing difference of about 15 microseconds over the Internet with under 50,000 samples. Side-channel exploits are generally statistical in nature, so such a sample size is generally not a significant mitigation.

How is this code actually vulnerable?

In the above code snippet, the vulnerability occurs in aes\_sbox\[/\* ... \*/\]\[/\* ... \*/\].

Due to the details of how the AES block cipher works, the input variable (word) is a sensitive value.

Software written this way allows attackers to detect whether or not a specific value was present in one of the processor’s caches.

To state the obvious: Cache hits are faster than cache misses. This creates an observable timing difference.

Such a timing leak allows the attacker to learn the value that was actually stored in said cache. You can directly learn this from other processes on the same hardware, but it’s also observable over the Internet (with some jitter) through the normal operation of vulnerable software.

How to mitigate this cryptographic side-channel

The correct way to solve this problem is to use hardware accelerated AES, which uses distinct processor features to implement the AES round function and side-steps any cache-timing shenanigans with the S-box.

Not only is this more secure, but it’s faster and uses less energy too!

If you’re also targeting devices that don’t have hardware acceleration available, you should first use hardware acceleration where possible, but then fallback to a bitsliced implementation such as the one in Thomas Pornin’s BearSSL.

Ed25519 signatures are malleable

Ed25519 libraries come in various levels of quality regarding signature validation criteria; much to the chagrin of cryptography engineers everywhere. One of those validation criteria involves signature malleability.

Signature malleability usually isn’t a big deal for most protocols, until suddenly you discover a use case where it is. If it matters, that usually that means you’re doing something with cryptocurrency.

Briefly, if your signatures are malleable, that means you can take an existing valid signature for a given message and public key, and generate a second valid signature for the same message. The utility of this flexibility is limited, and the impact depends a lot on how you’re using signatures and what properties you hope to get out of them.

For ECDSA, this means that for a given signature , a second signature is also possible (where is the order of the elliptic curve group you’re working with).

Matrix uses Ed25519, whose malleability is demonstrated between and .

This is trivially possible because S is implicitly reduced modulo the order of the curve, , which is a 253-bit number (0x1000000000000000000000000000000014def9dea2f79cd65812631a5cf5d3ed) and S is encoded as a 256-bit number.

The Ed25519 library used within Olm does not ensure that , thus signatures are malleable. You can verify this yourself by looking at the Ed25519 verification code.

int ed25519_verify(const unsigned char *signature, const unsigned char *message, size_t message_len, const unsigned char *public_key) {
    unsigned char h[64];
    unsigned char checker[32];
    sha512_context hash;
    ge_p3 A;
    ge_p2 R;
 
    if (signature[63] & 224) {
        return 0;
    }
 
    if (ge_frombytes_negate_vartime(&A, public_key) != 0) {
        return 0;
    }
 
    sha512_init(&hash);
    sha512_update(&hash, signature, 32);
    sha512_update(&hash, public_key, 32);
    sha512_update(&hash, message, message_len);
    sha512_final(&hash, h);
     
    sc_reduce(h);
    ge_double_scalarmult_vartime(&R, h, &A, signature + 32);
    ge_tobytes(checker, &R);
 
    if (!consttime_equal(checker, signature)) {
        return 0;
    }
 
    return 1;
}

This is almost certainly a no-impact finding (or low-impact at worst), but still an annoying one to see in 2024.

If you’d like to learn more, this page is a fun demo of Ed25519 malleability.

To mitigate this, I recommend implementing these checks from libsodium.

Timing leakage in base64 decoding of private key material

If you weren’t already tired of cache-timing attacks based on table look-ups from AES, the Matrix base64 code is also susceptible to the same implementation flaw.

while (pos != end) {
    unsigned value = DECODE_BASE64[pos[0] & 0x7F];
    value <<= 6; value |= DECODE_BASE64[pos[1] & 0x7F];
    value <<= 6; value |= DECODE_BASE64[pos[2] & 0x7F];
    value <<= 6; value |= DECODE_BASE64[pos[3] & 0x7F];
    pos += 4;
    output[2] = value;
    value >>= 8; output[1] = value;
    value >>= 8; output[0] = value;
    output += 3;
}

The base64 decoding function in question is used to load the group session key, which means the attack published in this paper almost certainly applies.

How would you mitigate this leakage?

Steve Thomas (one of the judges of the Password Hashing Competition, among other noteworthy contributions) wrote some open source code a while back that implements base64 encoding routines in constant-time.

The real interesting part is how it avoids a table look-up by using arithmetic (from this file):

// Base64 character set:
// [A-Z]      [a-z]      [0-9]      +     /
// 0x41-0x5a, 0x61-0x7a, 0x30-0x39, 0x2b, 0x2f
 
inline int base64Decode6Bits(char src)
{
    int ch  = (unsigned char) src;
    int ret = -1;
 
    // if (ch > 0x40 && ch < 0x5b) ret += ch - 0x41 + 1; // -64
    ret += (((0x40 - ch) & (ch - 0x5b)) >> 8) & (ch - 64);
 
    // if (ch > 0x60 && ch < 0x7b) ret += ch - 0x61 + 26 + 1; // -70
    ret += (((0x60 - ch) & (ch - 0x7b)) >> 8) & (ch - 70);
 
    // if (ch > 0x2f && ch < 0x3a) ret += ch - 0x30 + 52 + 1; // 5
    ret += (((0x2f - ch) & (ch - 0x3a)) >> 8) & (ch + 5);
 
    // if (ch == 0x2b) ret += 62 + 1;
    ret += (((0x2a - ch) & (ch - 0x2c)) >> 8) & 63;
 
    // if (ch == 0x2f) ret += 63 + 1;
    ret += (((0x2e - ch) & (ch - 0x30)) >> 8) & 64;
 
    return ret;
}

Any C library that handles base64 codecs for private key material should use a similar implementation. It’s fine to have a faster base64 implementation for non-secret data.

Worth noting: Libsodium also provides a reasonable Base64 codec.

Matrix Client	Cryptography Backend
https://github.com/tulir/gomuks	libolm (1, 2)
https://github.com/niochat/nio	libolm (1, 2)
https://github.com/ulyssa/iamb	vodozemac (1, 2)
https://github.com/mirukana/mirage	libolm (1)
https://github.com/Pony-House/Client	libolm (1)
https://github.com/MTRNord/cetirizine	vodozemac (1)
https://github.com/nadams/go-matrixcli	none
https://github.com/mustang-im/mustang	libolm (1))
https://github.com/marekvospel/libretrix	libolm (1)
https://github.com/yusdacra/icy_matrix	none
https://github.com/ierho/element	libolm (through the python SDK)
https://github.com/mtorials/cordless	none
https://github.com/hwipl/nuqql-matrixd	libolm (through the python SDK
https://github.com/maxkratz/element-web	vodozemac (1, 2, 3, 4)
https://github.com/asozialesnetzwerk/riot	libolm (wasm file)
https://github.com/NotAlexNoyle/Versi	libolm (1, 2)

soatok/Disclosure-Timeline.md