Whoo! This was a fun challenge with loads to learn. :)
The given code level11.c
checks whether the two inputs (as argv[1]
and argv[2]
) both MD5 hash to the same value or not. If they do, it uses both inputs as brainfuck code, and executes them. Then it checks if the outputs differ. Upon differing outputs, they are checked against the strings "io.sts Rules!"
and "io.sts Sucks!"
. If prog1's output is the first, and prog2's output is the second, we are granted shell.
MD5 hashing is vulnerable to very fast generation of collisions. Note that this is not equivalent to a pre-image attack, or even a second pre-image attack (which are harder). Instead, a generation of a collision is merely finding x
and y
such that MD5(x) == MD5(y)
. Lots of literature and code exists to do the same. We will use this idea and the next in our exploit.
The other vulnerability of MD5 is that of a variant of hash length extension, that comes about as a consequence of the fact that the whole internal state is exposed as the hash. This means that if we have some x
and y
, such that MD5(x) == MD5(y)
, then we can concatenate (||
) a message m
to both, and the resultant hashes will be the same as each other (not same as before, but mutually equal). That is, we will have MD5(x || m) == MD5(y || m)
.
The above two properties can be used to pass 2 different programs into the code and have them execute differently. However, we need to understand what limitations such a method has, as well as how the brainfuck part of the code is vulnerable.
The brainfuck part of the code ignores all non-brainfuck characters. Additionally, it even ignores the ,
(which would have made the challenge dead simple). This means that we can have any non-brainfuck characters in the two parts x
and y
and it will skip past those immediately.
There is also a minor flaw in the [
and ]
matching that might be abused, though I did not really use that flaw.
Since we have noticed that the code does an MD5 check before it does a running of the code, I split the code into 2 parts bf_exec.c
and md5_check.c
. Now, it is possible to work on both separately without nasty trouble. I also added some extra statements to help with debugging if need be.
Now, we need to generate the aforementioned x
and y
. For this, we can use Marc Steven's fastcoll tool. It generates collisions in mere seconds (I compiled it with -O4 -march=native
to optimize it as much as possible). Not all such colliding blocks are useful though.
I define useful colliding blocks x
and y
(in code below, mentioned as d1
and d2
) as the following: x
and y
are considered useful iff they differ by exactly one of either +
or -
or [
. Note that the blocks must not have any nulls (\x00
) either, otherwise they stop the strlen()
and prevent the whole block from being MD5'd.
This is coded by the python script find_useful.py
I wrote, which executes fastcoll
until useful blocks are found.
This script was kept running for quite a while. In fact it was taking too long so I started to modify fastcoll
and come up with different seeding for multiple instances, and then kept 4 instances of the above python script running parallelly to speed up finding useful collisions. The fastest one to complete, reached 101 rounds in 180 seconds, and found a useful collision. The first one that I had kept running ran for over 2000 seconds without finding any useful blocks, so luck matters a whole lot for this.
Using these initial blocks (prefix1.bin
and prefix2.bin
), we can now extend the blocks with a common suffix that abused this initial difference in code. The difference is that prefix2.bin
has an extra -
that prefix1.bin
does not. We can test this using the ./md5_check
and ./bf_exec
:
jay@Jay-Ubuntu ~/i/i/level11> X='+++++++++++++++++++++++++++++++++++++++++++++++++.'
jay@Jay-Ubuntu ~/i/i/level11> ./bf_exec $(cat prefix1.bin)$X $(cat prefix2.bin)$X
output prog1 (len= 1): 1
output prog2 (len= 1): 0
jay@Jay-Ubuntu ~/i/i/level11> ./md5_check $(cat prefix1.bin)$X $(cat prefix2.bin)$X
Len of prog1 = 178
Len of prog2 = 178
They are the same
Now, all that remains to be done is to come up with a $X
(equivalent to the concatenation of m
in the vulnerability discussion above) that abuses this difference.
A good resource for writing your own brainfuck code is to use the Brainfuck Algorithms page on Esolangs. Using the if-then-else construct there, I wrote a generic python script gen_differences.py
that would let me pick 2 arbitrary outputs for prog1
and prog2
and output them.
By adding this suffix to the two prefixes, we generate attack1.bin
and attack2.bin
, which satisfy the MD5 constraints, as well as output the required outputs.
After testing them out on my ./bf_exec
and ./md5_check
programs just to be sure, I uploaded them onto a temporary folder on the server, and access to level12 was granted. :)