Challenge: Compress and attack
Author: Shotokhan
Description: Leveraging compression properties to exploit a secure stream cipher
CTF: picoCTF 2021
Category: Crypto

Writeup

We connect to a TCP service and our goal is to find the flag. Source code is provided:

#!/usr/bin/python3 -u

import zlib
from random import randint
import os
from Crypto.Cipher import Salsa20

# flag = open("./flag").read()
flag = "picoCTF{Th1s_is_0nly_a_t3St}"


def compress(text):
    return zlib.compress(bytes(text.encode("utf-8")))

def encrypt(plaintext):
    secret = os.urandom(32)
    cipher = Salsa20.new(key=secret)
    return cipher.nonce + cipher.encrypt(plaintext)

def main():
    while True:
        usr_input = input("Enter your text to be encrypted: ")
        compressed_text = compress(flag + usr_input)
        encrypted = encrypt(compressed_text)
        
        nonce = encrypted[:8]
        encrypted_text =  encrypted[8:]
        print(nonce)
        print(encrypted_text)
        print(len(encrypted_text))

if __name__ == '__main__':
    main() 


I added the test flag to make some tests offline.
We can see that Salsa20 stream cipher is used; it is a cipher for which there don’t exist known vulnerabilites.
The standard way to use that cipher is to concatenate a nonce with the encrypted text, and this is what it’s done here.
We have a (partially) chosen plaintext scenario because our input is concatenated to the flag, then it is compressed, and the result of the compression is encrypted.
Now, we couldn’t do much if the cipher was a block cipher.
But, since Salsa20 is a stream cipher, ciphertext and plaintext have the same length.

Who doesn’t have a background about information theory can still search how zlib works, and he or she would see that zlib is based on entropy encoding.
Zlib uses an entropy encoding of order > 0, it means that it doesn’t computes the entropy on single characters, but on subsequences.
It then creates a variable-length encoding, with short symbols associated to more probable subsequences, and it does something similar to jpeg zig-zag scanning to optimize encoding of repeated symbols.

With that in mind, we can make some tests to validate what we said. We know that flag format is picoCTF{...}, so we know something about the part of the plaintext we can’t control.
The idea is the following:

"flag" + "uyop" -> compress with zlib -> encrypt with stream cipher -> has a certain length
"flag" + "flag" -> compress with zlib -> encrypt with stream cipher -> is shorter than the previous one

Let’s test it locally:

testing
Okay, it works, even if only one character changes.
It could still occur that, when only one character changes, the length is the same for two or more different new characters; we will need to handle this situation in the script we’re going to implement.
We first tested the script locally, obtaining very good results.
It starts like that:

testing
Then it goes on, doing a brute-force character-by-character, and then it ends:

testing
It’s very fast locally, but remotely we needed to handle the problem that after a certain timeout the server would close the connection.
The solution is simple: a try-except block with a reconnection logic; the script can then behave like the server never closes the connection.
Here is a snapshot of the script running, handling that situation:

testing
And here is the final script:

from pwn import *
import string


def get_min_args(zlib_oracle):
    srtd_oracle = sorted(zlib_oracle, key=lambda i: zlib_oracle[i])
    min_value = zlib_oracle[srtd_oracle[0]]
    min_args = []
    for arg in srtd_oracle:
        if zlib_oracle[arg] == min_value:
            min_args.append(arg)
        else:
            break
    return min_args


if __name__ == "__main__":
    # r = process(argv=["python", "compress_and_attack.py"])
    r = remote("mercury.picoctf.net", 50899)
    alphabet = string.ascii_letters + string.digits + "_}"
    base = ["picoCTF{"]
    found = False    
    while not found:
        zlib_oracle = {}
        for partial in base:
            for char in alphabet:
                try:
                    print(r.recvuntil("encrypted: ").decode(), end="")
                    payload = partial + char
                    r.sendline(payload)
                    print(payload)
                    r.recvline()
                    r.recvline()
                    val = int(r.recvline().decode()[:-1])
                    zlib_oracle[payload] = val
                except:
                    # server closes the connection after some time
                    r = remote("mercury.picoctf.net", 50899)
        base = get_min_args(zlib_oracle)
        if len(base) == 1 and base[0][-1] == '}':
            found = True
            r.close()
    print("Flag found: {}".format(base[0]))

The script should be self-explainatory, zlib_oracle is a dictionary with user input as key, and len(compress(flag + user_input)) as value.

picoCTF{sheriff_you_solved_the_crime}