ASIS 2020

General Comments

I was bribed with pizza to rejoin this CTF after two weeks of being largely offline. I didn’t play on Friday, as I was lost in the braindance of Cyberpunk 2077, but was able to quit and reboot into Linux on Saturday.

But oof, this CTF had some actual hiccups, not like the fake critiques I’ve unfairly levied against other CTFs above. A bunch of things seemed to have slipped through Q&A, there were a couple of broken problems and questionable code in the crypto/ppc sections; poiko also indicated that the rev challenges were really uninspired, everything boiling down to obfuscated x86 with guessy elements(?).

However, looking at CTFTime now, especially the voting section, it seems almost like there’s reverse group-think going on, like suddenly it’s Socially Acceptable to be Critical or to “offset” high votes with exaggeratedly low ones, so everyone’s letting off steam. Does this CTF really only deserve 20 points? That’s goddamn vicious. Was it really Ekoparty or e-Jornadas level? No way.

Their heart was clearly in the right place. The difficulty level was alright, and the intent seemed good. Plenty of the tasks were interesting, but sure, there were Problems and a couple of tasks with bad “quality assurance.” I still feel this CTF was leagues better than OSINT-riddled pub-quiz or escape-room CTFs. It tasted like at least 50-60 points to me, even given the fuckups for a few of the tasks.

Chloe

(Aside: both this and Coffeehouse were coded in a certain style that I dislike. It seems by a Python 2 programmer that’s afraid of using anything but the trusty old str datatype, and insists on using them as if they were bytes. This programmer also enjoys converting X-bit numbers to text strings of “0” and “1” to do binary operations as if working with actual numbers is a bit scary.)

So Chloe involved a fair bit of “reversing” the bad code to get data into types that are actually sane.

What you get is cipher + byte(r)*16 (48 bytes) xored with a 16-byte key. r is a random byte.

So modulo an 8-bit brute force, key (originally from os.urandom) is already given in the last 16 bytes of the output.

Now for cipher which is calculated in some Feistel-like manner with an added permutation at the end:

L,R = flag[:8],flag[8:]

for i in range(16):
  L,R = R, L ^ R ^ roundKey[i] ^ (2**64-1 if i%2 else 0)

cipher = permute(L + R)

The Feistel function is just a simple xor and predictable negation, so the whole “Feistel network” is easily collapsible, as you can simply check the xor keys that end up being used in left and right side:

# precalculate: (the magic numbers are a[i] = a[i-1]^a[i-2]^2**(i-1) starting
# from [0,1])
xorL = reduce(op.xor, [roundKey[i] for i in bits_by_idx(14043)])
xorR = reduce(op.xor, [roundKey[i] for i in bits_by_idx(28086)])

# and the above Feistel-loop simply collapses to:
cipher = permute(flag ^ (xorL << 64) ^ xorR)

The roundKey array is given by the key found earlier.

The last permute() step here performs a fixed random permutation on the bits in each byte. There’s only $8!$ such permutations so finding the reverse one is also easily brutable (~15-16 bits). To brute force the permutation I just did it the simple thing by turning each permutation into a 256-byte translation table, so I could simply do cipher.translate(Pinv[i]) to test a permutation.

So: cycle through the bytes, reverse permutation, and do a simple xor and check if it gives flag data.

Coffeehouse (cafe house)

A 32-bit block-cipher which uses a xor, add, and shift in a loop for diffusion. I.e. it takes the flag and encrypts 4-byte blocks by looping over the block 32 times and doing various add-and-xor-shift with the key and a counter.

I really wish the cipher wasn’t 32-bits because it’s just too tempting to brute force…

You only get the encrypted flag as data, so I guess that’s clue #1 that at least a partial brute-force was expected. You guess that no utf-8 is used in the flag so there’s N bits that you know were 0 in the plaintext, and from there you can filter keys by writing the trivial decrypt() function which was missing. Technically this dumb way ends up being around 37-bit(?) brute force.

This is what I did and I’m not particularly proud of it. My program (C++) found the solution in seconds, but a numpy equivalent with parallel keys would probably not be much slower.

My guess is that maybe the task author intended you to find some way to strength-reduce the loop? Hmmm. Like, what if it was 64-bit or 128-bit instead? Or the inner loop had several orders of magnitude more rounds? Then one would have to solve it by finding invariants… And suddenly it’s a much cooler task—given that there is a solution to these strengthened problems. I don’t really know, I didn’t look into it, I just did the bad thing and moved on. But my intuition tells me that the diffusion step is really weak and there’s plenty of invariants to be found, so I wouldn’t have complained if I was forced to look into it more, forced to not be lazy. (Again, given that there actually is a solution?)

congruence

I spent the better half of Saturday evening looking at this task. Spoiler: I didn’t solve it. (I don’t think anyone did? I think many people were in the same boat as me, spent a lot of time on it, and maybe ended up frustrated?)

It seemed magical, a promise of real beauty! So simple, so clean, pure mathematics, yet I had no idea how to solve it. That’s the best kind of feeling, isn’t it? When you feel stupid because of math. It drew me in immediately.

Secrets: $f l a g$ , $p$ (a fixed prime modulus of length ~512 bits), and $s$ (presumably a very small non-negative integer).

You’re given five different pairs of numbers $⌊ a_{i} /256 ⌋$ and $c_{i} = a_{i}^{e} (mod p)$ . Each of the numbers $a_{i}$ are constructed from ostensibly random alphanumeric bytestring of the same length as flag. You’re also given $f l a g^{e} (mod p)$ . So far so good. The coup-de-grâce is the exponent: $e = 7 \cdot 37 \cdot 191 \cdot 337 \cdot 31337 \cdot 2^{s}$ .

Hmmm…? The usual Euler hack of finding $p$ from $p \cdot k = g c d (a_{i}^{e} - c_{i}, a_{j}^{e} - c_{j})$ (for some small $k$ ) doesn’t work because $e$ is too large, even if $s = 0$ . The numbers involved would be hundreds of gigabytes. Oh my!

The fact that the $a_{i}$ numbers get their last byte censored seems to indicate that a solution has to be easily calculable, because the task will necessarily(?) involve a small brute-force step to test a given solution.

So, some ideas I went over:

is there some way to calculate remainder(a^e, b^e - c) iteratively, without fully reifying the numbers involved?
sub-question: is there some way to calculate the remainder or gcd directly from vectors of residues modulo sane primes? Prime-factor FFT?
can continued fractions be used somehow, as continued fractions and gcd are very similar algorithms.
can some clever algebra be used to relieve $e$ of some of its odd exponents? In particular: has $p$ been carefully picked to be revealed in one of the low-term expansions of some $a^{e} - b^{e}$ expression?
is $p$ of a special form? E.g. does $e$ divide into $p - 1$ giving a power-of-two quotient? (I can’t recall if I even tested these last two, because it would have made the task ugly and idiotic; I didn’t want it to be idiotic, I really wanted to preserve the task’s beauty.)
in the relations $a_{i}^{e} - c_{i} = k_{i} \cdot p$ , we can recover $k_{i} \cdot p (mod q)$ for saner numbers $q$ (in particular we can find small factors of $k_{i}$ ), but how does that help? All the $k_{i}$ integers are too big to actually be used or reconstructed.
is the special form of the $a_{i}$ numbers relevant?
the $a_{i}$ plaintext numbers are ostensibly random, but are they really? Could a seed have been fixed such that it can be used to reveal $p$ or $f l a g$ in a manner ordinarily impossible? Hmm, unlikely, right?
does it help to find vectors (e.g. LLL) combining the given terms to 0?

Most of these avenues I never really tested. What kept stopping me was the brute force step on the $a_{i}$ terms: I felt I had to be damn sure about an implementation lest I waste a lot of time, something I could actually verify before brute-forcing.

So I ended up really disliking the brute force step added to this task actually. (Unless it wasn’t intended as a brute-force step, but some kind of magic fairy-land lattice problem? I don’t see how that’s possible, though?) It seemed unnecessary. I also don’t know why $s$ was hidden as well, that just seemed overly sadistic.

But in the end I never really became sure of anything. If there is a solution, I would be very happy if someone could enlighten me! So far my best intuition is in the direction of some novel FFT application?

Trio couleurs

Spoiler: I didn’t solve this one either, but noting my thoughts.

I think this task was released during the day on Saturday. The server instantly disconnected you at first, so I ignored it for a while, and only came back to it when I finally gave up on congruence above. By then it was late and I found that this was a really work-intensive task…

(Again this style of converting numbers to text strings of “0” and “1”, so I take it it’s the same task author as Chloe and Coffeehouse, who does seem to have a particular phobia.)

Code is given which looks like DES, used in some broken 3DES-ish manner. It didn’t encrypt the same as a reference DES implementation, so then I have to waste time to figure out the difference. Turns out it was regular DES reduced to 8 rounds. (Unless this kind of debugging is relevant for the task, I wish the task made these things clearer, and simply stated sincerely that it was “DES reduced to 8 rounds” or similar.)

But OK, so it seemed like a linear cryptoanalysis task. The 3DES part of the task is broken since it trivially leaks the penultimate ciphertext.

It doesn’t actually seem like a bad task, it’s “true” crypto, though I believe it requires some prior experience to be doable in hours as opposed to days. Key discovery from 8-round DES is, as far as I know, not trivial, and require a lot of plaintexts. I haven’t done it before, nor do I have any linear analysis code, so this contributed to my decision to skip the task, as it would potentially be a big time sink, I needed sleep, and not many hours remained of the CTF.

It’s something I could have fun solving off-line though.

Baby MD5

I’m embarrassed to admit I also wasted a long time staring at this stupid task doing exactly what the task encouraged me not to do: overthink it. I wrote a script to solve ASIS’ proof-of-work using all available cores to make reconnects fast, and I tried to sample the parameters to see if it would sometimes give $n = m$ in which case you can make an MD5 collision and… Well, I suppose this was the “trap” set up to trick idiots like me.

While I was doing this, poiko came along and remarked “hm, isn’t ‘dead’ hex, though? Couldn’t you—” and I instantly cringed and facepalmed pretty hard. Yeah, indeed… Dumb, dumb, dumb.

Connect, assert that $m > n$ and then find a string X such that iterating the function lambda x: md5(x.encode()).hexdigest() starting with prefix + X produces a hex string that starts with dead after $m - n$ iterations; that hex string is the second input. Could also do this trick for $m < n$ if the random prefix turns out to be (lowercase) hex.

גל התקפה (Attack Wave)

I solved this one while taking a break from congruence above.

The boring part: 5.1 WAV file, two channels was a synthetic voice saying what sounded like “RGBA” and the other 4 channels contained random noise. The noise data were all in [0,255] so I extracted that, treated is as image data, and that’s where the fun¹ begins.

It’s what I like to call a “numpy massaging” task. In my mind these tasks exists in a grey area between the guessy and the mathematical². You don’t have the source, you don’t know what the task authors did or their reasons for doing it, so in that sense it’s guessy, but what you do have is pure, cold data and with data you can “use the force, Luke.” There are certain intuitions to follow in seeking low entropy, seeking order in the chaos. As long as you’re given some breadcrumb trail of patterns to start with, it’s usually fine.

For example, displaying the byte data as images of whatever dimension shows semi-repeated patterns on the top and bottom (low entropy), whereas the middle of the image appeared much more random (high entropy), which makes it “feel” like an image with maybe uniform coloring around the edge with text in the center, like this:

img "issing?

Now we seek to lower this entropy with various basic operations. Xoring all the channels together direct only increases the entropy, but I noticed that the three “RGB” channels were very similar (whereas the “A” channel was more chaotic). Looking at the differential of where certain numbers or bits occur is a good way to get a feel for the repeated patterns. They start out with this very cyclic pattern, but each is offset a little. So, rotating the arrays to by small offsets aligns them to produce very low-entropy patterns, differing in very few bits, so you know you’re closer. I combined this trick with the more mysterious “A” layer as well, and combining “A” with any of the other channels (or all three) quickly produced an image with the flag clearly readable:

img missing?

Unfortunately I have no idea what the author’s intent was. From the periodic artifacts still present in the above image, my guess is that there was indeed some “intended” way where you get an actual clean image, possibly with the letters all black and clearly defined. If I were to guess further it might have something to do with the fact that the data size makes it feel like there’s a row/column missing from the image, that the data given is actually just differentials between the pixels… But I didn’t investigate further after I got the flag. It was a fairly quick task.

Election & Galiver

I don’t exactly know when these tasks were released, but I didn’t see them until I joined up on Sunday, with like 4-5 hours left of the CTF. Oof.

Why, oh why, do most CTFs insist on these staggered releases? Every damn time. Is it intended to “keep things exciting” in the competitive sense until the very end? Because I don’t think it’s working³.

Galiver looked cool, seemed to be DH with gaussian integers, but I made the terrible choice of focusing on the PPC task Election instead because it “seemed easy.”

Then I invested too much into Election and I was sunk by my own sunk-cost fallacy. The cherry on top was that this task was actually bugged server-side.

Election was a PPC where you are given a string of base-10 digits and have to give some binary arithmetic expression like X op Y (where op can be one of {+,-,*,/,**}) in which these digits can be found as a substring. The expression you give cannot be more than 8 characters long. You can also just enter a single number so for levels up to strings of length 8, you can just feed it back the string given granted it doesn’t start with a 0.

Each level the base-10 string you get has length +1. It has an unknown number of levels, which is the first guessy part, because at first you don’t really know what you’re up against. Making a small in-memory database of the shorter expressions and their digits gets you to length 12+ strings and you realize you’ll need to make a full database of all strings producible by these arithmetic expressions. I dumped these to external files and just used grep in my solve-script. These files were several gigabytes each.

The addition, subtraction, and multiplications are trivial and can be ignored.

All the guessy elements stem from the server-side code being hidden. That’s always a bad sign. First you have to assume (pray) that it actually picks valid substrings. OK, fine, I’ll allow it, but it doesn’t feel great to trust the author like this, especially when knowing that the author is merely human, and the code might not have been tested by others.

The exponentiation expressions are the simplest to dump, but most time-consuming in CPU time (gmpy2 and not Python’s slow bigints), so I started with those. Constructing the database of digit expansions from the divisions are the more brain-intensive and tricky if you want to do it “properly.” From some pen and paper and Wikipedia I re-figured the repetend in the expansion of, for example, a/b is of length $k$ where $k$ is the least number such that $1 0^{k} = 1 (m o d b)$ . Thus for prime $b$ it will be up to $b - 1$ digits — so I figured this part of the database migth require a magnitude more disk space, because the numbers involved are larger (D**DDDDD vs D/DDDDDD), and thus wanted to take some extra care to only store what is necessary. I knew that for the cyclic numbers the numerator can be fixed to 1, but it was unclear to me whether $a / b$ for composite numbers could give new “unique” decimal sequences. I don’t have that much SSD disk space…

But bottom line: expressions on the form D/DDDDDD can give up to $999999 + ϵ$ digits before repeating. And this is where the server-side bug comes in. As my database kept filling up I started encountering this:

[DEBUG] Sent 0x9 bytes: b'1/102451\n'
[DEBUG] Received 0x60 bytes: b'sorry, the result of your expression does not contain the given substr: 07718811919\n'

Yet:

>>> getcontext().prec = 200000
>>> '07718811919' in str(Decimal(1)/102451)
True

Hmmm. I kept getting more and more of these errors, barring me from further levels. I didn’t really know what was going wrong. I thought maybe I had a bug somewhere making it skip certain “easy” expressions and giving the server fractions it didn’t expect (like if the server erroneously expected some fixed low-term expression for the later levels). It felt really bad to be in the dark. Finally I went on IRC and tried to contact admin. It turned out the server actually only calculates fractions to 99999 digits:

[...]
15:56 <factoreal> see this:
15:56 <factoreal> if len(formula) <= 8:
15:56 <factoreal>     if re.match(regex, formula):
15:56 <factoreal>       try:
15:56 <factoreal>         if '/' in formula:
15:56 <factoreal>           a, b = formula.split('/')
15:56 <factoreal>           res = mpfr(Fraction(int(a), int(b)), 99999)
15:56 <factoreal>         else:
15:56 <factoreal>           res = eval(formula)
15:56 <factoreal>         if substr in str(res):
15:56 <factoreal>           return True
15:57 <franksh> hm, yeah, so bugged. :/ 99999 isn't enough to cover all repetends/decimal expansions of some fractions.
15:58 <franksh> but i didn't find a way to get or see server source?
15:59 <factoreal> the cycle of 1/p where p is prime is p-1
15:59 <factoreal> franksh I will share it after CTF, ok?
16:00 <franksh> yeah, but 6-digit numbers are valid in d/dddddd so need an order of magnitude more to cover all
16:00 <franksh> and sure
16:01 <factoreal> you are right

The classic off-by-one-order-of-magnitude bug. But by then, as is shown by the timestamps above (CTF ended 16:00 for me), it was too late to do anything. That felt pretty shitty, and a sour note to end it on.

From IRC, I think the one solver of this task said something about only searching fractions 1/p with (fixed) 30000 precision(?), which seemed like pure luck (or they’re just a very good guesser). That also gives a clue to a third problem with the task, in that it didn’t select the base-10 strings randomly at all. You could “guess” that exponentiation was irrelevant, and only search 1/p up to 100000 in an on-line fashion? Urk.

Pretty disappointing. Should probably have done Galiver instead. Oh well, offline I guess.

for some definition of fun. I must admit I sort of enjoy data/numpy massaging, especially when it’s precise binary data (as opposed to analog data where you spend most of the time on cleanup or noise filtering).↩
“True” guessy challenges for me are more like OSINT riddles or web challs where you have no data or feedback—you simply have to “get” the core idea, and it’s a binary yes/no. “Numpy massaging” tasks become a sort-of subgame where you play with combining arrays or matrices while trying to make them “fit,” and there is some indication or feeling whether you’re “closer” to a solution or not, e.g. by seeing what patterns you’re able to produce. Xoared from BalCCon2k CTF was another example like this, where you do simple transforms of some blob of data, seeking lower and lower entropy, and eventually it suddenly becomes a readable image.↩
Besides, online CTFs are completely broken in terms of the “competitive element,” since forming super-teams ad infinitum and rolling up with a 20-man group is all fair play, given the right communication tools and politics. The only “competitive” element that exists is between antisocial dissidents that have refused to merge into super-teams (yet)?↩

a soft spiracle