# ASIS 2020

## General Comments

I was bribed with pizza to rejoin this CTF after two weeks of being
largely offline. I didn’t play on Friday, as I was lost in the
braindance of *Cyberpunk 2077*, but was able to quit and reboot into
Linux on Saturday.

But oof, this CTF had some *actual* hiccups, not like the fake critiques
I’ve unfairly levied against other CTFs above. A bunch of things seemed
to have slipped through Q&A, there were a couple of broken problems and
questionable code in the crypto/ppc sections; `poiko`

also indicated
that the rev challenges were really uninspired, everything boiling down
to obfuscated x86 with guessy elements(?).

However, looking at CTFTime now, especially the voting section, it seems almost like there’s reverse group-think going on, like suddenly it’s Socially Acceptable to be Critical or to “offset” high votes with exaggeratedly low ones, so everyone’s letting off steam. Does this CTF really only deserve 20 points? That’s goddamn vicious. Was it really Ekoparty or e-Jornadas level? No way.

Their heart was clearly in the right place. The difficulty level was
alright, and the *intent* seemed good. Plenty of the tasks were
*interesting*, but sure, there were Problems and a
couple of tasks with bad “quality assurance.” I still feel this CTF
was leagues better than OSINT-riddled pub-quiz or escape-room CTFs. It
tasted like at least 50-60 points to me, even given the fuckups for a
few of the tasks.

## Chloe

(Aside: both this and *Coffeehouse* were coded in a certain style that I
dislike. It seems by a Python 2 programmer that’s afraid of using
anything but the trusty old `str`

datatype, and insists on using them as
if they were `bytes`

. This programmer also enjoys converting X-bit
numbers to text strings of “0” and “1” to do binary operations as if
working with actual numbers is a bit scary.)

So *Chloe* involved a fair bit of “reversing” the bad code to get data
into types that are actually sane.

What you get is `cipher + byte(r)*16`

(48 bytes) xored with a 16-byte
`key`

. `r`

is a random byte.

So modulo an 8-bit brute force, `key`

(originally from `os.urandom`

) is
already given in the last 16 bytes of the output.

Now for `cipher`

which is calculated in some Feistel-like manner with an
added permutation at the end:

```
L,R = flag[:8],flag[8:]
for i in range(16):
L,R = R, L ^ R ^ roundKey[i] ^ (2**64-1 if i%2 else 0)
cipher = permute(L + R)
```

The Feistel function is just a simple xor and predictable negation, so the whole “Feistel network” is easily collapsible, as you can simply check the xor keys that end up being used in left and right side:

```
# precalculate: (the magic numbers are a[i] = a[i-1]^a[i-2]^2**(i-1) starting
# from [0,1])
xorL = reduce(op.xor, [roundKey[i] for i in bits_by_idx(14043)])
xorR = reduce(op.xor, [roundKey[i] for i in bits_by_idx(28086)])
# and the above Feistel-loop simply collapses to:
cipher = permute(flag ^ (xorL << 64) ^ xorR)
```

The `roundKey`

array is given by the `key`

found earlier.

The last `permute()`

step here performs a fixed random permutation on
the bits in each byte. There’s only $8!$ such permutations so finding
the reverse one is also easily brutable (~15-16 bits). To brute force
the permutation I just did it the simple thing by turning each
permutation into a 256-byte translation table, so I could simply do
`cipher.translate(Pinv[i])`

to test a permutation.

So: cycle through the bytes, reverse permutation, and do a simple xor and check if it gives flag data.

## Coffeehouse (cafe house)

A 32-bit block-cipher which uses a xor, add, and shift in a loop for diffusion. I.e. it takes the flag and encrypts 4-byte blocks by looping over the block 32 times and doing various add-and-xor-shift with the key and a counter.

I really wish the cipher wasn’t 32-bits because it’s just too tempting to brute force…

You only get the encrypted flag as data, so I guess that’s clue #1 that
at least a *partial* brute-force was expected. You guess that no utf-8
is used in the flag so there’s N bits that you know were 0 in the
plaintext, and from there you can filter keys by writing the trivial
`decrypt()`

function which was missing. Technically this dumb way ends
up being around 37-bit(?) brute force.

This is what I did and I’m not particularly proud of it. My program (C++) found the solution in seconds, but a numpy equivalent with parallel keys would probably not be much slower.

My guess is that maybe the task author *intended* you to find some way
to strength-reduce the loop? Hmmm. Like, what if it was 64-bit or
128-bit instead? Or the inner loop had several orders of magnitude more
rounds? Then one would have to solve it by finding invariants… And
suddenly it’s a much cooler task—given that there *is* a solution to
these strengthened problems. I don’t really know, I didn’t look into
it, I just did the bad thing and moved on. But my intuition tells me
that the diffusion step is really weak and there’s plenty of invariants
to be found, so I wouldn’t have complained if I was forced to look into
it more, forced to not be lazy. (Again, given that there actually is a
solution?)

## congruence

I spent the better half of Saturday evening looking at this task. Spoiler: I didn’t solve it. (I don’t think anyone did? I think many people were in the same boat as me, spent a lot of time on it, and maybe ended up frustrated?)

It seemed magical, a promise of real beauty! So simple, so clean, pure
mathematics, yet I had no idea how to solve it. That’s the best kind of
feeling, isn’t it? When you feel stupid because of *math*. It drew me
in immediately.

Secrets: $flag$, $p$ (a fixed prime modulus of length ~512 bits), and $s$ (presumably a very small non-negative integer).

You’re given five different pairs of numbers $⌊a_{i}/256⌋$ and $c_{i}=a_{i}(modp)$. Each of the numbers $a_{i}$ are constructed from ostensibly random alphanumeric bytestring of the same length as flag. You’re also given $flag_{e}(modp)$. So far so good. The coup-de-grâce is the exponent: $e=7⋅37⋅191⋅337⋅31337⋅2_{s}$.

Hmmm…? The usual Euler hack of finding $p$ from $p⋅k=gcd(a_{i}−c_{i},a_{j}−c_{j})$ (for some small $k$) doesn’t work because $e$ is too large, even if $s=0$. The numbers involved would be hundreds of gigabytes. Oh my!

The fact that the $a_{i}$ numbers get their last byte censored seems to indicate that a solution has to be easily calculable, because the task will necessarily(?) involve a small brute-force step to test a given solution.

So, some ideas I went over:

- is there some way to calculate
`remainder(a^e, b^e - c)`

iteratively, without fully reifying the numbers involved? - sub-question: is there some way to calculate the remainder or gcd directly from vectors of residues modulo sane primes? Prime-factor FFT?
- can continued fractions be used somehow, as continued fractions and gcd are very similar algorithms.
- can some clever algebra be used to relieve $e$ of some of its odd exponents? In particular: has $p$ been carefully picked to be revealed in one of the low-term expansions of some $a_{e}−b_{e}$ expression?
- is $p$ of a special form? E.g. does $e$ divide into $p−1$ giving a power-of-two quotient? (I can’t recall if I even tested these last two, because it would have made the task ugly and idiotic; I didn’t want it to be idiotic, I really wanted to preserve the task’s beauty.)
- in the relations $a_{i}−c_{i}=k_{i}⋅p$, we
*can*recover $k_{i}⋅p(modq)$ for saner numbers $q$ (in particular we can find small factors of $k_{i}$), but how does that help? All the $k_{i}$ integers are too big to actually be used or reconstructed. - is the special form of the $a_{i}$ numbers relevant?
- the $a_{i}$ plaintext numbers are
*ostensibly*random, but are they really? Could a seed have been fixed such that it can be used to reveal $p$ or $flag$ in a manner ordinarily impossible? Hmm, unlikely, right? - does it help to find vectors (e.g. LLL) combining the given terms to 0?

Most of these avenues I never really tested. What kept stopping me was the brute force step on the $a_{i}$ terms: I felt I had to be damn sure about an implementation lest I waste a lot of time, something I could actually verify before brute-forcing.

So I ended up really disliking the brute force step added to this task
actually. (Unless it wasn’t intended as a brute-force step, but some
kind of magic fairy-land lattice problem? I don’t see how that’s
possible, though?) It seemed unnecessary. I also don’t know why $s$ was
hidden as well, that just seemed *overly* sadistic.

But in the end I never really became sure of anything. If there *is* a
solution, I would be very happy if someone could enlighten me! So far my
best intuition is in the direction of some novel FFT application?

## Trio couleurs

Spoiler: I didn’t solve this one either, but noting my thoughts.

I think this task was released during the day on Saturday. The server
instantly disconnected you at first, so I ignored it for a while, and
only came back to it when I finally gave up on *congruence* above. By
then it was late and I found that this was a really work-intensive
task…

(Again this style of converting numbers to text strings of “0” and
“1”, so I take it it’s the same task author as *Chloe* and
*Coffeehouse*, who does seem to have a particular phobia.)

Code is given which looks like DES, used in some broken 3DES-ish manner. It didn’t encrypt the same as a reference DES implementation, so then I have to waste time to figure out the difference. Turns out it was regular DES reduced to 8 rounds. (Unless this kind of debugging is relevant for the task, I wish the task made these things clearer, and simply stated sincerely that it was “DES reduced to 8 rounds” or similar.)

But OK, so it seemed like a linear cryptoanalysis task. The 3DES part of the task is broken since it trivially leaks the penultimate ciphertext.

It doesn’t actually seem like a bad task, it’s “true” crypto, though I believe it requires some prior experience to be doable in hours as opposed to days. Key discovery from 8-round DES is, as far as I know, not trivial, and require a lot of plaintexts. I haven’t done it before, nor do I have any linear analysis code, so this contributed to my decision to skip the task, as it would potentially be a big time sink, I needed sleep, and not many hours remained of the CTF.

It’s something I could have fun solving off-line though.

## Baby MD5

I’m embarrassed to admit I also wasted a long time staring at this stupid task doing exactly what the task encouraged me not to do: overthink it. I wrote a script to solve ASIS’ proof-of-work using all available cores to make reconnects fast, and I tried to sample the parameters to see if it would sometimes give $n=m$ in which case you can make an MD5 collision and… Well, I suppose this was the “trap” set up to trick idiots like me.

While I was doing this, `poiko`

came along and remarked “hm, isn’t
‘dead’ hex, though? Couldn’t you—” and I instantly cringed and
facepalmed pretty hard. Yeah, indeed… Dumb, dumb, dumb.

Connect, assert that $m>n$ and then find a string `X`

such that
iterating the function `lambda x: md5(x.encode()).hexdigest()`

starting
with `prefix + X`

produces a hex string that starts with `dead`

after
$m−n$ iterations; that hex string is the second input. Could also do
this trick for $m<n$ if the random prefix turns out to be (lowercase)
hex.

## גל התקפה (Attack Wave)

I solved this one while taking a break from congruence above.

The boring part: 5.1 WAV file, two channels was a synthetic voice saying
what sounded like “RGBA” and the other 4 channels contained random
noise. The noise data were all in `[0,255]`

so I extracted
that, treated is as image data, and that’s where the fun^{1} begins.

It’s what I like to call a “numpy massaging” task. In my mind these
tasks exists in a grey area between the guessy and the
mathematical^{2}. You don’t have the source, you don’t know what the
task authors did or their reasons for doing it, so in that sense it’s
guessy, but what you do have is pure, cold data and with data you can
“use the force, Luke.” There are certain intuitions to follow in
seeking low entropy, seeking order in the chaos. As long as you’re
given some breadcrumb trail of patterns to start with, it’s usually
fine.

For example, displaying the byte data as images of whatever dimension shows semi-repeated patterns on the top and bottom (low entropy), whereas the middle of the image appeared much more random (high entropy), which makes it “feel” like an image with maybe uniform coloring around the edge with text in the center, like this:

Now we seek to lower this entropy with various basic operations. Xoring
all the channels together direct only increases the entropy, but I
noticed that the three “RGB” channels were very similar (whereas the
“A” channel was more chaotic). Looking at the differential of where
certain numbers or bits occur is a good way to get a feel for the
repeated patterns. They start out with this very cyclic pattern, but
each is offset a little. So, rotating the arrays to by small offsets
aligns them to produce *very* low-entropy patterns, differing in very
few bits, so you know you’re closer. I combined this trick with the
more mysterious “A” layer as well, and combining “A” with any of the
other channels (or all three) quickly produced an image with the flag
clearly readable:

Unfortunately I have no idea what the author’s intent was. From the periodic artifacts still present in the above image, my guess is that there was indeed some “intended” way where you get an actual clean image, possibly with the letters all black and clearly defined. If I were to guess further it might have something to do with the fact that the data size makes it feel like there’s a row/column missing from the image, that the data given is actually just differentials between the pixels… But I didn’t investigate further after I got the flag. It was a fairly quick task.

## Election & Galiver

I don’t exactly know when these tasks were released, but I didn’t see them until I joined up on Sunday, with like 4-5 hours left of the CTF. Oof.

Why, oh why, do most CTFs insist on these staggered releases? Every damn
time. Is it intended to “keep things exciting” in the competitive
sense until the very end? Because I don’t think it’s working^{3}.

*Galiver* looked cool, seemed to be DH with gaussian integers, but I
made the terrible choice of focusing on the PPC task *Election* instead
because it “seemed easy.”

Then I invested too much into *Election* and I was sunk by my own
sunk-cost fallacy. The cherry on top was that this task was actually
bugged server-side.

*Election* was a PPC where you are given a string of base-10 digits and
have to give some binary arithmetic expression like `X op Y`

(where `op`

can be one of `{+,-,*,/,**}`

) in which these digits can be found as a
substring. The expression you give cannot be more than 8 characters
long. You can also just enter a single number so for levels up to
strings of length 8, you can just feed it back the string given granted
it doesn’t start with a 0.

Each level the base-10 string you get has length +1. It has an unknown
number of levels, which is the first guessy part, because at first you
don’t really know what you’re up against. Making a small in-memory
database of the shorter expressions and their digits gets you to length
12+ strings and you realize you’ll need to make a full database of all
strings producible by these arithmetic expressions. I dumped these to
external files and just used `grep`

in my solve-script. These files were
several gigabytes each.

The addition, subtraction, and multiplications are trivial and can be ignored.

All the guessy elements stem from the *server-side code being hidden*.
That’s *always* a bad sign. First you have to assume (pray) that it
actually picks *valid* substrings. OK, fine, I’ll allow it, but it
doesn’t feel *great* to trust the author like this, especially when
knowing that the author is merely human, and the code might not have
been tested by others.

The exponentiation expressions are the simplest to dump, but most
time-consuming in CPU time (`gmpy2`

and not Python’s slow bigints), so
I started with those. Constructing the database of digit expansions from
the divisions are the more brain-intensive and tricky if you want to do
it “properly.” From some pen and paper and Wikipedia I re-figured the
repetend in the expansion of, for example, `a/b`

is of length $k$ where
$k$ is the least number such that $10_{k}=1(modb)$. Thus for prime $b$ it will be up to $b−1$ digits — so I figured
this part of the database migth require a magnitude more disk space,
because the numbers involved are larger (`D**DDDDD`

vs `D/DDDDDD`

), and
thus wanted to take some extra care to only store what is necessary. I
knew that for the cyclic numbers the numerator can be fixed to 1, but it
was unclear to me whether $a/b$ for composite numbers could give new
“unique” decimal sequences. I don’t have that much SSD disk space…

But bottom line: expressions on the form `D/DDDDDD`

can give *up to*
$999999+ϵ$ digits before repeating. And this is where the
server-side bug comes in. As my database kept filling up I started
encountering this:

```
[DEBUG] Sent 0x9 bytes: b'1/102451\n'
[DEBUG] Received 0x60 bytes: b'sorry, the result of your expression does not contain the given substr: 07718811919\n'
```

Yet:

```
>>> getcontext().prec = 200000
>>> '07718811919' in str(Decimal(1)/102451)
True
```

Hmmm. I kept getting more and more of these errors, barring me from
further levels. I didn’t really know what was going wrong. I thought
maybe I had a bug somewhere making it skip certain “easy” expressions
and giving the server fractions it didn’t expect (like if the server
erroneously expected some fixed low-term expression for the later
levels). It felt really bad to be in the dark. Finally I went on IRC and
tried to contact admin. It turned out the server actually only
calculates fractions to `99999`

digits:

```
[...]
15:56 <factoreal> see this:
15:56 <factoreal> if len(formula) <= 8:
15:56 <factoreal> if re.match(regex, formula):
15:56 <factoreal> try:
15:56 <factoreal> if '/' in formula:
15:56 <factoreal> a, b = formula.split('/')
15:56 <factoreal> res = mpfr(Fraction(int(a), int(b)), 99999)
15:56 <factoreal> else:
15:56 <factoreal> res = eval(formula)
15:56 <factoreal> if substr in str(res):
15:56 <factoreal> return True
15:57 <franksh> hm, yeah, so bugged. :/ 99999 isn't enough to cover all repetends/decimal expansions of some fractions.
15:58 <franksh> but i didn't find a way to get or see server source?
15:59 <factoreal> the cycle of 1/p where p is prime is p-1
15:59 <factoreal> franksh I will share it after CTF, ok?
16:00 <franksh> yeah, but 6-digit numbers are valid in d/dddddd so need an order of magnitude more to cover all
16:00 <franksh> and sure
16:01 <factoreal> you are right
```

The classic off-by-one-order-of-magnitude bug. But by then, as is shown by the timestamps above (CTF ended 16:00 for me), it was too late to do anything. That felt pretty shitty, and a sour note to end it on.

From IRC, I think the one solver of this task said something about only
searching fractions `1/p`

with (fixed) 30000 precision(?), which seemed
like pure luck (or they’re just a very good guesser). That also gives a
clue to a *third* problem with the task, in that it didn’t select the
base-10 strings randomly at all. You could “guess” that exponentiation
was irrelevant, and only search `1/p`

up to 100000 in an on-line
fashion? Urk.

Pretty disappointing. Should probably have done *Galiver* instead. Oh
well, offline I guess.

for some definition of fun. I must admit I sort of enjoy data/numpy massaging, especially when it’s precise binary data (as opposed to analog data where you spend most of the time on cleanup or noise filtering).↩

“True” guessy challenges for me are more like OSINT riddles or web challs where you have no data or feedback—you simply have to “get” the core idea, and it’s a binary yes/no. “Numpy massaging” tasks become a sort-of subgame where you play with combining arrays or matrices while trying to make them “fit,” and there

*is*some indication or feeling whether you’re “closer” to a solution or not, e.g. by seeing what patterns you’re able to produce.*Xoared*from BalCCon2k CTF was another example like this, where you do simple transforms of some blob of data, seeking lower and lower entropy, and eventually it suddenly becomes a readable image.↩Besides, online CTFs are completely broken in terms of the “competitive element,” since forming super-teams ad infinitum and rolling up with a 20-man group is all fair play, given the right communication tools and politics. The only “competitive” element that exists is between antisocial dissidents that have refused to merge into super-teams (yet)?↩