# DragonCTF 2020

## General Comments {#dragon2020}

My expectations were a bit high, DragonCTF has had some of the best CTFs in the past. Yet I had the thought that even last year‘s 24-hour “teaser“ CTF was better? So a bit mixed feelings. I‘m not saying this wasn‘t good but…? It‘s hard not to vote with one‘s heart.

The difficulty was alright, but it felt like very few tasks in total.
Thinking partly of my poor teammate `poiko`

this time, who seemed to get
only a single pure rev task, and an easy one at that. I know he was
hyped and had looked forward to fiendish VM-in-VMs and unknown
architectures. We kept waiting for more to be released, but alas.

For my own sake, Frying in motion was an cool idea, tho a bit draining. Bit Flip 1 was easy. Bit Flip 2 & 3 were puzzling, but not the good kind of puzzling (which motivates me to write code or learn stuff outside of the CTF), but “I feel I just missing some (probably) trivial trick here, oh well next time I guess“-puzzling.

Sunday I ended up watching *Gilmore Girls* and `poiko`

was reduced to
doing Linux stuff and actual hacking. Imagine having to do actual
hacking. What‘s that about? Sheesh.

Anyway, we placed like dumpster tier anyway. Wasn‘t our weekend, I guess.

## Frying in motion

I put off this task while looking at the *Bit Flip* tasks because I kind
of expected implementing it would be an unfun hellhole of tiny bugs.
It‘s one of those tasks where the idea is simple and/or obvious but
it‘s a pain in the ass to implement^{1}. Not a fan of that.

But so: the task involves reversing a `strfry()`

given the output of
another `strfry()`

. Fine, sounds like an RNG task.

`strfry()`

does this:

```
char *
strfry (char *string)
{
static int init;
static struct random_data rdata;
if (!init)
{
static char state[32];
rdata.state = NULL;
__initstate_r (random_bits (),
state, sizeof (state), &rdata);
init = 1;
}
size_t len = strlen (string);
if (len > 0)
for (size_t i = 0; i < len - 1; ++i)
{
int32_t j;
__random_r (&rdata, &j);
j = j % (len - i) + i;
char c = string[i];
string[i] = string[j];
string[j] = c;
}
return string;
}
```

(Aside: is it just me or is the GNU style one of the ugliest code styles in existence?)

So `strfry()`

just uses glibc‘s `random_r()`

stuff, using a 32-byte
state buffer (of which only 28 bytes are used), initialized with a
32-bit seed based on some CPU clock counter. (`random_bits()`

actually
gives less than 32-bits of actual bits but let‘s ignore that.)

It‘s clearly a “brute the seed“ sort of task, because even partially
reversing the internal state directly (let‘s say if it had initialized
with 28 bytes of random data from `urandom`

) just given
`strfry()`

output alone is very hard unless many such calls
can be queried in a series. (And even then it would be a finicky as all
hell.)

Another thing the task does is it calls `strfry()`

a lot of times on a
temporary string before it does the challenge-response step. The number
of resulting `random_r()`

calls is predictable, but you don‘t get any
output. So, it‘s “brute the seed“ and then “seek in the RNG
stream.“

Seeking in glibc‘s `random_r()`

stream turns out to be easy, as it
seems to be just a linear recurrence. It does varying things depending
on how many bytes of state you give it, but all of them are very simple.
It‘s sort of like a more flexible Mersenne Twister without the output
scrambling. For 32-byte state it will use $a_{n+7}=a_{n+4}+a_{n}$
which can be modelled most simply (to me, anyway) with matrix
exponentiation:

```
A = ZZ[2**32].matrix([[0, 0, 1, 0, 0, 0, 1],
[1, 0, 0, 0, 0, 0, 0],
[0, 1, 0, 0, 0, 0, 0],
[0, 0, 1, 0, 0, 0, 0],
[0, 0, 0, 1, 0, 0, 0],
[0, 0, 0, 0, 1, 0, 0],
[0, 0, 0, 0, 0, 1, 0]])
# now you can model state after n steps as A**n * state
# modulo indexing errors.
```

This is the same for seeking in plenty of RNGs whose internals are linear or “affine“ like xoshiro and MT. Note that the state has the last 7 outputs from the RNG, which will come in handy. Note also that the RNG discard the lowest bit when giving actual output. (But due to the lack of scrambling the low order bits will still be pretty bad by most RNG standards.)

Now about the seed. The state is initialized by
`initstate_r()`

with $state_{i}=seed⋅16807_{i}(mod2_{31}−1)$.

There‘s plenty of sources of bugs here. One is that it actually starts
outputting at $state_{3}$ or something like that, and then iterates in
reverse order from how I set up my vectors, which I should probably have
fixed—so that‘s on me. It also discards the 70 (in our case) first
outputs from `random_r()`

so these have to be taken into account. And of
course the mixed moduli need to be kept separate. Notably care must be
taken so numpy doesn‘t overflow the seed-modulus.

Now to match the `strfry()`

output. For example, given
`out = strfry(in)`

with unique characters, I know that the first random
call (`i=0`

) satisfies `random_r() % (len(in) - i) == in.index(out[i])`

.
Then re-do the swap, get a similar constraint from the next character,
and so forth. I did this for the 7 first characters for a 90+ char
string to make sure I had enough information to uniquely determine any
seed from a given state. (A potentially more clever idea would be to use
a longer string and only consider even indices, taking only 1 bit from
each character, to get constraints that are all mod 2? I didn‘t think
of that at the time.)

I think in the end I had some idiotic thing like:

$(A_{70+n+7}[seed⋅16807_{i}(mod2_{31}−1)]_{i=0…6}(mod2_{32}))_{7−j}≫1=index_{j}(modlength−j)(modrandom_indexing_errors)$

I mean… It‘s not pretty. I knew it wouldn‘t be. Doing this sort of thing 90% of my time is spent fixing off-by-one errors or putting indices in the wrong order or something like that. For my test code I had something like:

```
# precomputed stuff
start = np.array([282475249, 16807, 1, 470211272, 1144108930, 984943658, 1622650073], np.uint32)
A = np.array([
[0xe9373f80, 0xdc7dfb8c, 0x9a732dfd, 0x8e41234d, 0x45478090, 0xe4134087, 0x78b11946],
[0x78b11946, 0xe9373f80, 0xdc7dfb8c, 0x21c214b7, 0x8e41234d, 0x45478090, 0xe4134087],
[0xe4134087, 0x78b11946, 0xe9373f80, 0xf86abb05, 0x21c214b7, 0x8e41234d, 0x45478090],
[0x45478090, 0xe4134087, 0x78b11946, 0xa3efbef0, 0xf86abb05, 0x21c214b7, 0x8e41234d],
[0x8e41234d, 0x45478090, 0xe4134087, 0xea6ff5f9, 0xa3efbef0, 0xf86abb05, 0x21c214b7],
[0x21c214b7, 0x8e41234d, 0x45478090, 0xc2512bd0, 0xea6ff5f9, 0xa3efbef0, 0xf86abb05],
[0xf86abb05, 0x21c214b7, 0x8e41234d, 0x4cdcc58b, 0xc2512bd0, 0xea6ff5f9, 0xa3efbef0],
], np.uint32)
@numba.jit(nopython=True)
def find_seed(A, start, m):
seedvec = start.copy()
for i in range(1,2**32):
out = (A*seedvec).sum(1).astype(np.uint32)
out //= 2
out %= np.array(range(89,96), np.uint32)
if np.all(out == m):
return i
seedvec += start
seedvec %= 2147483647
```

However, doing a full brute like this wasn‘t fast enough against the
live server. It was really close but it required me to get lucky, and
DragonCTF has this super annoying PoW blocker on all their tasks… I
probably should have picked apart `random_bits()`

but at this point I
just wanted to move on with my life. I *think* it might also be possible
to thread the various moduli and simply *calculate the seed directly*
for each index-modulus and then CRT it (esp. if using the idea noted
above), but Lord almighty, the paper work involved. Yeah, brute forcing
is dirty, it doesn‘t feel rewarding or clean, but it‘s a dirty
real-worldsy task, so I didn‘t feel too bad about it.

Because this CTF had like 1 rev task, `poiko`

was free to help out, so
he wrote a utility in big boy code to do the above for me while I fixed
the remaining indexical bugs, and I could just use:

```
print("FINDING SEED VROOM VROOM")
s = int(subprocess.getoutput('./BLAS ' + ' '.join(str(x) for x in M)))
# s = find_seed(M)
```

Out comes the seed quick as mercury rain. Then seek the stream as above,
then seek the stream again because you forgot to add +1, then do the
final `strfry()`

on the given string, then realize you‘re an idiot and
*reverse* the final `strfry()`

because you misremembered the code, then
finally get the flag.

## Bit Flip 1

Diffie-Hellman between Alice and Bob to get a shared key which is used as an AES key.

The stream of numbers $2_{256}sha256(x)+sha256(x+1)$ for $x=r,r+2,r+4,…$ is scanned to find a prime. When a prime is found at $x$, then the lower 64 bits of $sha256(x+1)$ becomes Alice‘s secret. The numbers here are big-endian strings of 32 bytes.

$r$ is `os.urandom(16)`

and unknown, but you get to flip arbitrary bits
in it so once it‘s known you can set it to whatever you want. You‘re
told how many steps it took in the above sequence to find a prime
(simulating a timing attack), so finding the original $r$ bit-by-bit is
easy, something like:

```
def find_next_bit(oracle, num_known, bits_known):
next_bit = 0
while True:
next_bit = 2 * next_bit + (1 << num_known)
x = oracle(next_bit - bits_known - 2) # fetch ABC1111X
# if unlucky
if x != 0:
break
y = oracle(next_bit + bits_known) # fetch ABx0000X
# if C was 0, then oracle(y) fetched at offset +2 closer to the prime
# if C was 1, there's a tiny chance for false positive
return y != x - 1
```

(Could be made a lot more efficient by making use of past ranges. Probably only $1+ϵ$ queries is needed for most bits instead of fixed 2 I‘m doing here?)

But anyway, this basically solves the task, because now you get the prime, Alice‘s secret, and Bob‘s public key is given.

## babykok

I‘m a closeted Haskell advocate, I‘ve played with other dependent type things like Agda 2, and there‘s some community overlap, so I recognized it immediately. However.

I‘ve never used Coq before. I‘ve never used it before, but I now know that I hate it. I hate its syntax, its IDE, its logo, I hate all the people who think making puns on its name is smirk-worthy, and the people who pretend the name doesn‘t lend itself to puns. I hate how all the blogs make proving stupid toy theorems about Peano numbers seem like a fun adventure. I hate myself.

I have renewed understanding for beginners when they do stuff like:

```
if i == 0:
s = "a"
elif i == 2:
s = "b"
elif i == 3:
s = "c"
elif ...
```

or

```
log("initializing")
sum = i = 0
for _ in range(100000):
log("increasing the index")
i += 1
log("accumulating sum...")
sum += av[i]**2 + av[2*i + av[i]%2]
log("solving langlands")
solve_langlands()
log("printing sum")
print(sum)
```

I get it. That‘s me in Coq.

I solved the trivial ones easily. The `math_theorem`

one I just pulled
from some tutorial thing. (Super unrewarding way to solve it, but there
you go.) I was stuck on the last problem for a long time, reading
tutorials, blog posts, looking at other proofs, and so on. I also
infected `poiko`

and made him waste time on it. Both of us wasted
several combined hours trying to learn Coq‘s syntax and usage for this
dumb, pseudo-trivial theorem.

In the end we both came up with different proofs for the final theorem at nearly the same time. Mine was:

```
intro n. intro l. revert n. induction l as [|? ? IHl]; intro n; destruct n; simpl.
- intro. apply PeanoNat.Nat.nlt_0_r in H. contradiction.
- intro. apply PeanoNat.Nat.nlt_0_r in H. contradiction.
- destruct 1. exists a. auto. exists a. auto.
- auto with arith.
```

It probably makes no sense and looks ridiculous to anyone who knows anything about Coq.

`poiko`

‘s solution might make more sense (I couldn‘t say):

```
unfold lt; intro n; induction n as [| n hn]; intro l.
- destruct l; simpl. inversion 1. inversion 1. exists a. auto. exists a. auto.
- destruct l. simpl. * inversion 1. * intros. apply hn. auto with arith.
```

I‘m having a hard time deciding whether this is a good task or not. I mean it made me despise Coq, and that‘s not good; I‘m usually all for ivory tower type theory stuff. On the one hand, motivating CTF players to learn some higher-order type theory and theorem proving sounds great, but on the other hand it feels very arbitrary and polarizing the way this task was put together, like putting up a poem in Finnish that tells you how to make the flag through puns—free points for anyone who speaks the language, a frustrating Google hammer experience for the rest.

## Bit Flip 2 & 3

Similar to *Bit Flip 1* but Bob‘s key isn‘t printed. This ruins the
easy solution. You can still reverse $r$ as above, so you get a little
bit of control over what the prime becomes as well as Alice‘s secret,
both of which are of course known, but there‘s not much else to do.

These tasks had me super puzzled. Here‘s just various notes I made to myself mentally.

The brute forcing options included:

- Brute force Bob‘s 64-bit secret. Nope, not gonna happen.
- Brute force hashes to make Alice‘s secret 0 like a 64-bit PoW task. Also not gonna happen.
- Search the SHA256-stream for primes under which 5 (the generator
used) has a very low order, for example by making $p−1$ smooth.
Given that
*Bit Flip 3*is just like*Bit Flip 2*but with the added constraint of strong primes, it seemed to heavily suggest this was the path. But finding smooth numbers of this size without sieving you control is near impossible. So this was also not gonna happen.

I wondered about the `(p % 5 == 4)`

chech in *Bit Flip 3* — like are
there any other constraints for *random* primes where a given number
such as 5 might have low order in the associated field?

The task also specifically imports `is_prime()`

from `gmpy2`

, even
though it uses `Crypto.Util.numbers`

(which has `isPrime()`

). It‘s the
only thing it uses from `gmpy2`

too. That was curious, but I thought it
was just some author idiosyncracy. Besides, you can‘t really
*construct* the primes being generated, you can only pick one from some
set of random ones, so again it‘s not like you can even begin to think
about fooling a standard prime check.

All tasks also xors the shared secret with 1337 for seemingly no reason.
No idea why. The shared secret will in “almost all“ cases be a 500+
bit number, of which the 128 most significant bits are used, so it
doesn‘t matter. (Addendum: I was wrong here, I totally missed the
“bug“ in *Bit Flip 3*.)

IV isn‘t reused, there doesn‘t seem to be a meet-in-the-middle, data collection probably doesn‘t help.

I failed to even find the *problem* I was supposed to work on, so I just
put it out of my mind.

*Post-CTF Addendum:* after the CTF ended I found out that *Bit Flip 3*
can be solved trivially because `long_to_bytes(x, 16)`

does something
quite unexpected:

```
>>> long_to_bytes(random.getrandbits(513), 16)[:16]
b'\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x01'
```

This was actually really clever! Ever since I started joining CTFs I‘ve
had a low-key distaste for `long_to_bytes()`

and how pervasive it is. It
feels so Python 2-y. It‘s in every crypto task under the sun, so its
usage didn‘t stand out to me at all, and I never bothered to check up
on it in this case. I‘m sorry I missed this a-ha moment. Deceptive!

However it seems that several people solved *Bit Flip 2* by “brute
force the hashes modulo OSINT.“ Namely: the BitCoin blockchain has a
lot of such hashes, so you can find hashes there that match the
requirements. Although that‘s a clever idea, it sort of lowers the task
in my estimation greatly if that is the intended solution. It‘s just a
really unfun way to solve problems. It feels a lot like “enter the MD5
hash in Google“ or “look up the factors on factordb.com“ but version
2.0. Big boy OSINT is still OSINT. If anyone has a programmatic or
mathematical solution to *Bit Flip 2* that doesn‘t look up dirty
money-laundered power-guzzling hashes from other sources I‘m very
interested to see what I missed?

Not that anyone reads these notes, but hey.

There‘s only two hard problems in computer science: naming things, cache invalidation, and off-by-one errors. (Recycled Twitter joke.)↩