DragonCTF 2020

General Comments {#dragon2020}

My expectations were a bit high, DragonCTF has had some of the best CTFs in the past. Yet I had the thought that even last year‘s 24-hour “teaser“ CTF was better? So a bit mixed feelings. I‘m not saying this wasn‘t good but…? It‘s hard not to vote with one‘s heart.

The difficulty was alright, but it felt like very few tasks in total. Thinking partly of my poor teammate poiko this time, who seemed to get only a single pure rev task, and an easy one at that. I know he was hyped and had looked forward to fiendish VM-in-VMs and unknown architectures. We kept waiting for more to be released, but alas.

For my own sake, Frying in motion was an cool idea, tho a bit draining. Bit Flip 1 was easy. Bit Flip 2 & 3 were puzzling, but not the good kind of puzzling (which motivates me to write code or learn stuff outside of the CTF), but “I feel I just missing some (probably) trivial trick here, oh well next time I guess“-puzzling.

Sunday I ended up watching Gilmore Girls and poiko was reduced to doing Linux stuff and actual hacking. Imagine having to do actual hacking. What‘s that about? Sheesh.

Anyway, we placed like dumpster tier anyway. Wasn‘t our weekend, I guess.

Frying in motion

I put off this task while looking at the Bit Flip tasks because I kind of expected implementing it would be an unfun hellhole of tiny bugs. It‘s one of those tasks where the idea is simple and/or obvious but it‘s a pain in the ass to implement1. Not a fan of that.

But so: the task involves reversing a strfry() given the output of another strfry(). Fine, sounds like an RNG task.

strfry() does this:

char *
strfry (char *string)
{
  static int init;
  static struct random_data rdata;

  if (!init)
    {
      static char state[32];
      rdata.state = NULL;
      __initstate_r (random_bits (),
                     state, sizeof (state), &rdata);
      init = 1;
    }

  size_t len = strlen (string);
  if (len > 0)
    for (size_t i = 0; i < len - 1; ++i)
      {
        int32_t j;
        __random_r (&rdata, &j);
        j = j % (len - i) + i;

        char c = string[i];
        string[i] = string[j];
        string[j] = c;
      }

  return string;
}

(Aside: is it just me or is the GNU style one of the ugliest code styles in existence?)

So strfry() just uses glibc‘s random_r() stuff, using a 32-byte state buffer (of which only 28 bytes are used), initialized with a 32-bit seed based on some CPU clock counter. (random_bits() actually gives less than 32-bits of actual bits but let‘s ignore that.)

It‘s clearly a “brute the seed“ sort of task, because even partially reversing the internal state directly (let‘s say if it had initialized with 28 bytes of random data from urandom) just given strfry() output alone is very hard unless many such calls can be queried in a series. (And even then it would be a finicky as all hell.)

Another thing the task does is it calls strfry() a lot of times on a temporary string before it does the challenge-response step. The number of resulting random_r() calls is predictable, but you don‘t get any output. So, it‘s “brute the seed“ and then “seek in the RNG stream.“

Seeking in glibc‘s random_r() stream turns out to be easy, as it seems to be just a linear recurrence. It does varying things depending on how many bytes of state you give it, but all of them are very simple. It‘s sort of like a more flexible Mersenne Twister without the output scrambling. For 32-byte state it will use which can be modelled most simply (to me, anyway) with matrix exponentiation:

A = ZZ[2**32].matrix([[0, 0, 1, 0, 0, 0, 1],
                      [1, 0, 0, 0, 0, 0, 0],
                      [0, 1, 0, 0, 0, 0, 0],
                      [0, 0, 1, 0, 0, 0, 0],
                      [0, 0, 0, 1, 0, 0, 0],
                      [0, 0, 0, 0, 1, 0, 0],
                      [0, 0, 0, 0, 0, 1, 0]])

# now you can model state after n steps as A**n * state
# modulo indexing errors.

This is the same for seeking in plenty of RNGs whose internals are linear or “affine“ like xoshiro and MT. Note that the state has the last 7 outputs from the RNG, which will come in handy. Note also that the RNG discard the lowest bit when giving actual output. (But due to the lack of scrambling the low order bits will still be pretty bad by most RNG standards.)

Now about the seed. The state is initialized by initstate_r() with .

There‘s plenty of sources of bugs here. One is that it actually starts outputting at or something like that, and then iterates in reverse order from how I set up my vectors, which I should probably have fixed—so that‘s on me. It also discards the 70 (in our case) first outputs from random_r() so these have to be taken into account. And of course the mixed moduli need to be kept separate. Notably care must be taken so numpy doesn‘t overflow the seed-modulus.

Now to match the strfry() output. For example, given out = strfry(in) with unique characters, I know that the first random call (i=0) satisfies random_r() % (len(in) - i) == in.index(out[i]). Then re-do the swap, get a similar constraint from the next character, and so forth. I did this for the 7 first characters for a 90+ char string to make sure I had enough information to uniquely determine any seed from a given state. (A potentially more clever idea would be to use a longer string and only consider even indices, taking only 1 bit from each character, to get constraints that are all mod 2? I didn‘t think of that at the time.)

I think in the end I had some idiotic thing like:

I mean… It‘s not pretty. I knew it wouldn‘t be. Doing this sort of thing 90% of my time is spent fixing off-by-one errors or putting indices in the wrong order or something like that. For my test code I had something like:

# precomputed stuff
start = np.array([282475249, 16807, 1, 470211272, 1144108930, 984943658, 1622650073], np.uint32)
A = np.array([
  [0xe9373f80, 0xdc7dfb8c, 0x9a732dfd, 0x8e41234d, 0x45478090, 0xe4134087, 0x78b11946],
  [0x78b11946, 0xe9373f80, 0xdc7dfb8c, 0x21c214b7, 0x8e41234d, 0x45478090, 0xe4134087],
  [0xe4134087, 0x78b11946, 0xe9373f80, 0xf86abb05, 0x21c214b7, 0x8e41234d, 0x45478090],
  [0x45478090, 0xe4134087, 0x78b11946, 0xa3efbef0, 0xf86abb05, 0x21c214b7, 0x8e41234d],
  [0x8e41234d, 0x45478090, 0xe4134087, 0xea6ff5f9, 0xa3efbef0, 0xf86abb05, 0x21c214b7],
  [0x21c214b7, 0x8e41234d, 0x45478090, 0xc2512bd0, 0xea6ff5f9, 0xa3efbef0, 0xf86abb05],
  [0xf86abb05, 0x21c214b7, 0x8e41234d, 0x4cdcc58b, 0xc2512bd0, 0xea6ff5f9, 0xa3efbef0],
], np.uint32)

@numba.jit(nopython=True)
def find_seed(A, start, m):
  seedvec = start.copy()
  for i in range(1,2**32):
    out = (A*seedvec).sum(1).astype(np.uint32)
    out //= 2
    out %= np.array(range(89,96), np.uint32)
    if np.all(out == m):
      return i
    seedvec += start
    seedvec %= 2147483647

However, doing a full brute like this wasn‘t fast enough against the live server. It was really close but it required me to get lucky, and DragonCTF has this super annoying PoW blocker on all their tasks… I probably should have picked apart random_bits() but at this point I just wanted to move on with my life. I think it might also be possible to thread the various moduli and simply calculate the seed directly for each index-modulus and then CRT it (esp. if using the idea noted above), but Lord almighty, the paper work involved. Yeah, brute forcing is dirty, it doesn‘t feel rewarding or clean, but it‘s a dirty real-worldsy task, so I didn‘t feel too bad about it.

Because this CTF had like 1 rev task, poiko was free to help out, so he wrote a utility in big boy code to do the above for me while I fixed the remaining indexical bugs, and I could just use:

print("FINDING SEED VROOM VROOM")
s = int(subprocess.getoutput('./BLAS ' + ' '.join(str(x) for x in M)))
# s = find_seed(M)

Out comes the seed quick as mercury rain. Then seek the stream as above, then seek the stream again because you forgot to add +1, then do the final strfry() on the given string, then realize you‘re an idiot and reverse the final strfry() because you misremembered the code, then finally get the flag.

Bit Flip 1

Diffie-Hellman between Alice and Bob to get a shared key which is used as an AES key.

The stream of numbers for is scanned to find a prime. When a prime is found at , then the lower 64 bits of becomes Alice‘s secret. The numbers here are big-endian strings of 32 bytes.

is os.urandom(16) and unknown, but you get to flip arbitrary bits in it so once it‘s known you can set it to whatever you want. You‘re told how many steps it took in the above sequence to find a prime (simulating a timing attack), so finding the original bit-by-bit is easy, something like:

def find_next_bit(oracle, num_known, bits_known):
  next_bit = 0
  while True:
    next_bit = 2 * next_bit + (1 << num_known)
    x = oracle(next_bit - bits_known - 2)  # fetch ABC1111X
    # if unlucky
    if x != 0:
      break
  y = oracle(next_bit + bits_known)        # fetch ABx0000X

  # if C was 0, then oracle(y) fetched at offset +2 closer to the prime
  # if C was 1, there's a tiny chance for false positive
  return y != x - 1

(Could be made a lot more efficient by making use of past ranges. Probably only queries is needed for most bits instead of fixed 2 I‘m doing here?)

But anyway, this basically solves the task, because now you get the prime, Alice‘s secret, and Bob‘s public key is given.

babykok

I‘m a closeted Haskell advocate, I‘ve played with other dependent type things like Agda 2, and there‘s some community overlap, so I recognized it immediately. However.

I‘ve never used Coq before. I‘ve never used it before, but I now know that I hate it. I hate its syntax, its IDE, its logo, I hate all the people who think making puns on its name is smirk-worthy, and the people who pretend the name doesn‘t lend itself to puns. I hate how all the blogs make proving stupid toy theorems about Peano numbers seem like a fun adventure. I hate myself.

I have renewed understanding for beginners when they do stuff like:

if i == 0:
  s = "a"
elif i == 2:
  s = "b"
elif i == 3:
  s = "c"
elif ...

or

log("initializing")
sum = i = 0
for _ in range(100000):
  log("increasing the index")
  i += 1
  log("accumulating sum...")
  sum += av[i]**2 + av[2*i + av[i]%2]

log("solving langlands")
solve_langlands()

log("printing sum")
print(sum)

I get it. That‘s me in Coq.

I solved the trivial ones easily. The math_theorem one I just pulled from some tutorial thing. (Super unrewarding way to solve it, but there you go.) I was stuck on the last problem for a long time, reading tutorials, blog posts, looking at other proofs, and so on. I also infected poiko and made him waste time on it. Both of us wasted several combined hours trying to learn Coq‘s syntax and usage for this dumb, pseudo-trivial theorem.

In the end we both came up with different proofs for the final theorem at nearly the same time. Mine was:

intro n. intro l. revert n. induction l as [|? ? IHl]; intro n; destruct n; simpl.
- intro. apply PeanoNat.Nat.nlt_0_r in H. contradiction.
- intro. apply PeanoNat.Nat.nlt_0_r in H. contradiction.
- destruct 1. exists a. auto. exists a. auto.
- auto with arith.

It probably makes no sense and looks ridiculous to anyone who knows anything about Coq.

poiko‘s solution might make more sense (I couldn‘t say):

unfold lt; intro n; induction n as [| n hn]; intro l.
- destruct l; simpl. inversion 1. inversion 1. exists a. auto. exists a. auto.
- destruct l. simpl. * inversion 1. * intros. apply hn. auto with arith.

I‘m having a hard time deciding whether this is a good task or not. I mean it made me despise Coq, and that‘s not good; I‘m usually all for ivory tower type theory stuff. On the one hand, motivating CTF players to learn some higher-order type theory and theorem proving sounds great, but on the other hand it feels very arbitrary and polarizing the way this task was put together, like putting up a poem in Finnish that tells you how to make the flag through puns—free points for anyone who speaks the language, a frustrating Google hammer experience for the rest.

Bit Flip 2 & 3

Similar to Bit Flip 1 but Bob‘s key isn‘t printed. This ruins the easy solution. You can still reverse as above, so you get a little bit of control over what the prime becomes as well as Alice‘s secret, both of which are of course known, but there‘s not much else to do.

These tasks had me super puzzled. Here‘s just various notes I made to myself mentally.

The brute forcing options included:

  • Brute force Bob‘s 64-bit secret. Nope, not gonna happen.
  • Brute force hashes to make Alice‘s secret 0 like a 64-bit PoW task. Also not gonna happen.
  • Search the SHA256-stream for primes under which 5 (the generator used) has a very low order, for example by making smooth. Given that Bit Flip 3 is just like Bit Flip 2 but with the added constraint of strong primes, it seemed to heavily suggest this was the path. But finding smooth numbers of this size without sieving you control is near impossible. So this was also not gonna happen.

I wondered about the (p % 5 == 4) chech in Bit Flip 3 — like are there any other constraints for random primes where a given number such as 5 might have low order in the associated field?

The task also specifically imports is_prime() from gmpy2, even though it uses Crypto.Util.numbers (which has isPrime()). It‘s the only thing it uses from gmpy2 too. That was curious, but I thought it was just some author idiosyncracy. Besides, you can‘t really construct the primes being generated, you can only pick one from some set of random ones, so again it‘s not like you can even begin to think about fooling a standard prime check.

All tasks also xors the shared secret with 1337 for seemingly no reason. No idea why. The shared secret will in “almost all“ cases be a 500+ bit number, of which the 128 most significant bits are used, so it doesn‘t matter. (Addendum: I was wrong here, I totally missed the “bug“ in Bit Flip 3.)

IV isn‘t reused, there doesn‘t seem to be a meet-in-the-middle, data collection probably doesn‘t help.

I failed to even find the problem I was supposed to work on, so I just put it out of my mind.

Post-CTF Addendum: after the CTF ended I found out that Bit Flip 3 can be solved trivially because long_to_bytes(x, 16) does something quite unexpected:

>>> long_to_bytes(random.getrandbits(513), 16)[:16]
b'\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x01'

This was actually really clever! Ever since I started joining CTFs I‘ve had a low-key distaste for long_to_bytes() and how pervasive it is. It feels so Python 2-y. It‘s in every crypto task under the sun, so its usage didn‘t stand out to me at all, and I never bothered to check up on it in this case. I‘m sorry I missed this a-ha moment. Deceptive!

However it seems that several people solved Bit Flip 2 by “brute force the hashes modulo OSINT.“ Namely: the BitCoin blockchain has a lot of such hashes, so you can find hashes there that match the requirements. Although that‘s a clever idea, it sort of lowers the task in my estimation greatly if that is the intended solution. It‘s just a really unfun way to solve problems. It feels a lot like “enter the MD5 hash in Google“ or “look up the factors on factordb.com“ but version 2.0. Big boy OSINT is still OSINT. If anyone has a programmatic or mathematical solution to Bit Flip 2 that doesn‘t look up dirty money-laundered power-guzzling hashes from other sources I‘m very interested to see what I missed?

Not that anyone reads these notes, but hey.

  1. There‘s only two hard problems in computer science: naming things, cache invalidation, and off-by-one errors. (Recycled Twitter joke.)