Misc/Zh3r0 2021

General Comments {#misc2021}

poiko has been sneaking a few problems to me on the side from various CTFs. I didn‘t take many notes, don‘t remember most of them, but here are some.

Boring MT blah (Zh3r0)

Basic problem was as follows:

import numpy.random as random

def rand_32():
    return int.from_bytes(os.urandom(4),'big')

for _ in range(2):
    # hate to do it twice, but i dont want people bruteforcing it
    random.seed(rand_32())
    iv,key = random.bytes(16), random.bytes(16)
    cipher = AES.new(key,iv=iv,mode=AES.MODE_CBC)
    flag = iv+cipher.encrypt(flag)

So a 32-bit seed is used with numpy.random.seed(), but we have to get through two of them.

When I first looked at the code I was somewhat excited because newer numpy uses PCG64, which I think is a really cool RNG, and would be quite interesting to reverse, however, the seed() stuff just uses the legacy implementation, which is a Mersenne Twister, basically equivalent to Python‘s (though some differences in initialization).

Mersenne Twister is one of the more boring RNGs (in my opinion), but I‘m also biased by the fact that it has a lot of finickiness with implementation and it‘s so easy to get stuck on off-by-one bugs or stuff like that. Basically, its implementation is more of a hassle than the RNG is worth. It also has, in my opinion, a greatly disproportionate state size for its quality.

Anyway, there‘s two approaches to this problem. One, the “trve“ way, is to actually directly reverse the 32-bit seed from the 16 bytes of output you get for free. Now, I know that this can be done but it is also not trivial, because the MT implementations use a super-convoluted initailization, and you have to copy all this code and make sure you get all the indexing right and it‘s just a royal pain in the ass with all the potential bugs.

The second approach, the “I don‘t want to deal with this shit“ approach, is to note that you can get as many of these outputs as you want. So you can build a database of output-bytes -> seed or something like that, and then just get enough outputs from the server until it‘s very likely you get two matches. I think the probability calculation might go like (?) where is the number of entries you put in the database and is the number of times you hit up the server. So calculating ~200 million seeds (not too bad, time-wise) while nc-ing the server 400 times (also not too bad, because there‘s no PoW), gives you a fair chance of being able to decrypt both ciphers.

I went for the second approach, which was simple and braindead.

A~52~ group (unknown CTF)

I remember one problem he described where you are to find two elements in the alternating group such that given any deck shuffle (of 52 cards) you can give a sequence of which gives that shuffle. I don‘t know the exact problem statement, or what category it was in (guessing rev or misc), but it was an interesting little problem for being something I haven‘t seen before.

Basically what I came up with is to have one element (the big loop) cycle the first 51 cards like (0 1 2 … 49 50) and then another (the little loop) which cycles the last three cards like (49 50 51). And then you can think of it as a really bad sorting algorithm, where you incrementally start to sort the cards in the big loop (0..50) by using the 51st place as a register or temporary variable. You shift the card you want to move into position 51, then align the rest of the cards properly, then shift it out again, rinse repeat until all the cards are sorted (or in whatever order you want). There‘s a special case when the card you want to move is attached to the front or back of the sequence of already-sorted cards (and you want to move it to the other side) in the “big loop,“ but it‘s easy to take care of.

Unicode troll (unknown CTF)

There was another problem that appeared to be a RNG reverse, but was just a troll task.

You give a seed as hex, it has various conditions applied to it, then the digits in your seed is scrambled, and you are given two bytes of output in an xorshift64 stream and asked to predict the value. The relevant part was:

seed = input("give me the seed: ")
seed = seed.strip()

if(len(seed)) != SEEDS:
    print("seed should be "+str(SEEDS)+" bytes long!")
    exit()

seed = list(seed)
random.shuffle(seed)

counts = collections.Counter(seed)
if counts.most_common()[0][1] > 3:
        print ("You can't use the same number more than 3 times!")
        exit()

int16 = lambda x: int(x,16)
seed = list(map(int16,seed))

The suspicious part is where it does the Counter() stuff to check for repeated digits. I got a hunch and yes, sure enough:

>>> int('1٠۰߀०০੦૦୦௦౦೦൦෦๐໐༠၀႐០᠐᥆᧐᪀᪐᭐᮰᱀᱐꘠꣐꤀꧐꧰꩐꯰0𐒠𐴰𑁦𑃰𑄶𑇐𑋰𑑐𑓐𑙐𑛀𑜰𑣠𑱐')
100000000000000000000000000000000000000000000000000

So you can literally just feed it 0 as the seed and xorshift64 is trivially all zeros.

You can find these weird characters as follows:

''.join([chr(x) for x in range(256,0x110000) if unicodedata.category(chr(x)) == 'Nd' and unicodedata.digit(chr(x)) == 0])

Apparently the task (in whatever CTF it was) was marked with crypto as well, which is why I call it a troll.

homebrew hash function (Zh3r0)

There was a hash function like

def hash(text:bytes):
    text = pad(text)
    text = [int.from_bytes(text[i:i+4],'big') for i in range(0,len(text),4)]
    M = 0xffff
    x,y,z,u = 0x0124fdce, 0x89ab57ea, 0xba89370a, 0xfedc45ef
    A,B,C,D = 0x401ab257, 0xb7cd34e1, 0x76b3a27c, 0xf13c3adf
    RV1,RV2,RV3,RV4 = 0xe12f23cd, 0xc5ab6789, 0xf1234567, 0x9a8bc7ef
    for i in range(0,len(text),4):
        X,Y,Z,U = text[i]^x,text[i+1]^y,text[i+2]^z,text[i+3]^u
        RV1 ^= (x := (X&0xffff)*(M - (Y>>16)) ^ ROTL(Z,1) ^ ROTR(U,1) ^ A)
        RV2 ^= (y := (Y&0xffff)*(M - (Z>>16)) ^ ROTL(U,2) ^ ROTR(X,2) ^ B)
        RV3 ^= (z := (Z&0xffff)*(M - (U>>16)) ^ ROTL(X,3) ^ ROTR(Y,3) ^ C)
        RV4 ^= (u := (U&0xffff)*(M - (X>>16)) ^ ROTL(Y,4) ^ ROTR(Z,4) ^ D)
    for i in range(4):
        RV1 ^= (x := (X&0xffff)*(M - (Y>>16)) ^ ROTL(Z,1) ^ ROTR(U,1) ^ A)
        RV2 ^= (y := (Y&0xffff)*(M - (Z>>16)) ^ ROTL(U,2) ^ ROTR(X,2) ^ B)
        RV3 ^= (z := (Z&0xffff)*(M - (U>>16)) ^ ROTL(X,3) ^ ROTR(Y,3) ^ C)
        RV4 ^= (u := (U&0xffff)*(M - (X>>16)) ^ ROTL(Y,4) ^ ROTR(Z,4) ^ D)
    return int.to_bytes( (RV1<<96)|(RV2<<64)|(RV3<<32)|RV4 ,16,'big')

And the goal was to find a collision. Stripping away all the trivially reversible fluff, what we want to attack is:

RV1 ^= (X&0xffff)*(M - (Y>>16)) ^ ROTL(Z,1) ^ ROTR(U,1)
RV2 ^= (Y&0xffff)*(M - (Z>>16)) ^ ROTL(U,2) ^ ROTR(X,2)
RV3 ^= (Z&0xffff)*(M - (U>>16)) ^ ROTL(X,3) ^ ROTR(Y,3)
RV4 ^= (U&0xffff)*(M - (X>>16)) ^ ROTL(Y,4) ^ ROTR(Z,4)

Specifically, the only thing that is nonlinear here is the multiplication. I think you could approach this in several ways, like a sort of backward search on each block, or try to find out how two blocks can cancel each other out, but as a preliminary I put it into z3 first. I had a suspicion that you can solve the above for several instances of X, Y, Z, U to get a collision in the very first 16 bytes, or at least I couldn‘t see why it shouldn‘t be possible. But sure enough, z3 found such an instance pretty quickly (fixing one variable to be anything to ground it) and it was just done. Whatever else in the data doesn‘t matter.

almost-combinadics (Zh3r0)

The problem was:

def nk2n(nk):
    l = len(nk)
    if l == 1:
        return nk[0]
    elif l == 2:
        i,j = nk
        return ((i+j) * (i+j+1)) // 2 + j
    return nk2n([nk2n(nk[:l-l//2]), nk2n(nk[l-l//2:])])

print(nk2n(flag))
#2597749519984520018193538914972744028780767067373210633843441892910830749749277631182596420937027368405416666234869030284255514216592219508067528406889067888675964979055810441575553504341722797908073355991646423732420612775191216409926513346494355434293682149298585

Right off the bat there was a strong “smell“ of some sort of “exotic base conversion“ here, and I mentioned combinatorial number system to poiko. He pointed out that indeed, the case of l==2 above is choose(i+j+1,2)+j. That made it click for me even though the whole combinatorial angle isn‘t really needed or relevant, the idea is just that the l==2 step is easily reversible:

def rev(q):
  # q == comb(i+j+1,2) + j
  r = iroot(2*q,2)[0]
  if comb(r+1,2) < q: r += 1
  # r == i+j+1
  j = q - comb(r,2)
  i = r - j - 1
  return (i,j)

So you can simply keep calling rev(v) on all the numbers >255 until all you have is bytes, and the flag comes out.

approximate mastermind (Zh3r0)

This was a PPC task of solving Mastermind when the server sometimes (with a low probability) lies in its responses. Full disclosure is I didn‘t solve this one until after the CTF, so unfortunately no points for mode13h but it was interesting enough.

So my first idea was to do the classical solution, (because the number of pegs and colors were low in the challenge games,) which is to enumerate all possible solution and weed out ones that didn‘t match the constraints, and, if this leads to 0 possible solutions because the server lied at some point, to sort of backtrack and remove one of the constraints until it is consistent again. This obviously won‘t work if the server lies too much, but the hope was that I could just retry the game until the server lied few enough times, probabilistically.

I did this with numpy to make it somewhat efficient, something like

def c_bulls(a,b):
  return (a == b).sum(1, dtype=np.int8)

def c_cows(C,a,b):
  b = tuple(b)
  c = np.zeros((a.shape[0],), a.dtype)
  for k in range(C):
    bc = b.count(k)
    if bc == 0:
      continue
    x = (a==k).sum(1, dtype=np.int8)
    c += np.min(x, bc)
  return c

(Here a is a k×P matrix of “potential solutions.“ Note also c_cows() include the bull count, which I find more intuitive, whereas the server essentially uses cows = c_cows() - c_bulls().)

To select a next guess I sampled some of the potential solutions and found ones that would lead to the most other candidates being eliminated no matter the response (min-max). Anyway, it wasn‘t the fastest, and I ran into some weird (hidden) timeout issues and it kept failing. I don‘t know what the timeout on the server actually was, because it wasn‘t explicitly stated, which was annoying.

I then changed it to use a different approach, where I would generate a small set of “best candidates“ solutions (initially random) and just keep iteritatively randomizing them until reaching some local maximum where they would all give approximately the same answers as the answers I had gotten from the server, adding more random candidates that again get mutated down to “good candidates“ when previous ones end up identical. For each guess I just used the best candidate I had so far.

def local_mutate(self, ch):
  tmp = list(ch)
  f = self.score(ch)
  order = list(range(self.P))
  random.shuffle(order)
  for i in order:
    p, tmp[i] = tmp[i], random.randrange(self.N)
    nf = self.score(tmp)
    if nf < f:
      f = nf
    else:
      tmp[i] = p
  return f, tuple(tmp)

def score(self, cand):
  "Score a guess based on previous guesses and responses"
  f = 0.0
  cntc = Counter(cand)
  for g, r, w in self.history:
    r_ = sum(x==y for x,y in zip(g,cand))
    w_ = sum((Counter(g) & cntc).values())
    f += self.bull_weight * abs(r_-r)
    f += abs(w_-w)
  return f

This is more of a genetic algorithm approach, kind of similar to something I‘ve done when finding a factorization where the primes had very specific bit patterns applied to them. It turned out this approach worked surprisingly well actually, and immediately gave the flag:

b'level passed, good job\n'
b'you earned it : zh3r0{wh3n_3asy_g4m3s_b3come_unnecessarily_challenging}\n'

It was also much faster than the full “brute force“ numpy solution, even though here I did everything in slow Python code.