# Misc/Zh3r0 2021

## General Comments {#misc2021}

`poiko`

has been sneaking a few problems to me on the side from various
CTFs. I didn‘t take many notes, don‘t remember most of them, but here
are some.

## Boring MT blah (Zh3r0)

Basic problem was as follows:

```
import numpy.random as random
def rand_32():
return int.from_bytes(os.urandom(4),'big')
for _ in range(2):
# hate to do it twice, but i dont want people bruteforcing it
random.seed(rand_32())
iv,key = random.bytes(16), random.bytes(16)
cipher = AES.new(key,iv=iv,mode=AES.MODE_CBC)
flag = iv+cipher.encrypt(flag)
```

So a 32-bit seed is used with `numpy.random.seed()`

, but we have to get
through two of them.

When I first looked at the code I was somewhat excited because newer
`numpy`

uses PCG64, which I think is a really cool RNG, and would be
quite interesting to reverse, however, the `seed()`

stuff just uses the
legacy implementation, which is a Mersenne Twister, basically equivalent
to Python‘s (though some differences in initialization).

Mersenne Twister is one of the more boring RNGs (in my opinion), but I‘m also biased by the fact that it has a lot of finickiness with implementation and it‘s so easy to get stuck on off-by-one bugs or stuff like that. Basically, its implementation is more of a hassle than the RNG is worth. It also has, in my opinion, a greatly disproportionate state size for its quality.

Anyway, there‘s two approaches to this problem. One, the “trve“ way, is to actually directly reverse the 32-bit seed from the 16 bytes of output you get for free. Now, I know that this can be done but it is also not trivial, because the MT implementations use a super-convoluted initailization, and you have to copy all this code and make sure you get all the indexing right and it‘s just a royal pain in the ass with all the potential bugs.

The second approach, the “I don‘t want to deal with this shit“
approach, is to note that you can get as many of these outputs as you
want. So you can build a database of `output-bytes -> seed`

or something
like that, and then just get enough outputs from the server until it‘s
very likely you get two matches. I think the probability calculation
might go like $1−(1−2_{32}d )_{h}$ (?) where $d$ is the number of
entries you put in the database and $h$ is the number of times you hit
up the server. So calculating ~200 million seeds (not too bad,
time-wise) while `nc`

-ing the server 400 times (also not too
bad, because there‘s no PoW), gives you a fair chance of being able to
decrypt both ciphers.

I went for the second approach, which was simple and braindead.

## A~52~ group (unknown CTF)

I remember one problem he described where you are to find two elements
in the alternating group $a,b∈A_{52}$ such that given any deck
shuffle (of 52 cards) you can give a sequence of
$ab_{2}ab_{13}a_{5}⋯$ which gives that shuffle. I don‘t know
the exact problem statement, or what category it was in (guessing `rev`

or `misc`

), but it was an interesting little problem for being something
I haven‘t seen before.

Basically what I came up with is to have one element (the big loop)
cycle the first 51 cards like `(0 1 2 … 49 50)`

and then another (the
little loop) which cycles the last three cards like `(49 50 51)`

. And
then you can think of it as a really bad sorting algorithm, where you
incrementally start to sort the cards in the big loop (`0..50`

) by using
the 51st place as a register or temporary variable. You shift the card
you want to move into position 51, then align the rest of the cards
properly, then shift it out again, rinse repeat until all the cards are
sorted (or in whatever order you want). There‘s a special case when the
card you want to move is attached to the front or back of the sequence
of already-sorted cards (and you want to move it to the other side) in
the “big loop,“ but it‘s easy to take care of.

## Unicode troll (unknown CTF)

There was another problem that appeared to be a RNG reverse, but was just a troll task.

You give a seed as hex, it has various conditions applied to it, then the digits in your seed is scrambled, and you are given two bytes of output in an xorshift64 stream and asked to predict the value. The relevant part was:

```
seed = input("give me the seed: ")
seed = seed.strip()
if(len(seed)) != SEEDS:
print("seed should be "+str(SEEDS)+" bytes long!")
exit()
seed = list(seed)
random.shuffle(seed)
counts = collections.Counter(seed)
if counts.most_common()[0][1] > 3:
print ("You can't use the same number more than 3 times!")
exit()
int16 = lambda x: int(x,16)
seed = list(map(int16,seed))
```

The suspicious part is where it does the `Counter()`

stuff to check for
repeated digits. I got a hunch and yes, sure enough:

```
>>> int('1٠۰߀०০੦૦୦௦౦೦൦෦๐໐༠၀႐០᠐᥆᧐᪀᪐᭐᮰᱀᱐꘠꣐꤀꧐꧰꩐꯰０𐒠𐴰𑁦𑃰𑄶𑇐𑋰𑑐𑓐𑙐𑛀𑜰𑣠𑱐')
100000000000000000000000000000000000000000000000000
```

So you can literally just feed it 0 as the seed and xorshift64 is trivially all zeros.

You can find these weird characters as follows:

```
''.join([chr(x) for x in range(256,0x110000) if unicodedata.category(chr(x)) == 'Nd' and unicodedata.digit(chr(x)) == 0])
```

Apparently the task (in whatever CTF it was) was marked with `crypto`

as
well, which is why I call it a troll.

## homebrew hash function (Zh3r0)

There was a hash function like

```
def hash(text:bytes):
text = pad(text)
text = [int.from_bytes(text[i:i+4],'big') for i in range(0,len(text),4)]
M = 0xffff
x,y,z,u = 0x0124fdce, 0x89ab57ea, 0xba89370a, 0xfedc45ef
A,B,C,D = 0x401ab257, 0xb7cd34e1, 0x76b3a27c, 0xf13c3adf
RV1,RV2,RV3,RV4 = 0xe12f23cd, 0xc5ab6789, 0xf1234567, 0x9a8bc7ef
for i in range(0,len(text),4):
X,Y,Z,U = text[i]^x,text[i+1]^y,text[i+2]^z,text[i+3]^u
RV1 ^= (x := (X&0xffff)*(M - (Y>>16)) ^ ROTL(Z,1) ^ ROTR(U,1) ^ A)
RV2 ^= (y := (Y&0xffff)*(M - (Z>>16)) ^ ROTL(U,2) ^ ROTR(X,2) ^ B)
RV3 ^= (z := (Z&0xffff)*(M - (U>>16)) ^ ROTL(X,3) ^ ROTR(Y,3) ^ C)
RV4 ^= (u := (U&0xffff)*(M - (X>>16)) ^ ROTL(Y,4) ^ ROTR(Z,4) ^ D)
for i in range(4):
RV1 ^= (x := (X&0xffff)*(M - (Y>>16)) ^ ROTL(Z,1) ^ ROTR(U,1) ^ A)
RV2 ^= (y := (Y&0xffff)*(M - (Z>>16)) ^ ROTL(U,2) ^ ROTR(X,2) ^ B)
RV3 ^= (z := (Z&0xffff)*(M - (U>>16)) ^ ROTL(X,3) ^ ROTR(Y,3) ^ C)
RV4 ^= (u := (U&0xffff)*(M - (X>>16)) ^ ROTL(Y,4) ^ ROTR(Z,4) ^ D)
return int.to_bytes( (RV1<<96)|(RV2<<64)|(RV3<<32)|RV4 ,16,'big')
```

And the goal was to find a collision. Stripping away all the trivially reversible fluff, what we want to attack is:

```
RV1 ^= (X&0xffff)*(M - (Y>>16)) ^ ROTL(Z,1) ^ ROTR(U,1)
RV2 ^= (Y&0xffff)*(M - (Z>>16)) ^ ROTL(U,2) ^ ROTR(X,2)
RV3 ^= (Z&0xffff)*(M - (U>>16)) ^ ROTL(X,3) ^ ROTR(Y,3)
RV4 ^= (U&0xffff)*(M - (X>>16)) ^ ROTL(Y,4) ^ ROTR(Z,4)
```

Specifically, the only thing that is nonlinear here is the
multiplication. I think you could approach this in several ways, like a
sort of backward search on each block, or try to find out how two blocks
can cancel each other out, but as a preliminary I put it into z3 first.
I had a suspicion that you can solve the above for several instances of
`X, Y, Z, U`

to get a collision in the very first 16 bytes, or at least
I couldn‘t see why it shouldn‘t be possible. But sure enough, z3 found
such an instance pretty quickly (fixing one variable to be anything to
ground it) and it was just done. Whatever else in the data doesn‘t
matter.

## almost-combinadics (Zh3r0)

The problem was:

```
def nk2n(nk):
l = len(nk)
if l == 1:
return nk[0]
elif l == 2:
i,j = nk
return ((i+j) * (i+j+1)) // 2 + j
return nk2n([nk2n(nk[:l-l//2]), nk2n(nk[l-l//2:])])
print(nk2n(flag))
#2597749519984520018193538914972744028780767067373210633843441892910830749749277631182596420937027368405416666234869030284255514216592219508067528406889067888675964979055810441575553504341722797908073355991646423732420612775191216409926513346494355434293682149298585
```

Right off the bat there was a strong “smell“ of some sort of “exotic
base conversion“ here, and I mentioned combinatorial number system to
`poiko`

. He pointed out that indeed, the case of `l==2`

above is
`choose(i+j+1,2)+j`

. That made it click for me even though the whole
combinatorial angle isn‘t really needed or relevant, the idea is just
that the `l==2`

step is easily reversible:

```
def rev(q):
# q == comb(i+j+1,2) + j
r = iroot(2*q,2)[0]
if comb(r+1,2) < q: r += 1
# r == i+j+1
j = q - comb(r,2)
i = r - j - 1
return (i,j)
```

So you can simply keep calling `rev(v)`

on all the numbers `>255`

until
all you have is bytes, and the flag comes out.

## approximate mastermind (Zh3r0)

This was a PPC task of solving Mastermind when the server sometimes
(with a low probability) lies in its responses. Full disclosure is I
didn‘t solve this one until after the CTF, so unfortunately no points
for `mode13h`

but it was interesting enough.

So my first idea was to do the classical solution, (because the number of pegs and colors were low in the challenge games,) which is to enumerate all possible solution and weed out ones that didn‘t match the constraints, and, if this leads to 0 possible solutions because the server lied at some point, to sort of backtrack and remove one of the constraints until it is consistent again. This obviously won‘t work if the server lies too much, but the hope was that I could just retry the game until the server lied few enough times, probabilistically.

I did this with `numpy`

to make it somewhat efficient, something like

```
def c_bulls(a,b):
return (a == b).sum(1, dtype=np.int8)
def c_cows(C,a,b):
b = tuple(b)
c = np.zeros((a.shape[0],), a.dtype)
for k in range(C):
bc = b.count(k)
if bc == 0:
continue
x = (a==k).sum(1, dtype=np.int8)
c += np.min(x, bc)
return c
```

(Here `a`

is a `k×P`

matrix of “potential solutions.“ Note also
`c_cows()`

include the bull count, which I find more intuitive, whereas
the server essentially uses `cows = c_cows() - c_bulls()`

.)

To select a next guess I sampled some of the potential solutions and found ones that would lead to the most other candidates being eliminated no matter the response (min-max). Anyway, it wasn‘t the fastest, and I ran into some weird (hidden) timeout issues and it kept failing. I don‘t know what the timeout on the server actually was, because it wasn‘t explicitly stated, which was annoying.

I then changed it to use a different approach, where I would generate a small set of “best candidates“ solutions (initially random) and just keep iteritatively randomizing them until reaching some local maximum where they would all give approximately the same answers as the answers I had gotten from the server, adding more random candidates that again get mutated down to “good candidates“ when previous ones end up identical. For each guess I just used the best candidate I had so far.

```
def local_mutate(self, ch):
tmp = list(ch)
f = self.score(ch)
order = list(range(self.P))
random.shuffle(order)
for i in order:
p, tmp[i] = tmp[i], random.randrange(self.N)
nf = self.score(tmp)
if nf < f:
f = nf
else:
tmp[i] = p
return f, tuple(tmp)
def score(self, cand):
"Score a guess based on previous guesses and responses"
f = 0.0
cntc = Counter(cand)
for g, r, w in self.history:
r_ = sum(x==y for x,y in zip(g,cand))
w_ = sum((Counter(g) & cntc).values())
f += self.bull_weight * abs(r_-r)
f += abs(w_-w)
return f
```

This is more of a genetic algorithm approach, kind of similar to something I‘ve done when finding a factorization where the primes had very specific bit patterns applied to them. It turned out this approach worked surprisingly well actually, and immediately gave the flag:

```
b'level passed, good job\n'
b'you earned it : zh3r0{wh3n_3asy_g4m3s_b3come_unnecessarily_challenging}\n'
```

It was also much faster than the full “brute force“ numpy solution, even though here I did everything in slow Python code.