# Introduction

Hi.

I’m Frank.

This is me using `mdbook`

^{1} as a personal web page, after I gave up on the
alternatives. Much is exported from Org-mode files.

Divided into the sections you see on the left.

*Narcissus*: “Everyone generalizes from one example. At least, I do…”*Harpocrates*: CTF competitions, cryptography, and the like.*Hephaestus*: describing or documenting personal projects.*Calliope*: STEM’s not all.*Lyssa*: (hidden) rants and passive-aggressive notes.

Some stuff is still incomplete or missing exports. For example
the CTF notes haven’t been properly converted, they’re just `pandoc`

output from
my Org file, so there might be some formatting errors and dead links.

# Self-doxxing

Like I said, my name is Frank. I go by `fsh`

, `franksh`

, `tope`

, `metope`

. My old IRC nick used to be `lamer`

.

I am friendly to the point of being flirtatious, so feel free to message me:

- Discord:
`tope#9134@Discord`

- Matrix:
`@franksh:matrix.org`

Social media links:

- GitLab: my git home
^{1}. - GitHub: secondary, but it’s set to mirror some of my repositories.
- Goodreads: not updated in a long time.
- CryptoHack: new challenges when?

“why GitLab, everybody uses GitHub!” Because I signed up in the olden days when GitHub didn’t offer free repositories.↩

Sometimes I play CTFs on the small two-man team `mode13h`

. (The other member
being `poiko`

, a god of all things reverse/arch/VM.)

These are my notes.

**Audience:** I am writing these notes to some imagined fellow CTFer who
looked at the same tasks that I did, but missed some critical step for
the solve, or perhaps someone who did solve them and just wants to
compare notes. I’ll rarely re-post the full problem as given in the
CTF, nor do very detailed expositions. For the time being this isn’t
intended as a pedagogical resource. The target audience (“you”) are
not complete beginners, *some* facility with math is assumed.

**Motivation:** Even if I don’t participate in them competitively I’ll sometimes
solve the tasks offline.

I am fascinated by the generalities behind puzzles—i.e. their mathematical expression—and I love solving problems with programming. Fascination gives off a glow, and this glow-energy is what I’d like to call motivation. The competitive element of CTFs has little bearing, and I tend to avoid CTFs with uninteresting problems.

**Content:** My favorite tasks are of the “puzzle”^{1} variety. In
CTFs these would normally be found under misc and crypto. My safe space
is the intersection between programming and mathematics.

full moon to reveal a bunch of points that turn out to align with certain zodiac signs in the sky, and then after translating the zodiacs into numbers using the provided plastic spy-codex, you get a number which turns out to be the Dewey decimal index where you can find the book that was used for the book cipher. (See example.) That kind of Sherlock Holmes stuff is completely orthogonal to the kind of puzzles I enjoy.

by “puzzles” I mean problems that have a

*natural*translation into mathematics and reveal there a certain simplicity and elegance in generalized form. It stands in marked contrast to “riddles”^{2}, which tend to resist mathematical patterns, and are usually dependent on culture, language, or specific trivia.↩

# N1CTF2020

## General Comments

An **A+** CTF! An *incredible* amount of tasks, from mid-tier to
top-tier difficulty level. Objectively the task set alone deserves a
~100 rating on CTFTime^{1}.

I did the crypto tasks and will outline solves for those below. I also glanced at some of the “easier” tasks in misc and web, but even the entry-level tasks in those categories were beyond me.

*Personal downside #1*: very biased toward web, rev, and pwn, which I am
pretty clueless about, though those are the “true” staples of CTF. I
realize this is more like a pure upside for most people.

On the end of the first day there were only three crypto tasks released yet more than fifteen(!?) web, pwn, and rev tasks out. I sat around twiddling my thumbs a bit, feeling pretty useless.

The only misc-like task I saw involved PHP, which is more of a mental
plague than a programming language^{2}. I played around with it a bit,
but finally just went to bed feeling a little bit dumber and somewhat
frustrated because I thought all tasks had been released and that I’d
be useless for the rest of the CTF.

*Personal downside #2*: Two more crypto were released just after I’d
gone to sleep on day two, so I only had a few hours for them when I woke
up. Although the new tasks were a nice surprise, the timing was very
unfortunate for me.

*Plea for organizers in general: please consider having all tasks out by
the mid-way point of the CTF.* If not, then communicate your planned
task release schedule so it’s possible to better manage our time?

We ended up placing 13th^{3}.

## VSS

You’re given some code which generates a QR-code image of the flag and
uses that image to create two new randomized images which—when
combined—would reconstruct the original (albeit transposed? I am
guessing, I didn’t actually run the code). You also receive *one* of
the images generated. It struck me as a little cryptic.

I ignored the whole visual threshold part of the task and noted it uses Python’s random module to generate the pixels in the output images. That’s a lot of random data, and the QR-code image it’s based on has a lot of known fixed output. After double-checking that the QR-image had enough known pixels (you get to reverse one bit of state per pixel) and where it was (it would be hell if it wasn’t contiguous), it reduces to a “standard” reverse-the-RNG-state task.

For the reversing Python’s MT the *easy* way you need $32⋅672$
contiguous bits of output. That is if you don’t have to worry about
missing state. You need to undo the tempering that MT does to its output
words:

```
def untemper(x):
x ^= (x >> 18)
x ^= (x << 15) & 0xefc60000
x ^= ((x << 7) & 0x9d2c5680) ^ ((x << 14) & 0x94284000) ^ ((x << 21) & 0x14200000) ^ ((x << 28) & 0x10000000)
x ^= (x >> 11) ^ (x >> 22)
return x
```

You combine words `i`

, `i+1`

, and `i+M`

to get word `i+N`

:

```
tmp = w[i][31] ^ w[i+1][:31] # using slice notation to select bits
w[i+N] = tmp>>1 ^ w[i+M] ^ (A if tmp[0] else 0)
```

`M`

, `N`

, `A`

are constants from
`_randommodule.c`

in CPython source code. Then you reapply
the tempering (see aforementioned `.c`

source) and it should
match Python’s output. That’s the basic idea.

It was a task which is very easy to realize the solution but rather
painful to implement. I struggled with infinite little indexing bugs.
The headache I got from trying to directly indexing into
`bin(random.getrandbits(...))`

was not worth it. (Bits within words run
from bit $2_{31}$ to $1$, but the words are in little-endian order.) I
even had bugs in the final solution as some randomness was leaking
through, but fortunately QR-codes are highly redundant, so I didn’t
care. Then again I probably did things needlessly complicated by
reverse-generating the original QR-code instead of simply generating the
“companion” image to the one shared.

Apart from headaches it gave me with my own bugs, it’s actually a fairly clever task, because there aren’t any obvious “hints” that points you in the right direction, so it might take a while to notice. I was pretty lucky.

## FlagBot

Several random 256-bit elliptic curves are generated and used to do a standard key exchange to AES-encrypt the flag. The curves are all asserted to be non-smooth and resistant to MOV attack. You get the public output of the exchanges as well as the encrypted messages. It’s a very nice and clean task with a likewise straightforward implementation. It was a pure blessing after the indexical spaghetti that was VSS.

The key to realize is that the secret is reused (for both client and server). The generated random curves have a lot of small random subgroups, so you can solve the discrete log in those subgroups (multiply the generator and public keys by $(p−1)/q$ to put it in the subgroup of size $q$), get constraints like $secret=x_{i}(modq_{i})$, and then do Chinese Remainder when you have enough. A partial Pohlig-Hellman, if you will.

I think I did most of this task in REPL, but the pseudo-code sketch would be:

```
crt_r, crt_s = [], []
for ec,pk_r,pk_s in data:
order = ec.order()
for f in small_factors(order):
crt_r.append( (babystepgiantstep(f, pk_r*(order/f), g*(order/f)), f) )
crt_s.append( (babystepgiantstep(f, pk_s*(order/f), g*(order/f)), f) )
r,s = crt(*crt_r)[0], crt(*crt_s)[0]
```

## curve

In this task (against a live server) you’re asked for elliptic curve parameters $(p,a,b)$ and two points $(G_{1},G_{2})$ and then have to solve 30 rounds of distinguishing $G_{1}⋅r⋅s$ from $G_{1}⋅x$ when given $G_{1}⋅r$ and $G_{2}⋅s$ (for random secret integers $r,s,x$). It will throw an exception if you give it a singular curve or points not on the curve.

At first glance I thought this was too similar to `FlagBot`

, because
there are no checks against the points being in a small subgroup. I knew
you could also use trace 1 curves on which the much-copy-pasted Smart
attack works, but I only have that code in some Sage docker; I wanted to
use my own comfortable code and homemade Python libraries. Besides, *I
got this*: I thought it would just be a matter of trivially putting them
into small subgroups and solving the CRT again. A bit too easy, maybe…

Yeah, after a while I realized my oops: it required a very *large* prime
modulus $p$ and calls `E.order()`

on the curve — *and* there’s a timer
on the program. The `E.order()`

call takes well over a minute, sometimes
several minutes, and so there’s no time to do the loop. I wasted some
time trying to find random curves for which `E.order()`

was smooth or
took less time but…

Finally I relented and tested `E.order()`

on a curve with $∣E_{p}∣=p$.
It was instant, of course, so… *sigh* Copy-pasted Sage code it is,
then.

Now the problem was to generate curves with trace 1, which I didn’t
know how to do, but `poiko`

talked of a DEFCON task which had involved
generating such curves and gave me a paper:
http://www.monnerat.info/publications/anomalous.pdf

From the paper I figured I wanted curves under primes
$p=11m(m+1)+3$ with $(a,b)$ which gives j-invariant equal to
$−2_{15}$, I quickly found plenty such curves. Then copy-paste
`Smart_Attack()`

and blah, blah, the usual stuff. (I really despise
copy-pasting.) A bit unrewarding due to my stubbornness, but I have to
admit it was a good task in the end, even if the “hidden” constraint
of “the given curve must also have a fast `E.order()`

” was a bit
devious^{4}.

## easy RSA?

A fun task which had two stages.

It generates some RSA-modulus $n=p∗q$ and encrypts a large vector with lots of numbers. These numbers are the vector (dot) products between the flag and random numbers, offset by some small error, and then modulo a small prime. You are given the random numbers used for free, but the errors are secret.

The primes are generated in a truly bizarre way:

```
mark = 3**66
def get_random_prime():
total = 0
for i in range(5):
total += mark**i * getRandomNBitInteger(32)
fac = str(factor(total)).split(" * ")
return int(fac[-1])
```

It generates a number that has “small” digits in base-$3_{66}$ and returns the largest prime in the factorization of this number. That’s one of the oddest ways to generate primes I’ve seen.

But (the “a-ha” moment) it means that $n⋅x$ also has “small” digits in base-$3_{66}$ for some $x$ which is itself also “small” (compared to $n$).

I used a lattice like

```
[ n * HH, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1],
[ 1 * HH, HL, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[B**1 * HH, 0, HL, 0, 0, 0, 0, 0, 0, 0, 0],
[B**2 * HH, 0, 0, HL, 0, 0, 0, 0, 0, 0, 0],
[B**3 * HH, 0, 0, 0, HL, 0, 0, 0, 0, 0, 0],
[B**4 * HH, 0, 0, 0, 0, HL, 0, 0, 0, 0, 0],
[B**5 * HH, 0, 0, 0, 0, 0, HL, 0, 0, 0, 0],
[B**6 * HH, 0, 0, 0, 0, 0, 0, HL, 0, 0, 0],
[B**7 * HH, 0, 0, 0, 0, 0, 0, 0, HL, 0, 0],
[B**8 * HH, 0, 0, 0, 0, 0, 0, 0, 0, HL, 0],
```

to find $x$. There might be better ways, but it worked. Now to factor. Trivially you can find the first and last digits of the primes. There’s probably some more clever way to do the whole thing, but I just did the dumb thing and used the base-$B$ digits of $n⋅x$ as coefficients for a polynomial over $Z_{p}$ for some large $p$ and factored that. That gave me two numbers back (when multiplied so the least/most significant digits match), each of which shared one of the big primes with $n$. This was all a bit rushed because it took me too long to discover the “trick” of finding $n⋅x$ and I just did everything in the REPL at this point, using my own utility libraries and NTL wrapper.

So now I had $n=p⋅q$ and could decrypt the vector to get a big system of linear relations with “small errors” modulo some small prime. The lattice is life, the lattice is love. Maybe there’s a more clever way to solve this as well, but did I mention I was a little pressed for time at this point? The system in its entirely was too big for my poor, slow LLL, but a small random subset of equations worked just fine (every equation involved the entire flag).

## babyProof

I “solved” this task but was a little too late. I had already realized the solution by the last hour or two, but underestimated how many datasets I needed and wasted some time with self-doubt, double-checking, and massaging uncooperative lattices. When the competition ended I went to eat and by the time I got back there was enough data and I got the (bittersweet) flag easily.

Like `FlagBot`

it’s a deceptively clean and simple task, where you
don’t realize the problem until that sweet “a-ha!” moment. The server
uses generated DSA constants (a (huge) prime $p$, a generator $g$ for a
large prime ($q$) subgroup of $Z_{p}$) to prove that it knows $x$
in $y=g_{x}(modp)$ by giving you $y$, $t=y_{v}(modp)$ (for some secret
ephemeral key $v$) and $r=v−c⋅x(modq)$ with $c$ being a SHA256 constant the recepient can derive from
$(g,y,t)$. It looks respectable and above the board at first glance…

But basically you need to forget everything I just said and ignore the whole business with $p$ and discrete logs. With $r$ you’re given a system of inequalities like $c_{i}⋅flag+r_{i}<flag(modq_{i})$, because the ephemeral key is erroneously generated to be less than $x$ which was asserted to be 247 bits in the code (while $q$ is 256 bits), so each equation ought to reveal a little bit more about $x$. It is clearly another lattice problem.

I’m not sure if there are any better or more clever ways to set up the lattice, but what worked for me in the end was the most obvious and straightforward:

```
[[1, 0, c0, c1, ..., cn],
[0, 2^248, r0, r2, ..., rn],
[0, 0, q0, 0, ..., 0]
[0, 0, 0, q1, ..., 0]
[0, 0, 0, 0, ..., qn]]
```

`b'n1ctf{S0me_kn0wl3dg3_is_leak3d}'`

Since I can no longer enter it on
the site, I’ll leave it here to be sad and alone. :(

Given appropriate adjustment of over-rated CTFs that feature stego and guess-tasks.↩

I firmly believe the inventors of PHP ought to be retroactively punished for having made the world a worse place. Every time I had to read a PHP doc page and look at the comments by PHP users I could feel myself losing IQ points as if breathing in neurotoxic mold.↩

We’re only a two-man team, so naturally we don’t stand a chance for a competitive placement, but it was a lot of fun with good, challenging tasks.↩

Maybe I’m only critical because I’m embarrassed it fooled me!↩

# Hack.lu 2020

## General Comments

An unfortunate step down from last weekend’s N1CTF2020
even though this one has a much higher rating on `CTFTime`

.

**DISCLAIMER:** I’m judging with very biased goggles here; I’m sure
web/rev/pwn were all kinds of excellent and top notch woo-hoo, but misc
and crypto was very disappointing and demotivating for this rating
bracket and I tapped out early. It felt too easy and/or uninspired
and/or unoriginal.

I am also going to sound *incredibly* negative. It wasn’t that bad, but
it definitely wasn’t a 100-point CTF, and that’s the stance I’m
critiquing from. The tasks I looked at felt like they belonged in the
50-70 range.

Pwnhub Collection which was (unnecessarily) gated
behind rev would probably be my pick for “best” crypto-task, because
it actively combines two related techniques in a sensible way.
Conquering Premium Access
had a copy-pastable solution
through Google, which was not fun to discover.
Bad Primes was somewhat trivial^{1}.
BabyJS was cute but
that’s about it. More info about them below. (As usual I will just
outline the solves, these are just notes that should allow anyone who
attempted or failed the problems to solve them.)

## Bad Primes

In this problem^{2} you’re given $c=flag_{e}(modp⋅q)$ with all numbers known except
$flag$. The caveat is that $e∣p−1$ so there’s no
$d=e_{−1}(modp−1)$ for recovering the flag directly. (Finding the
flag $(modq)$ works trivially, so we ignore that part.)

So finding the *e*-th root of $c(modp)$ is the interesting part. You
can do it in two different ways:

The I-don’t-want-to-learn-anything Way (also known as
I’m-a-professional-I-don’t-have-time-for-this): press the
`[Skip This Step]`

button and do `GF(p)(c).nth_root(e)`

in Sage.

The Problem Child Way:

So we want a number $x$ such that $x_{e}=c(modp)$ with $p−1=e⋅s$ and $gcd(s,e)=1$.

The a-ha comes after exploring the hunch that $s$ is important here. Look at $j=e_{−1}(mods)$. This means means $j⋅e=1+k⋅s$ for some $k$. In fact, it means that $j⋅e⋅e=e+k⋅(p−1)$. A-ha! So now $(c_{j})_{e}=c_{e⋅j}=c_{k⋅s+1}=x_{e⋅(k⋅s+1)}=x_{k⋅(p−1)+e}=x_{e}(modp)$. Voila! $c_{j}$ is a root.

Other reasoning lines probably exist too, but this was the one I took.

Next step we iteratively multiply with any *e*-th root of unity (=
$r_{(p−1)/e}$ for any primitive element $r$) to cycle through the rest
of the roots, reconstructing the number in under mod $p⋅q$ with CRT
to check if it is the flag.

## BabyJS

A misc task that was misclassified as web(?). It’s an exposition on the
various ways JavaScript tries very hard to be an awful programming
language^{3}. It’s still not as awful as PHP, but we can’t all be the
champ.

```
is(a, 'number');
is(b, 'number');
assert('1.1', a === b);
assert('1.2', 1337 / a !== 1337 / b);
```

`[0.0, -0.0]`

works because the division gives `[+Infinity, -Infinity]`

.
A pimple, no big deal. Floats are tricky anyway.

```
isnt(c, 'undefined');
isnt(d, 'undefined');
const cast = (f, ...a) => a.map(f);
[c, d] = cast(Number, c, d);
assert('2.1', c !== d);
[c, d] = cast(String, c, d);
assert('2.2', c === d);
```

Probably a billion solutions here. I did `[{}, "[Object object]"]`

.
What’s that weird rash…

```
let { e } = json;
is(e, 'number');
const isCorrect = e++<e--&&!++e<!--e&&--e>e++;
assert('3', isCorrect);
```

Up to boils now. Actual boils. `<!--`

is a comment. $e=−1$ works.

```
const { f } = json;
isnt(f, 'undefined');
assert('4', f == !f);
```

`f=[]`

works, I don’t know why and I don’t want to know. I fear that
knowing certain things actually make you less knowledgeable.

```
const { g } = json;
isnt(g, 'undefined');
// what you see:
function check(x) {
return {
value: x * x
};
}
// what the tokenizer sees:
function
check
(
x
)
{
return
{
value
:
x
*
x
}
;
}
assert('5', g == check(g));
```

This one was a little cute, like a magician’s clever misdirection. The
original `check()`

is replaced by this `check(x) { return; ... }`

function. So `null`

works. Why does it return undefined? Haha!
https://news.ycombinator.com/item?id=3842713

Blood-shot eyes, trichotillomania, psychosis.

```
const { h } = json;
is(h, 'number');
try {
JSON.parse(String(h));
no('6');
} catch(e){}
passed('6');
```

Something like `1e1000`

is converted to the string `"Infinity"`

. Makes
sense, n’est-ce pas?

```
const { i } = json;
isnt(i, 'undefined');
assert('7', i in [,,,...'"',,,Symbol.for("'"),,,]);
```

This unending dream. `3`

is in the array because array index 3 is
defined? I don’t know the logic. Any language which tries to pretend
lists and maps are somehow isomorphic data structures or algebras is
insane. These languages were invented by insane people.

```
const js = eval(`({clean})`);
assert('8', Object.keys(json).length !== Object.keys(js).length);
```

Put `__proto__`

in the object and it will get hidden by the eval,
because `:jazzhands-OO:`

.

```
const { y, z } = json;
isnt(y, 'undefined');
isnt(z, 'undefined');
y[y][y](z)()(FLAG);
```

When I looked over the task I figured I could pass all the other checks,
but this one seemed a bit need-to-actually-stop-and-think, “huh?” It
had me puzzled for a while, I don’t really know JavaScript all that
well. (I had to ask Google if JavaScript has apply-overriding and stuff
like that.) I also didn’t realize that *everything* has a
`.constructor`

. But eventually I discovered it just by playing around in
`nodejs`

and from that a-ha the pieces fell into place.

`y = "constructor"`

so `y[y]`

becomes a function, i.e. `y.constructor`

,
and `y[y][y]`

becomes the function constructor, which takes its body as
a string argument (?!) to be eval’d. So `z = "return console.log;"`

for
example.

## Conquering Premium Access

AES is used to encrypt the flag. You’re handed the ciphertext and “power traces” (voltage measurement?) of some hardware using AES to encrypt 10,000 known 16-byte plaintexts with the same key/IV as it did the flag.

The task hints about “T-tables”, “aligned” and that masking isn’t employed. But the task also said “thank you to professor so-and-so for the data” and links to a university course, which is the biggest clue in my eyes. From that I figured it’s probably very “textbook” and intended to be solved with least effort.

So, textbook reading material: https://www.tandfonline.com/doi/full/10.1080/23742917.2016.1231523

Indeed: almost disappointingly so. Finding a correlation based on the
1-bits-equals-more-power assumption turned out to work directly. You can
ignore the rounds, ignore the hints^{4}, ignore everything, because
there’s so much data and it’s so artificially clean. Find the `key`

(and `iv`

) such that the weight of `sbox[key^iv^plaintext]`

has the
highest correlation with (overall) power consumption.
`sbox[key^iv^plaintext]`

is what the state gets set to in the first
round (in CBC mode), before `ShiftRows`

etc. (Note that `ShiftRows`

doesn’t change the weight.) Technically I ignored IV too because I
simply forgot about it, but that was fine too. You can simply use the
full traces, and don’t have to target any point in time at all.

See also: https://teamrocketist.github.io/2018/11/14/Crypto-SquareCtf-2018-C4-leaky-power/ which I also came across and seems to copy a lot of text/material from the above link, but is a much more thorough write-up than anything I can produce.

Notice how it’s also a verrrry similar problem? Yep: copy-pasting that code should just work out of the box here too, though it is awful and slow so I ended up rewriting it and doing most of my playing in REPL to pretend I wasn’t a fraud, trying to learn something in spite of everything.

And yeah, AES ECB to decrypt, so no IV was used, which I as stated implicitly assumed. Can’t recall if the task hinted to that or not, maybe it did?

## P*rn Protocol

A PDF described a very simple data protocol. Implement the protocol, request login, log in with the given username and password, and get the flag.

Uhh?

This felt more like a “socket programming for beginners” tutorial than anything else. Why was this even part of the task set?

## Pwnhub Collection

Labelled as hard crypto but really just easy-ish crypto gated behind
ASAN rev. `poiko`

reversed it for me because I’m a newbie, so can’t
really say much about the rev part.

So from what `poiko`

told me, the server does something roughly
equivalent of the following pseudocode:

```
# coll = list of pairs (category, link)
coll = [t.strip().split(' ') for t in open('collection')]
coll.append( (input(), input()) )
txt = ' '.join(x + '{' + y + '}' for x,y in sorted(coll, key=lambda x:x[0]))
# NB: sorted only on category element
print(aes_cbc(fixed_key, fixed_iv, txt).hexdigest())
```

So there’s a string like `"blah{bleh} foo{bar} qux{crux}"`

where we can
add an element that gets sorted in.

Finding the “categories” (the strings outside the curly braces) can be
done very efficiently with a binary search. Start with a string like
`byte(127)*n`

that gets sorted last, observe the ciphertext output. Then
for each byte keep a `high,low`

that you bisect and observe if we were
flipped to another position in the string (an earlier ciphertext block
will change). This finds all the categories very quickly.

They turned out to be `crypto flag misc`

etc. (Which was something
`poiko`

already guessed from doing the rev part, but I just
double-checked.) Next step is discovering the stuff inside the curly
braces. Because I was being dumb, it took me longer than it should have
to realize it’s the even-more-textbook method of byte-by-byte brute
forcing.

Input a category that gets sorted before `flag`

with a long arbitrary
link that aligns things like so:

```
# |---aes block---||---aes block---||---aes block---|
# xxxx f{aaaaaaaaaaaaaaaaaaaa} flag{xxxxxxxxxxxxxxxxx
```

Now pop off one `a`

so the first `x`

gets shifted in and note what that
block becomes in the output. Then discover the byte by cycling through
and check the output of encrypting with
`link = "aaaa...aaaa} flag{" + b`

for unknown byte `b`

. I.e. the string
that’s encrypted becomes:

```
# |---aes block---||---aes block---||---aes block---|
# xxxx f{aaaaaaaaaaaaaaaaaaa} flag{*} flag{xxxxxxxxxx
# with '*' being the variable byte.
```

Once there’s a match you add the byte to what you know is there already
(`"} flag{"`

) and repeat for the next byte, until the entire link has been
discovered. Print it, ship it, done.

which is fine, but it could have been a sub-problem in another task for example. One could also say it tries to teach you some math—but it’s the sort of stuff with a trivial Sage/Google escape hatch. The trademark of low-effort tasks that get ground up by the “point mill” into bland paste, rather than offering any fun/engaging problem solving.↩

Also: this was my first impression:

`#!/usr/bin/env python2`

.`:shudder:`

This has nothing to do with the task itself, but it always makes my heart sink and I lose a bit of faith in the problem author and the world in general.↩The question “why

*is*JavaScript?” is ostensibly answered with “because fuck you, because we’re engineers and we get the job done, that’s why.” But the question “why is JavaScript*the way it is*?” can only be answered with a shrug and an embarrassed silence. Indeed, why is any scatological piece of art the way it is.↩OK, actually the fact that it uses T-tables probably helps, as high-weight input will likely lead to high-weight output from the lookups there? I don’t know, I’ve never done any power-analysis before.↩

# CyberSecurityRumble 2020

## General Comments

Overall a pretty good CTF. Decent number of tasks, variety, difficulty level and so forth. The stuff I looked at seemed alright, though I got entangled in some web business toward the end that soured things but that was my own fault.

We placed a cool 10th tho (much thanks to `poiko`

spamming flags left
and right the last day).

There wasn‘t an *abundance* of crypto or mathematical misc tasks, but
then I don‘t really come to expect that much, it felt like there was
*enough*. I kinda think of crypto/math as the awkward little sister
that‘s there only because the organizers‘ mother forced them to bring
her along. The cool kids are talking about real world stuff, hacking the
Gibson, phreaking the phone lines, doing bug bounties, whatever it is
that real hackermen do, while she‘s just sitting there arranging her
fries to count out the Stirling numbers. Most of the time I‘m just glad
she‘s invited at all; if CTFs were to reflect real world industry, I
suspect it‘d be like 30 web-tasks and 2 revs.

I solved most of the easy crypto on the first evening, then did
dtls the following
morning, but the two remaining tasks were labelled web. Ugh. I looked at
the task named `blow`

and just thought “fuck no.“ Some kind of hellish
JWS/JWT JavaScript task they kept adding hints to because it got no
solves. From the hints alone I surmised that it has some trivial
solution in crypto terms, but it was probably a pain in the ass (aka the
“web“ part) to even get started or progress to where it‘s relevant?
So I worked on Yanote
instead, but failed that task miserably and eventually went off to nurse
my imposter syndrome in the corner.

## Pady McPadface

A live server that knows the flag traverses it bit by bit and gives $(r_{2}+bit)_{e}(modn)$ for a fixed RSA modulus $n$ using the standard $e=65537$. $r$ is an ephemeral random number a few bits than $log_{2}n $ so $r_{2}<n$.

So basically I wanted to discover if the ciphertexts were quadratic residues or not under the given (composite) modulus. I knew what to look up, and it turned out to be pretty easy, but I was actually surprised to learn the Jacobi symbol is easily calculable without knowing the factorization of $n$. Huh! (It‘s also surprising that this has never come up before.) I‘m glad to have learnt it, it does seem like a neat trick.

```
def trailing_zeros(a):
return (a ^ a - 1).bit_length() - 1
def jacobi(a,n):
assert n & 1 # n must be odd
sign = 1
a = a % n
while a > 0:
r = trailing_zeros(a)
# 2 divides an odd number of times into a, then flip the sign if n is not
# on the form 8k±1
if n % 8 in (3,5) and r & 1:
sign = -sign
a,n = n, a >> r
if a % 4 == 3 and n % 4 == 3:
sign = -sign
a = a % n
if n != 1:
# a divides into n
return 0
return sign
```

So then it was a matter of connecting to the server and getting several ciphertext instances and calculating their Jacobi symbols $(nc )$. If you find an instance where it is -1 then you know the number is not a quadratic residue, so $bit$ must be 1 in that position. After some collected data you set the others to 0 and get the flag.

## dlog

The task implements some basic code for working with the secp256k1 elliptic curve ($y_{2}=x_{3}+7$). The ECC implementation is “home-made“ with code mostly taken from Wikipedia. It asks for a point, multiplies it with a random secret number and asks you to recover this secret. No check is done on whether or not the point you give is actually on the secp256k1 curve though, so it‘s a classic invalid curve attack.

Basically you can set the $b$ parameter in $y_{2}=x_{3}+b$ to whatever you want. (If you give the point $(s,t)$ it will use the curve $y_{2}=x_{3}+(t_{2}−s_{3})$.) I‘ve solved some task before using a singular curve but couldn‘t remember how the mapping worked off the top of my head and didn‘t find my old code for it, so instead I idly looked for other $b$ parameters where I could solve the discrete log with Pohlig-Hellman as I had the tools for that more readily available. I think there‘s 7 (?) different Frobenius traces among these curves, corresponding to different orders of the resulting elliptic curve, but each had a very large factor (133-135 bits) so this turned out to be a dead end.

I went back to pen and paper to see if I could manually work out the
singular mapping, partly as self-punishment for forgetting. Given one of
the simplest non-trivial points, $(1,1)$, the server will use the
singular non-elliptic curve $y_{2}=x_{3}$. The point-doubling formula then
does the following: $x↦(2y3x_{2} )_{2}−2x=41 x$ and similarly
$y↦81 y$. Hm! That gave me the needed deja vu: indeed it
seems that $(n∗(1,1))_{y}(n∗(1,1))_{x} =n$, so it worked out nicely, and I didn‘t
have to wade through the full algebra. So yeah, just feed it `(1,1)`

and
then do $yx (modp)$ to get the secret.

It‘s probably supposed to be a pretty trivial task, but I made it overly complicated by investigating all the curve-stuff.

## ezdsa

A server is signing messages using the Python `ecdsa`

library. I don‘t
remember the exact details but I think you were supposed to forge a
signature. Immediately there‘s was a glaring red flag with a
`entropy=sony_rand`

parameter where `sony_rand`

was a function that
returned random bytes using Python‘s `random`

module. `ecdsa`

uses this
for generating the ephemeral $k$ in its DSA signature generation.

At first I thought this was going to be a very challenging task, because
even though the randomness of `getrandbits`

isn‘t of cryptographic
quality, it‘s very hard to actually mirror it unless given at least
some white (fully revealed) output. I know it‘s seeded in Python from
PID and two different clock functions, so it might be bruteable, but
it‘s hard to predict how far into the stream the server is and so on
and so forth; it seemed very daunting to brute all that. I wondered if
you could get startup time or restart the server, or maybe the Mersenne
Twister output had some cool linear relationship I wasn‘t aware of…

I was just being dumb. I didn‘t notice it was a forking TCP server… It will clone the state and basically just re-use the same ephemeral $k$ every single time. This becomes immediately obvious when you set to actually collect some signatures and see the $r$ being repeated. I got pretty lucky that it was so obvious once you start collecting data, if not it‘d be screwed.

So from two signatures $(r,kH_{1}+rx )$ and
$(r,kH_{2}+rx )$ where you can calculate the hashes
$H_{1},H_{2}$, recovering the private multiplier $x$ is trivial and you can
sign anything. I think I just set `sk.privkey.secret_multiplier = x`

and
used `ecdsa`

to do it in REPL.

## misc:Pylindrome

A server will run any Python code that‘s a valid palindrome
(`x == x[::-1]`

) in a Python subprocess (of a user that can‘t read the
flag file), but then `exec`

the `stdout`

from that process (in the
process that can read the flag file). There‘s a limited attempt at
preventing the task from becoming trivial: it will remove any of the
substrings `"#", '"""', "'''"`

from the input. The removal happens only
once, *in order*. So, the a-ha: you can actually use `"""`

by giving
something like `"'''""`

. The server will remove the single quotes and
keep the double ones.

Unfortunately I don‘t have the actual solution at hand since I can‘t
find it in my REPL history, but the basic idea was to just use a simple
payload like `print("__import__('os').system('sh')")`

and then figure
out how to make that into a palindrome. I think I settled on something
like `""";";""";<code>;""";<reverse of code>;""";";"""`

. Notice how
`""";";"""`

is interpreted to be the string `';";'`

when evaluated
normally, but also provides an end for the triple-quote:
`..."""; ";" ""`

.

(Another idea would be to use some trickery involving the escape
character `r"\"`

which could potentially escape/unescape strings
depending on order, but I didn‘t think of that at the time.)

## Hashfun

Trivial welcome-task I had missed until `poiko`

reminded me. It
basically does `print(flag ^ flag[4:])`

(if one were able to do that in
Python). You know `flag`

starts with `CSR{`

which gives you the next
four bytes, and those gives the next four, and so on.

## dtls

The task gives you a pcap file and this shell script:

```
# /usr/lib/libc.so.6|head -n 1
# GNU C Library (GNU libc) stable release version 2.31.
# clone
git clone https://github.com/eclipse/tinydtls.git dtls
# build
cd dtls && autoconf && autoheader && ./configure && make && cd ..
# listen
tcpdump -i lo port 31337 -w dump.pcap &
# Wait for tcpdump...
sleep 2
# serve
dtls/tests/dtls-server -p 31337 &
# send flag
cat flag.txt|dtls/tests/dtls-client 127.0.0.1 -p 31337
```

I got the repo and looked around. I know it implements basic TLS over
UDP, but ignored all that. The first thing I searched for was RNG. It
doesn‘t use cryptographic source of random numbers but instead will
just use `srand()`

and `rand()`

from `libc`

(as hinted to in the script
above). That‘s a problem in and by itself.

But it also turns out that the code is buggy, for example it initializes
the RNG with `dtls_prng_init((unsigned long)*buf)`

, where `buf`

was read
to from `/dev/urandom`

. It‘s probably *intending* to use a `long`

for
the seed, but guess what? `buf`

is a `char`

array, so it‘s actually
just using a single byte for the seed to generate all those important
crypto bytes.

Now, how to actually attack it. I knew I didn‘t want to actually interact with TLS, because TLS is web and web is a hell run by committees, so instead I did the following:

- Make it print out the random bytes it generates. Run the script and compare it with the given pcap to figure out where those numbers go and what they should actually be.
- Trivially brute the seed.
- Modify the library to load its seed from an environment variable so I could run the client and server with fixed seeds.
- Force its internal time function to return 0, as I can see from the pcap that‘s what it used there. (It actually tries to use integer seconds since the start of the program, thus 0 since the transfer happens immediately.)

I ran the script again and compared my pcap with the one given. It‘s generating and using the same numbers, uses the same cookie, etc. So then I simply copied the final encrypted packet from the pcap file and put it into the source code with an ugly hack like this:

```
static char bufzzz[] = "...stuff...";
static int
decrypt_verify(dtls_peer_t *peer, uint8 *packet, size_t length,
uint8 **cleartext)
{
if (packet[0] == 0x17) {
printf("pulling the old switcharoo\n");
packet = bufzzz;
length = 84;
}
// ...
```

For some reason (that I don‘t care about) this causes TinyDTLS‘ own debug printing of the message to get messed up, so I also had to print the cleartext myself at the end of the function. Out comes the flag. No TLS needed (thank God).

This was a pretty cool task. It was practical and “real-world“ but without feeling contrived and without there being JavaScript or any web frameworks involved. It was possibly the “cleanest“ real-world-y task I‘ve seen. Well done to the task author.

## Yanote

I didn‘t solve this one. I failed it. Not only did I fail it
technically, but I failed it *mentally*, which is probably worse. But I
worked on it for a while so I‘m still going to write about it. It
triggered all sorts of feelings of inadequacy and failure that you
wouldn‘t believe.

All because of this “web“ tag.

The “web“ tag usually means “use educated guesses for any blackbox behavior“ but to me it usually reads as “realize that nobody agrees with what you think is reasonable.“ It reads “smell that? Take a deep whiff, that‘s the smell of learned helplessness, baby.“

Clue: the server says “invalid CRC-HMAC“ if you give it an invalid cookie.

Ignore the pedantic fact that it would normally be called `HMAC-<name>`

and “CRC“ is very non-specific, kind of like just saying “HASH.“

However CRCs are more fun than hashes in that they‘re easier to play
around with computationally. So for a second I dared to dream about some
sort of semi-fun task of trying to figure out the modulus and key and
all the other constants of some unknown
`HMAC(key, ipad, opad, H=CRC(modulus, a, b))`

given an `oracle(x) = HMAC(key, A+x+B)`

. That would have been pretty nice actually.

So my approach was all wrong. Because I started to hear this “looOOooOOoOool“ coming from a dark corner of my mind: the “web“ tag coming back to haunt me. So maybe none of that cool stuff. Maybe it‘s just…straight up CRC-32…? Well, yes and no.

See, I hate this. Instead of “figuring something out“ you have to
guess what *they* mean about what that something is. Since it‘s “web“
and in web they tend to not be too particular about things, HMAC might
mean something else. Potentially it could be a homegrown custom HMAC,
potentially it could be something else entirely. It‘s not even given
that it does `HMAC(<key>, <pickle-data>)`

. For all you know it could do
`F(<key>, <username> + ":" + <permission>)`

or whatever. Hell, maybe it‘s not even a CRC! Maybe
it‘s just `adler32(<data> + <secret>)`

…

Okay, relax. Start testing some basic stuff.

I figured out it had normal CRC behavior at least. It was, as they say,
affine-ish. Working with polynomials over $Z_{2}$,
$CRC(x+y)=CRC(x)+CRC(y)+K_{b}$ where $K_{b}$ is a constant that depends on the
*byte-lengths* of `x`

and `y`

(technically on their xor difference?).
You‘d expect this to work for any HMAC based on a CRC too. This allows
you to (easily) log in as admin (but on the admin page I just found
people doing “test post pls ignore“ and being generally “????????“).

But when I actually tried to “solve“ the blackbox function completely,
I kept getting wrong constants everywhere. And I didn‘t know what was
wrong, didn‘t know if it‘s because I had bugs in the old code I was
using (some `GF2[X]`

implementation from way back), or had some wrong
assumption.

After a while I suspected there was no HMAC and maybe just a straight up unknown CRC, which frustrated me even more because if so, then this should be easy, no? What if something else is different? I spammed random stuff trying to figure out the modulus for the CRC from the constants I was getting out, double-checked my math… I even wondered if you were supposed to birthday-hammer user-create to find a duplicate checksum, and maybe that would finally reveal what was going on, got completely lost…

Getting full control over the cookie *is* trickier, but in my more
clear-headed state I think it‘s possible without solving the black-box
function as I was trying to do. I believe all you need is in the
relationship. You can get all the constant `K`

terms you need by making
three users that xor to zero for any given length. For an unknown CRC,
for three constants $K_{i}$, $K_{i+1}$ and $K_{i+2}$ (that you‘d get from
`("0","q","A")`

for example, since they xor to 0), then you can find the
modulus as a factor of $x_{8}K_{i+2}+(x_{8}+1)K_{i+1}+K_{i}$ (does
that work? Now I‘m not sure, I‘m copying a note…), then the
relationship between the various $K_{i}$ as
$K_{n}=K_{0}(x−1)x_{n−1}x_{n}−1 (modM)$ (TODO: math might be
wrong), and you should have everything you need.

Then you make usernames that xor together to be your pickle payload (the
server accepted non-printables in username) and change the byte for the
username length (this last part is a pen-and-paper mess, so it‘s not
something I‘d actually try unless I was sure). The stuff after the byte
`0x2e (".")`

in the pickle data is ignored, so it‘s fine if the cookie
ends with garbage. This was probably the intended solution, though it‘s
a bug-prone approach, maybe a bit painful to implement.

However I basically just got so frustrated with not getting the math to
work out (I mean, I couldn‘t even figure out the damn *modulus*![^10])
and debugging my own code that I gave up. I have an issue with
committing to guesses unless I have a reasonable idea it will work.
*Failure-sensitive*. Besides, engaging with guessy stuff feels dirty
unless you vibe some level of mental harmony with the task author (which
I didn‘t).

Dirty confession: I even debased myself and tried to brute force a 32-bit key against HMAC-CRC32 using the usual constants, but it didn‘t work. That was the last straw. After that I signed off and went for a long walk to hang out with the ducks by the river (not a metaphor—there are actual ducks).

*Addendum:* it was plain CRC-32, and the “HMAC“ just a salt being
added to the data bytestring. Prefix or postfix salt? Length of salt?
Well, it‘s web, so the answer could be to just guess and try it, and if
that doesn‘t work, guess something else.

# BalsnCTF 2020

## General Comments {#balsn2020}

Another great CTF!

But yet again I have some Debbie Downer comments.

Disclaimer: this is clearly a top-tier CTF, close to ~100 points, so my main criticisms below are more subjective and personal than objective. I will be playing the part of Wolfgang, so I might seem a lot more negative than what is deserved.

The difficulty of *certain* tasks seemed to come from them simply being
the composition of several sub-tasks that *in some cases* weren‘t even
related. I am not a fan of that at all. It‘s exhausting and it feels
like a really cheap way to increase difficulty.
Happy Farm is the obvious
example^{1}. Instead of having tasks T1, T2, T3 that are all fairly
independent problems, they‘re composed or chained in some way to make a
much more work-intensive task. This joined task can even appear more
difficult than the most difficult in the set^{2}, yet *it won‘t feel
that rewarding*.

That‘s just a strong personal preference, perhaps other people think it‘s cool and the bee‘s knees. I preferred N1CTF2020 for example, which had a similar difficulty level, but where none of the tasks seemed “artificially“ difficult by composition or layering.

```
import random
random.seed(int(input()))
assert \
b"""As an extreme example, what if every category was just a single chained
monster-task? No flag until you do them all. I personally am /much/ more
comfortable on the other end of this spectrum, where it's a series of smaller
bite-sized puzzles. Composing small but hard-ish puzzles isn't /that/ difficult.
Here's one, for example.""" == random.getrandbits(8*328).to_bytes(328, 'big')
```

Another consequence is one ends up with less number of tasks total, so there‘s less to choose from or pick what order to do things in.

A lot of the frustration behind the above rant comes from all the time I
wasted on `Patience I`

, which I only got a partial solution to, and the
exhaustion that set in from solving the three-in-one
task Happy Farm
. As it were I felt
really burned out and didn‘t do anything at all productive on Sunday,
except to assist `poiko`

with one of his solves. Well. I also discovered
several mice in my apartment this weekend, which was…not fun.

Anyway, `aeshash`

and `IEAIE`

were both really cool, and I look forward
to actually doing them outside of the CTF.

We ended up placing 12th, so we just missed the mark for the coveted 13th position.

## Happy Farm

A layered crypto task. It‘s simply the concatenation of three totally different crypto tasks. Unlike some other such problems (I seem to remember some chained RSA problem from either a Dragon CTF or a DefCon qualifier?) the systems here aren‘t even related. In my opinion it really should have been given as three separate tasks, even if they were capped at a third of the points.

But as it were, this took too much time and exhausted most of my energy/hype.

Also, the task had the added *fun flavour* of the code being about seeds
and fertilizers to grow onions, drawing numbers as ASCII-art bulbs and
so forth. It stopped being fun after the first subtask.

### Task 1: AES CBC

After spending some time reading the code (which is a hot mess) this task becomes pretty simple. All you need to know is how CBC mode works with AES.

Your have an oracle you can query twice. You give it
`(n, iv, data)`

with `n=0..8999`

and it gives back
$aescbc_{k,iv}(data)$ for an unknown key $k$. That is: AES is
initialized in CBC mode with the unknown `k`

and the given `iv`

. It then
iteratively encrypt `data`

`n`

times. The goal is to find
$aescbc_{k,siv}(sdata)$ for some known $(siv,sdata)$.
You may not give it $sdata$ to encrypt directly.

So first you give it something like
`(8999, siv[0]^sdata[0] || siv[1:], b'0' || sdata[1:])`

. The xor is just to make the data not match `sdata`

, but it
will be undone by the CBC mode. Then you give it
`(1, response[-16:], response)`

(last block is the IV for next block)
and the result is the answer.

### Task 2: RSA and Euler hacks

The server sets $e=3$, $s=(2_{1023})_{e}(modn)$, and $d=1/3(modϕ(n))$. $n$ is composed of two 512-bit primes. Note that modulus $n$ is initially unknown. Note also that $x_{d}(modn)$ is “sort-of“ like $3x (modn)$.

The serves gives you $s$ and asks for an integer $k∈[0,8999]$. It then gives you $r=s_{d_{k}}$, which is “sort-of“ like taking the cube root $k$ times.

Finally you can feed it a full pair $(x,j)$ and it will calculate $y=x_{d_{j}}(modn)$ but with some of the low-order bits hidden. This whole thing was a bit convoluted. There‘s ASCII drawings of onions that you use to figure out where the missing bits are and blah blah. You basically get $y$ minus some “small“ random number on the order of 2^332^-ish (if I recall correctly). And a final caveat: the $j$ you give in this step has to be less than or equal to the $k$ you gave initially.

The goal is to find $s_{d_{9000}}(modn)$.

First, to do anything, we need to find $n$, which we can do with what I
call Euler hacks. We know that $n_{′}=2_{3⋅1023}−s$ contains $n$
as a factor. This number is too big to try to factor out $n$ directly,
but we have another number that also has $n$ as a factor. Because $r$ is
“sort-of“ like iteratively taking the cube root of $s$ (which was
itself a cube, because $e=3$), if we repeatedly cube it back up and
subtract the known remainder we‘ll get a factor of $n$, for example in
$r_{3_{k−1}}−2_{1023}$. This last number can‘t be calculated directly
if the $k$ is large, but we can do it under $(modn_{′})$ which will
preserve the factor of $n$. I suspect the caveat mentioned above with
$j≤k$ is simply there to force you to give a large initial $k$ to
make this step a little bit harder? Finally we take the `gcd`

of these
numbers and get `n`

(possibly multiplied by some small factor).

```
nk = 2**(3*3*11*31) - s
nj = pow(r, 3**8997, nk) - 2**(11*31) # I gave j=8999
n = gcd(nk, nj)
# The exponents are factored becaue I thought maybe the fact that 1023
# contained a factor of 3 would be relevant, but it wasn't.
```

Anyway, with $n$ known this reduces to a lattice/Coppersmith-type problem. In the second step I gave $k=8999$ and got $r=s_{d_{8999}}(modn)$. In the last step I gave $j=8999$ again but now using $x=2_{1023}$ as the base number (i.e. a cube root of $s$), which gives me back $y$ as an “approximation“ of the target number $s_{d_{9000}}(modn)$.

The mask for the unknown bits in this approximation is

```
mask = 0xf00000000ffff0000ff00ffff00ffff00fffffffffffff000ffffffffffffffff00fffffffffffff0ff
```

Because $y_{3}=r(modn)$ I can use this equation and Sage‘s
`.small_roots()`

. I didn‘t really think about doing anything else, I
just wanted it to end. However, the missing bits are not contiguous: if
treated as a contiguous block then it‘s maybe a little bit over what a
standard out-of-the-box Howgrave-Coppersmith implementation can handle.
I wanted to avoid introducing more variables for the bit pattern and I
also didn‘t want to sit there and have to fiddle with magic parameters
hoping to get lucky. But “brute forcing“ the first hex character
seemed to be enough. E.g. in `sage`

:

```
print("small rootsin': ")
for i in range(16):
print(i, end=' ', flush=True)
rewt = ((c + i*2^328 + x)^3 - b).small_roots(X=2**300, epsilon=0.05)
if rewt:
print('thank god... You may now have a snack')
break
```

So `y + i*2^328 + rewt[0]`

is the solution.

### Task 3: Borked RCA

Thankfully, this last layer was very easy. By now I was really sick of this whole task.

Long story short, you can query an oracle four times, giving it a `k`

each time (`k`

has the usual constraints). It then does something like
this:

```
def blah(secret, k):
L,R = secret[:len(secret)//2], secret[len(secret)//2:]
for _ in range(k):
L,R = R, bytes_xor(L, self.rc4_encrypt(R))
return L+R
```

And gives you the result. The goal is to find the result of
`blah(secret, 9000**3)`

.

`rc4_encrypt`

is *supposed* to be a standard RC4 implementation
initialized with an unknown key. (I.e. it generates `len(plaintext)`

bytes of stream cipher output and xors it into the plaintext.) But if
you‘re lucky like me, you notice this little nugget right away, before
you‘ve even parsed what the task is about:

```
def swap(self, a, b):
a, b = b, a
```

I think I actually laughed when I saw this innocent little method. It‘s trying so hard not to be noticed. Aww, it‘s so cute.

And but so. The RC4 implementation never really swaps elements around. Which means it will have a really short order. Let‘s look at it:

```
i = (i + 1) % 256
j = (j + s[i]) % 256
self.swap(s[i], s[j]) # lol!
output.append(s[(s[i] + s[j]) % 256])
```

Every 256 iterations, the `i`

will be 0 and `j`

will be
`sum(range(256))%256 == 128`

. Meaning that every *512* iterations we‘ll have $j=i=0$ and it
will repeat.

OK. So. It‘s trivial to find `secret`

: just give it `k=0`

and it won‘t
encrypt it. I think `secret`

was 240 bytes long, so to find the order of
`blah`

we just need to align things…lcm with the order…hm divide by
2…blah blah boring details. Long story short,
`blah(secret, 9000**3) == secret`

. It‘s just what came out of the first
oracle query. Almost a bit of an anti-climax, really.

## The Last Bitcoin

Proof-of-work Python script that asks you to find a string `X`

such that
`sha256(random_token + X)`

starts with 200 zero bits.

That part is straightforward and clearly impossible.

Then you notice that if you give it such a string it exits with error
code 1 indicating success. Python also exists with 1 if it gets an
uncaught exception. And it will give such an exception if you give it
non-ascii input. So I simply typed `æ`

and got the flag.

## The Danger of Google‘s Omnipotence

This is really `poiko`

‘s solve, he did the reverse, provided all the
data, and so on, I just helped with the last part in figuring out how to
reduce the magic operation to something tractable.

The problem at bird‘s eye is, I was told, something like this:

```
k = 0xfad9c53c828be5dafc765d4a52a54168442b6f57569db5f320a45d0d0e39d92d04284087fe2e36da1375d55e6e4e9f746cf9d9916c791e0467dc0aedf77581d7e1ab342f99e49f4c684fd7424e806cc2fb1dd54c614487b6a3909dc469f76eb8df050f3928d4c371d8aace5c81fbb1e467b987ec5ae1f5ecd0b8ffe69369edc9
flag = <some 2^16 array> # secret
A = <some 2^16 array> # known
B = flag
for _ in range(k):
B = strange_matmul(B, A)
# B is output here
```

Reversing `strange_matmul`

and figuring out what it does is the main
problem, which `poiko`

already did. But as a last step we‘d need to
strength-reduce `strange_matmul`

to some actual math, like a real matrix
multiplication for example, and then we have `B = flag * A^k`

which can
hopefully be inverted.

`poiko`

gave me sort-of pseudocode for `strange_matmul`

:

```
def strange_matmul(m, n):
out = []
for i in range(256):
for j in range(256):
val = 0
for k in range(256):
v1 = m[i*256+k]
v2 = n[k*256+j]
if v1 == 0 or v2 == 0:
continue
val = strange_add(val, a_inv[(a[v1] + a[v2]) % 7**3])
out.append(val)
return out
```

Where `strange_add`

adds the numbers as if they were in base-7 but
without using carry, and `a`

and `a_inv`

were some lookup tables he also
shared. I don‘t remember if they were in the code or something he
constructed.

Anyway, so it looks like a normal matrix multiplication, but the
element-wise operations aren‘t carried out using normal arithmetic. So
we both played around a little with `a`

and `a_inv`

and there were
definite group-like properties. I even wrote a little utility class so I
could do arithmetic with such numbers directly and play with them in the
REPL. At first it “felt“ like a commutative additive group, with `0`

almost being the identity except for `0 + 0 = 1`

which meant it wasn‘t
really associative either, because $(x+0)+0=x+(0+0)$. `a`

was a
permulation, but `a_inv`

was not as `a_inv[0] == 1`

when it seemed like
it should have been 0. This makes sense from the above code where 0 is
avoided. Anyway, from the `strange_add`

I strongly suspected it was a
mapping into the åker^{3} $F_{7_{3}}$ where the numbers
represented the coefficients of a qudratic polynomial over
$F_{7}$. If so, what‘s the modulus?

$x_{3}$ in $F_{7}[X]/(x_{3}+P)$ should become $−P$ where $degP<3$.

```
>>> -fux(7)**3
[6 0 4]
```

`fux`

was my utility-class, and `fux(7)`

should
represent `x`

, so the modulus is $x_{3}+6x_{2}+4$. (It turns out to
also be the default one that `sage`

gives when constructing
`GF(7^3)`

, so that‘s probably its origin.) With this
intuition behind it the math seems to work out.

So, we had 256x256 matrices over $F_{7_{3}}$, and calculating the
inverse is straightforward but boring `sage`

stuff. It would likely have
taken many minutes though, but one thing I lucked out^{4} on was I did
`.multiplicative_order()`

on `A^-1`

which `sage`

found quickly as 342.
So I could shave several minutes off the calculation by doing
`flagimg = B*(A^-1)^(k % 342)`

.

Anyway, `flagimg`

was a 256x256 noisy image showing a readable flag.

## aeshash & IEAIE & Patience I

I feel these deserve an entry even though I failed all of them, but I
did spend time on them. I probably didn‘t praise the first two tasks in
this list enough. They are what I consider actually *good* crypto tasks
of high difficulty (unlike Happy Farm). Especially `aeshash`

! `aeshash`

‘s task author
deserves some real kudos.

For either task I had no prior experience and tooling, so I knew it‘d
be a lot of work for me. I made `poiko`

waste some of his rev-time to
make sure I got a `aeshash`

into Python, which I felt sort of bad
about… Because I only really started to scratch the surface of them.
And then for most of Sunday I didn‘t really have any energy left to
continue. However they‘re the kind of tasks that makes me want to solve
them in my leisure time, which is a real treat.

`IEAIE`

was some image encryption scheme using a logistic map and
permuations. It was very sensitive to differential analysis. I did a few
things, was able to reveal a bunch of key-state, but ran into some
floating point rounding headaches in my offline tests. I hate floating
point numbers, so I took a sanity break from it and never really got
back. For once the relevant papers are given in the task description so
it‘s not really a Google-the-Paper challenge.

`aeshash`

involved leaking state in a three-round pseudo-AES (each round
using initial state as round-key) and then targeting a specific output.
Literally: an AES hash. I have no idea about this one, which is all the
more exciting. Despite me usually solving the crypto challenges I‘ve
never worked with crypto profesionally or academically, so I lack a ton
of basic knowledge and experience, such as breaking weakened AES, which
is probably like Crypto 101. But I figured it will be a good place to
start learning.

I also sunk a ton of time into `Patience I`

doing proper OpenCV for the
first time in my life… God, this task sapped me. The composed text
which I suspect is one third of the flag(?) was easy, I also found this
strange Morse-ish sequence but was unable to guess what it really was,
noted the spikes in the lighting but had no idea what they meant either.
In the end I even started to fear the minute little camera shakes were
also part of it. Please God. Maybe all of these things played together
in some beautiful and harmonious way in the end, but really it just felt
like three concatenated subtasks that just all had to do with video?

*Addendum:* having seen spoilers on how `Patience I`

is solved (but not
the others), I can at least say I don‘t feel bad anymore for not
figuring out this task. It‘s a bad one. Though I was wrong about the
“subtasks“ being separate.

The first string you find, the easy part, overlay all the LEDs, was
`_q!fitlboEc`

. I mistakenly thought it might be `flag[x::3]`

or
something like that. (Thus why I thought the three parts were separate.)
But apparently it‘s an alphabet of sorts, which you index using…I
don‘t know, it was unclear exactly what. And the timing pattern *was*
Morse, but the international extended version with punctuation, making
`P=B_=_Q=D`

which is apparently supposed to mean “replace ‘b‘ with
‘p‘, and ‘q‘ with ‘q‘“?… The logic or motivation behind any of
these things totally escapes me.

I even said to `poiko`

while I was working on this task, thinks like
“hm, there‘s a timing pattern here, it feels a lot like Morse, but
I‘m actually going to be a little bit disappointed if that turns out to
be true…“ and “it can‘t be too escape room-ish, right? It‘s Balsn,
they‘re good, they wouldn‘t make something like that“ and so on.
Disappointed!

## babyrev

Again, `poiko`

solved this one, but I helped out a tiny little bit at
the end, so I feel comfortable writing what it was about.

Apparently a `scala`

program that seemed to construct an infinite stream
by doing what I would write in Haskell as:

```
w = [[0,1,2,3], [0], [0], [0]]
stream = [0] : map (concat . map (w !!)) stream
```

It makes an infinite stream of lists like
`[0] : [0,1,2,3] : [0,1,2,3,0,0,0] : [0,1,2,3,0,0,0,0,1,2,3,0,1,2,3,0,1,2,3] : ...`

. It sums all the sublists
in this stream, extracts index 60107, and takes its remainder under
$2_{62}$. It converts the resulting number to 4 bytes which becomes an
xor-key for flag data.

Immediately I smelled a recurrence relation for the sum. Tried and true
OEIS tells the truth: $a_{n+2}=a_{n+1}+3a_{n}$ ^{5}, and so,
starting from $a_{0}=0$ and $a_{1}=6$ (for the sum of these lists) gives all that‘s needed. It can be
calculated linearly in a list, or logarithmically by exponentiation on a
2x2 matrix.

I suspect

`Show your Patience and Intelligence I`

is another one, but since I only partially solved it, I couldn‘t really say outright. See also https://github.com/BookGin/my-ctf-challenges/tree/master/balsn-ctf-2019/images-and-words from last year‘s BalsnCTF, which—don‘t get me wrong—is a really cool and mind-blowing task, but you need three fairly independent key insights to solve it. Although I guess in web this kind of stuff is more like the standard way of doing things?↩because the domain knowledge required for the entire thing grows linearly with unrelated sub-tasks added. This is another reason why I don‘t like mixed challenges. Crypto behind web, misc behind rev, etc. It feels like a sort of “miserly“ way of forcing the solve count to stay low even though the subtasks are themselves easy or unoriginal. Generalists and people with “broad“ CTF-knowledge are already greatly rewarded and have a (fair!) advantage over more specialized noobs (like me), but this doubles down on that advantage on the task level.↩

Norwegian for ‘field‘↩

I say I lucked out because my

`sage`

seems to hit a bug when I do`A.multiplicative_order()`

directly. It just hangs and churns CPU for over a minute, so most likely I would have given up on this avenue. Who knows why`sage`

does what it does sometimes.`:its_a_mystery:`

↩looking up on OEIS felt like cheating but it saved a couple of minutes. It‘s clear in hindsight, as the list goes: A, B, BAAA, BAAABBB, BAAABBBBAAABAAABAAA and so on.↩

# DragonCTF 2020

## General Comments {#dragon2020}

My expectations were a bit high, DragonCTF has had some of the best CTFs in the past. Yet I had the thought that even last year‘s 24-hour “teaser“ CTF was better? So a bit mixed feelings. I‘m not saying this wasn‘t good but…? It‘s hard not to vote with one‘s heart.

The difficulty was alright, but it felt like very few tasks in total.
Thinking partly of my poor teammate `poiko`

this time, who seemed to get
only a single pure rev task, and an easy one at that. I know he was
hyped and had looked forward to fiendish VM-in-VMs and unknown
architectures. We kept waiting for more to be released, but alas.

For my own sake, Frying in motion was an cool idea, tho a bit draining. Bit Flip 1 was easy. Bit Flip 2 & 3 were puzzling, but not the good kind of puzzling (which motivates me to write code or learn stuff outside of the CTF), but “I feel I just missing some (probably) trivial trick here, oh well next time I guess“-puzzling.

Sunday I ended up watching *Gilmore Girls* and `poiko`

was reduced to
doing Linux stuff and actual hacking. Imagine having to do actual
hacking. What‘s that about? Sheesh.

Anyway, we placed like dumpster tier anyway. Wasn‘t our weekend, I guess.

## Frying in motion

I put off this task while looking at the *Bit Flip* tasks because I kind
of expected implementing it would be an unfun hellhole of tiny bugs.
It‘s one of those tasks where the idea is simple and/or obvious but
it‘s a pain in the ass to implement^{1}. Not a fan of that.

But so: the task involves reversing a `strfry()`

given the output of
another `strfry()`

. Fine, sounds like an RNG task.

`strfry()`

does this:

```
char *
strfry (char *string)
{
static int init;
static struct random_data rdata;
if (!init)
{
static char state[32];
rdata.state = NULL;
__initstate_r (random_bits (),
state, sizeof (state), &rdata);
init = 1;
}
size_t len = strlen (string);
if (len > 0)
for (size_t i = 0; i < len - 1; ++i)
{
int32_t j;
__random_r (&rdata, &j);
j = j % (len - i) + i;
char c = string[i];
string[i] = string[j];
string[j] = c;
}
return string;
}
```

(Aside: is it just me or is the GNU style one of the ugliest code styles in existence?)

So `strfry()`

just uses glibc‘s `random_r()`

stuff, using a 32-byte
state buffer (of which only 28 bytes are used), initialized with a
32-bit seed based on some CPU clock counter. (`random_bits()`

actually
gives less than 32-bits of actual bits but let‘s ignore that.)

It‘s clearly a “brute the seed“ sort of task, because even partially
reversing the internal state directly (let‘s say if it had initialized
with 28 bytes of random data from `urandom`

) just given
`strfry()`

output alone is very hard unless many such calls
can be queried in a series. (And even then it would be a finicky as all
hell.)

Another thing the task does is it calls `strfry()`

a lot of times on a
temporary string before it does the challenge-response step. The number
of resulting `random_r()`

calls is predictable, but you don‘t get any
output. So, it‘s “brute the seed“ and then “seek in the RNG
stream.“

Seeking in glibc‘s `random_r()`

stream turns out to be easy, as it
seems to be just a linear recurrence. It does varying things depending
on how many bytes of state you give it, but all of them are very simple.
It‘s sort of like a more flexible Mersenne Twister without the output
scrambling. For 32-byte state it will use $a_{n+7}=a_{n+4}+a_{n}$
which can be modelled most simply (to me, anyway) with matrix
exponentiation:

```
A = ZZ[2**32].matrix([[0, 0, 1, 0, 0, 0, 1],
[1, 0, 0, 0, 0, 0, 0],
[0, 1, 0, 0, 0, 0, 0],
[0, 0, 1, 0, 0, 0, 0],
[0, 0, 0, 1, 0, 0, 0],
[0, 0, 0, 0, 1, 0, 0],
[0, 0, 0, 0, 0, 1, 0]])
# now you can model state after n steps as A**n * state
# modulo indexing errors.
```

This is the same for seeking in plenty of RNGs whose internals are linear or “affine“ like xoshiro and MT. Note that the state has the last 7 outputs from the RNG, which will come in handy. Note also that the RNG discard the lowest bit when giving actual output. (But due to the lack of scrambling the low order bits will still be pretty bad by most RNG standards.)

Now about the seed. The state is initialized by
`initstate_r()`

with $state_{i}=seed⋅16807_{i}(mod2_{31}−1)$.

There‘s plenty of sources of bugs here. One is that it actually starts
outputting at $state_{3}$ or something like that, and then iterates in
reverse order from how I set up my vectors, which I should probably have
fixed—so that‘s on me. It also discards the 70 (in our case) first
outputs from `random_r()`

so these have to be taken into account. And of
course the mixed moduli need to be kept separate. Notably care must be
taken so numpy doesn‘t overflow the seed-modulus.

Now to match the `strfry()`

output. For example, given
`out = strfry(in)`

with unique characters, I know that the first random
call (`i=0`

) satisfies `random_r() % (len(in) - i) == in.index(out[i])`

.
Then re-do the swap, get a similar constraint from the next character,
and so forth. I did this for the 7 first characters for a 90+ char
string to make sure I had enough information to uniquely determine any
seed from a given state. (A potentially more clever idea would be to use
a longer string and only consider even indices, taking only 1 bit from
each character, to get constraints that are all mod 2? I didn‘t think
of that at the time.)

I think in the end I had some idiotic thing like:

$(A_{70+n+7}[seed⋅16807_{i}(mod2_{31}−1)]_{i=0…6}(mod2_{32}))_{7−j}≫1=index_{j}(modlength−j)(modrandom_indexing_errors)$

I mean… It‘s not pretty. I knew it wouldn‘t be. Doing this sort of thing 90% of my time is spent fixing off-by-one errors or putting indices in the wrong order or something like that. For my test code I had something like:

```
# precomputed stuff
start = np.array([282475249, 16807, 1, 470211272, 1144108930, 984943658, 1622650073], np.uint32)
A = np.array([
[0xe9373f80, 0xdc7dfb8c, 0x9a732dfd, 0x8e41234d, 0x45478090, 0xe4134087, 0x78b11946],
[0x78b11946, 0xe9373f80, 0xdc7dfb8c, 0x21c214b7, 0x8e41234d, 0x45478090, 0xe4134087],
[0xe4134087, 0x78b11946, 0xe9373f80, 0xf86abb05, 0x21c214b7, 0x8e41234d, 0x45478090],
[0x45478090, 0xe4134087, 0x78b11946, 0xa3efbef0, 0xf86abb05, 0x21c214b7, 0x8e41234d],
[0x8e41234d, 0x45478090, 0xe4134087, 0xea6ff5f9, 0xa3efbef0, 0xf86abb05, 0x21c214b7],
[0x21c214b7, 0x8e41234d, 0x45478090, 0xc2512bd0, 0xea6ff5f9, 0xa3efbef0, 0xf86abb05],
[0xf86abb05, 0x21c214b7, 0x8e41234d, 0x4cdcc58b, 0xc2512bd0, 0xea6ff5f9, 0xa3efbef0],
], np.uint32)
@numba.jit(nopython=True)
def find_seed(A, start, m):
seedvec = start.copy()
for i in range(1,2**32):
out = (A*seedvec).sum(1).astype(np.uint32)
out //= 2
out %= np.array(range(89,96), np.uint32)
if np.all(out == m):
return i
seedvec += start
seedvec %= 2147483647
```

However, doing a full brute like this wasn‘t fast enough against the
live server. It was really close but it required me to get lucky, and
DragonCTF has this super annoying PoW blocker on all their tasks… I
probably should have picked apart `random_bits()`

but at this point I
just wanted to move on with my life. I *think* it might also be possible
to thread the various moduli and simply *calculate the seed directly*
for each index-modulus and then CRT it (esp. if using the idea noted
above), but Lord almighty, the paper work involved. Yeah, brute forcing
is dirty, it doesn‘t feel rewarding or clean, but it‘s a dirty
real-worldsy task, so I didn‘t feel too bad about it.

Because this CTF had like 1 rev task, `poiko`

was free to help out, so
he wrote a utility in big boy code to do the above for me while I fixed
the remaining indexical bugs, and I could just use:

```
print("FINDING SEED VROOM VROOM")
s = int(subprocess.getoutput('./BLAS ' + ' '.join(str(x) for x in M)))
# s = find_seed(M)
```

Out comes the seed quick as mercury rain. Then seek the stream as above,
then seek the stream again because you forgot to add +1, then do the
final `strfry()`

on the given string, then realize you‘re an idiot and
*reverse* the final `strfry()`

because you misremembered the code, then
finally get the flag.

## Bit Flip 1

Diffie-Hellman between Alice and Bob to get a shared key which is used as an AES key.

The stream of numbers $2_{256}sha256(x)+sha256(x+1)$ for $x=r,r+2,r+4,…$ is scanned to find a prime. When a prime is found at $x$, then the lower 64 bits of $sha256(x+1)$ becomes Alice‘s secret. The numbers here are big-endian strings of 32 bytes.

$r$ is `os.urandom(16)`

and unknown, but you get to flip arbitrary bits
in it so once it‘s known you can set it to whatever you want. You‘re
told how many steps it took in the above sequence to find a prime
(simulating a timing attack), so finding the original $r$ bit-by-bit is
easy, something like:

```
def find_next_bit(oracle, num_known, bits_known):
next_bit = 0
while True:
next_bit = 2 * next_bit + (1 << num_known)
x = oracle(next_bit - bits_known - 2) # fetch ABC1111X
# if unlucky
if x != 0:
break
y = oracle(next_bit + bits_known) # fetch ABx0000X
# if C was 0, then oracle(y) fetched at offset +2 closer to the prime
# if C was 1, there's a tiny chance for false positive
return y != x - 1
```

(Could be made a lot more efficient by making use of past ranges. Probably only $1+ϵ$ queries is needed for most bits instead of fixed 2 I‘m doing here?)

But anyway, this basically solves the task, because now you get the prime, Alice‘s secret, and Bob‘s public key is given.

## babykok

I‘m a closeted Haskell advocate, I‘ve played with other dependent type things like Agda 2, and there‘s some community overlap, so I recognized it immediately. However.

I‘ve never used Coq before. I‘ve never used it before, but I now know that I hate it. I hate its syntax, its IDE, its logo, I hate all the people who think making puns on its name is smirk-worthy, and the people who pretend the name doesn‘t lend itself to puns. I hate how all the blogs make proving stupid toy theorems about Peano numbers seem like a fun adventure. I hate myself.

I have renewed understanding for beginners when they do stuff like:

```
if i == 0:
s = "a"
elif i == 2:
s = "b"
elif i == 3:
s = "c"
elif ...
```

or

```
log("initializing")
sum = i = 0
for _ in range(100000):
log("increasing the index")
i += 1
log("accumulating sum...")
sum += av[i]**2 + av[2*i + av[i]%2]
log("solving langlands")
solve_langlands()
log("printing sum")
print(sum)
```

I get it. That‘s me in Coq.

I solved the trivial ones easily. The `math_theorem`

one I just pulled
from some tutorial thing. (Super unrewarding way to solve it, but there
you go.) I was stuck on the last problem for a long time, reading
tutorials, blog posts, looking at other proofs, and so on. I also
infected `poiko`

and made him waste time on it. Both of us wasted
several combined hours trying to learn Coq‘s syntax and usage for this
dumb, pseudo-trivial theorem.

In the end we both came up with different proofs for the final theorem at nearly the same time. Mine was:

```
intro n. intro l. revert n. induction l as [|? ? IHl]; intro n; destruct n; simpl.
- intro. apply PeanoNat.Nat.nlt_0_r in H. contradiction.
- intro. apply PeanoNat.Nat.nlt_0_r in H. contradiction.
- destruct 1. exists a. auto. exists a. auto.
- auto with arith.
```

It probably makes no sense and looks ridiculous to anyone who knows anything about Coq.

`poiko`

‘s solution might make more sense (I couldn‘t say):

```
unfold lt; intro n; induction n as [| n hn]; intro l.
- destruct l; simpl. inversion 1. inversion 1. exists a. auto. exists a. auto.
- destruct l. simpl. * inversion 1. * intros. apply hn. auto with arith.
```

I‘m having a hard time deciding whether this is a good task or not. I mean it made me despise Coq, and that‘s not good; I‘m usually all for ivory tower type theory stuff. On the one hand, motivating CTF players to learn some higher-order type theory and theorem proving sounds great, but on the other hand it feels very arbitrary and polarizing the way this task was put together, like putting up a poem in Finnish that tells you how to make the flag through puns—free points for anyone who speaks the language, a frustrating Google hammer experience for the rest.

## Bit Flip 2 & 3

Similar to *Bit Flip 1* but Bob‘s key isn‘t printed. This ruins the
easy solution. You can still reverse $r$ as above, so you get a little
bit of control over what the prime becomes as well as Alice‘s secret,
both of which are of course known, but there‘s not much else to do.

These tasks had me super puzzled. Here‘s just various notes I made to myself mentally.

The brute forcing options included:

- Brute force Bob‘s 64-bit secret. Nope, not gonna happen.
- Brute force hashes to make Alice‘s secret 0 like a 64-bit PoW task. Also not gonna happen.
- Search the SHA256-stream for primes under which 5 (the generator
used) has a very low order, for example by making $p−1$ smooth.
Given that
*Bit Flip 3*is just like*Bit Flip 2*but with the added constraint of strong primes, it seemed to heavily suggest this was the path. But finding smooth numbers of this size without sieving you control is near impossible. So this was also not gonna happen.

I wondered about the `(p % 5 == 4)`

chech in *Bit Flip 3* — like are
there any other constraints for *random* primes where a given number
such as 5 might have low order in the associated field?

The task also specifically imports `is_prime()`

from `gmpy2`

, even
though it uses `Crypto.Util.numbers`

(which has `isPrime()`

). It‘s the
only thing it uses from `gmpy2`

too. That was curious, but I thought it
was just some author idiosyncracy. Besides, you can‘t really
*construct* the primes being generated, you can only pick one from some
set of random ones, so again it‘s not like you can even begin to think
about fooling a standard prime check.

All tasks also xors the shared secret with 1337 for seemingly no reason.
No idea why. The shared secret will in “almost all“ cases be a 500+
bit number, of which the 128 most significant bits are used, so it
doesn‘t matter. (Addendum: I was wrong here, I totally missed the
“bug“ in *Bit Flip 3*.)

IV isn‘t reused, there doesn‘t seem to be a meet-in-the-middle, data collection probably doesn‘t help.

I failed to even find the *problem* I was supposed to work on, so I just
put it out of my mind.

*Post-CTF Addendum:* after the CTF ended I found out that *Bit Flip 3*
can be solved trivially because `long_to_bytes(x, 16)`

does something
quite unexpected:

```
>>> long_to_bytes(random.getrandbits(513), 16)[:16]
b'\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x01'
```

This was actually really clever! Ever since I started joining CTFs I‘ve
had a low-key distaste for `long_to_bytes()`

and how pervasive it is. It
feels so Python 2-y. It‘s in every crypto task under the sun, so its
usage didn‘t stand out to me at all, and I never bothered to check up
on it in this case. I‘m sorry I missed this a-ha moment. Deceptive!

However it seems that several people solved *Bit Flip 2* by “brute
force the hashes modulo OSINT.“ Namely: the BitCoin blockchain has a
lot of such hashes, so you can find hashes there that match the
requirements. Although that‘s a clever idea, it sort of lowers the task
in my estimation greatly if that is the intended solution. It‘s just a
really unfun way to solve problems. It feels a lot like “enter the MD5
hash in Google“ or “look up the factors on factordb.com“ but version
2.0. Big boy OSINT is still OSINT. If anyone has a programmatic or
mathematical solution to *Bit Flip 2* that doesn‘t look up dirty
money-laundered power-guzzling hashes from other sources I‘m very
interested to see what I missed?

Not that anyone reads these notes, but hey.

There‘s only two hard problems in computer science: naming things, cache invalidation, and off-by-one errors. (Recycled Twitter joke.)↩

# ASIS 2020

## General Comments

I was bribed with pizza to rejoin this CTF after two weeks of being
largely offline. I didn’t play on Friday, as I was lost in the
braindance of *Cyberpunk 2077*, but was able to quit and reboot into
Linux on Saturday.

But oof, this CTF had some *actual* hiccups, not like the fake critiques
I’ve unfairly levied against other CTFs above. A bunch of things seemed
to have slipped through Q&A, there were a couple of broken problems and
questionable code in the crypto/ppc sections; `poiko`

also indicated
that the rev challenges were really uninspired, everything boiling down
to obfuscated x86 with guessy elements(?).

However, looking at CTFTime now, especially the voting section, it seems almost like there’s reverse group-think going on, like suddenly it’s Socially Acceptable to be Critical or to “offset” high votes with exaggeratedly low ones, so everyone’s letting off steam. Does this CTF really only deserve 20 points? That’s goddamn vicious. Was it really Ekoparty or e-Jornadas level? No way.

Their heart was clearly in the right place. The difficulty level was
alright, and the *intent* seemed good. Plenty of the tasks were
*interesting*, but sure, there were Problems and a
couple of tasks with bad “quality assurance.” I still feel this CTF
was leagues better than OSINT-riddled pub-quiz or escape-room CTFs. It
tasted like at least 50-60 points to me, even given the fuckups for a
few of the tasks.

## Chloe

(Aside: both this and *Coffeehouse* were coded in a certain style that I
dislike. It seems by a Python 2 programmer that’s afraid of using
anything but the trusty old `str`

datatype, and insists on using them as
if they were `bytes`

. This programmer also enjoys converting X-bit
numbers to text strings of “0” and “1” to do binary operations as if
working with actual numbers is a bit scary.)

So *Chloe* involved a fair bit of “reversing” the bad code to get data
into types that are actually sane.

What you get is `cipher + byte(r)*16`

(48 bytes) xored with a 16-byte
`key`

. `r`

is a random byte.

So modulo an 8-bit brute force, `key`

(originally from `os.urandom`

) is
already given in the last 16 bytes of the output.

Now for `cipher`

which is calculated in some Feistel-like manner with an
added permutation at the end:

```
L,R = flag[:8],flag[8:]
for i in range(16):
L,R = R, L ^ R ^ roundKey[i] ^ (2**64-1 if i%2 else 0)
cipher = permute(L + R)
```

The Feistel function is just a simple xor and predictable negation, so the whole “Feistel network” is easily collapsible, as you can simply check the xor keys that end up being used in left and right side:

```
# precalculate: (the magic numbers are a[i] = a[i-1]^a[i-2]^2**(i-1) starting
# from [0,1])
xorL = reduce(op.xor, [roundKey[i] for i in bits_by_idx(14043)])
xorR = reduce(op.xor, [roundKey[i] for i in bits_by_idx(28086)])
# and the above Feistel-loop simply collapses to:
cipher = permute(flag ^ (xorL << 64) ^ xorR)
```

The `roundKey`

array is given by the `key`

found earlier.

The last `permute()`

step here performs a fixed random permutation on
the bits in each byte. There’s only $8!$ such permutations so finding
the reverse one is also easily brutable (~15-16 bits). To brute force
the permutation I just did it the simple thing by turning each
permutation into a 256-byte translation table, so I could simply do
`cipher.translate(Pinv[i])`

to test a permutation.

So: cycle through the bytes, reverse permutation, and do a simple xor and check if it gives flag data.

## Coffeehouse (cafe house)

A 32-bit block-cipher which uses a xor, add, and shift in a loop for diffusion. I.e. it takes the flag and encrypts 4-byte blocks by looping over the block 32 times and doing various add-and-xor-shift with the key and a counter.

I really wish the cipher wasn’t 32-bits because it’s just too tempting to brute force…

You only get the encrypted flag as data, so I guess that’s clue #1 that
at least a *partial* brute-force was expected. You guess that no utf-8
is used in the flag so there’s N bits that you know were 0 in the
plaintext, and from there you can filter keys by writing the trivial
`decrypt()`

function which was missing. Technically this dumb way ends
up being around 37-bit(?) brute force.

This is what I did and I’m not particularly proud of it. My program (C++) found the solution in seconds, but a numpy equivalent with parallel keys would probably not be much slower.

My guess is that maybe the task author *intended* you to find some way
to strength-reduce the loop? Hmmm. Like, what if it was 64-bit or
128-bit instead? Or the inner loop had several orders of magnitude more
rounds? Then one would have to solve it by finding invariants… And
suddenly it’s a much cooler task—given that there *is* a solution to
these strengthened problems. I don’t really know, I didn’t look into
it, I just did the bad thing and moved on. But my intuition tells me
that the diffusion step is really weak and there’s plenty of invariants
to be found, so I wouldn’t have complained if I was forced to look into
it more, forced to not be lazy. (Again, given that there actually is a
solution?)

## congruence

I spent the better half of Saturday evening looking at this task. Spoiler: I didn’t solve it. (I don’t think anyone did? I think many people were in the same boat as me, spent a lot of time on it, and maybe ended up frustrated?)

It seemed magical, a promise of real beauty! So simple, so clean, pure
mathematics, yet I had no idea how to solve it. That’s the best kind of
feeling, isn’t it? When you feel stupid because of *math*. It drew me
in immediately.

Secrets: $flag$, $p$ (a fixed prime modulus of length ~512 bits), and $s$ (presumably a very small non-negative integer).

You’re given five different pairs of numbers $⌊a_{i}/256⌋$ and $c_{i}=a_{i}(modp)$. Each of the numbers $a_{i}$ are constructed from ostensibly random alphanumeric bytestring of the same length as flag. You’re also given $flag_{e}(modp)$. So far so good. The coup-de-grâce is the exponent: $e=7⋅37⋅191⋅337⋅31337⋅2_{s}$.

Hmmm…? The usual Euler hack of finding $p$ from $p⋅k=gcd(a_{i}−c_{i},a_{j}−c_{j})$ (for some small $k$) doesn’t work because $e$ is too large, even if $s=0$. The numbers involved would be hundreds of gigabytes. Oh my!

The fact that the $a_{i}$ numbers get their last byte censored seems to indicate that a solution has to be easily calculable, because the task will necessarily(?) involve a small brute-force step to test a given solution.

So, some ideas I went over:

- is there some way to calculate
`remainder(a^e, b^e - c)`

iteratively, without fully reifying the numbers involved? - sub-question: is there some way to calculate the remainder or gcd directly from vectors of residues modulo sane primes? Prime-factor FFT?
- can continued fractions be used somehow, as continued fractions and gcd are very similar algorithms.
- can some clever algebra be used to relieve $e$ of some of its odd exponents? In particular: has $p$ been carefully picked to be revealed in one of the low-term expansions of some $a_{e}−b_{e}$ expression?
- is $p$ of a special form? E.g. does $e$ divide into $p−1$ giving a power-of-two quotient? (I can’t recall if I even tested these last two, because it would have made the task ugly and idiotic; I didn’t want it to be idiotic, I really wanted to preserve the task’s beauty.)
- in the relations $a_{i}−c_{i}=k_{i}⋅p$, we
*can*recover $k_{i}⋅p(modq)$ for saner numbers $q$ (in particular we can find small factors of $k_{i}$), but how does that help? All the $k_{i}$ integers are too big to actually be used or reconstructed. - is the special form of the $a_{i}$ numbers relevant?
- the $a_{i}$ plaintext numbers are
*ostensibly*random, but are they really? Could a seed have been fixed such that it can be used to reveal $p$ or $flag$ in a manner ordinarily impossible? Hmm, unlikely, right? - does it help to find vectors (e.g. LLL) combining the given terms to 0?

Most of these avenues I never really tested. What kept stopping me was the brute force step on the $a_{i}$ terms: I felt I had to be damn sure about an implementation lest I waste a lot of time, something I could actually verify before brute-forcing.

So I ended up really disliking the brute force step added to this task
actually. (Unless it wasn’t intended as a brute-force step, but some
kind of magic fairy-land lattice problem? I don’t see how that’s
possible, though?) It seemed unnecessary. I also don’t know why $s$ was
hidden as well, that just seemed *overly* sadistic.

But in the end I never really became sure of anything. If there *is* a
solution, I would be very happy if someone could enlighten me! So far my
best intuition is in the direction of some novel FFT application?

## Trio couleurs

Spoiler: I didn’t solve this one either, but noting my thoughts.

I think this task was released during the day on Saturday. The server
instantly disconnected you at first, so I ignored it for a while, and
only came back to it when I finally gave up on *congruence* above. By
then it was late and I found that this was a really work-intensive
task…

(Again this style of converting numbers to text strings of “0” and
“1”, so I take it it’s the same task author as *Chloe* and
*Coffeehouse*, who does seem to have a particular phobia.)

Code is given which looks like DES, used in some broken 3DES-ish manner. It didn’t encrypt the same as a reference DES implementation, so then I have to waste time to figure out the difference. Turns out it was regular DES reduced to 8 rounds. (Unless this kind of debugging is relevant for the task, I wish the task made these things clearer, and simply stated sincerely that it was “DES reduced to 8 rounds” or similar.)

But OK, so it seemed like a linear cryptoanalysis task. The 3DES part of the task is broken since it trivially leaks the penultimate ciphertext.

It doesn’t actually seem like a bad task, it’s “true” crypto, though I believe it requires some prior experience to be doable in hours as opposed to days. Key discovery from 8-round DES is, as far as I know, not trivial, and require a lot of plaintexts. I haven’t done it before, nor do I have any linear analysis code, so this contributed to my decision to skip the task, as it would potentially be a big time sink, I needed sleep, and not many hours remained of the CTF.

It’s something I could have fun solving off-line though.

## Baby MD5

I’m embarrassed to admit I also wasted a long time staring at this stupid task doing exactly what the task encouraged me not to do: overthink it. I wrote a script to solve ASIS’ proof-of-work using all available cores to make reconnects fast, and I tried to sample the parameters to see if it would sometimes give $n=m$ in which case you can make an MD5 collision and… Well, I suppose this was the “trap” set up to trick idiots like me.

While I was doing this, `poiko`

came along and remarked “hm, isn’t
‘dead’ hex, though? Couldn’t you—” and I instantly cringed and
facepalmed pretty hard. Yeah, indeed… Dumb, dumb, dumb.

Connect, assert that $m>n$ and then find a string `X`

such that
iterating the function `lambda x: md5(x.encode()).hexdigest()`

starting
with `prefix + X`

produces a hex string that starts with `dead`

after
$m−n$ iterations; that hex string is the second input. Could also do
this trick for $m<n$ if the random prefix turns out to be (lowercase)
hex.

## גל התקפה (Attack Wave)

I solved this one while taking a break from congruence above.

The boring part: 5.1 WAV file, two channels was a synthetic voice saying
what sounded like “RGBA” and the other 4 channels contained random
noise. The noise data were all in `[0,255]`

so I extracted
that, treated is as image data, and that’s where the fun^{1} begins.

It’s what I like to call a “numpy massaging” task. In my mind these
tasks exists in a grey area between the guessy and the
mathematical^{2}. You don’t have the source, you don’t know what the
task authors did or their reasons for doing it, so in that sense it’s
guessy, but what you do have is pure, cold data and with data you can
“use the force, Luke.” There are certain intuitions to follow in
seeking low entropy, seeking order in the chaos. As long as you’re
given some breadcrumb trail of patterns to start with, it’s usually
fine.

For example, displaying the byte data as images of whatever dimension shows semi-repeated patterns on the top and bottom (low entropy), whereas the middle of the image appeared much more random (high entropy), which makes it “feel” like an image with maybe uniform coloring around the edge with text in the center, like this:

Now we seek to lower this entropy with various basic operations. Xoring
all the channels together direct only increases the entropy, but I
noticed that the three “RGB” channels were very similar (whereas the
“A” channel was more chaotic). Looking at the differential of where
certain numbers or bits occur is a good way to get a feel for the
repeated patterns. They start out with this very cyclic pattern, but
each is offset a little. So, rotating the arrays to by small offsets
aligns them to produce *very* low-entropy patterns, differing in very
few bits, so you know you’re closer. I combined this trick with the
more mysterious “A” layer as well, and combining “A” with any of the
other channels (or all three) quickly produced an image with the flag
clearly readable:

Unfortunately I have no idea what the author’s intent was. From the periodic artifacts still present in the above image, my guess is that there was indeed some “intended” way where you get an actual clean image, possibly with the letters all black and clearly defined. If I were to guess further it might have something to do with the fact that the data size makes it feel like there’s a row/column missing from the image, that the data given is actually just differentials between the pixels… But I didn’t investigate further after I got the flag. It was a fairly quick task.

## Election & Galiver

I don’t exactly know when these tasks were released, but I didn’t see them until I joined up on Sunday, with like 4-5 hours left of the CTF. Oof.

Why, oh why, do most CTFs insist on these staggered releases? Every damn
time. Is it intended to “keep things exciting” in the competitive
sense until the very end? Because I don’t think it’s working^{3}.

*Galiver* looked cool, seemed to be DH with gaussian integers, but I
made the terrible choice of focusing on the PPC task *Election* instead
because it “seemed easy.”

Then I invested too much into *Election* and I was sunk by my own
sunk-cost fallacy. The cherry on top was that this task was actually
bugged server-side.

*Election* was a PPC where you are given a string of base-10 digits and
have to give some binary arithmetic expression like `X op Y`

(where `op`

can be one of `{+,-,*,/,**}`

) in which these digits can be found as a
substring. The expression you give cannot be more than 8 characters
long. You can also just enter a single number so for levels up to
strings of length 8, you can just feed it back the string given granted
it doesn’t start with a 0.

Each level the base-10 string you get has length +1. It has an unknown
number of levels, which is the first guessy part, because at first you
don’t really know what you’re up against. Making a small in-memory
database of the shorter expressions and their digits gets you to length
12+ strings and you realize you’ll need to make a full database of all
strings producible by these arithmetic expressions. I dumped these to
external files and just used `grep`

in my solve-script. These files were
several gigabytes each.

The addition, subtraction, and multiplications are trivial and can be ignored.

All the guessy elements stem from the *server-side code being hidden*.
That’s *always* a bad sign. First you have to assume (pray) that it
actually picks *valid* substrings. OK, fine, I’ll allow it, but it
doesn’t feel *great* to trust the author like this, especially when
knowing that the author is merely human, and the code might not have
been tested by others.

The exponentiation expressions are the simplest to dump, but most
time-consuming in CPU time (`gmpy2`

and not Python’s slow bigints), so
I started with those. Constructing the database of digit expansions from
the divisions are the more brain-intensive and tricky if you want to do
it “properly.” From some pen and paper and Wikipedia I re-figured the
repetend in the expansion of, for example, `a/b`

is of length $k$ where
$k$ is the least number such that $10_{k}=1(modb)$. Thus for prime $b$ it will be up to $b−1$ digits — so I figured
this part of the database migth require a magnitude more disk space,
because the numbers involved are larger (`D**DDDDD`

vs `D/DDDDDD`

), and
thus wanted to take some extra care to only store what is necessary. I
knew that for the cyclic numbers the numerator can be fixed to 1, but it
was unclear to me whether $a/b$ for composite numbers could give new
“unique” decimal sequences. I don’t have that much SSD disk space…

But bottom line: expressions on the form `D/DDDDDD`

can give *up to*
$999999+ϵ$ digits before repeating. And this is where the
server-side bug comes in. As my database kept filling up I started
encountering this:

```
[DEBUG] Sent 0x9 bytes: b'1/102451\n'
[DEBUG] Received 0x60 bytes: b'sorry, the result of your expression does not contain the given substr: 07718811919\n'
```

Yet:

```
>>> getcontext().prec = 200000
>>> '07718811919' in str(Decimal(1)/102451)
True
```

Hmmm. I kept getting more and more of these errors, barring me from
further levels. I didn’t really know what was going wrong. I thought
maybe I had a bug somewhere making it skip certain “easy” expressions
and giving the server fractions it didn’t expect (like if the server
erroneously expected some fixed low-term expression for the later
levels). It felt really bad to be in the dark. Finally I went on IRC and
tried to contact admin. It turned out the server actually only
calculates fractions to `99999`

digits:

```
[...]
15:56 <factoreal> see this:
15:56 <factoreal> if len(formula) <= 8:
15:56 <factoreal> if re.match(regex, formula):
15:56 <factoreal> try:
15:56 <factoreal> if '/' in formula:
15:56 <factoreal> a, b = formula.split('/')
15:56 <factoreal> res = mpfr(Fraction(int(a), int(b)), 99999)
15:56 <factoreal> else:
15:56 <factoreal> res = eval(formula)
15:56 <factoreal> if substr in str(res):
15:56 <factoreal> return True
15:57 <franksh> hm, yeah, so bugged. :/ 99999 isn't enough to cover all repetends/decimal expansions of some fractions.
15:58 <franksh> but i didn't find a way to get or see server source?
15:59 <factoreal> the cycle of 1/p where p is prime is p-1
15:59 <factoreal> franksh I will share it after CTF, ok?
16:00 <franksh> yeah, but 6-digit numbers are valid in d/dddddd so need an order of magnitude more to cover all
16:00 <franksh> and sure
16:01 <factoreal> you are right
```

The classic off-by-one-order-of-magnitude bug. But by then, as is shown by the timestamps above (CTF ended 16:00 for me), it was too late to do anything. That felt pretty shitty, and a sour note to end it on.

From IRC, I think the one solver of this task said something about only
searching fractions `1/p`

with (fixed) 30000 precision(?), which seemed
like pure luck (or they’re just a very good guesser). That also gives a
clue to a *third* problem with the task, in that it didn’t select the
base-10 strings randomly at all. You could “guess” that exponentiation
was irrelevant, and only search `1/p`

up to 100000 in an on-line
fashion? Urk.

Pretty disappointing. Should probably have done *Galiver* instead. Oh
well, offline I guess.

for some definition of fun. I must admit I sort of enjoy data/numpy massaging, especially when it’s precise binary data (as opposed to analog data where you spend most of the time on cleanup or noise filtering).↩

“True” guessy challenges for me are more like OSINT riddles or web challs where you have no data or feedback—you simply have to “get” the core idea, and it’s a binary yes/no. “Numpy massaging” tasks become a sort-of subgame where you play with combining arrays or matrices while trying to make them “fit,” and there

*is*some indication or feeling whether you’re “closer” to a solution or not, e.g. by seeing what patterns you’re able to produce.*Xoared*from BalCCon2k CTF was another example like this, where you do simple transforms of some blob of data, seeking lower and lower entropy, and eventually it suddenly becomes a readable image.↩Besides, online CTFs are completely broken in terms of the “competitive element,” since forming super-teams ad infinitum and rolling up with a 20-man group is all fair play, given the right communication tools and politics. The only “competitive” element that exists is between antisocial dissidents that have refused to merge into super-teams (yet)?↩

# JustCTF 2020

## General Comments {#justctf2020}

Back for another CTF after Christmas depression.

(CTF apparently called 2020 because it was rescheduled even though it took place in 2021.)

Overall seemed like an okay mid-tier CTF. “Decent, but not great.” The tasks I looked at were a bit on the easy side, there were only two crypto, one of them very meh, and pretty much all of the misc was meh-adjacent.

I only actively played the first day (Saturday) so can’t really say
that much about the tasks released late^{1}.

Also, one complaint `poiko`

had that I fully agree with and can forward:
all flags outputted by tasks should be *consistent*. (C.f. `reklest`

.)
It’s such a basic thing it should be considered a cardinal sin to
violate it. If the nature of a task *requires* a different format, there
should be a big note describing it in mathematically precise English.
(C.f. the convoluted description on `ABNF`

.)

## 25519

Trivial crypto task where you get the private key. The hardest part is simply to mentally parse the script to see what it expects.

The goal is to pass this check with 8 different parameter sets:

```
def verify(signature, P, m):
I, e, s = signature
return e == hashs(m, s*G + e*P, s*hashp(P) + e*I)
```

You provide `I, e, s`

and you’re given everything else: `P, m, G, x`

,
where $x$ is the private key in $P=xG$. `hashp(P)`

can be treated as
a new random point on the curve.

Thus, to generate however many solutions you want:

```
x = ... # the given private key, P=xG
Q = hashp(P) # treat hashp(P) as a novel point
# random nonces
t = <random>
h = <random>
# the target hash that we will construct
e = hashs(m, t*G, h*Q)
# set s such that t*G == s*G + e*P
s = (t - e*x) % ec_order
# set I such that h*Q == s*Q + e*I
I = (h - s) * pow(e, -1, ec_order) * Q
# e, s, I is a solution
```

Do this 8 times and get the flag.

## Oracles

Web-ish crypto where you’re given the code for a nodejs server. (Whose instances are gated behind a 28-bit PoW.)

It vaguely involved a quadfecta of things I dislike (web, javascript, remote, docker), so I dragged my feet a lot before I was bribed with pizza to properly look at it.

But so: the instance gives you three ciphertexts, two being of known
plaintexts, the third being the flag. Each is encrypted with `RSA_OAEP`

from `asmcrypto.js`

.

(Note: the first iteration of this problem, and the one I wasted the
most time on had the public key redacted, so you didn’t know the
modulus (normal exponent of 65537 was implied). Which I felt was pretty
sadistic and painful. I realize you could technically recover it from
the given ciphertexts, doing GCD on 64MB numbers, but that’s pretty
painful… (Either reimplement OAEP or use noedjs, probably need to use
a good bigint library like GMP and not Python’s slow-ints, etc.) They
later updated the task to provide the pubkey, but they only noted this
on the “news” page and not on in task description itself so I totally
missed it until `poiko`

pointed it out to me. I feel that’s sort of bad
communication.)

On the server there’s an oracle query that does this:

```
const oracle = new asmCrypto.RSA_OAEP(privkey, new asmCrypto.Sha256(), fs.readFileSync('./oracles_stuff/' + oracleName));
var response = "I won't answer.";
try {
const result = oracle.decrypt(question);
asmCrypto.bytes_to_string(result);
} catch(err) {
//
}
res.render('index', {response: response});
```

That is, you give it some data, it attempts to decrypt it and simply discards everything, ignoring errors.

It was easy to guess that it would be a timing attack problem, but because you have to actually do non-trivial stuff (i.e. web) to test your guess I was still loath to commit.

However there were two factors that severely strengthened my guess that it’s a timing attack problem.

Inspecting the `asmcrypto.js`

source strengthens the guess tho:

```
// ...
this.rsa.decrypt(new BigNumber(data));
const z = this.rsa.result[0];
const seed = this.rsa.result.subarray(1, hash_size + 1);
const data_block = this.rsa.result.subarray(hash_size + 1);
if (z !== 0) throw new SecurityError('decryption failed');
const seed_mask = this.RSA_MGF1_generate(data_block, seed.length);
for (let i = 0; i < seed.length; i++) seed[i] ^= seed_mask[i];
const data_block_mask = this.RSA_MGF1_generate(seed, data_block.length);
for (let i = 0; i < data_block.length; i++) data_block[i] ^= data_block_mask[i];
const lhash = this.hash
.reset()
.process(this.label || new Uint8Array(0))
.finish().result as Uint8Array;
for (let i = 0; i < hash_size; i++) {
if (lhash[i] !== data_block[i]) throw new SecurityError('decryption failed');
}
// ...
```

I.e. it checks immediately if the first byte is 0 and fails early if
not. Only after this check does it hash the label data. And note in the
oracle source you can provide whatever file you want for the label. (I
found the largest file on the docker, which was something like
`../../opt/yarn-v1.22.5/lib/cli.js`

and used that.)

Second and final hint that it’s a straightforward timing attack: the server explicitly gives its response-time in the HTTP headers so you don’t actually have to time it yourself.

It turns out the difference between a valid decryption and an invalid one was something like ~7ms vs ~40ms, so yes, now you have a decryption oracle (on the first byte).

From there, finally!, there’s actual math. The basic idea is that you can multiply the message $m$ by a number $k$ (because $(m⋅k)_{e}=c⋅k_{e}$) and have the decrypt method tell you whether the first byte (in big endian) is 0 or not, meaning that $l⋅n≤k⋅m<l⋅n+2_{1024−8}$ for some $l$. I use this as an invariant.

This was all napkin math I did at the time but hopefully correct.

Set $B=2_{1024−8}$ because we’re checking on high 8 bits. First we need to discover a $k$ to start with, i.e. where $l=1$ in the above case. This is $k=⌈n/m⌉$. I used $⌈n/B⌉$ as an estimate and brute force it, even though there’s better ways, this is simpler.

The idea is to keep multiplying the values by 2 and readjusting based on
what the oracle tells us. We always want $k$ (named `a`

in the code
below) to be the least possible so that $k⋅m$ “barely” overflows
on the modulus. When we multiply all values by 2, we then have to figure
out if we’re in $[0,B)$ or $[B,2B)$ based on what the oracle tells us.
Code like so: (comments and cleanup added after-the-fact.)

```
def discover(n, B, oracle):
# First discover a = ceil(n//T)
# Use a more efficient techinque here if (n//B)*T < B
assert not oracle(n//B)
a = n//B + 1
while not oracle(a):
a += 1
# assert a == n//T + 1
for i in count(0):
# assert 2**i*n <= a*T < 2**i*n + B
if B//a == 0:
return (2**i * n + B)//a
a = 2*a - 1
if not oracle(a):
a += 1
```

It failed the first time I tried it against the live server, but that
might have been a timing glitch. I didn’t have any error detection
stuff in my code, for when the ranges have become invalid. It might be
possible to do something more clever and robust here, but *eh, c’est la
vie de CTF*.

So now you have $m$ and I need to decrypt it with this OAEP nonsense. I
just copy pasted the bytes of $m$ into an array and used `nodejs`

so I
could apply `asmcrypto.js`

directly by copy-pasting the latter half of
the decrypt function. I didn’t really feel like reimplementing anything
or looking up more APIs.

All in all I have to admit it’s a pretty well-made crypto task from a certain perspective. If you’re the sort of person who cares about practical applicability and the real world then it’s A+. Unfortunately I’m not and kept whining over the fact that I had to actually read js code.

## That’s not crypto

Listed as reverse so I only discovered it by accident. Python byte code
that contains a list of big numbers. These numbers are used as
coefficients in a polynomial $P(x)$ and there’s a check that does
something like `all(c*P(d*x) == c*c for x in accumulate(flag))`

for some
fixed literals `c`

and `d`

. In other words, the cummulative sums of
flag’s bytes are the roots of $P(x)−c$ divided by $d$. Instead of
trying to factor this polynomial, I just did this in REPL:

```
>>> n = lambda l: [i for i in range(1,256) if p(d*(l + i)) == c][0]
>>> r = [n(0)]
>>> while True: r.append(n(sum(r)))
```

It will stop on exception and then `print(bytes(r))`

gives the flag.

## PDF is broken, and so is this file

Put off this task till late because I vibed from the description that it
would be some stupid rabbit hole thing. `poiko`

did some preliminary
analysis and told me I was right.

The “PDF” file you get can be intrepreted as a Ruby script; and as
such a script it starts a web server on localhost where you can download
the file itself but now with a .zip extension. I.e. the zip file is
identical to the original file. I have to admit that already had me
going “wtf.” (It’s weird that `unzip`

doesn’t even complain, even
though it obivously needs to skip large parts of the file to reach the
zip data stream.) There was also some false(?) hint like
`"readelf -p .note might be useful later"`

that never became relevant
for me.

In the zip file there’s a meme link to a YouTube video talking about
how stupid these “broken PDF” tasks are and asking CTFs to please stop
running them, together with a `mutool`

binary for repairing broken PDFs,
and a script to run said command in a docker (should you need it).
Running the command gives a PNG that’s full of conspiracy theory
spoofs, references to Tonybee tiles, Frank Chu, and the like, and other
stuff that treads the line between tragic schizophrenia and memetic
mystery. Oh, and also a reference to the “it’s a trap” meme and
something blah blah about PDF bombs.

That is when I sort of just bailed out and took a more brute force approach:

```
def streams(dat):
j = 0
while True:
try:
i = dat.index(b'\x06stream\x0a', j)
except ValueError:
return
j = dat.index(b'\x0aendstream', i)
yield dat[i+8:j]
```

I.e. I gave up on trying to make `challenge.pdf`

play nice with any
other program and simply extracted the streams in a “dumb” way, saving
them all as files, deflating those that needed deflating.

Here there’s various images (the ones rendered in the conspiracy PNG
above), font data, and random stuff like another false(?) hint about
`pip install polyfile`

and so on.

But one of the streams has a lot of hex codes listed as text data. Converting it to binary (it had the JPEG magic bytes) and opening it gives the flag.

I’m lowkey impressed by the work that probably went into creating the task, and of course the memes, but it’s still a “thanks, I hate it” sort of thing.

of which only

`Steganography 2.0`

was relevant for me anyway. Tho regarding that task: a pickled 5GB numpy array? Bitch, please, does it look like I’m made of RAM?↩

# zer0pts CTF 2021

## General Comments {#zer0pts2021}

I don‘t really have any comments regarding this CTF because I didn‘t really play much. I hardly logged in, I was “busy“ and not very motivated this weekend, but just noting it here in my CTF “diary.“

I know `poiko`

enjoyed it though, so it was probably good.

The motivation I *did* have was mostly to help out because I‘ve been a
shit team player for a while now, not really finding any will to
`li<code>^W</code>play`

{=html}. `poiko`

kept posting tasks to me on
Discord and I gave comments when able.

I only looked at the easy tasks. I skipped the two elliptic curve tasks completely because they weren‘t immediately obvious to me—I imagine they would involve googling stuff and/or Thinking, which I wanted to avoid.

## Easy Pseudo Random

An RNG^{1} is set up, $s_{i}=s_{i−1}+b(modp)$.

You‘re given the high bits of two consecutive states and the goal is to reconstruct the stream. So you have $(H_{0}⋅2_{85}+u)_{2}+b=H_{1}⋅2_{85}+v$ with $u,v$ being two unknown “small“ (85 bits) integers.

With high bits anything I immediately think of lattices as the obvious candidate, so that‘s what I tried here for a quick solve. I just ignored the terms $u_{2}$ and $v$ completely, trying to just minimize the simple and linear expression $2_{86}H_{0}v+b−2_{85}H_{1}(modp)$ with a small $u$. I can 100% guarantee that there‘s a cleaner and more robust way to attack the full polynomial, but I‘m dumb and would have to Think or google for it which, like I said, I wanted to avoid.

The error of the linear equation is expected to be around 170 bits, so I tried to find short vectors from a matrix like so:

```
[[ p+b-(H1<<85), 0, 1<<K0 ], # constant term
[ H0<<86, 1<<K1, 0 ], # linear factor of u
[ p, 0, 0 ]] # modulus
```

Starting with parameters `K1 ~ 86`

and `K2 ~ 171`

that I massaged a little until the correct solution for $u$ tumbled out.
Was mostly a quick hack. The rest is trivial.

## wa(rsa)mup

`poiko`

solved this one because he thought I wasn‘t going to play. I
was offline most of Friday and early Saturday.

I didn‘t know he had solved it and so I double-solved it. But no matter, it was a quick/standard problem.

Standard RSA setup. There‘s some code noise around padding but it‘s
irrelevant. The immediate suspect thing is you‘re given two
ciphertexts: $c_{0}=m_{e}(modn)$ and $c_{1}=⌊2m ⌋_{e}(modn)$. Because the message $m$ (probably) ends with
`}`

it is (probably) odd, which is relevant because now we
have a non-trivial relation $m_{0}=2m_{1}+1$. (If the relation had been
$m_{0}=2m_{1}$ instead, it wouldn‘t provide us with any more information.)

Plugging this relation into the ciphertext equations and treating the message as an “unknown“ we have two polynomial expressions which both have the message as a root and we can take their gcd:

```
>>> g = (x**e-c1).gcd(((x-1)/2)**e - c2)
>>> g
[113128245622357901252722513691018597529212818374857225068412230803117273431764336733611386199949429353010088688478215740193848150958821139378543874939689746528140403143114943900235798243884022251713648885768664407134358754271963457290992686093387882808160942022485994772070150575070443505280922344644888038580 1]
>>> (-g[0]).lift().bytes()
b'\x02\x81\xae\xed \xdd\x07\x12;\x99\xc7d:\x99\x1a8\x16\xfe\xe6<\x18\x1dw\xea&\xfb\xfc\x8a\xa7\xa8\xba\xfa\xd8\xbe\xdf\x01\x13\xcb\xd3\x99\x9c\xf3_\x18qw\xb99}\'Q\xd7~\x03&^\xcd\x9aw\xf0\xef\xb5\x04\x1b\xb7\n\xe1\xcd"\x95ff]\x0c(H\x99\xb5\xed\xc3\x82\x9dl\xe4\x8c\xddx\xfd\x00zer0pts{y0u_g07_47_13457_0v3r_1_p0in7}'
```

(Here using some number theory wrappers (using NTL) that I wrote for regular Python so I didn‘t have to use Sage so much. A leap for a man, an insignificant stumble for mankind.)

## janken vs. yoshiking

Didn‘t really solve it so much as I just told `poiko`

my guess for the
solution after seeing it. He wrote the actual solution.

What I wrote on Discord + added corrections:

```
# assumption: secret key x is odd.
# re-run program until x is odd
# idea, match quadratic "residue-ness" of numbers to rule out
# possibilities.
# put the numbers in GF(p) åker etc.
c0sq = c0.is_square()
canbe = 7 # 1|2|4 == all
if c0sq != c1.is_square():
canbe ^= 1 # can't be rock
if c0sq != (c1/2).is_square():
canbe ^= 2 # can't be scissors
if c0sq != (c1/3).is_square():
canbe ^= 4 # can't be paper
choice = ['invalid',
3, # beat rock
1, # beat scissors
1, # 50-50 to win/draw
2, # beat paper
3, # 50-50
2, # 50-50
randint(1,3), # ???
][canbe]
```

Which apparently worked.

This was probably the best (as in “non-boring“) task of the ones I looked at.

## ot or not ot

This one too, I didn‘t actually write the code or interact with the server, I just wrote on Discord what I thought the solution would be:

```
# reasoning:
# X = bit0 ^ ( a^r * c^s )
# Y = bit1 ^ ( b^r * c^s )
# Z = ( d^r * t^s )
# a = 2 or whatever
# b = a^-1
# c = -1
# X*Y = (-1)^(2*s) = 1
# send:
# a = 2 or whatever
# b = pow(a, -1, p)
# c = -1 % p
# d = ??? doesn't matter?
# then get x,y (z doesn't matter???) and this should work:
def get_two_bits(X,Y):
for bit0, bit1 in product((0,1), (0,1)):
if (X^bit0)*(Y^bit1)%p == 1:
return bit0 + 2*bit1
```

Which it did. I don‘t really understand what the `d`

,
`t`

and `z`

variables were doing in the program,
they seemed completely superfluous? Perhaps the problem author had
overlooked the possibility of giving `-1`

for `c`

so that $c_{k}$ becomes a trivial factor?

## Triple AES

This was the last task I did because it took much longer than the others (hours vs minutes), and so felt I was getting “sucked in.“

Setup: you can decrypt/encrypt arbitrary messages with some cipher that chains three AES instances (different keys) working in different block modes (ECB, CBC, CFB). You don‘t control the IVs generated for encryptions. You‘re disconnected immediately when retrieving the encrypted flag.

Not very hard, but very fiddly and ugh. These are my notes from the comments (I tend to “work out“ my thinking in comments):

```
# encryption:
# PLAIN: m0 || m1
# ECB: E0(m0) || E0(m1)
# CBC: E1(iv0 + E0(m0)) || E1( E1(iv0 + E0(m0)) + E0(m1) )
# CFB: E2(iv1) + E1(iv0 + E0(m0)) || E2(E2(iv1) + E1(iv0 + E0(m0))) + E1( E1(iv0 + E0(m0)) + E0(m1) )
# decryption:
# PLAIN: c0 || c1
# CFB: E2(iv1) + c0 || E2(c0) + c1
# CBC: D1(E2(iv1) + c0) + iv0 || D1(E2(c0) + c1) + E2(iv1) + c0
# ECB: D0(D1(E2(iv1) + c0) + iv0) || D0(D1(E2(c0) + c1) + E2(iv1) + c0)
# dec(0) : D0( D1(E2(0)) ) || D0( D1(E2(0)) + E2(0) )
# : D0(D1( E20 )) || D0( D1(E20) + E20 )
# enc(0) : E2(R1) + E1(R0 + E0(0)) || E2(E2(R1) + E1(R0 + E0(0))) + E1(E1(R0 + E0(0)) + E1(0))
# : E2(R1) + E1(R0 + E00) || E2( E2(R1) + E1(R0 + E00) ) + E1( E1(R0 + E00) + E10 )
# dec0(enc0,iv1=R1,iv0=0) :
# D0( R0 + E0(0) )
```

The intuition was that the special values of `enc(0)`

,
`dec(0,iv1=0,iv2=0)`

, or `dec(0,iv1=X,iv2=0)`

etc would somehow be
instrumental in unwrapping some of these constructs and finding
“primitives“ (like `aes_ecb_encrypt(key,0)`

) that can be
looked up in precomputed tables. (The keyspace for each of the three
keys was only 24 bits so each on its own is easily brute forceable.)

The last part of the comment above is me discovering the starting point
for recovering `key0`

and after that everything becomes easy.

The server has a very generous timeout which seems to indicate that some of this brute forcing could take place to dynamically construct messages for the server to encrypt/decrypt. Thus there‘s probably a better way of doing this, but I didn‘t look very hard. The only benefit to how I did is that no computation is needed when connected to server, just fetch a few key values and disconnect. Code:

```
from flagmining.all import *
z = byte(0)*16
n_keys = 2**24
ecb = lambda k: AES.new(k, mode=AES.MODE_ECB)
print("generating keys...")
keys = [md5(bytes(b)).digest() for b in tqdm(product(range(256), repeat=3), total=n_keys)]
print('making lookup table...')
enc_to_key = {ecb(k).encrypt(z): k
for k in tqdm(keys, total=n_keys)}
r = remote('crypto.ctf.zer0pts.com', 10929)
# [..] wrapping server communication in enc(), dec(), get_flag()
Ez, R0, R1 = enc(z)
Pz = dec(Ez, z, R1) # = D0(R0 ^ E0(0))
Dz = dec(z+z, z, z) # = D0(D1(E2(0))) || D0(D1(E2(0)) ^ E2(0))
flag3, flag_iv0, flag_iv1 = get_flag()
# Done with server.
# Find the key0 which makes D(R0 ^ E(0)) == Pz.
k0 = None
for x,k in tqdm(enc_to_key.items(), total=n_keys):
if ecb(k).decrypt(xor_bytes(x, R0)) == Pz:
k0 = k
break
assert k0
print("found key0:", k0)
# Now unwrap Dz and just look up key2 from table.
ciph0 = ecb(k0)
k2 = enc_to_key.get(xor_bytes(
ciph0.encrypt(Dz[:16]),
ciph0.encrypt(Dz[16:])), None)
assert k2
print("found key2:", k2)
flag2 = AES.new(k2, mode=AES.MODE_CFB, iv=flag_iv1, segment_size=8*16).decrypt(flag3)
# Now find the middle key that gives the flag.
print("finding k1...")
for k1 in tqdm(keys):
flag0 = ciph0.decrypt( AES.new(k1, mode=AES.MODE_CBC, iv=flag_iv0).decrypt(flag2) )
if b'zer0' in flag0:
print("found k1:", k1)
print("found flag:", flag0)
break
```

## Kantan Calc

No, I forgot one, but it kinda doesn‘t count. Sunday I went for my
usual walk across the ice, and feeling refreshed I opened this one.
`poiko`

had mentioned it briefly before, but I misunderstood the task
slightly and hadn‘t seen the source, so my suggestions at the time were
dead ends.

But now (with source) I got the right idea (or at least *a* right idea):

```
// 012345678901234567890123456789/* flag */})()
// ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
// });(function x() {return x
// });(()=>()=>{
// });u=(s)=>()=>[...''+s];u(()=>{
// });((s)=>()=>[...''+s])(()=>{
```

Again I‘m sort of “thinking“ in comments.

The first two return the string of the function but are blocked by the
app because the substring `zer0pts`

is blacklisted. That was a bit
annoying to circumvent because I know next to nothing about JS. With the
last two the idea is to convert the string to an array so it gets
`t,r,a,n,s,f,o,r,m,e,d`

and bypasses the blacklist. The last
one is below length limit and works.

Should have also checked the main site tho, because turns out the CTF was already over when I did this one.

fun fact: Blum-Blum-Shub was the first “strong“ RNGs I was properly exposed to, through the excellent

`calc`

command-line program.↩

# Misc/Zh3r0 2021

## General Comments {#misc2021}

`poiko`

has been sneaking a few problems to me on the side from various
CTFs. I didn‘t take many notes, don‘t remember most of them, but here
are some.

## Boring MT blah (Zh3r0)

Basic problem was as follows:

```
import numpy.random as random
def rand_32():
return int.from_bytes(os.urandom(4),'big')
for _ in range(2):
# hate to do it twice, but i dont want people bruteforcing it
random.seed(rand_32())
iv,key = random.bytes(16), random.bytes(16)
cipher = AES.new(key,iv=iv,mode=AES.MODE_CBC)
flag = iv+cipher.encrypt(flag)
```

So a 32-bit seed is used with `numpy.random.seed()`

, but we have to get
through two of them.

When I first looked at the code I was somewhat excited because newer
`numpy`

uses PCG64, which I think is a really cool RNG, and would be
quite interesting to reverse, however, the `seed()`

stuff just uses the
legacy implementation, which is a Mersenne Twister, basically equivalent
to Python‘s (though some differences in initialization).

Mersenne Twister is one of the more boring RNGs (in my opinion), but I‘m also biased by the fact that it has a lot of finickiness with implementation and it‘s so easy to get stuck on off-by-one bugs or stuff like that. Basically, its implementation is more of a hassle than the RNG is worth. It also has, in my opinion, a greatly disproportionate state size for its quality.

Anyway, there‘s two approaches to this problem. One, the “trve“ way, is to actually directly reverse the 32-bit seed from the 16 bytes of output you get for free. Now, I know that this can be done but it is also not trivial, because the MT implementations use a super-convoluted initailization, and you have to copy all this code and make sure you get all the indexing right and it‘s just a royal pain in the ass with all the potential bugs.

The second approach, the “I don‘t want to deal with this shit“
approach, is to note that you can get as many of these outputs as you
want. So you can build a database of `output-bytes -> seed`

or something
like that, and then just get enough outputs from the server until it‘s
very likely you get two matches. I think the probability calculation
might go like $1−(1−2_{32}d )_{h}$ (?) where $d$ is the number of
entries you put in the database and $h$ is the number of times you hit
up the server. So calculating ~200 million seeds (not too bad,
time-wise) while `nc`

-ing the server 400 times (also not too
bad, because there‘s no PoW), gives you a fair chance of being able to
decrypt both ciphers.

I went for the second approach, which was simple and braindead.

## A~52~ group (unknown CTF)

I remember one problem he described where you are to find two elements
in the alternating group $a,b∈A_{52}$ such that given any deck
shuffle (of 52 cards) you can give a sequence of
$ab_{2}ab_{13}a_{5}⋯$ which gives that shuffle. I don‘t know
the exact problem statement, or what category it was in (guessing `rev`

or `misc`

), but it was an interesting little problem for being something
I haven‘t seen before.

Basically what I came up with is to have one element (the big loop)
cycle the first 51 cards like `(0 1 2 … 49 50)`

and then another (the
little loop) which cycles the last three cards like `(49 50 51)`

. And
then you can think of it as a really bad sorting algorithm, where you
incrementally start to sort the cards in the big loop (`0..50`

) by using
the 51st place as a register or temporary variable. You shift the card
you want to move into position 51, then align the rest of the cards
properly, then shift it out again, rinse repeat until all the cards are
sorted (or in whatever order you want). There‘s a special case when the
card you want to move is attached to the front or back of the sequence
of already-sorted cards (and you want to move it to the other side) in
the “big loop,“ but it‘s easy to take care of.

## Unicode troll (unknown CTF)

There was another problem that appeared to be a RNG reverse, but was just a troll task.

You give a seed as hex, it has various conditions applied to it, then the digits in your seed is scrambled, and you are given two bytes of output in an xorshift64 stream and asked to predict the value. The relevant part was:

```
seed = input("give me the seed: ")
seed = seed.strip()
if(len(seed)) != SEEDS:
print("seed should be "+str(SEEDS)+" bytes long!")
exit()
seed = list(seed)
random.shuffle(seed)
counts = collections.Counter(seed)
if counts.most_common()[0][1] > 3:
print ("You can't use the same number more than 3 times!")
exit()
int16 = lambda x: int(x,16)
seed = list(map(int16,seed))
```

The suspicious part is where it does the `Counter()`

stuff to check for
repeated digits. I got a hunch and yes, sure enough:

```
>>> int('1٠۰߀०০੦૦୦௦౦೦൦෦๐໐༠၀႐០᠐᥆᧐᪀᪐᭐᮰᱀᱐꘠꣐꤀꧐꧰꩐꯰０𐒠𐴰𑁦𑃰𑄶𑇐𑋰𑑐𑓐𑙐𑛀𑜰𑣠𑱐')
100000000000000000000000000000000000000000000000000
```

So you can literally just feed it 0 as the seed and xorshift64 is trivially all zeros.

You can find these weird characters as follows:

```
''.join([chr(x) for x in range(256,0x110000) if unicodedata.category(chr(x)) == 'Nd' and unicodedata.digit(chr(x)) == 0])
```

Apparently the task (in whatever CTF it was) was marked with `crypto`

as
well, which is why I call it a troll.

## homebrew hash function (Zh3r0)

There was a hash function like

```
def hash(text:bytes):
text = pad(text)
text = [int.from_bytes(text[i:i+4],'big') for i in range(0,len(text),4)]
M = 0xffff
x,y,z,u = 0x0124fdce, 0x89ab57ea, 0xba89370a, 0xfedc45ef
A,B,C,D = 0x401ab257, 0xb7cd34e1, 0x76b3a27c, 0xf13c3adf
RV1,RV2,RV3,RV4 = 0xe12f23cd, 0xc5ab6789, 0xf1234567, 0x9a8bc7ef
for i in range(0,len(text),4):
X,Y,Z,U = text[i]^x,text[i+1]^y,text[i+2]^z,text[i+3]^u
RV1 ^= (x := (X&0xffff)*(M - (Y>>16)) ^ ROTL(Z,1) ^ ROTR(U,1) ^ A)
RV2 ^= (y := (Y&0xffff)*(M - (Z>>16)) ^ ROTL(U,2) ^ ROTR(X,2) ^ B)
RV3 ^= (z := (Z&0xffff)*(M - (U>>16)) ^ ROTL(X,3) ^ ROTR(Y,3) ^ C)
RV4 ^= (u := (U&0xffff)*(M - (X>>16)) ^ ROTL(Y,4) ^ ROTR(Z,4) ^ D)
for i in range(4):
RV1 ^= (x := (X&0xffff)*(M - (Y>>16)) ^ ROTL(Z,1) ^ ROTR(U,1) ^ A)
RV2 ^= (y := (Y&0xffff)*(M - (Z>>16)) ^ ROTL(U,2) ^ ROTR(X,2) ^ B)
RV3 ^= (z := (Z&0xffff)*(M - (U>>16)) ^ ROTL(X,3) ^ ROTR(Y,3) ^ C)
RV4 ^= (u := (U&0xffff)*(M - (X>>16)) ^ ROTL(Y,4) ^ ROTR(Z,4) ^ D)
return int.to_bytes( (RV1<<96)|(RV2<<64)|(RV3<<32)|RV4 ,16,'big')
```

And the goal was to find a collision. Stripping away all the trivially reversible fluff, what we want to attack is:

```
RV1 ^= (X&0xffff)*(M - (Y>>16)) ^ ROTL(Z,1) ^ ROTR(U,1)
RV2 ^= (Y&0xffff)*(M - (Z>>16)) ^ ROTL(U,2) ^ ROTR(X,2)
RV3 ^= (Z&0xffff)*(M - (U>>16)) ^ ROTL(X,3) ^ ROTR(Y,3)
RV4 ^= (U&0xffff)*(M - (X>>16)) ^ ROTL(Y,4) ^ ROTR(Z,4)
```

Specifically, the only thing that is nonlinear here is the
multiplication. I think you could approach this in several ways, like a
sort of backward search on each block, or try to find out how two blocks
can cancel each other out, but as a preliminary I put it into z3 first.
I had a suspicion that you can solve the above for several instances of
`X, Y, Z, U`

to get a collision in the very first 16 bytes, or at least
I couldn‘t see why it shouldn‘t be possible. But sure enough, z3 found
such an instance pretty quickly (fixing one variable to be anything to
ground it) and it was just done. Whatever else in the data doesn‘t
matter.

## almost-combinadics (Zh3r0)

The problem was:

```
def nk2n(nk):
l = len(nk)
if l == 1:
return nk[0]
elif l == 2:
i,j = nk
return ((i+j) * (i+j+1)) // 2 + j
return nk2n([nk2n(nk[:l-l//2]), nk2n(nk[l-l//2:])])
print(nk2n(flag))
#2597749519984520018193538914972744028780767067373210633843441892910830749749277631182596420937027368405416666234869030284255514216592219508067528406889067888675964979055810441575553504341722797908073355991646423732420612775191216409926513346494355434293682149298585
```

Right off the bat there was a strong “smell“ of some sort of “exotic
base conversion“ here, and I mentioned combinatorial number system to
`poiko`

. He pointed out that indeed, the case of `l==2`

above is
`choose(i+j+1,2)+j`

. That made it click for me even though the whole
combinatorial angle isn‘t really needed or relevant, the idea is just
that the `l==2`

step is easily reversible:

```
def rev(q):
# q == comb(i+j+1,2) + j
r = iroot(2*q,2)[0]
if comb(r+1,2) < q: r += 1
# r == i+j+1
j = q - comb(r,2)
i = r - j - 1
return (i,j)
```

So you can simply keep calling `rev(v)`

on all the numbers `>255`

until
all you have is bytes, and the flag comes out.

## approximate mastermind (Zh3r0)

This was a PPC task of solving Mastermind when the server sometimes
(with a low probability) lies in its responses. Full disclosure is I
didn‘t solve this one until after the CTF, so unfortunately no points
for `mode13h`

but it was interesting enough.

So my first idea was to do the classical solution, (because the number of pegs and colors were low in the challenge games,) which is to enumerate all possible solution and weed out ones that didn‘t match the constraints, and, if this leads to 0 possible solutions because the server lied at some point, to sort of backtrack and remove one of the constraints until it is consistent again. This obviously won‘t work if the server lies too much, but the hope was that I could just retry the game until the server lied few enough times, probabilistically.

I did this with `numpy`

to make it somewhat efficient, something like

```
def c_bulls(a,b):
return (a == b).sum(1, dtype=np.int8)
def c_cows(C,a,b):
b = tuple(b)
c = np.zeros((a.shape[0],), a.dtype)
for k in range(C):
bc = b.count(k)
if bc == 0:
continue
x = (a==k).sum(1, dtype=np.int8)
c += np.min(x, bc)
return c
```

(Here `a`

is a `k×P`

matrix of “potential solutions.“ Note also
`c_cows()`

include the bull count, which I find more intuitive, whereas
the server essentially uses `cows = c_cows() - c_bulls()`

.)

To select a next guess I sampled some of the potential solutions and found ones that would lead to the most other candidates being eliminated no matter the response (min-max). Anyway, it wasn‘t the fastest, and I ran into some weird (hidden) timeout issues and it kept failing. I don‘t know what the timeout on the server actually was, because it wasn‘t explicitly stated, which was annoying.

I then changed it to use a different approach, where I would generate a small set of “best candidates“ solutions (initially random) and just keep iteritatively randomizing them until reaching some local maximum where they would all give approximately the same answers as the answers I had gotten from the server, adding more random candidates that again get mutated down to “good candidates“ when previous ones end up identical. For each guess I just used the best candidate I had so far.

```
def local_mutate(self, ch):
tmp = list(ch)
f = self.score(ch)
order = list(range(self.P))
random.shuffle(order)
for i in order:
p, tmp[i] = tmp[i], random.randrange(self.N)
nf = self.score(tmp)
if nf < f:
f = nf
else:
tmp[i] = p
return f, tuple(tmp)
def score(self, cand):
"Score a guess based on previous guesses and responses"
f = 0.0
cntc = Counter(cand)
for g, r, w in self.history:
r_ = sum(x==y for x,y in zip(g,cand))
w_ = sum((Counter(g) & cntc).values())
f += self.bull_weight * abs(r_-r)
f += abs(w_-w)
return f
```

This is more of a genetic algorithm approach, kind of similar to something I‘ve done when finding a factorization where the primes had very specific bit patterns applied to them. It turned out this approach worked surprisingly well actually, and immediately gave the flag:

```
b'level passed, good job\n'
b'you earned it : zh3r0{wh3n_3asy_g4m3s_b3come_unnecessarily_challenging}\n'
```

It was also much faster than the full “brute force“ numpy solution, even though here I did everything in slow Python code.

# perfectblue CTF 2021

## General Comments {#pbctf2021}

I didn‘t play it, but I solved all the crypto tasks. Two of them I
solved for `poiko`

on Sunday while the competition was running.

They were pretty cool overall. Didn‘t look at any of the misc tasks,
the zip `poiko`

gave me just had these problems.

## Alkaloid Stream

`poiko`

had already solved this one during the actual CTF, I just
re-solved it out of own curiosity. Even “easy“ problems can be quite
enjoyable when made by good problem authors (which I know `rbtree`

to
be).

Anyway, the task: I don‘t recall the name used for this kind of system, but it‘s a discrimination problem, where we‘re trying to distinguish values that are are part of some set from fake ones. Here, the true values are linearly independent vectors in $F_{2}$ and the fakes are linear combinations from this basis. In a random order you get two numbers $(X,Y)$ where one is true and one is false.

In this specific problem none of this is important though, because it is trivially bugged. The fake values are generated from contiguous sums but the loops are bugged so the last fake value generated is always 0, allowing us to distinguish some true value $t_{i}$. The penultimate fake value will be $t_{i}$ itself, allowing us to find another $t_{j}$ and so forth.

The solve was about half an hour(?) from reading to flag.

## Steroid Stream

As above, but now the fake value $i$ is generated from some a random subset (of size $⌊3n ⌋$) from $t_{i},t_{i+1},t_{i+2},⋯$. However it is still bugged, because the last $⌊3n ⌋$ fake values are not generated at all, so they just default to 0. So immediately we have the tail end of the original true values. From there it‘s just a matter of iterating over the other values $(x,y)$ and checking if the rank of the matrix increases when we add one of these numbers but not the other (pretending the numbers represent a matrix over $F_{2}$).

```
# Code extract for solve. Much, much slower than it needs to be
# (at least O(n^4) in Python-ops, but I believe O(n^3) should be possible?)
# but fast enough for the task (couple of minutes).
mat = []
KEY = [0] * len(pks)
for i,xy in enumerate(pks):
x,y = sorted(xy)
if x != 0:
continue
mat.append(y)
KEY[i] = xy[0] == 0
r = bit_rank(mat)
while r < len(pks):
for i,xy in enumerate(pks):
if 0 in xy:
continue
n0 = bit_rank(mat + [xy[0]])
n1 = bit_rank(mat + [xy[1]])
if n0 == n1:
continue
which = n0 < n1
mat.append(xy[which])
xy[1-which] = 0
KEY[i] = which
r += 1
```

`bit_rank()`

above is just naive elimination from my `flagmining`

library:

```
lowbit = lambda x: x & -x
# ...
def bit_rank(ns):
"""Pretend the numbers represent a matrix over GF2 and calculate its rank.
"""
mat = list(ns)
r = 0
while mat:
if (pivot := mat.pop()) == 0:
continue
r += 1
lb = lowbit(pivot)
for i, row in enumerate(mat):
if row & lb:
mat[i] ^= pivot
return r
```

Task was much simpler than I expected, and took just under an hour from reading code to flag, according to my timestamps.

## GoodHash

A server accepts input strings that must strictly conform to being
printable ASCII, valid JSON, and for the JSON to contain the a property
like `.admin==True`

. This string is hashed by using the
output from `AES_GCM(key=<fixed>, data=bytes(32), nonce=<input>, auth_data=None)`

(both tag and ciphertext), and the hash
is compared against a known target value. The goal is to generate a
collision while adhering to the constrained input format.

The path to the solution was easy enough to “see“ after refreshing my memory on GCM, but I put off actually implementing it for a long time (until I felt guilty enough). I dislike GCM because there‘s always some non-trivial amount of fiddly frustration and off-by-1 bugs and the like, due to the unintuitive LSB/MSB schizophrenia it suffers from in actual implementation. So this one took several (too many) hours just for that reason alone.

Basically when the nonce is not 96 bits exactly, and it‘s the only variable factor, the “point of collision“ reduces to this part of the GCM:

```
h = AES_ECB(key, bytes(16))
cnt0 = GHASH(h, null_pad(IV) + (8*len(IV)).to_bytes(16, 'big'))
# cnt0 is then used for the CTR keystream, to mask the auth data, and so forth.
```

And `GHASH(h, B)`

works by evaluating the polynomial
$h_{n}B_{1}+h_{n−1}B_{2}+⋯+hB_{n}$ where $B_{i}$ is 16-byte block $i$ of the byte stream B,
and all byte-blocks are interpreted as values in $F_{2_{128}}$
under whatever random bit order who the hell knows.

I took the most naive way forward (there might be better ways): a nonce
like `{"admin":1,"x":"////BLOCK[A]////....BLOCK[B]...."}`

gives two completely free blocks I can customize. If $a,b,c,d,e$ are the
blocks generated in the input to `GHASH()`

from this nonce
(including the length block), then the task is to find $b,c$ of a
suitable form such that $bh+c=t=(ah_{5}+dh_{2}+eh)h_{−3}$ in
GCM‘s finite field.

To find these suitable $b,c$ I again took the dumb approach, since I figured the complexity was small enough and I just wanted it over with: generate random $b$ looking for values where either $bh$ or $bh+t$ has no bits in common with $128⌊2552_{128} ⌋$ (i.e. no high bits set in any byte). This finds candidates where the output is valid ASCII with 16-bit brute force. Then do a small local search combining these sets to produce a valid form $b$ and $c=bh+t$ where that ASCII also conforms to the allowable alphabet. Note that the this first null-set is quote-unquote constant-time because it can be generated as a table offline.

## Seed Me

A server running Java asks for a seed and requires that the
`(2048*k)`

-th `float`

output value for
$k∈[1,16]$ is larger than `~0.980155`

. It smelled like a
lattice problem.

Upon inspection (`poiko`

looked it up for me while I finished up
`GoodHash`

) Java uses a simple 48-bit LCG and floats are made
from the high 24 bits of the state. LCGs were my first love in computer
science topics and so has a special place in my heart.

I set up:

```
# [ -T0 -T1 -T2 ... -T15 B 0 ]
# [ a0 a1 a2 ... a15 0 1 ]
# [ M 0 0 .... 0 0 0 ]
# [ 0 M 0 .... 0 0 0 ]
# [ 0 0 M .... 0 0 0 ]
# [ . . . ]
```

… where $B≫2_{48}$, $M=2_{48}$ and the first 16 columns are also scaled by some similarly large constant.

The target values $T$ are constructed by seeking a sum close to the mid
point between lowest and highest acceptable value in the LCG range, i.e.
`(2**48 + int(2**48*7.331*0.1337))//2`

offset by the added constant at
that point in the LCG stream.

This finds several good candidates, seeds that pass 14 or 15 of the given checks but it doesn‘t find one that passes all 16. At this point I quietly complimented the problem author for constructing a task that wasn‘t defeated by the most naive cookie-cutter LLL approach. But noted also that the entropy of the constraints is extremely low, far exceeding the seed space, so:

a) we‘re likely looking for a very specific seed that the problem author has discovered or found somewhere, b) indicating also that there‘s alternative solutions involving pure integer programming, or by analyzing how Java‘s specific multiplier behaves in 2048-dimensional space, b) in particular, it‘s likely that the hyperplanes of the LCG are near-orthogonal to a specific axis in this space, and so lots of fun could be had there.

But still, it‘s usually not that hard to “convince“ LLL to find the answer you want if you know it‘s there. I saw two obvious ways forward: adjust the target to be slightly off-center, and/or use a non-uniform scale for the columns applying uneven weight. I applied these techniques more or less randomly and will admit to not really knowing much theory here (the “I have no idea what I‘m doing“ dog meme picture comes to mind), I just followed some basic instincts. But randomizing the weights instantly nudged LLL in the right direction and found a (the only?) seed that works.

Problem description to flag took around 1.5 hours so a pretty fast problem.

## Yet Another PRNG

This one was the most fun and enjoyable problem (in my view). It seemed
simple enough, it smelled of nuts (as in brain teasers), and I love
Euler hacking^{1}. It was decently challenging and rewarding. I am far
from certain I found the intended or most optimal solution.

The problem rephrased in conciser numpy code (numpy built into core CPython when?) is as follows:

```
# These are justified as nothing-up-my-sleeve numbers.
A = np.array([[4256, 307568, 162667],
[593111, 526598, 630723],
[383732, 73391, 955684]], object)
# OBS 1: these are NOT justified...
moduli = np.array([2**32 - 107, 2**32 - 5, 2**32 - 209], dtype=object)
M = 2**64 - 59
def gen(s):
# We maintain three separate streams (as rows in `s`) and for each iteration
# the next stream value is some linear combination of the three previous
# values with coefficients from rows in A.
#
# The streams are reduced with different moduli, and finally summed under a
# fourth modulus.
while True:
out = s[:,0] * [2*moduli[0], -moduli[2], -moduli[1]] # OBS 2: the order.
yield sum(out) % M # summed under a fourth modulus
n = (s*A).sum(1) % moduli # each stream gets its own modulus
s = np.c_[s[:,1:], n] # cycle state: shift out the most recent numbers, add new ones
# 9x 32-bit unknowns.
U = np.array([[random.getrandbits(32) for _ in range(3)] for _ in range(3)], dtype=object)
g = gen(U)
for _ in range(12):
print(next(g)) # output "hints"
# use next 12 outputs to encode flag.
```

I spent several wasteful hours going down misguided rabbit holes chasing
clever Euler hacks, which was unfruitful, but I had a lot of fun doing
it. I don‘t have a proper time estimate because I started preliminary
analysis before bed and then continued in the morning, but I would say I
used *at least* 5 hours on this problem, which I don‘t regret.

The final modulus does not seem like the difficult part, as it will only
be relevant half the time anyway, thus I decided early on to ignore it,
figuring that *if all else fails* it‘s a 12-bit brute force. The
problem lies in reasoning about these three values that have already
been reduced by different moduli by the time they‘re added using
further arithmetic…

My initial idea was to think of the original values in the unknown state
as faux-$Q$ values in order to unify the moduli calculation to
something under $(modm_{0}m_{1}m_{2})$ or similar, but I was probably just
being delusional. However during this I made two important observations:
the moduli were indeed very suspicious. For example $2m_{0}=m_{1}+m_{2}$,
which I assume is relevant, but I didn‘t find a *direct* way to exploit
it. I got the feeling there‘s some clever insight I missed here, like
“oh! but that means we can just think of the streams as blah blah in
linear algebra“ but my brain didn‘t deliver on the blah blah part.

Anyway, the second observation is how the multipliers switches order when giving the final sum, mimicking CRT. The output is (ignoring the $M$ modulus) $2m_{0}(x(modm_{0}))−m_{2}(y(modm_{1}))−m_{1}(z(modm_{2}))$ (to be fair: this is much clearer in the actual task than I made it seem in the numpy golf above), which is equivalent to $(2m_{0}x(modm_{0}))−(m_{2}y+m_{1}z(modm_{1}m_{2}))$, so that at least reduces it to two moduli.

And, but, aha! Now the moduli are twice the bit length, meaning that they‘re “applied“ less. Expanding, we have something like $2m_{0}(c_{0}x_{0}+c_{1}x_{1}+c_{2}x_{2})(modm_{0})$ for the first independent stream, where all the inner values are 32-bit, so the $k$ in $(⋯)−r=m_{0}k$ will also be around 32-bit.

$m_{0}$ and $m_{1}m_{2}$ are very close, their difference is only $102_{2}$, so then the idea is as follows: pretend the modulus is some $m_{′}$ between theese two values and then we have ourselves a good old lattice problem. The inaccuracy introduced in each constraint will be $≈32+g_{2}102_{2}−1$, but the modulus is ~64 bits, so we‘re still getting a handful of good bits of state per equation.

And a final note is that the first 3 outputs are just combinations of
the original random values, meaning they are known to be accurate up to
1 overflow, so they give a a ton of extra information. Likewise the
fourth output has higher accuracy than the rest due to the values in
`A`

being only 20 bits.

Now, factoring in the modulus $M$ I ignored earlier, I didn‘t find a very elegant way to do this, but brute forced it for the first four values, and simply ignored it for the remaining constraints since the error it introduces there is less than the error introduced by the pseudo-modulus $m_{′}$.

In the end I had some big, ugly lattice that is too ugly to reproduce here, but it succeeded in finding all the original 9 seed values given the correct factor of $M$ for the first four output values (so 16 lattices).

```
b'pbctf{Wow_how_did_you_solve_this?_I_thought_this_is_super_secure._Thank_you_for_solving_this!!!}'
```

## Yet Another RSA

When I first glanced at this problem I thought it was some weird elliptic curve thing and that the title was a troll. I immediately became very suspicious that it would be a Google-the-paper for one of those weird cryptosystems that academics and grad students pump out. “Diffie-Hellman over non-uniform smooth Titmann-Buttworth groups of semi-regular order“ and then the incestuous follow ups with “Analysis of …“ and “Attack against …“. (If I sound bitter it‘s only because I‘m jealous.)

So, OK, the problem…is to find the logarithm of a number under some scarily complicated group operation. All arithmetic is performed modulo some RSA-number $n=pq$. My initial glance quickly proved wrong, it definitely wasn‘t an elliptic curve. The group has apparent order $(p_{2}+p+1)(q_{2}+q+1)$ whose representation is given as two numbers in $Z_{n}∪None$ (or something like that, anyway). The primes used have the special form $a_{2}+3b_{2}$, the private exponent is suspiciously low, and so on. Tons and tons of red flags screaming “Google me.“

But first thing I did was to simplify. I looked at the case where the modulo was a single prime and tried (in vain) to reason about what the hell the group operation “did“ to points geometrically or visually by looking at easy stuff like $(1,2)∗(2,1),(−1,2)∗(1,2),(−2,1)∗(1,−2),etc$ and expressing the results in human-readable p-fractions (e.g. showing each coordinate as $dn (modp)$ when such $n,d$ of small absolute value can be found easily). It wasn‘t particularly enlightening.

I tried to Google-the-paper at this point but didn‘t find anything promising so I just started a Wikipedia-hole instead. I came across the projective linear group and duh, the rather obvious $p_{2}+p+1=p−1p_{3}−1 $ finally hit me. Thus I figured it was modelling some operation over the projective plane (thus all the “fractions“), and from the clue of $p_{3}−1$ I carefully re-examined the group operation while thinking about $F_{p_{3}}$ and yes indeed, it was modelling multiplication over ℤ~pq~[X]/(X^3^-2)! (Where $projective≈monic$, to really abuse math terms.)

I also wrote a quick utility class for playing with this group properly. (Here modified with stuff I discovered below.)

```
def mk_point(N, r):
display = lambda x: min(x-N, x, key=abs)
@dataclass(eq=False, unsafe_hash=True)
class _pt:
x: int
y: int
z: int = 1
def __iter__(self):
return iter((self.x, self.y, self.z))
def __eq__(self, other):
return tuple(self @ other.z) == tuple(other @ self.z)
def __add__(self, other):
px,py,pz = self
qx,qy,qz = other
return _pt((px*qx + (py*qz + pz*qy)*r) % N,
(px*qy + py*qx + r*pz*qz) % N,
(py*qy + px*qz + pz*qx) % N)
def __rmul__(self, n):
return self * n
def __mul__(self, n):
return generic_pow(_pt.__add__, _pt(1,0,0), self, n)
def pell(self):
x,y,z = self
return (x**3 + r*y**3 + r**2*z**3 - 3*r*x*y*z) % N # == 1
def __neg__(self):
return NotImplemented
def __matmul__(self, k):
return _pt(self.x*k % N, self.y*k % N, self.z*k % N)
def __repr__(self):
if self.z == 0:
if self.y == 0:
return '<INFPOINT>'
return f'<INFLINE {display(mod_inv(self.y, N)*self.x%N)}>'
iz = mod_inv(self.z, N)
return f'<{display(self.x*iz%N)} : {display(self.y*iz%N)}>'
return _pt
```

With this information I was able to Google-the-paper much better. I
spent a lot of distracted time on an interesting paper called *A Group
Law on the Projective Plane with Applications in Public Key
Cryptography* (2020), but it didn‘t go anywhere toward a solution on
this problem. But thinking about the special form of the primes, and
Pell‘s equation, I found *A novel RSA-like cryptosystem based on a
generalization of Redei rational functions* (2017) using cubic Pell.
Yup, there it was: everything.

Oh yeah, back there I was also trying to look for invariants to find the
curve in $A$ it was following, as I figured there would be
one(?). I checked all sorts of quadratic forms, some cubics, but never
found it. No wonder, because as the paper above the curve (cubic Pell)
for this particular instance turns out to be: $x_{3}+2y_{3}−6xy+3$.
Jesus. (To be fair, that does mean it‘s easy to find a point, namely
`(1,1)`

!)

```
Pt = mk_point(900397, 2)
P = Pt(1,1)
Q = P * 1337
assert P.pell() == Q.pell()
```

I mean it‘s cool…but for nothing?

This paper also makes bold claims about how resistant it is to various
classical attacks et cetera, but then the citations immediately leads to
another paper (*Classical Attacks on a Variant of the RSA Cryptosystem*)
with a rather dry counter:

They claimed that the classical small private attacks on RSA such as Wiener’s continued fraction attack do not apply to their scheme. In this paper, we show that, on the contrary, Wiener’s method as well as the small inverse problem technique of Boneh and Durfee can be applied to attack their scheme.

In the end it was super anticlimactic because the whole thing was a bit ugh. The small inverse attack of course turns out to just be simple algebra. Duh. I might have figured it out on my own, but due to all the rabbit holes above, all the wishful thinking about how there was something cool about the group, the mental fatigue set in and I didn‘t even bother looking at the plain small-$d$ algebra.

I‘m still convinced the special form of the primes leads to a backdoor in this cubic Pell group though. I mean, it has to, right? Like, why else? Why?

This task took the longest, like God knows how many hours, a day‘s amount of “work.“ But in the end didn‘t feel worth it.

```
b'pbctf{I_love_to_read_crypto_papers_\x04+\x81\xf4-Th)Gj2m\x95\xc7\xd5\xe9\x8cZ\xaa\xcei\xc8u\xb3\xc3\x95\x9f\xdep\xae4\xcb\x10\xbdo\xd5\x83\x9c\xca\x1b3\xdee\xef\x89y\x07w"^\x1ez\x96\xb1\x1a\xd2\x9d\xc6\xfd\x1b\x8e\x1fz\x97\xba \x00\xf7l\xd4Yv\xb0\xd8\xb8\x0e\xf4\x93\xa4\x9fB\x97\xab\xd3eD\xa8\xc9\xa7x\x90r'
b"and_implement_the_attacks_from_them}\xfb\x03\\\xdd\x9ch\x14\x89\x1d]\xfdf\xa8R\x81s\xf0\xbf\xfb\xa0\xe1\x90\xcfd\x82\xb4\xa5\x0b\x02\xc4r\x00wb|^\xd3\xf4\xb0N\xec\xf52\xe1\xb7\x9bF\x8dzW\xcbQ\xf3\xb7\xe7\x81N\x1e\\\xfb\x1c:\xbb'\x11\xadQ.\x8e [,\xdee\xd7\x86\x95\x1ff\x18\x16u\xe4\x95jPcn{\x9f"
```

Ehh.

(Edit/addendum: OK, after spoiling myself and reading other writeups
etc., it‘s possible the small-d stuff was the intention. I have a
theory: the problem author probably/might have just come across the
attack-paper above and thought it would be cool in and by itself, some
classics against something novel, but didn‘t consider that the solvers
would be so taken in by the cool group stuff, i.e. the novelty, that
then coming back down to *blah blah…Coppersmith…blah blah* in the
end would be a disappointment?)

What I call exploratory mathematics, or experimental mathematics, especially when involving classical number theory.↩

# ASIS 2021

Again I didn’t play, but here are the tasks `poiko`

gave me:

- Pinhole: easy due to the task being drunk.
- LagLeg: pretty cool to figure out, medium difficulty.
- Family: I just did some coding for fun on the final reconstruction.
- Spiritual: EC point counting puzzle, medium difficulty (or easy if you’re good at Googling the paper).
- DamaS: I couldn’t solve it :(, but it was interesting.

# Pinhole

An easy crypto. My friend `poiko`

asked me to have a look after his solve
failed. He spoiled me^{1} by outlining how he tried to solve it though… I
ended up doing it the exact same way^{2}, so it was a super quick solve, like
half an hour-ish. The trickiest part was loading back Sage’s `print()`

-ed output
back into Sage.

Upon opening the file you’re presented with the following:

```
def random_poly(degree):
R.<x> = ZZ[]
while True:
f = x**degree
for i in range(1, degree):
f += randint(-3, 3) * x ** (degree - i)
if f.degree() == degree:
return f
def genkey(a,d,r,s):
M, N = [SL2Z.random_element() for _ in '01']
A = N * matrix(ZZ, [[0, -1], [1, 1]]) * N**(-1)
B = N * matrix(ZZ, [[0, -1], [1, 0]]) * N**(-1)
# r, s = [randint(5, 14) for _ in '01']
U, V = (B * A) ** r, (B * A**2) ** s
F = []
for j in range(2):
Ux = [random_poly(d[j]) for _ in range(4)]
Ux = [Ux[i] - Ux[i](a) + U[i // 2][i % 2] for i in range(4)]
Ux = matrix([[Ux[0], Ux[1]], [Ux[2], Ux[3]]])
F.append(Ux)
X, Y = M * F[0] * M ** (-1), M * F[1] * M ** (-1)
pubkey, privkey = (X, Y), (M, a)
return pubkey, privkey
```

`genkey()`

is a royal mess. It almost looks like the author started with some
interesting idea, two non-similar matrices in $SL(Z,2)$, doing some
random base changes…but then he got drunk or had a panic attack or *something*,
because the `Ux`

stuff that is happening in the loop just looks like garbled
nonsense brayed by an idiot’s ass. He also forgot that `V`

ever existed.

So how was it easy? Well, it’s because of this:

```
def encrypt(msg, pubkey):
X, Y = pubkey
C = Y
for b in msg:
C *= X ** (int(b) + 1) * Y
return C
```

Meaning you can just ignore the drunk pseudo-math above and test which of $CY_{−1}X_{−2}$ and $CY_{−1}X_{−1}$ is integral for each bit…

Total code for my solve:

```
import re
# The hardest part.
monomial = re.compile(r'( - |-|)(\d+)(?:\*(x)(?:\^(\d+))?)?')
def extract_poly(R,s):
x = R.gen()
cur = R(0)
u = R(1)
for (pre,c,isx,e) in monomial.findall(s):
if e == '':
e = '1' if isx else '0'
cur += R(c) * (u << int(e)) * (-1 if '-' in pre else 1)
if e == '0':
yield cur
cur = R(0)
out = open('Pinhole/output.txt').read()
Px = PolynomialRing(ZZ,'x')
V = list(extract_poly(Px,out))
X = Matrix([V[:2],V[2:4]])
Y = Matrix([V[4:6],V[6:8]])
V = Matrix([V[8:10],V[10:12]])
def xy_bits(V,X,Y):
while V != Y:
V *= Y^-1
assert V.denominator() == 1
if (V * X^-2).denominator() == 1:
yield 1
V *= X^-2
else:
yield 0
V *= X^-1
assert V.denominator() == 1
print(bytes.fromhex(sum(2**i*x for i,x in enumerate(xy_bits(V,X,Y))).hex()))
```

# LagLeg

RSA-ish system with a custom exponentiation function.

The first thing I notice is probably that the prime generation is obviously very weak:

```
r = getRandomNBitInteger(nbit >> 1)
s = getRandomNBitInteger(nbit >> 3)
p, q = r**5 + s, s + r
if isPrime(p) and isPrime(q):
...
```

`nbit`

is originally 512, so `r`

becomes 256-bit and `s`

only 64-bit. But uhh,
checking the actual provided numbers in the given `output.txt`

did not match, it
would have to be a 128-bit `r`

and 64-bit `s`

. Ehh, OK, so the output wasn’t
generated by the script, nice.

But anyway, assuming the generation was still the same, it’s easy to factor.

For this you could use whatever stock Coppersmith implementation from off the
shelf, but where’s the fun in that? The numbers are actually weak enough that
you can factor it *by hand*, which is way more fun! We don’t even need to brute
force.

Pretend $s=0$ and get the first approximation of $r_{′}≈r$. This is really the upper bound of $r$. We could technically pretend $s=2_{31}+2_{30}$ but it’s not necessary. Let’s just keep it simple.

```
ri = iroot(n,6)[0]
```

Observe that this $r_{′}$ is also the approximation of $q_{′}$ (here a lower bound).

Observe what decreasing $r_{′}$ (i.e. approaching the true $r$) does to our approximate $q_{′}$:

```
sage: print( [n//(ri-k)**5 - n//(ri-k+1)**5 for k in range(10)] )
[5, 5, 5, 5, 5, 5, 5, 5, 5, 5]
```

What does it all mean, Clarice? What does it mean!?

Observe that $n/r_{′5}≈r_{′}+2$. I.e. the fraction is really close to this integer.

Observe what happens when we ignore the *least significant* $s$: $n=(r_{5}+s)(r+s)=r_{6}+sr_{5}$

Observe! $(n+k)_{6}=n_{6}+6kn_{5}+⋯$

I said observe!! $r_{6}+sr_{5}≈(r+6s )_{6}$

Phew! The error in our $r_{′}$ is actually related to the hidden $s$ by $r_{′}−r=⌊6s ⌋$. So let’s get real snug and tight to it.

```
sage: max(((ri-y)^5 * ((ri-y) + 2 + 6*y) - n).roots(RealField(600),False))
5.69553654999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999992804449695479085801566269134729154132243555279318347481e8
```

(The $2$ here is from the observation earlier, where the integer bound happens, as $s=2(mod6)$.)

So trying $r=r_{′}−569553655$ directly leads to factoring, for example: $q=⌊(r_{′}−569553655)_{5}n ⌋$.

Who needs stupid Sage scripts.

Anyway, the task, yeah. Factoring was the easy job, I just made it unnecessarily
difficult, and factoring doesn’t solve the task yet^{1} because the exponentiation
operation performed looked like this:

```
def lag(k, a, n):
s, t = 2, a
if k == 0:
return 2
r = 0
while k % 2 == 0:
r += 1
k //= 2
B = bin(k)[2:]
for b in B:
if b == '0':
t = (s * t - a) % n
s = (s **2 - 2) % n
else:
s = (s * t - a) % n
t = (t** 2 - 2) % n
for _ in range(r):
s = (s ** 2 - 2) % n
return s
```

And then the encryption something like:

```
r = getRandomRange(2, n)
enc = (lag(r, a, n), (lag(r, y, n) + lag(e, FLAG, n)) % n)
```

Where `a`

, `e`

, and `y`

is known.

Uh huh. I’m embarrassed to say it took me a good two or three hours of playing
with this operation to figure out that `lag()`

calculated Lucas numbers of the
second kind $V_{k}(a,1)(modn)$. The $2$ should have tipped me, but instead I
found out by reading about Chebyshev polynomials when I finally cowered before
OEIS. Really need to get better at this kind of math rev.

However the *most* embarrassing thing was that the task calculated “some” `d`

as
$1/e(mod(p_{2}−1)(q_{2}−1))$, and deriving `y`

from it… Which I somehow blocked
from my mind. At one point I also calculated this “mysterious” `d`

, just to
derive my own `y`

to double check the given `y`

. It wasn’t until *after* I’d
gone to the trouble identifying the sequence that I tried plugging this `d`

into
`lag()`

and found that it did indeed work just like a regular RSA inverse. Jesus
Christ. I don’t even.

Pretty cool overall, even if yet again the “cool” operation turns out not to really matter, but I learned some new tricks about the Lucas sequences at least.

Well, it actually does, but I didn’t know that.↩

# Family

This will be more of a programming write-up than a task write-up. I don’t even know what the task involved, since I only got to look at the very last step that involved a broken PNG file with some byte switched at seemingly random (but known) positions.

That in itself is not really a hard problem but it was a nice programming exercise.

`poiko`

did the actual task but he asked me to look at the last step which was
more up my alley. This was right before the CTF ended, just in case I already
had some code written for this kind of thing. I do, or could have sworn I did,
but didn’t find it at the time, so all I could try when he asked was to naively
decode the IDAT with `zlib`

and then do some simple backtracking based on its
error, which was basically what `poiko`

had done semi-manually already.

But this is an incredibly painful process, and would probably take well over an
hour even with mindless concentration. But why do something slowly in an hour,
when you can take even more time than that to write code that will *eventually*
do it quickly?

Next day I went back to find my old code, so I could incorporate it into flagmining in a less ad-hoc form.

So what I had was a stateless bit reader for DEFLATE streams, and on top of that I build a backtracking mechanism:

```
# Basic idea:
# v--------------v------------v--------------- uncertain bytes
# v---v-----v-------v--------------------v---------- block/symbol frames in lz
# Stream |---|-XX--|-------|--XX-----------XX-F-|------ ...
# ^------------------------------- we started reading here
# ^------------ we found a fault here
# ----- 2 ----------- 1 ------------------------------ we backtrack to 1, then 2, etc.
# We want to rewind to discrete frames in the LZ stream. Whenever we step over
# some uncertain bytes we try to change one of them (starting at highest
# position going down) to its next value. If we've exhausted the possibilities
# for that byte, reset it and forget about it, continuing down to the previous
# uncertain byte and/or LZ frame.
class zlib_fixer(deflate_decoder):
def __init__(self, cp, cd, *args, **kwargs):
"""This class specifically is for the Family task of ASIS 2021 where the zlib
stream contains 2-byte uncertainties.
`cp` is a list of byte-positions where stream bytes are uncertain. `cd` is a
corresponding list of choices for those bytes. They're assumed to be sorted
in order of position.
"""
super().__init__(*args, **kwargs)
self.stack = []
self.change_pos = cp
self.change_data = cd
self.zlib_header()
def recurse_fix(self):
self.stack.append(framestate(self.tell(), len(self.output),
(self.lit_table, self.dist_table, self.final)))
try:
return self.read_and_decode()
except ValueError:
pass
self.backtrack()
return self.recurse_fix()
def backtrack(self):
prev_pos = self.tell()
log.info(f"fault found at {prev_pos//8}:{prev_pos%8}")
b = bisect.bisect_left(self.change_pos, prev_pos//8)
while self.stack and b > 0:
cur_pos = prev_pos
prev_pos, out_len, tables = self.stack.pop()
if prev_pos > 8 * self.change_pos[b-1]:
log.debug(f"backtracking past {prev_pos//8}:{prev_pos%8} (next change at {self.change_pos[b-1]}+2)")
continue
b -= 1
if not self.bump_change(b):
log.debug(f"backtracking past {prev_pos//8}:{prev_pos%8} (change at {self.change_pos[b]}+2 was exhausted)")
continue
self.seek(prev_pos)
self.output = self.output[:out_len]
self.lit_table, self.dist_table, self.final = tables
log.info(f"backtracked to position {prev_pos//8}:{prev_pos%8}")
return
self.error("backtrack beyond start! can't recover stream!")
def bump_change(self, i):
alts = self.change_data[i]
self._fobj.seek(self.change_pos[i])
current = self._fobj.read(2)
assert current in alts
self._fobj.seek(self.change_pos[i])
j = alts.index(current) + 1
if j >= len(alts):
self._fobj.write(alts[0])
return False
else:
self._fobj.write(alts[j])
return True
def read_and_decode(self):
ret = super().read_and_decode()
if not self.output:
return ret
line_start = (len(self.output) - 1) // 2251 * 2251
if self.output[line_start] not in [0,1,2,3,4]:
self.error('invalid PNG filter')
if self.output[line_start] == 0 and set(self.output[line_start+1:]) & {3,5,7}:
self.error('a few random disallowed colors')
return ret
```

This turned out to work extremely well. It instantly finds many valid zlib stream (valid except for the CRC of course).

There’s about 150 positions in the file where the 2-byte 16-bit value at that position was one of two given options. (Apparently from some non-bijective encryption or encoding, I don’t know the details.) Which is a lot of potential streams overall.

So I added extra safeguards and caused it to backtrack also if the filter byte
starting each line of a PNG image was invalid. This produces perfectly valid
PNGs (again save the CRC) but they’re *still* garbled. Noting that only the
first and final colors in the palette were being used, I added even more
assumptions (these a bit more random and ad hoc, like banning 3 colors
entirely). Finally this adds enough checks for it to reconstruct the image
perfectly.

# Spiritual

I don’t have the exact text of the task, but you connect to a service and are told $∣E(Z_{p})∣$ for some given $p$. The curve itself is unknown (i.e. you don’t get the parameters $a,b$). The goal is to find out how many points is in the corresponding curve over $Z_{p_{k}}$ for a given $k$.

I played around in Sage with generating some curves and then looking at the
point count for $Z_{k}$ for $k=1,2,3,4,⋯$, relating it back
to the base curve $k=1$. I found that there was definitely a pattern, but I
couldn’t quite pinpoint it^{1}. At this point I turned to Google and found some
paper about point counting that mentioned Lucas sequences and slapped myself. It was
something like^{2}:
$∣E(F_{p_{k}})∣=p_{k}+1−V_{k}(p+1−∣E(F_{p})∣,p)$

Yeah, yeah, even though I solved LagLeg prior to this and

*was*able to figure it out there. In one ear, out the other, I don’t have much to say in my defense. Plus I confused myself by looking at direct polynomial relationships and patterns in their factors, without taking away the first and last terms.↩modulo any mistakes, I’m on the train without internet, so writing from memory currently.↩

# DamaS

I failed this task, tho it looked the most interesting.

For reference the task was set up like so:

```
# unknown
p, q = <random n-bit primes>
A = random_matrix(Zmod(p*q), 11)
f = rand_poly(11, p*q)
# known
N = p*q
e = <random>
B = f(A)
Q = B**pow(e, -1, (p-1)*(q-1))
```

Then `pow(flag, e, N)`

is encoded in some weird way into a matrix `S`

and you
are given `S*Q^r`

and `B^r`

for some unknown `r`

. I figured the only way forward
was to find `d`

or a factoring (equivalent).

I worked up all sorts of delusions involving the characteristic polynomials of the given matrices, thinking that if I could just find the right values to feed into the right polynomials some kind of magical algebra would take place and the factoring could take place.

There’s one eigenvalue of $B$ in the base ring $Z_{m}$ (find it by doing $g(B,Q_{e})$), but I was lost for how I might use it. I wasn’t able to find the corresponding eigenvalue of $Q=B_{d}$. And the others are likely roots of irreducible polynomials that I couldn’t find either. So was also lost for how to approach something like Jordan normal.

So many delusions… I desperately hoped there was a way to compose (or
otherwise combine) the characteristic polynomials^{1} for some magical fairy
math where I could factor the polynomials in $Z$. Tried to play around
with the companion matrices (Hermite form) and base changes. Or what if I could
find a `e`

-th root of 1, if I just— But maybe— What if—

I gave up when `poiko`

finally told me I shouldn’t waste my time and that the
solution was stupid (he was spoiled by looking at others discussing it).
*Apparently* the `e`

wasn’t random as outlined in the script, but rather a low
`d`

was picked so you could Coppersmith it…? What’s the point…

Edit: I double checked the above and indeed,

```
d = 1839320038472006359578228121964872958248984913931534334417556559320978533688828921
```

… which would be $≈$impossible if the numbers were actually generated by
the given script. Thus I feel this is a repeat of last year’s `congurence`

task.

All kinds of stupid party-tricks like $x−ec_{B} (B)$ and similar, where you’ll suddenly stumble across some surprise that give a jolt of

*I’m on to something here*until the inevitable*never mind, I’m just stupid*.↩

# HITCON 2021

Did all the non-pwn crypto.

One of the few CTFs I could be said to have actually participated in lately. That is, I did the tasks as the CTF was running and inputted the flags like a good boy.

The tasks (crypto) were a bit… I don’t know^{1}, like a five-minute video
stretched to the ten-minute mark, like your fourth-favorite noodle brand, like
showing up at your friend’s house because he said you could have his bookshelf,
but it turns out he didn’t mean the *black* one. I mean it wasn’t bad, just not
great, y’know. That center band where you feel like an asshole if you complain
about it. As far as I remember HITCON focuses more on *trve h4ck3r* stuff, as
evidenced by the fact that several of the crypto tasks were locked behind `pwn`

tasks. I’m grateful for any pure `crypto`

, really.

The `misc`

were also tied up to *trve hacker* or binary images and stuff, the
only “free” misc looked like web/forensics, so.

Two of the tasks (the

*magic*ones) were nearly identical, my solution worked for both, and I’m not sure what the “easier” solution was supposed to be.↩

# a little easy rsa

RSA, but `d = p`

. `d`

is also small, so you’re seemingly supposed to use some
kind of Coppersmith thing (the flag text indicates this as well), but I think
the author just messed up.

Full buffer for my solve, including the notes I write in comments as I tend to “think by typing” while working out the algebra or Euler hacks (two dead end thoughts and then the correct one):

```
n = 73105772487291349396254686006336120330504972930577005514215080357374112681944087577351379895224746578654018931799727417401425288595445982938270373091627341969888521509691373957711162258987876168607165960620322264335724067238920761042033944418867358083783317156429326797580005138985469248465425537931352359757
e = 4537482391838140758438394964043410950504913123892269886065999941390882950665896428937682918187777255481111874006714423664290939580653075257588603498124366669194458116324464062487897262881136123858890202346251370203490050314565294751740805575602781718282190046613532413038947173662685728922451632009556797931
c = 14558936777299241791239306943800914301296723857812043136710252309211457210786844069103093229876701608756952780774067174377636161903673229776614350695222134040119114881027349864098519027057618922872932074441000483969146246381640236171500856974180238934543370727793393492372475990330143750179123498797867932379
# e*p = k*phi + 1
# e*p = k*(n-p-q+1) + 1
# k*(q-1) - 1 = 0 (mod p)
# k == 1/(q-1) (mod p)
#
# e*p - 1 = k*phi
# e/n = k/p - k/n - k/p^2 + (k+1)/np
# e/n = (k*p-k)/p^2 - k/n + (k+1)/np
# e*p - 1 == k*phi
# e*n == q (mod phi)
# n == p+q-1 (mod phi)
# n-e*n == p-1 (mod phi)
print(bytes.fromhex(hex(pow(c,n-e*n+1,n))[2:]))
```

Solved in minutes.

# so easy rsa

RSA but $p,q$ are taken as two numbers close in a LCG sequence $x_{i}=ax_{i−1}+b(modm)$. You get $a,b,m$. Generous indeed.

First I thought it would be Coppersmith whatever, since you get some easy relations there, but as I was working out the algebra (again thinking in comments) I found you could find $p$ directly under $(modm)$:

```
# p (a^k p + (a^(k-1)+...)b == n (mod m)
# aa p^2 + bb p == n
# p^2 + bb/aa p == n/aa
# p^2 + 2*Q*p + Q^2 = ... + Q^2
# (p+Q)^2 = ...
```

I.e. an LCG is given directly by $x_{i}=a_{i+1}s+a−1a_{i}−1 b(modm)$. We may set $s=p$ and pretend that’s the start of the sequence. Then we get $q=a_{k+1}p+a−1a_{k}−1 b(modm)$ for some (very) small $k$. (Since both factors are generated by this LCG, we know $p,q<m$ so it’s fine to work under $(modm)$.)

Now, $n=p(a_{k+1}p+a−1a_{k}−1 b)(modm)$. And naming these cumbersome coefficients $A$ and $B$ we get $n=Ap_{2}+Bp(modm)$ and we do some algebra to complete the square $An +(2AB )_{2}=(p+2AB )_{2}(modm)$. Testing a few values of $k$ and using standard modular square root leads to the factors.

Took about an hour or so.

# so easy but not rsa

This was my favorite problem, the only one I really liked, even though it was painful to implement. I’ll post the full task code because I thought it was a little clever.

```
from Crypto.Util.number import bytes_to_long as b2l
from Crypto.Util.number import long_to_bytes as l2b
import random
load('secretkey.sage')
Zx.<x> = ZZ[]
def convolution(f,g):
return (f * g) % (x^n-1)
def balancedmod(f,q):
g = list(((f[i] + q//2) % q) - q//2 for i in range(n))
return Zx(g) % (x^n-1)
def randomdpoly(d1, d2):
result = d1*[1]+d2*[-1]+(n-d1-d2)*[0]
random.shuffle(result)
return Zx(result)
def invertmodprime(f,p):
T = Zx.change_ring(Integers(p)).quotient(x^n-1)
return Zx(lift(1 / T(f)))
def invertmodpowerof2(f,q):
assert q.is_power_of(2)
g = invertmodprime(f,2)
while True:
r = balancedmod(convolution(g,f),q)
if r == 1: return g
g = balancedmod(convolution(g,2 - r),q)
def keypair():
while True:
try:
f = randomdpoly(61, 60)
f3 = invertmodprime(f,3)
fq = invertmodpowerof2(f,q)
break
except Exception as e:
pass
g = randomdpoly(20, 20)
publickey = balancedmod(3 * convolution(fq,g),q)
secretkey = f
return publickey, secretkey, g
def encode(val):
poly = 0
for i in range(n):
poly += ((val%3)-1) * (x^i)
val //= 3
return poly
def encrypt(message, publickey):
r = randomdpoly(18, 18)
return balancedmod(convolution(publickey,r) + encode(message), q)
n, q = 263, 128
publickey, _, _ = key # key = keypair()
flag = b2l(open('flag', 'rb').read())
print(encrypt(flag, publickey))
print(publickey)
# generate a lots of random data
data = [ random.getrandbits(240) for _ in range(200)]
for random_msg in data:
print(l2b(random_msg).hex(), encrypt(random_msg, publickey))
```

Gawd, it’s *painfully obvious* to me *now*, but it took me a good two hours or
more before I realized…

For context it’s probably important to note that I didn’t know the cryptosystem (NTRU) as I haven’t been exposed to it before, just heard of it. I didn’t actually learn about NTRU until the next day. At first I had my hands full just exploring (in the REPL) what these strange polynomials were and how they behaved to get a feel for it.

Then I moved on to trying to think of ways for recovering the secret key, some
kind of incremental brute force or lattice…? Finally I wanted to incorporate
all this extra data we get, the random encryptions. Only then did it dawn on me
they were absolutely useless as they didn’t involve the secret key, they’d be
equivalent to any data I could generate myself. (If the author wanted to be
*really* sadistic, he’d involve the secret key here somehow, but without any
path to recovery, and I’d probably be stuck for way longer.)

Of course. All the polynomials and whatever is irrelevant, it’s just Mersenne untwisting.

So the attack is to recover the MT state from the 240-bit random numbers you get directly from its output. (Each one discards a 16-bit halfword but it’s easily recovered from future numbers.) Once I knew this I actually put it on hold and did other tasks, because I’d have to go digging to recover bits and pieces of code I’ve written before for MT reversing, as it tends to be very bug-prone.

Coming back to it though, there’s the interesting concept of reversing the
shuffle in `randomdpoly()`

. I’ve considered making problems involving
`random.shuffle()`

before, because it is not injective and discards a
nondeterministic amount of information. However I found it was a lot easier than
I had previously thought, especially just a single large shuffle like this (I
wanted to make a task that involved reversing a lot of small shuffles) since you
can sort of bisect the range of numbers it will consume and the correct shuffle
was found easily.

I liked this one, cause it almost had me fooled but I still figured it out, so I got that trace amount of dopamine.

# still not rsa

Same-ish problem as in so easy but not rsa but now it’s time to deal with the actual cryptosystem. See that task for code, though the $n$ parameter is smaller, and now the goal is to decrypt a ciphertext where the AES key is the secret polynomial.

Unfortunately it also had a ton of problems and was updated like three times. It seemed unsolvable when I first looked at it (you didn’t even get the IV for the AES cipher), so I left it for last.

Like I mentioned in that task I didn’t really know it was NTRU, and I spent
several hours figuring out the basics of how the system worked on my own. It was
the last problem and I didn’t really have anything to do after, I had all the
time in the world, so this was just me playing around, and I’m usually not a fan
of Googling. I found you could overflow the “outer” $(modq)$ when decrypting,
and the resulting polynomial gave really short polynomials when multiplied with
the secret key. Cue a lot of pen&paper-work and I figured this is related to
monomials that the public key (or really its `g`

component) shared with the
secret key, and it happens when the coefficient goes outside the $(−q/2,q/2)$
range when two such monomials are summed, but stays within it when this doesn’t
happen.

Although short, these polynomials were still too long to brute force. (10-20 terms?) At this
point I also tried several lattice approaches. Large lattices seemed untenable,
although with “narrow” lattices I had some success in recovering parts of `g`

if
I could guess some of the zeroed out coefficient. It seemed very error prone and
didn’t feel confident it would work efficiently for the full key. So this
problem took up most of my time and I probably spent a full day on it due to
this stubbornness.

Finally I did end up Googling, and that’s when I wound it was NTRU, and I learned various attacks, such as the one I sort of stumbled into above, but worked out to a much more sophisticated degree. I.e. you can provide a polynomial like $3c(x_{i_{a}}+x_{i_{b}}+⋯+hx_{i_{x}}+hx_{i_{y}}+⋯)$ for decryption and they will all give short polynomials, and it seems the more you layer this the smaller the overlap area — $c$ will likewise have to be adjusted so we’re only “targetting” an area where more monomials overlap — so you end up getting back very short polynomials indeed — monomials or binomials. For this you can work in $Z_{p}[X]/(X_{n}−1)$ and try dividing mono- or binomials by the polynomial in a brute force fashion and sometimes (rather often) the secret key pops out.

This is what I did.

```
n, q = 167, 128
R, iv, flag_enc, pk = connect()
h = bq(pk*43)
Zx.<x> = ZZ[]
Zq.<z> = Zmod(q)[]
Z3.<t> = Zmod(3)[]
ZR.<r> = Z3.quotient(t^n - 1)
poly_targets = [ZR(sum(x^i for i in es)) for es in combos(range(n), [1,2])]
C = convolution
bq = lambda s: balancedmod(C(s,1),q)
b3 = lambda s: balancedmod(C(s,1),3)
def test(k, send):
ct = decode( 24*(h*x^k + h + 1) )
m = send(ct)
zm = ZR(m)
if not zm.is_unit():
print(k, 'skipping non-unit')
return None
zm = 1/zm
for p in poly_targets:
_f = test_secret(p*zm)
if _f is not None:
return _f
def test_secret(ss):
cc = Counter(ss)
if cc[1] == 60 and cc[2] == 61:
ss = -ss
elif cc[1] != 61 or cc[2] != 60:
return None
f = b3(Zx(ss.lift()))
cc = Counter( balancedmod(C(pk, f), q) )
if cc[3] != 15 or cc[-3] != 15:
print("rejected candidate!", cc)
return None
return f
for k in range(2, n):
print(k, "new round")
if (_sk := test(k,lambda ss: exchange(R,ss))) is not None:
print(k, "secret key found: ", _sk)
break
```

# magic rsa

Almost identical to magic dlog, and I only solved *that*
problem. The solution just works on this one too, for free, so it’s like two
flags, one problem. So see that problem.

This problem is supposedly easier(?), but I’m not sure what the easier solution is. Maybe some clever trick I missed, but it just seemed even more of a hassle to do the discrete log under a composite modulus than a prime with smooth phi.

# magic dlog

A server asks you to input a modulus $p$ (here required to be prime, but relaxed to any number in magic rsa), an exponent $e$, and then a number $x$ such that $x_{e}=sha384(x)(modp)$.

So the obvious approach is to make $p−1$ smooth such that we can construct $e$ after fixing $x$ (and consequently $sha384(x)$).

There’s one more caveat, in that the upper bits of $p$ has to match some given
(random) ~136-bit pattern $m$. I’m not sure of any direct construction of smooth
numbers on a form like $2_{248}m+k$, but since $m$ is “small” we can hope
that it itself^{1} is smooth-ish and just multiply it with smooth numbers that is
“almost” $2_{248}$. I chose to find the acceptably smooth numbers on the form
$2_{k}−1$ (which was easy as these “Mersenne numbers” tend to be rich in
divisors, especially for composite exponents) and then just pick numbers from
this list with exponents that summed to 248. The $m$-factor will need to be
adjusted upward to make the upper bits match. In the end we have a lot of
candidates $p=1+(m+ϵ)(2_{k_{0}}−1)(2_{k_{1}}−1)⋯$ and once
we find such a prime we’re all set.

Pick some primitive root $x$, compute $sha384(x)$ and do a discrete log using all the factors of $p−1$.

A pretty good Euler hack problem overall, I thought.

Or really the adjusted $m+ϵ$ that we’ll end up actually using.↩

# SECCON 2021

Seemed like a really good 24-hour CTF. But only about half the problems listed
here was actually solved *during* the CTF.

Unfortunately it had already been running for 12 hours when `poiko`

and I signed
up on Saturday evening. Which was unfortunate because there was a ton of the
kind of goodies I like. (I.e. misc/crypto/puzzles.) So much so that I continued
to solve the remaining ones I had downloaded even after the CTF ended (6am
Sunday morning for me).

- ppp
- ooo
- CCC
- XXX
- qchecker
- cerberus
- hitchhike
- Sign Wars
- s/<script>//gi
- sed programming
- case-insensitive
`(FAILED)`

# ppp

Trivial welcome crypto.

```
m = [[p,p,p,p], [0,m1,m1,m1], [0,0,m2,m2],[0,0,0,1]]
# add padding
for i in range(4):
for j in range(4):
m[i][j] *= getPrime(768)
m = matrix(Zmod(p*q), m)
c = m^e
print("n =", n)
print("e =", e)
print("c =", list(c))
```

The determinant of the resulting matrix is $(p⋅m1⋅m2)_{e}(modn)$, so `gcd`

factors `n`

, and then the multiplicative
order of 4×4 matrices over $Z_{p}$ I think always divides $∏_{i=1}p_{i+1}−1$ (TODO: look it up) and so with that the approach is the same as if it was
regular RSA.

# Sign Wars

The most painful task unless you have a lot of ready-made code for it. And on a 24-hour CTF this packed with tasks it’s either a tooling-check or a number-of-team-memebers-check.

Task is basically:

```
from secret import msg1, msg2, flag
# P-384 curve
G = ...
order = ...
flag = pad(flag, 96)
flag1 = flag[:48]
flag2 = flag[48:]
for b in msg1:
assert b >= 0x20 and b <= 0x7f
z1 = bytes_to_long(msg1)
assert z1 < 2^128
for b in msg2:
assert b >= 0x20 and b <= 0x7f
z2 = bytes_to_long(msg2)
assert z2 < 2^384
# prequel trilogy
def sign_prequel():
d = bytes_to_long(flag1)
sigs = []
for _ in range(80):
# normal ECDSA. all bits of k are unknown.
k1 = random.getrandbits(128)
k2 = z1
k3 = random.getrandbits(128)
k = (k3 << 256) + (k2 << 128) + k1
kG = k*G
r, _ = kG.xy()
r = Z_n(r)
k = Z_n(k)
s = (z1 + r*d) / k
sigs.append((r,s))
return sigs
# original trilogy
def sign_original():
d = bytes_to_long(flag2)
sigs = []
for _ in range(3):
# normal ECDSA
k = random.getrandbits(384)
kG = k*G
r, _ = kG.xy()
r = Z_n(r)
k = Z_n(k)
s = (z2 + r*d) / k
sigs.append((r,s))
return sigs
sigs1 = sign_prequel()
print(sigs1)
sigs2 = sign_original()
print(sigs2)
```

We get 80 signatures with some fuckery going on with the nonce, which smells of lattices already.

I did actually attempt this problem when I woke up in the wee morning before the
CTF ended. I “woke up with the solution,” as sometimes happens, but it turns out
this is far from infallible… The dreams had corrupted the problem in my mind,
because I fully believed we got 80 samples of both types of signatures, and that
the problem was some sort of Mersenne Twister thing where we could align a lot
of samples just-so to induce a bias in the bits of the `k`

s, (approximating
$Z$-linearity with something $F_{2}[X]$-linear, somehow, magically,) and
“there’s our lattice…” Unfortunately the dream didn’t include the details,
just the idea. “But man,” I thought, “this seems like a really tough problem,
doesn’t MT’s output tempering kind of fuck us? How on Earth… Is it
Google-the-paper?” But once I realized my mistake and that everything I thought
was wrong and that I had wasted a good and precious time on being an idiot, I
was like ah fuck me, and panic-switched to other, less time-intensive problems.

Ahh. But *now*, dear hearts, without the time pressure, we can remain calm and
collected and *pretend to be as clever as we want*. We can use words like “trivial”
and “easy” to mask our crippling insecurity and self-doubt, because the only
thing that is certain is our own stupidity and failings^{1}.

Anyway, back to this *easy* problem. *So trivial*, pah, hardly worth the words I
waste on it.

There’s two twists on the standard partially-revealed-nonce setup:

we don’t know the “hash” (

`z1`

). This is fine though, because it’s given to be small, so we can just set it up as an extra row in our lattice.it’s not known-prefix nor known-suffix, but

*known-middle-bits*(kinda), which, from my experience, is a real pain in the ass for lattice solves, because now we kind of want to do some Coppersmith polynomial shit instead, and ugh.

*However* — here’s the clever trick — because the order of the group is close
to a power of two, we can *overflow* the nonce and still end up with a “small”
number.

```
>>> (random.getrandbits(128)*2**384 % order).bit_length()
318
```

Now, *there’s* our lattice.

I ended up with something like this:

```
def matrix_for(sigs, scale=(1,1,1)):
B = Z_n(2)^128
N = matrix.identity(len(sigs)) * order
S = matrix(ZZ, [
[B*(1-B*s)/Z_n(s),
B*r/Z_n(s),]
for r,s in sigs]).T
V = matrix.diagonal([2^scale[2], 2^scale[1]])
Z = matrix.zero(len(sigs), 2)
return matrix.block([
[S*2^scale[0], V],
[N*2^scale[0], Z],
], subdivide=False)
scale_d = 0
scale_k = 384-318
scale_z = 384-128
M = matrix_for(sp, (scale_k, scale_d, scale_z))
```

And it works and we get…

```
>>> any_to_bytes(v[-1])
b'SECCON{New_STARWARS_Spin-Off_The_Book_Of_Boba_Fe'
```

… *half* the flag? Ah fuck, right, I forgot.

So *now* it’s actually a MT reverse problem, but it’s an easy one and just a
tooling-check because we recover the `k`

s from the first signature set, which is
2 more than needed for Python’s MT’s full state. There’s no tricks or gotchas,
just plug-and-play:

```
b'tt_Will_Premiere_On_December_29-107c360aab}\x05\x05\x05\x05\x05'
```

half of you, dear non-existent reader, will have no idea what I’m talking about: you also have no idea how much I envy you.↩

# ooo

Cute:

```
message = b""
for _ in range(128):
message += b"o" if random.getrandbits(1) == 1 else b"O"
M = getPrime(len(message) * 5)
S = bytes_to_long(message) % M
print("M =", M)
print('S =', S)
print('MESSAGE =', message.upper().decode("utf-8"))
```

The goal is to recover the message with original case. Which is a subset sum problem $(modM)$ which I solved using a lattice, though there’s likely other ways to do it that are less sledgehammery.

```
# S from server
S -= any_to_int(b'O'*128)
C = matrix([[-S % M] + [-1]*128])
S = matrix([[32 * 2^(i*8) % M] for i in range(128)])
MOD = matrix([[M] + [0]*128])
M = C.stack(S.augment(2 * matrix.identity(ZZ, 128))).stack(MOD).LLL()
# find row vector with `v[0] == 0 and set(v[1:]) == {-1,1}`
```

Something like that, kind of translating that into Sage from memory because I solved it in REPL using my own matrix-ergonomics stuff.

# ccc

Yet another problem of RSA with weirdly generated primes.

```
def create_prime(p_bit_len, add_bit_len, a):
p = getPrime(p_bit_len)
p_bit_len2 = 2*p_bit_len // 3 + add_bit_len
while True:
b = getRandomInteger(p_bit_len2)
_p = a * p
q = _p**2 + 3*_p*b + 3*b**2
if isPrime(q):
return p, q
```

Here, `p_bit_len=1024`

, `add_bit_len=9`

, and `a=23`

, (so `p_bit_len2 = 691`

)
though I think these parameters are pretty arbitrary.

This problem I solved pretty much just rambling algebra as comments in the “solve script” (that ended up just empty save for the comments since again I solved it in REPL).

```
# q = _p**2 + 3*_p*b + 3*b**2
# P = 23p
# Pq =P^3 3P^2b + 3Pb^3
# + b^3 = (P+b)^3
# 23n + b^3 == (P+b)^3
```

So, we have $23n+b_{3}=(23p+b)_{3}$, and, by intuition, because $b∼2_{691}≪2_{1024}∼p$, we see $323n $ will be *extremely* close
to $23p+b$. How close? Well, if we set

$3x_{3}+b_{3} =x+ϵ$

$x_{3}+b_{3}=x_{3}+3x_{2}ϵ+3xϵ_{2}+ϵ_{3}$

$b_{3}=3x_{2}ϵ+3xϵ_{2}+ϵ_{3}$

Under the assumption $ϵ_{2}≪x$,

$b_{3}≈3x_{2}ϵ+O(x)$

$3x_{2}b_{3} ≈ϵ+O(1/x)$

and from what we know about the parameters, $ϵ$ is expected to be less than $3⋅(691−ε)−2⋅(1024−ε+g_{2}23)≲16$ bits, plus/minus whatever errors I’ve made in this napkin math. But either way: small.

So we find the first cube `first_st(lambda k: is_cube((est+k)**3 - 23*n))`

, which will be $b$ and then we factor and done.

# xxx

We get a prime `p`

with bit-length approximately 2.5 that of the flag and 6
$(a_{i},b_{i})$ pairs such that
$(flag,y_{i})$ is a point on the
elliptic curve $y_{2}=x_{3}+a_{i}x+b_{i}(modp)$.

The crux and main thing to notice is that the $y_{i}$ coordinates for the flag-point are suspiciously generated as a random number less than the flag itself, thus it is “small” compared to the modulus.

So, probably a lattice problem. Subtracting pairs of the elliptic curve equations, we get a bunch of equations like

$y_{i}−y_{j}=(a_{i}−a_{j})flag+(b_{i}−b_{j})(modp)$

or,

$(a_{i}−a_{j})flag+(b_{i}−b_{j})=Y_{ij}(modp)$

where $Y_{ij}<p_{54}$, but we don’t worry our pretty little heads about bounds, we just plug it in and pray that it’s enough to recover flag and indeed it is and we’re done.

Padding-oracle problem for, uhh, AES in “propagating cipher block chaining” mode. (Who comes up with these anyway?)

Server provides us with encrypted flag:

```
# The flag is padded with 16 bytes prefix
# flag = padding (16 bytes) + "SECCON{..."
ref_iv, ref_c = encrypt(flag)
print("I teach you a spell! repeat after me!")
print(base64.b64encode(ref_iv + ref_c).decode("utf-8"))
```

We can attempt to decrypt stuff on the form `ref_c || <arbitrary>`

and we’ll get
a message whether the padding is correct or not. Server uses a random key and we
don’t get any other encrypted samples.

Playing around with this (unfamiliar) cipher, we observe what the Wikpedia page mentions, namely that swapping the order of two blocks doesn’t affect the subsequent blocks. But actually it’s more than that:

```
>>> dec(en[:3] + en[:3][::-1] + en[:3]) # we can reverse
[b'firstfirstfirstf', b'secondsecondseco', b'thirdthirdthirdt', b'/\xf7\x1cu\x8b\x90\xefi\xde\xac\x91\xa2\x9c\x85\xaa\xa9', b'x>\xcb,\x80\xf4\xa6\x0b\xbb\x8d\x8e\xcbO\xce\tW', b'\x95K!\xce-,\xc0\xc6h\xf7R\x96\xfcN\xdc\x91', b'firstfirstfirstf', b'secondsecondseco', b'thirdthirdthirdt']
>>> dec(en[:5] + rot(en[:5], 2) + en[:5]) # we can rotate...
[b'firstfirstfirstf', b'secondsecondseco', b'thirdthirdthirdt', b'carrotscarrotsca', b'applesapplesappl', b'\x91[\xe1\xf4\xaf#\x060\xf8\xb9iu\x1d#\xfc\x8e', b'\x86R\xfa\xf4\xa4#\x1d:\xeb\xafor\x00"\xfb\x9b', b'\x84C\xf8\xea\xae\x0f)\xfa\xb1xn\x15!\xe8\x96', b'\x9e\x10\x89\x8d\xc3\xbc\x15\xa8\xb0\x15\xb29\xc0\xe5\xb6\xa9', b'\x8b\x1c\x98\x91\xd9\xbe\x0f\xbf\xa0\x0e\xba4\xc1\xf3\xa1\xa0', b'firstfirstfirstf', b'secondsecondseco', b'thirdthirdthirdt', b'carrotscarrotsca', b'applesapplesappl']
>>> dec(en + shuffle(en) + en) # oh, any shuffle
[b'firstfirstfirstf', b'secondsecondseco', b'thirdthirdthirdt', b'carrotscarrotsca', b'applesapplesappl', b'orangesorangesor', b'\x8fS\x9f(L\xc68\x15\xf2X\xd2\xd9\x8eo\x16\xb5', b"l8\x93f'\xe5\xf3(\x017\x03x\x97\x80\x05\x91", b'\xd0\x85l\xf9\tQ\x08b6<\xec\xb3\x03%@\xbb', b'\xc8\xdb\xe8.\xaa#\xbf\x1fO\x9e7\x10m\x8e\xd4u', b'\xac\xb46e\x00:\x07\x06Nq\r\xa0.\xc0\x91', b'\x95K!\xce-,\xc0\xc6h\xf7R\x96\xfcN\xdc\x91', b'firstfirstfirstf', b'secondsecondseco', b'thirdthirdthirdt', b'carrotscarrotsca', b'applesapplesappl', b'orangesorangesor']
```

With this task I tried a new DSL-like approach rather than confusing myself with endless xor-algebra on paper. Doing block cipher algebra on paper always breaks my brain. I was tired and this way I don’t have to think, alright…

```
>>> rev_dec(en + shuffle(en[:3]) + en[:2])
'pp(0,) pp(1,) pp(2,) pp(3,) pp(4,) pp(5,) pp(0, 5)^en(5,) pp(1, 5)^en(5,) pp(2, 5)^en(5,) pp(0, 2, 5)^en(2, 5) pp(1, 2, 5)^en(2, 5)'
>>> rev_dec(en + en)
'pp(0,) pp(1,) pp(2,) pp(3,) pp(4,) pp(5,) pp(0, 5)^en(5,) pp(1, 5)^en(5,) pp(2, 5)^en(5,) pp(3, 5)^en(5,) pp(4, 5)^en(5,) en(5,)'
```

And with this we find the way to set up so we can use the oracle:

```
>>> rev_dec(en + [byte(1)*16, byte(1)*16])
'pp(0,) pp(1,) pp(2,) pp(3,) pp(4,) pp(5,) ??? 1^pp(5,)^en(5,)'
>>> rev_dec(en + en + en[:4] + [xor_bytes(en[3], byte(1)*16), xor_bytes(en[3], byte(1)*16)])
'pp(0,) pp(1,) pp(2,) pp(3,) pp(4,) pp(5,) pp(0, 5)^en(5,) pp(1, 5)^en(5,) pp(2, 5)^en(5,) pp(3, 5)^en(5,) pp(4, 5)^en(5,) en(5,) pp(0,) pp(1,) pp(2,) pp(3,) ??? 1^pp(3,)'
```

So we can do it byte-by-byte and select aribtrary blocks. That’s a lot of requests, and maybe there’s a better way to do it, but eh…

# s/<script>//gi

We get a big file like

```
SE<<s<SCRIpT>crIPT>S<Sc<SCRIPT>rIpt>Cr<S<SCriPt<sCRipt>>C<ScRIpt>RiPT>Ipt<scrI<<
SCrIpt>scriPt<ScRi<sCriPt>pt><SCripT>>p<script>T>>CCON{sani<sCRIpT><<ScrIP<s<scR
I<SCripT>pT>cri<ScRIpT>pt>T><ScRip<<SCrIpT><sCriPt>Sc<ScRiPT><Script>ri<ScRIpT>P
t><s<sCrIPT>CrIpT>T>sCRIPt><Sc<ScrIpt>r<sCriPt>Ipt<sCRipt>><Scr<s<ScRiPt>CrIpt<S
<scrIPT<sCRiPt>><SCrIPt>c<sC<<SCRipt>ScriPt>RIPt>ri<ScRIpT>pt<SCRIpT>>><ScRipT<s
CriPT><Sc<scR<scRIpt>IPT><scRIpT><ScRIpt><SCript>RI<ScrI<SCrIP<ScrIpT>T>Pt>Pt>>I
<ScRIPt>PT<ScRi<sCriPt><SCrIpT>pt>><s<scRipt>CriPt>tiz<S<sCRip<S<sCRIpT><ScrIp<s
CrIPt>T>CrI<sCRiPT>Pt><scRipt>t><sc<<scRipt>sC<s<ScrIpT>cRIpT><sCRipt>r<ScRIpT>I
<Scr<ScRIPt>i<SCRIpt><ScR<s<SCRipt>CrIpt>iPT>P<scRIpT>t>pT<sC<SCripT>RiPT>><scri
pT>RipT<sCRI<scriPt>pT>><<scRIPt<SCRIPT>>ScR<scrIPT>iP<s<ScRiPT><Sc<sC<ScriPT><s
...
```

and after replacing however many millions of `<script>`

tags only the flag
should remain.

Did this one under severe time pressure, which of course means you try the worst solutions first. Tried a regex which replaced two layers of nested strings at the same time. It starts out great, but quickly falls into quicksand because it seems there are around 16 or so (?) huge “towers” of nested strings (like a degenerate tree where each node only has one child).

So from there I moved on to the *second worst* solution, where I took the
resulting string with these “towers” and sliced them off at the base simply by
string-scanning forward on `<>`

characters, not caring about anything else,
reasoning that surely they could not be part of the flag.

It *worked* and got the flag, but it was a shitty ad-hoc solution.

The *right* solution (I realize now) would have been to just write simple,
straightforward loop that keeps a stack of strings at each level. I had
discarded this idea from the get-go, because I misread the size of the file
(~64MB) as ~640MB, assuming that a Python loop that iterates per character was
totally out of the question.

# hitchhike

A really cute Python escape:

```
#!/usr/bin/env python3.9
import os
def f(x):
print(f'value 1: {repr(x)}')
v = input('value 2: ')
if len(v) > 8: return
return eval(f'{x} * {v}', {}, {})
if __name__ == '__main__':
print("+---------------------------------------------------+")
print("| The Answer to the Ultimate Question of Life, |")
print("| the Universe, and Everything is 42 |")
print("+---------------------------------------------------+")
for x in [6, 6.6, '666', [6666], {b'6':6666}]:
if f(x) != 42:
print("Something is fundamentally wrong with your universe.")
exit(1)
else:
print("Correct!")
print("Congrats! Here is your flag:")
print(os.getenv("FLAG", "FAKECON{try it on remote}"))
```

Of course they’re all trivial (e.g. `"0 or 42"`

) except the last one, because
what can you really put in so that `dict() * <anything>`

doesn’t raise an
exception?

I thought: hmmmmmmm.

I tried some combos, thinking maybe there was a type that had an `__rmul__`

which handled `dict()`

s. `Counter()`

? (Then again, how would I import it?)
Sets, the (obscure) complex `j`

, etc. Nope.

Again I thought: hmmmmmmm. It has to be something syntactical or skips the evaluation entirely.

`0 if 1 else 42`

is too long!

`eval(input())`

is too long!

`__impor...`

, nah, forget it!

I started at the available functions in `__builtins__`

and thought: hmmmmmmm.

I know `exit`

is an object, does it provide some kind of trick that— no. Hey,
what about the other non-function function, `help()`

, does it… Oh!

Indeed, the `help()`

mini-shell invokes the external programs `less`

or `more`

which can then be used to execute shell commands with `!`

…

# qchecker

We’re given a fun Ruby quine program that self-modifies based on its argument:^{1}

```
[franksh@moso qchecker]$ ruby qchecker.rb
eval$uate=%w(a=%(eval$uate=%w(#{$uate})*"");Bftjarzs=b=->a{a.split(?+).map{|b|b.to_i(36)}};c=b["awyiv4fjfkuu2pkv+awyiv4f
v ut 71 6g 3j +a x
c e5e4pxrogszr3+5i0o mfd5dm9xf9q7+axce5 e4khrz21ypr+5htqqi 9iasvmjri7+axcc76i 03zrn7gu7+cbt4 m8 xybr3cb27+1ge6 s
n jex10w3si9+1k8vdb4 fzcys2yo0"];d,e,f, g,h,i=b["0+0+zeexa xq012eg+k2htkr1ola j6+3cbp5mnkzll t3 +2qpvamo605t7j "
] ;(j=eval(?A<<82<<7 1<<86)[0])&&d==0&& (e+=1;k=2**64;l=-> (a,b){(a-j.ord)*25 6.pow(b-2,b)%b }; f=l[f,k+13];g= l
[ g, k+ 37];h=l[h,k+51];i= l[i,k+81];j==?}&&( d=e==32&&f+g+h +i ==0?2:1);a.sub !
(/"0.*?"/,'"0'+[d ,e ,f,g,h,i].map{|x|x .to_s(36)}*?+<<34) );srand(f);k=b["7a cw+jsjm+46d84" ]; l=d==2?7:6;m=[ ?
#*(l*20)<<10]*11* "" ;l.times{|a|b=d==0 &&e!=0?rand(4):0;9 .times{|e|9.times{ |f|(c[k[d]/10* *a %10]>>(e*9+f)& 1
)!=0&&(g=f;h=e;b. ti mes{g,h=h,8-g};t=( h*l+l+a)*20+h+g*2+ 2;m[t]=m[t+1]=""<< 32)}}};a.sub!( /B .*?=/,"B=");n= m
. co un t( ?# )- a.length;a.sub !
("B=","B#{(1..n).map{(rand(26)+97).chr}*""}=");o=0;m.length.times{|b|m[b]==?#&&o<a.length&&(m[b]=a[o];o+=1)};puts(m))*""
[franksh@moso qchecker]$
```

```
[franksh@moso qchecker]$ ruby qchecker.rb > out.rb; for c in `echo "SECCON{AAAAA}" | fold -w1`; do ruby - $c < out.rb > tmp.rb; mv tmp.rb out.rb; done; cat out.rb
eval$uate=%w(a=%(eval$uate=%w(#{$uate})*"");Bygzwgnlnjkmwzhugrpnmdvlcwpmqlebkawvjklvmkmkc=b=->a{a.split(?+).map{|b|b.to_
i (36)}} ;c=b[" aw yi v4 fj fkuu2pkv+awyiv4fvut71
6 g3j+ax ce5e4p xr ogszr3+5i0omfd 5d m9xf9q7+axce5e 4k hrz21ypr+5htqq i9 iasvmjri7+axcc76i03zrn7gu7+cbt4m8xybr
3 cb27+1 ge6snj ex 10w3si9+1k8vdb 4f zcys2yo0"];d,e ,f ,g,h,i=b["01+d +m 6177zx5cmtf+1mdtba3ieal9d+2v6gou7jwyt
c+2 uf h2 68 232n wq"];(j=eval(? A< <82<<71<<86)[0 ]) &&d==0&&(e+=1; k= 2**64;l=->(a,b){(a-j.ord)*256.pow(b-2
,b) %b }; f= l[f, k+ 13];g=l[g,k+37 ]; h=l[h,k+51];i= l[ i,k+81]; j==?}&&(d=e==32&&f+g+
h+i == 0? 2: 1);a .sub!(/" 0.*?"/,' "0'+[d,e,f,g,h ,i ].map{|x|x.to_ s( 36)}*?+<<34)); srand(f);k=b["7acw+js
jm+46 d84"]; l=d==2 ?7:6;m=[?# *(l*20 )<<10]*11*"";l .t imes{|a|b=d==0 && e!=0?rand(4):0 ;9.times{|e|9.times{|
f|(c[ k[d]/1 0**a%1 0]>>(e*9+f)& 1)!= 0&&(g=f;h=e;b. ti mes{g,h=h,8-g} ;t =(h*l+l+a)*20+ h+ g*2+ 2;m[ t]=m[
t+1]= ""<<32 )}}};a .sub!(/B.*?=/, "B =" );n=m.count(?# )- a. leng th;a .sub!
("B=","B#{(1..n).map{(rand(26)+97).chr}*""}=");o=0;m.length.times{|b|m[b]==?#&&o<a.length&&(m[b]=a[o];o+=1)};puts(m))*""
[franksh@moso qchecker]$
```

Cool!

Caveat: I’ve never written even a hello world in Ruby^{2}, and I solved this
task simply by deconstructing the program “naively,” so this is going to be a
bit sketch.

My emacs buffer ended up such:

```
# a=%(eval$uate=%w(#{$uate})*"");
# Bftjarzs=b=->a{a.split(?+).map{|b|b.to_i(36)}};
```

Apparently Ruby syntax for lambdas are `->args{code}`

*and* apparently you don’t
call lambdas with `v(x)`

but rather `v.call(x)`

, but, again apparently, `v[x]`

is a short-hand trick, even though `[]`

is usually used for lookups like in
Python. So there’s this `b`

lambda that splits a string on `+`

and then
interprets the fragments as base-36. Useful.

`%(...)`

is some weird Ruby syntax for literal string arrays(?), `?<char>`

is an
escaped literal char, `a*""`

joins an array of strings (perhaps `* :: [T] -> T -> T`

for some monoid `T`

works like intersperse?). I mean, I could definitely
get into Ruby, this is good stuff.

```
# c=b["awyiv4fjfkuu2pkv+awyiv4fvut716g3j+axce5e4pxrogszr3+5i0omfd5dm9xf9q7+axce5e4khrz21ypr+5htqqi9iasvmjri7+axcc76i03zrn7gu7+cbt4m8xybr3cb27+1ge6snjex10w3si9+1k8vdb4fzcys2yo0"];
# 2413138514168077294502911 0x1ff0080402010080403ff b'\x01\xff\x00\x80@ \x10\x08\x04\x03\xff'
# 2413138514203124227638271 0x1ff0080403ff0080403ff b'\x01\xff\x00\x80@?\xf0\x08\x04\x03\xff'
# 2415504318135715148071935 0x1ff80c0603e10080403ff b'\x01\xff\x80\xc0`>\x10\x08\x04\x03\xff'
# 1216023231471466527982591 0x10180c06030180c0603ff b'\x01\x01\x80\xc0`0\x18\x0c\x06\x03\xff'
# 2415504318120356412261375 0x1ff80c06030180c0603ff b'\x01\xff\x80\xc0`0\x18\x0c\x06\x03\xff'
# 1214839173222390695330815 0x1014090443ff80c0603ff b'\x01\x01@\x90D?\xf8\x0c\x06\x03\xff'
# 2415495076716156996027391 0x1ff8040201ff0080403ff b'\x01\xff\x80@ \x1f\xf0\x08\x04\x03\xff'
# 75705726472931768279551 0x100804020100804021ff b'\x10\x08\x04\x02\x01\x00\x80@!\xff'
# 321749341105789098926865 0x442211154aa554462311 b'D"\x11\x15J\xa5TF#\x11'
# 345406059408174499233792 0x49248000000000000000 b'I$\x80\x00\x00\x00\x00\x00\x00\x00'
```

Here I found a big such “array” of raw data, which I pretty printed, and it looks to me maybe it’s used for constructing the ASCII-art output…

```
# d,e,f,g,h,i=b["0+0+zeexaxq012eg+k2htkr1olaj6+3cbp5mnkzllt3+2qpvamo605t7j"];
# d = 0 0x0 b''
# e = 0 0x0 b''
# f = 4659461645708163688 0x40a9bbae0cfdfa68 b'h\xfa\xfd\x0c\xae\xbb\xa9@'
# g = 2641556351334323346 0x24a8b18187759492 b'\x92\x94u\x87\x81\xb1\xa8$'
# h = 15837377083725718695 0xdbc9aa8c316224a7 b'\xa7$b1\x8c\xaa\xc9\xdb'
# i = 12993509283917003551 0xb45237d9e59cfb1f b'\x1f\xfb\x9c\xe5\xd97R\xb4'
(j=eval(?A<<82<<71<<86)[0])
&& d==0 && (
e+=1;
k=2**64;
l=->(a,b){(a-j.ord)*256.pow(b-2,b)%b};
f=l[f,k+13];
g=l[g,k+37];
h=l[h,k+51];
i=l[i,k+81];
j==?} && (d=e==32&&f+g+h+i==0?2:1);
a.sub!(/"0.*?"/,'"0'+[d,e,f,g,h,i].map{|x|x.to_s(36)}*?+<<34)
);
```

Now wait just a second, this is way more suspicious. `pow`

, `%`

-mod, `2**64+13`

is a prime i recognize, etc. So yeah, this is likely to be the logic.

But following the calculations through, we have a bunch of steps $a_{i,j+1}=256a_{i,j}−c_{j} (modp_{i})$, with $i=1…4$ and $c_{j}$ being
the ordinal value of our input characters. And the goal seems to be $∑_{i}a_{i,n}=0$ and since they’re non-negative (Ruby’s `%`

operator is *sane*) it means
each sequence should end at 0.

From this it’s reasonable to assume that the starting values is the flag itself, and indeed the modulus is big enough to reconstruct it with CRT:

```
>>> crt(vals, [2**64 + v for v in [13,37,51,81]]).bytes()
b'}!!!3n1uQ_ru0y_3t1rw_5t3L{NOCCE'
```

And there we go.

The rest isn’t that interesting, but I guess it constructs the ASCII-art output:

```
srand(f);k=b["7acw+jsjm+46d84"];
l=d==2?7:6;
m=[?#*(l*20)<<10]*11*"";
l.times{
|a|
b=d==0
&& e!=0?rand(4):0;
9.times{|e|9.times{|f|(c[k[d]/10**a%10]>>(e*9+f)&1)!=0&&(g=f;h=e;b.times{g,h=h,8-g};t=(h*l+l+a)*20+h+g*2+2;m[t]=m[t+1]=""<<32)}}
};
a.sub!(/B.*?=/,"B=");
n=m.count(?#)-a.length;
a.sub!("B=","B#{(1..n).map{(rand(26)+97).chr}*""}=");
o=0;
m.length.times{
|b|m[b]==?#
&&
o<a.length&& (m[b]=a[o]; o+=1)
};
puts(m)
```

# sed programming

There’s a big sed script that consists of a list of regular expressions (actually just simple string replacements) that are tried in order. When one is successful, it loops back and starts over. There’s two stop-states (replacements exits the loop).

It looks like this:

```
#!/bin/sed -f
# Check flag format
# Some characters are used internally
/^SECCON{[02-9A-HJ-Z_a-km-z]*}/!{
cINVALID FORMAT
b
}
:t
s/1Illl11IlIl1/1IlIl11Illl1/;tt
s/1Illl11III1/1III11Illl1/;tt
s/1Ill11IlIl1/1IlIl11Ill1/;tt
s/1Illl11l1/1l11Illl1/;tt
s/1Ill11IIII1/1IIII11Ill1/;tt
s/1Ill11III1/1III11Ill1/;tt
s/1Ill11IIll1/1IIll11Ill1/;tt
s/1Illl11IIll1/1IIll11Illl1/;tt
s/1Illl11IIII1/1IIII11Illl1/;tt
s/1Ill11l1/1l11Ill1/;tt
s/G1II1/GR1II1/;tt
s/1II11IIll11IIll11IIll11IIll11IlIl11IlIl11IlIl11II11IIll1/1II11IIll11IIll11IIll11IIll11IlIl11IlIl11IIll11II1/;tt
s/1II11IIll11IlIl11IlIl11IIll11IlIl11II11IlIl1/1IIIl1/;tt
s/1II11IIll11IIll11IIll11IlIl11IIll11IlIl11IIll11II11IlIl1/1IIIl1/;tt
s/1II11IIll11IlIl11IlIl11IlIl11IlIl11IlIl11IIll11II11IIll1/1II11IIll11IlIl11IlIl11IlIl11IlIl11IIll11IlIl11II1/;tt
s/1II11IIll11IlIl11IIll11IIll11IlIl11IlIl11IlIl11IlIl11II11IlIl1/1II11IIll11IlIl11IIll11IIll11IlIl11IlIl11IlIl11IIll11II1/;tt
s/1II11IIll11IlIl11IIll11IlIl11IlIl11II11IIll1/1II11IIll11IlIl11IIll11IlIl11IIll11II1/;tt
s/1II11IIll11IlIl11IlIl11IlIl11IIll11IIll11IIll11IIll11II11IIll1/1IIIl1/;tt
# ... + tons more ...
```

Like in qchecker I did this one also naively and manually, even though it clearly invites the construction of state machines that can be more easily manipulated. There’s been a number of similar CTF tasks before, too, and I’ve written bits and pieces, but it’s a messy ad-hoc mess that there’s little hope of salvaging anything. It would probably take me as much time to collect it as it would if I just “brute forced” it like it was a puzzle. Besides, it’s fun to work “with my hands” as it were.

First thing is of course to clean it up so we have something to look at. I’m
simply replacing string-chunks with my best guess for its role in the underlying
state machine. For example it looks like `1`

is a separator of some kind,
repeating at a regular interval in many strings. From the stop-states I found a
string that likely represents an error-state, and there’s tons of places where
two symbol-blocks seem to go hand-in-hand like they are binary…

```
|Illl|:0: /:0:|Illl|
|Illl||III| /|III||Illl|
|Ill|:0: /:0:|Ill|
|Illl||l| /|l||Illl|
|Ill||IIII| /|IIII||Ill|
|Ill||III| /|III||Ill|
|Ill|:1: /:1:|Ill|
|Illl|:1: /:1:|Illl|
|Illl||IIII| /|IIII||Illl|
|Ill||l| /|l||Ill|
G+ / GR+
+:1::1::1::1::0::0::0:+:1: / +:1::1::1::1::0::0::1:+
+:1::0::0::1::0:+:0: / <NO>
+:1::1::1::0::1::0::1:+:0: / <NO>
+:1::0::0::0::0::0::1:+:1: / +:1::0::0::0::0::1::0:+
+:1::0::1::1::0::0::0::0:+:0:/ +:1::0::1::1::0::0::0::1:+
+:1::0::1::0::0:+:1: / +:1::0::1::0::1:+
+:1::0::0::0::1::1::1::1:+:1:/ <NO>
+:1::0::1::1::1::1::1::1:+:0:/ <NO>
+:1::0::1::1::1::1::0:+:0: / <NO>
+:1::0::0::1::1:+:1: / <NO>
+:1::0::1::1::1::0::0:+:0: / <NO>
W<NO> /WR<NO>
+:1::0::0::1::1::1::1:+:1: / +:1::0::1::0::0::0::0:+
+:1::1::0::0::1::0::0:+:1: / <NO>
+:1::0::1::1::0::0::1::0:+:0:/ <NO>
+:1::0::1::0::0::0::1::1:+:0:/ <NO>
+:1::1::0::1::0::1::0:+:1: / <NO>
+:1::0::1::1::0::0::1:+:0: / <NO>
+:1::1::1::1::1::0:+:1: / <NO>
+:1::0::0::1::0::1::0::0:+:0:/ <NO>
+:1::0::0::1::1::0::0::1:+:1:/ <NO>
+:1::1::1::0::1:+:1: / +:1::1::1::1::0:+
+:1::0::1::0::0::0::0:+:0: / <NO>
+:1::1::0::1::1::0:+:0: / +:1::1::0::1::1::1:+
+:1::0::1:+:0: / <NO>
+:1::0::1::1::0::1::1::1:+:0:/ +:1::0::1::1::1::0::0::0:+
+:1::0::1::1::0::1::0::0:+:1:/ <NO>
+:1::1::0::1::1::1:+:1: / +:1::1::1::0::0::0:+
+:1::0::1::1::1::0:+:0: / +:1::0::1::1::1::1:+
+:1::1::1::0::0::0::0:+:1: / +:1::1::1::0::0::0::1:+
+:1::1::1::1:+:1: / <NO>
+:1::0::0::1::1::1:+:1: / <NO>
# ...<snip>...
...
```

This (the start of the script) seems to be some count-and-check logic. I.e. it counts the binary characters while breaking off with an error if it encounters a given binary digit at a given position. Basically it compares some binary string to a solution?

Further down there’s also *two other* set of blocks that looked to me like
binary digits, though here with just transfer-logic between them.

I wrote a quick Python script to simply dump the state while replacing certain blocks with symbols to watch what was happening visually.

First, apparently is translated to a binary, e.g. `45678`

eventually ends up as
its binary string with some initial state:

```
step 37: §§∅§:I::IIlI::IllI:∅∅§∅∅∅∅∅§∅§∅∅∅∅§∅∅§∅∅∅∅§§∅§∅∅∅§∅∅:l:
```

Then the “main loop” seems to start. Each iteration seems to transfer the
starting string bit by bit to the other side of the `:l:`

separator symbol, but
manipulated in some way, and now rendered in new symbols:

```
step 372: §§∅∅:IIlI::IllI:§∅∅∅∅§∅∅§∅∅∅∅§§∅§∅:Illl:∅∅§∅∅:l:¶¶¶°°°¶¶°
step 373: §§∅∅:IIlI::IllI:§∅∅∅∅§∅∅§∅∅∅∅§§∅§∅∅:Illl:∅§∅∅:l:¶¶¶°°°¶¶°
step 374: §§∅∅:IIlI::IllI:§∅∅∅∅§∅∅§∅∅∅∅§§∅§∅∅∅:Illl:§∅∅:l:¶¶¶°°°¶¶°
step 375: §§∅∅:IIlI::IllI:§∅∅∅∅§∅∅§∅∅∅∅§§∅§∅∅∅§:Illl:∅∅:l:¶¶¶°°°¶¶°
step 376: §§∅∅:IIlI::IllI:§∅∅∅∅§∅∅§∅∅∅∅§§∅§∅∅∅§∅:Illl:∅:l:¶¶¶°°°¶¶°
step 377: §§∅∅:IIlI::IllI:§∅∅∅∅§∅∅§∅∅∅∅§§∅§∅∅∅§∅∅:Illl::l:¶¶¶°°°¶¶°
step 378: §§∅∅:IIlI::IllI:§∅∅∅∅§∅∅§∅∅∅∅§§∅§∅∅∅§∅∅:l::Illl:¶¶¶°°°¶¶°
step 379: §§∅∅:IIlI::IllI:§∅∅∅∅§∅∅§∅∅∅∅§§∅§∅∅∅§∅∅:l:¶:Illl:¶¶°°°¶¶°
step 380: §§∅∅:IIlI::IllI:§∅∅∅∅§∅∅§∅∅∅∅§§∅§∅∅∅§∅∅:l:¶¶:Illl:¶°°°¶¶°
step 381: §§∅∅:IIlI::IllI:§∅∅∅∅§∅∅§∅∅∅∅§§∅§∅∅∅§∅∅:l:¶¶¶:Illl:°°°¶¶°
step 382: §§∅∅:IIlI::IllI:§∅∅∅∅§∅∅§∅∅∅∅§§∅§∅∅∅§∅∅:l:¶¶¶°:Illl:°°¶¶°
step 383: §§∅∅:IIlI::IllI:§∅∅∅∅§∅∅§∅∅∅∅§§∅§∅∅∅§∅∅:l:¶¶¶°°:Illl:°¶¶°
step 384: §§∅∅:IIlI::IllI:§∅∅∅∅§∅∅§∅∅∅∅§§∅§∅∅∅§∅∅:l:¶¶¶°°°:Illl:¶¶°
step 385: §§∅∅:IIlI::IllI:§∅∅∅∅§∅∅§∅∅∅∅§§∅§∅∅∅§∅∅:l:¶¶¶°°°¶:Illl:¶°
step 386: §§∅∅:IIlI::IllI:§∅∅∅∅§∅∅§∅∅∅∅§§∅§∅∅∅§∅∅:l:¶¶¶°°°¶¶:Illl:°
step 387: §§∅∅:IIlI::IllI:§∅∅∅∅§∅∅§∅∅∅∅§§∅§∅∅∅§∅∅:l:¶¶¶°°°¶¶°:Illl:
step 388: §§∅∅:IIlI::IllI:§∅∅∅∅§∅∅§∅∅∅∅§§∅§∅∅∅§∅∅:l:¶¶¶°°°¶¶°¶
step 389: §§∅∅:IIlI::IllI::Illl:∅∅∅∅§∅∅§∅∅∅∅§§∅§∅∅∅§∅∅:l:¶¶¶°°°¶¶°¶
step 390: §§∅∅:IIlI::IllI:∅:Illl:∅∅∅§∅∅§∅∅∅∅§§∅§∅∅∅§∅∅:l:¶¶¶°°°¶¶°¶
step 391: §§∅∅:IIlI::IllI:∅∅:Illl:∅∅§∅∅§∅∅∅∅§§∅§∅∅∅§∅∅:l:¶¶¶°°°¶¶°¶
step 392: §§∅∅:IIlI::IllI:∅∅∅:Illl:∅§∅∅§∅∅∅∅§§∅§∅∅∅§∅∅:l:¶¶¶°°°¶¶°¶
```

(Notice also that the counter on the far left has decreased by 1 now, it does this every full cycle of the “main loop,” but its symbols change to something weird in the end. The original count is accurate though, of 13 iterations.) Once the original binary string is gone, it translates the new binary string into the alphabet of the old one:

```
step 1157: §§∅∅:IIlI::IlI:¶¶¶°°°¶¶°¶¶°°¶¶¶¶¶¶°°¶°°°¶¶°¶¶¶°
step 1158: §§∅∅:IIlI:§:IlI:¶¶°°°¶¶°¶¶°°¶¶¶¶¶¶°°¶°°°¶¶°¶¶¶°
step 1159: §§∅∅:IIlI:§§:IlI:¶°°°¶¶°¶¶°°¶¶¶¶¶¶°°¶°°°¶¶°¶¶¶°
step 1160: §§∅∅:IIlI:§§§:IlI:°°°¶¶°¶¶°°¶¶¶¶¶¶°°¶°°°¶¶°¶¶¶°
step 1161: §§∅∅:IIlI:§§§∅:IlI:°°¶¶°¶¶°°¶¶¶¶¶¶°°¶°°°¶¶°¶¶¶°
step 1162: §§∅∅:IIlI:§§§∅∅:IlI:°¶¶°¶¶°°¶¶¶¶¶¶°°¶°°°¶¶°¶¶¶°
step 1163: §§∅∅:IIlI:§§§∅∅∅:IlI:¶¶°¶¶°°¶¶¶¶¶¶°°¶°°°¶¶°¶¶¶°
step 1164: §§∅∅:IIlI:§§§∅∅∅§:IlI:¶°¶¶°°¶¶¶¶¶¶°°¶°°°¶¶°¶¶¶°
step 1165: §§∅∅:IIlI:§§§∅∅∅§§:IlI:°¶¶°°¶¶¶¶¶¶°°¶°°°¶¶°¶¶¶°
step 1166: §§∅∅:IIlI:§§§∅∅∅§§∅:IlI:¶¶°°¶¶¶¶¶¶°°¶°°°¶¶°¶¶¶°
step 1167: §§∅∅:IIlI:§§§∅∅∅§§∅§:IlI:¶°°¶¶¶¶¶¶°°¶°°°¶¶°¶¶¶°
step 1168: §§∅∅:IIlI:§§§∅∅∅§§∅§§:IlI:°°¶¶¶¶¶¶°°¶°°°¶¶°¶¶¶°
step 1169: §§∅∅:IIlI:§§§∅∅∅§§∅§§∅:IlI:°¶¶¶¶¶¶°°¶°°°¶¶°¶¶¶°
step 1170: §§∅∅:IIlI:§§§∅∅∅§§∅§§∅∅:IlI:¶¶¶¶¶¶°°¶°°°¶¶°¶¶¶°
step 1171: §§∅∅:IIlI:§§§∅∅∅§§∅§§∅∅§:IlI:¶¶¶¶¶°°¶°°°¶¶°¶¶¶°
```

Then the loop repeats, doing the same sort of translation. The final string after these iterations is the binary string that is compared in the “error checker” we saw in the beginning.

I also save and print out the values of the binary strings after each “pass” over the strings:

```
1094861636: 0 :: 41424344: 0 :: b'ABCD':b'' []
1094861636: 0 :: 41424344: 0 :: b'ABCD':b'' []
1094861636: 0 :: 41424344: 0 :: b'ABCD':b'' [12]
1094861636: 0 :: 41424344: 0 :: b'ABCD':b'' [12]
21119812: 0 :: 1424344: 0 :: b'\x01BCD':b'' [12]
...
4: 29983161 :: 4: 1c981b9 :: b'\x04':b'\x01\xc9\x81\xb9' [12]
4: 59966322 :: 4: 3930372 :: b'\x04':b'\x03\x93\x03r' [12]
4: 119932644 :: 4: 72606e4 :: b'\x04':b'\x07&\x06\xe4' [12]
0: 239865288 :: 0: e4c0dc8 :: b'':b'\x0eL\r\xc8' [12]
3815236718: 0 :: e367e46e: 0 :: b'\xe3g\xe4n':b'' []
3815236718: 1 :: e367e46e: 1 :: b'\xe3g\xe4n':b'\x01' [12]
1667753070: 2 :: 6367e46e: 2 :: b'cg\xe4n':b'\x02' [12]
1667753070: 2 :: 6367e46e: 2 :: b'cg\xe4n':b'\x02' [6, 1]
...
```

Knowing the values must have a pretty simple relation, I just stare at them
really hard until some unknown neurons finally gets it and inform me the loop
does `x[i+1] = (x[i] << 1 ^ x[i] ^ x[i] >> 1) % 256**L`

. This repeats 13 times,
then the result is checked by the count-and-check thing I described above, so,
reversing, we get `SECCON{mARkOV_4Lg0Ri7hM}`

, which, looking it up, is the name
for this kind of construction I guess?

# case-insensitive

This was the one problem I downloaded but failed, but it’s probably worth mentioning:

```
from flag import flag
import signal
import bcrypt
def check_and_upper(message):
if len(message) > 24:
return None
message = message.upper()
for c in message:
c = ord(c)
if ord("A") > c or c > ord("Z"):
return None
return message
signal.alarm(600)
while True:
mode = input(
"""1. sign
2. verify
mode: """
).strip()
## sign mode ##
if mode == "1":
message = check_and_upper(input("message: ")) # case insensitive
if message == None:
print("invalid")
continue
salt = bcrypt.gensalt(5)
print("mac:", bcrypt.hashpw((message + flag).encode(), salt).decode("utf-8"))
## verify mode ##
else:
mac = input("mac: ")
message = check_and_upper(input("message: ")) # case insensitive
if message is None:
print("invalid")
continue
print("result:", bcrypt.checkpw((message + flag).encode(), mac.encode()))
```

Some kind of library bug? It doesn’t seem to support `$2$`

or anything like
that. Some unicode fuckery with the `mac`

we give? No, it bans `\x00`

s, it bans
`>127`

characters… At least the package I installed. (I tried to do this one
offline, so I don’t know what was used on the server, and I saw there’s a lot of
other less-popular `bcrypt`

implementations, but didn’t have the energy to check
all of them, and I doubt that’s the solution?)

Surely it can’t be a timing thing? (We can set the rounds, but how does that
help?) I can’t help but think the code is *really* suspicious because it’s
“childishly” simple (the `check_and_uppercase()`

), like it’s trying really hard
to look innocuous. But I just ended up scratching my head and I am out of
energy. I’d appreciate if anyone gives me a hint for some later attempt.

Edit: `poiko`

got the solution and gave me a pretty heavy hint about looking
into the length of `message.upper()`

and indeed:

```
>>> [(ord(c), c, len(c.upper()), mapl(ord, c.upper())) for c in map(chr, range(0x110000)) if len(c.upper()) > 2]
[(912, 'ΐ', 3, [921, 776, 769]), (944, 'ΰ', 3, [933, 776, 769]), (8018, 'ὒ', 3, [933, 787, 768]), (8020, 'ὔ', 3, [933, 787, 769]), (8022, 'ὖ', 3, [933, 787, 834]), (8119, 'ᾷ', 3, [913, 834, 921]), (8135, 'ῇ', 3, [919, 834, 921]), (8146, 'ῒ', 3, [921, 776, 768]), (8147, 'ΐ', 3, [921, 776, 769]), (8151, 'ῗ', 3, [921, 776, 834]), (8162, 'ῢ', 3, [933, 776, 768]), (8163, 'ΰ', 3, [933, 776, 769]), (8167, 'ῧ', 3, [933, 776, 834]), (8183, 'ῷ', 3, [937, 834, 921]), (64259, 'ﬃ', 3, [70, 70, 73]), (64260, 'ﬄ', 3, [70, 70, 76])]
```

This solves the task because `bcrypt`

does a quiet `string = string[:72]`

truncation on its side, so flag can be solved byte-by-byte.

I had no idea about `upper()/lower()`

changing the length (it only happens with
*one* character for lowercase), though 20-20 says I’m dumb for not checking it
because,

- I was
*already*suspicious of the “style” of the code, i.e. the style of code strongly hints there’s some subtle trickery going on, like in this case the`.upper()`

happening after the length check, and - I did see the truncation in
`bcrypt`

so I knew that*if*the string could be made longer the task would be solved…

# [TODO] flagmining

`flagmining`

is the name of a
*batteries-not-only-included-but-shoved-up-your-nose* personal library I’ve
developed for working on CTF tasks I enjoy. In particular it focuses on the kind of
problems that would usually be tagged with `crypto`

, `misc`

,
`math`

, or `ppc`

.

To give an idea, all my ad-hoc scripts tend to start with `from flagmining.all import *`

, which translates into the following (at the time of writing):

```
# Common stuff from the Python standard library that I use all the time. Why not
# just become Julia with a billion names in global namespace? At least we don't
# have 1-indexing...
from pathlib import Path
from functools import reduce, singledispatch, lru_cache
from itertools import count, chain, count, starmap, product, zip_longest, combinations
from dataclasses import dataclass
from secrets import token_bytes, token_hex, token_urlsafe
import secrets
from base64 import b64decode, b64encode
from collections import Counter, defaultdict, deque, namedtuple, abc
import logging
from logging import warning, info, debug, error
import math
from math import prod # Not in gmpy2.
import time
import datetime as dt
import re
import os
from os import urandom
import sys
import ast
import zlib
import random
from random import getrandbits, randrange
import operator as op
import hashlib
import json
import pickle
from timeit import timeit
# NUMPY (or something similar) SHOULD 100% BE IN THE PYTHON STANDARD LIBRARY.
import numpy as np
# I hate languages rolling their own bigints. Stop, please. GMP is just so far
# ahead of everything else.
import gmpy2
# Some other useful third-party libraries that could be in the kitchensink-like
# Python standard library.
import requests
from PIL import Image
from tqdm import tqdm, trange # So useful.
# import Crypto
# Sane numpy defaults.
np.set_printoptions(suppress=True, edgeitems=30, threshold=5000, linewidth=400)
# XXX: how to deal with this.
np.seterr(over='ignore')
from .monkey_patch import * # Black arts.
from .utils import * # Très important.
from .jsdict import * # The only compliment I'll ever pay JavaScript.
# from .bytes import * # now in utils.
# from .iterators import * # now in utils.
# from .compression import * # MISSING.
# from .automata import * # TODO: isolate from ~/misc
from .bits import * # Bits are the atoms of our universe.
from .state import * # Simple state for standalone scripts.
# The real juice.
from .euler import *
from .primes import *
# Various.
from .subst import * # substitution boxes and permutations.
from .xor import *
from .digits import *
from .sbox import *
from .oracles import *
from .text import *
from .groups import *
from .code import * # TODO: move to utils?
from .time import * # Time utilities.
from .numpy import * # Numpy extras.
from .lcg import * # Linear congruential generator.
from .aes import *
from .chacha import *
from .rc4 import *
from .rsa import *
# from .ec import * # TODO: move from sage scripts after flagrs
from .pbyte import * # Probabilistic bytes.
# Import the following to automatically run pdb debugger on exception:
#
# import flagmining.debug
```

In the beginning, there was chaos. Everything was ad-hoc — ugly scrawlings on
the walls of insane asylums — one-off scripts left behind in some random
directory, code lost in the REPL history. There was `~/misc`

, before I hardly
knew what CTFs were, when `poiko`

was just feeding me odd problem over chat,
later to become `~/misc/ctf`

. It grew to hundreds of files — tasks, problem
data, `.tgz`

s, `solve22.py`

, `solveeee.py`

, `solve222221.py`

, `solqwfqwf.py`

—
before I even started making directories. Then there was `~/tmp/ctf`

and
`~/ctf`

, depending on my involvement with actually playing myself. I had to `rg`

every time I saw a problem I knew I had solved before, praying that I’d named
the functions something sane (unlikely). the time I just solve tasks directly in
the Python REPL anyway, so I had to also make sure to search that history…

I’m not the most organized. But I am chaotically trying to get better. This is one such effort, attempting to extract common functionality, ergonomics, utilities, and so forth from my problem solving efforts.

I’m far from done, but here I plan to document parts of it, so that it might be useful open-source one day.

# Programming Puzzles & Competitions

My main source of mental exercise these days.

Simple & introductory problems are at Advent of Code 2021.

Completed all the CryptoHack problems.

Project Euler was mostly used during my Haskell days.

# Advent of Code 2021

I’ll be solving these in Python.

Update after day 3: seems I grossly overestimated the difficulty of these problems. I expected perhaps some algorithms or thinking, as I believe I recall such problems from the past AoCs. But these problems so far are just “problems of expression.” I.e. they’re more exercises in finding the correct language idioms to use or something like that. Perhaps better for people just starting out to learn the basics of programming?

Update after day 8: OK, I take that back. The problems do get a little bit more involved, there’s a little math here and there, and day 7 I (sort of) failed the first time I attempted it, so I have no reason to act cocky.

Some common prelude code I put in `aoc-common.py`

to allow me to download the
website data remotely. I opted out of having it also post the answer back.

```
import requests
from pathlib import Path
class aoc:
def __init__(self, cookie_file='cookie.txt'):
self.sess = requests.Session()
for line in (Path(__file__).parent / cookie_file).open('rt'):
if '=' in line:
name, val = line.strip().split('=', 1)
self.sess.cookies[name] = val
self._cache = {}
def input_text(self, day, year=2021):
url = f'https://adventofcode.com/{year}/day/{day}/input'
if url not in self._cache:
self._cache[url] = self.sess.get(url)
return self._cache[url].text
```

# Sonar Sweep

## Part 1

Simplified problem statement:

- Given a sequence of integers, count how many times there’s an increase in value between two adjacent numbers.

Basically count the number of times `a[i+1] > a[i]`

.

Basic idea is to just iterate over pairs (the adjacent value) and filter or
count the ones that satisfy the requirement. I’ll resist the usual temptation of
trivializing everything with `numpy`

tricks.

Pairs of values in Python can be given by `zip(lst, lst[1:])`

because `zip()`

stops at the shortest of the iterators. One could also use `tee()`

or similar to
avoid making a list copy.

For the filter I prefer a comprehension like `sum(x < y for x,y in pairs)`

or
`sum(1 for x,y in pairs if x < y)`

. A more militantly functional way to apply
the operator across the pairs could be be `starmap(op.lt, pairs)`

.

```
aoc = __import__('aoc-common').aoc()
depths = [int(x) for x in aoc.input_text(1).split()]
increases = lambda lst: sum([x < y for x,y in zip(lst, lst[1:])])
aoc.print_answer(1, 1, increases(depths))
```

## Part 2

- Given a sequence of integers
`a[i]`

, consider the transformed sequence`b[i] = a[i]+a[i+1]+a[i+2]`

(sum of every three contiguous elements), and count the increases of this new sequence as in*Part 1*.

A generalized solution that would be purely `O(n)`

no matter what the length
of the sums are, would be something like this:

```
from collections import deque
def contiguous_sums(k, it):
history = deque()
val = 0
for a in it:
val += a
history.append(a)
if len(history) > k:
val -= history.popleft()
if len(history) == k:
yield val
sums_of_L3 = contiguous_sums(3, depths)
```

But since `k=3`

is fixed, the `O(n*k)`

solution works where we just hardcode it works fine, too:

```
aoc.print_answer(1, 2, increases([x+y+z for x,y,z in zip(depths, depths[1:], depths[2:])]))
```

# Dive

This problem doesn’t lend itself very well to a one-line description, but basically:

- Compute some simple values (e.g. sums) from a sequence of text lines of the form
`<command> <int>`

.

Both part 1 and part 2 are like this, I don’t really see a difference.

The simplest approach, I think, is to either do `cmd, n = line.split(' ', 1)`

over the lines, or to use a regular expression to iterate over matches directly.
I went for the latter because it’s more flexible, and probably just in general a
better approach for this kind of problem when dealing with
Python.

```
import re
inp = __import__('aoc-common').aoc().input_text(2)
expr = re.compile(r"^(?P<dir>forward|down|up) (?P<n>\d+)$", re.M)
h, d = 0, 0
for m in expr.finditer(inp):
n = int(m['n'])
if m['dir'] == 'forward':
h += n
else:
d += -n if m['dir'] == 'up' else n
print(f'answer for part 1: {h*d}')
aim, h, d = 0, 0, 0
for m in expr.finditer(inp):
n = int(m['n'])
if m['dir'] == 'forward':
h += n
d += aim * n
else:
aim += -n if m['dir'] == 'up' else n
print(f'answer for part 2: {h*d}')
```

# Binary Diagnostic

## Part 1

… is just a simple idiom/expression exercise, count the number of `'1'`

per
column in a series of rows (lines) of text.

This task inspired the rant Python Is Slow.

```
aoc = __import__('aoc-common').aoc()
inp = aoc.input_text(3)
from collections import Counter
lines = inp.split()
ncolumns = len(lines[0])
one_counts = [r.count('1') for r in zip(*lines)]
gam = int(''.join(['01'[x >= len(lines)//2] for x in one_counts]), 2)
eps = int(''.join(['01'[x <= len(lines)//2] for x in one_counts]), 2)
print(f'part 1 answer: {gam*eps}')
```

## Part 2

… is *verging* on being algorithmic. Winnow down the lines iteratively
based on some criteria of this column count until one remains.

Standard operating procedure, the tricky part is figuring out the best way
to generalize the two operations (selecting min/max by column) into a single
action. This kind of stuff is often a pain point in imperative-like languages:
abstracting over some kind of “direction.” Often you end with ugly hacks like
`step = -1 or +1`

or even explicitly with enums like `FORWARD`

, `NORTH`

, and so
on. Or, in the case below, this ugly `which`

argument.

Note also there’s no guarantee in the input that this process will end before we run out of columns, but I’ll pass that off as a problem error.

```
from enum import Enum
class Dir(Enum):
BIGGER = 0
SMALLER = 1
nums = [int(x, 2) for x in inp.split()]
def recursive_divide(lst, which, idx=ncolumns):
assert lst and idx >= 0
if len(lst) <= 1:
return lst[0]
sep = [[], []]
for x in lst:
sep[1 & x >> idx - 1].append(x)
return recursive_divide(
sep[which.value] if len(sep[0]) <= len(sep[1]) else sep[1^which.value],
which,
idx - 1)
oxy = recursive_divide(nums, Dir.BIGGER)
co2 = recursive_divide(nums, Dir.SMALLER)
print(f'part 2 answer: {oxy*co2}')
```

# Bingo

## Part 1: Find first bingo winner.

Given 5x5 bingo boards and a sequence of numbers, find the first winner. It’s simplified to only count straight horizontal/vertical bingos.

This is perfect for Numpy, as we can do all the updates and checks with simple expressions. Again, see Python Is Slow for why this is good.

```
aoc = __import__('aoc-common').aoc()
import numpy as np
text = aoc.input_text(4)
_numbers, _boards = text.split(maxsplit=1)
numbers = np.array(_numbers.split(','), 'i4')
boards = np.array(_boards.split(), 'i4').reshape(-1,5,5)
from enum import Enum
class Orientation(Enum):
COLS = 2
ROWS = 1
def mark_bingos(boards, dir):
"""Translate an array of boards into an array that is nonzero if the board has a
bingo in the given direction.
"""
return (boards.sum(dir.value) == -boards.shape[dir.value]).sum(1)
def score_board(board, n):
board[board == -1] = 0 # reset the temporary sentinel markers.
return board.sum() * n
def find_winner(B, numbers):
"""Iteratively marks off the numbers and returns the first winner. Modifies the boards."""
for n in numbers:
B[B == n] = -1 # uses -1 because apparently 0 is a valid bingo number.
if len(wins := mark_bingos(B, Orientation.COLS).nonzero()[0]):
return score_board(B[wins[0]], n)
if len(wins := mark_bingos(B, Orientation.ROWS).nonzero()[0]):
return score_board(B[wins[0]], n)
score = find_winner(boards, numbers)
print(f'answer to part 1: {score}')
```

Note also that again the question of “orientation” or “direction” is somewhat awkward to generalize over in Python (as any imperative language) and invariably leads to pseudo-repetitive code (unless we resort to ugly hacks).

## Part 2: Find the losing board.

```
def find_loser(B, numbers):
"""Iteratively marks off the numbers and returns the last winner. Modifies the boards."""
for i,n in enumerate(numbers):
if len(B) <= 1:
return find_winner(B, numbers[i:])
B[B == n] = -1
nonwins = (mark_bingos(B, Orientation.COLS) == 0).nonzero()[0]
B = B[nonwins]
nonwins = (mark_bingos(B, Orientation.ROWS) == 0).nonzero()[0]
B = B[nonwins]
score = find_loser(boards, numbers)
print(f'answer to part 2: {score}')
```

# Hydrothermal Venture

Some actual algorithms.

But I’m getting annoyed with the way AoC doesn’t give any info as to what part 2 contains. This just seems to encourage doing a dirty ad-hoc solution for part 1 just to reveal part 2.

Here part 1 deals with orthogonal lines on a discrete 2D grid.

And part 2 could go in several diffferent directions. It might expand to talk about (orthogonal) rectangles (in which case you definitely want to go with something like BSPs), it might just loosen the restriction and allow non-orthogonal lines, or it could even expand it to talking about orthogonal lines in more dimensions (e.g. lines apparing at different discrete points in time). The lines in part 2 might be connected, and the calculation different. And so on. I suppose the only hint is that the data input format cannot change.

But all would entail different trade-offs and what kind of data structures you want to go for.

## Prelude

Will be using Numpy as usual.

```
aoc = __import__('aoc-common').aoc()
import numpy as np
import re
text = aoc.input_text(5)
expr = re.compile(r'^(\d+),(\d+) -> (\d+),(\d+)$', re.M)
lines = np.array(expr.findall(text), np.int64).reshape(-1, 2, 2)
# Now we have an array like (#, 2, 2) corresponding to (num_of_lines, start/end, x/y).
#
# I.e. lines[2][1][0] is the x-coordinate (0) of the end-point (1) of the 3rd line (2).
# Reposition the lines so we're working in an area starting with (0,0).
lines -= lines.min((0,1))
# Figure out the size of the area in question.
x_max, y_max = lines.max((0,1))
# Assert that area is "small."
assert x_max * y_max < 1_000_000
```

## Part 1: Querying ranges.

So we get a list of points that represent orthogonal lines, as said. They’re not really “lines,” per se, but ranges in some grid. We are to count the number of cells in the grid included in at least two such ranges.

There’s many different approaches and trade-offs to be made for such a scenario. There’s also a huge “problem space” that is interesting to think about, if you just ask different questions. It entirely changes the approach you want to take. You’re encouraged to think about these things, even though it’s probably not interesting to most people.

- The problem would be very different if we instead expected queries on the form “is there a line at (x,y)?”
- The problem would again be very different if the question was “what are the lines not overlapping or crossing with any other lines?”
- Or indeed, “what is the longest line within the bounding box that can be drawn that does not intersect with any of these lines?”

But to the problem at hand, the absolute simplest (in my mind) is to just have some 2D array and mark off all the ranges:

```
# The area we'll "draw" in.
arena = np.zeros((x_max+1, y_max+1), 'i4')
# Identify orthogonal lines.
orthogonal = lines[np.any(lines[:,0] == lines[:,1], 1)]
# Sort all the points so that the first point has the minimum of the x and y
# coordinates, and the second the maximum. Note that if the lines were
# non-orthogonal, this might mirror it vertically or horizontally.
orthogonal = np.stack([orthogonal.min(1), orthogonal.max(1)], 1)
# Moving this loop to numpy is non-trivial. Doing slicing with a numpy array on
# another array on the numpy side can be done but it is always pretty
# complicated.
for l,u in orthogonal:
arena[l[0]:u[0]+1, l[1]:u[1]+1] += 1
print(f'part 1 solution: {np.count_nonzero(arena >= 2)}')
```

If $n$ is the number of lines, $w$ is the maximum span of the coordinates, then this is $O(w(w+n))$, i.e. it’s quadratic with respect to the lengths we’re talking about.

It’s a good solution if the grid size is very small, and it’s *linear* with
respect to the number of lines. (Also if we are to query the grid area and do a
lot of calculations on it, this would also be a good solution because querying
is constant-time.)

But it’s a bad solution if the lines are long or if the bounding grid is very big.

The second general approach is to fixate only on the lines themselves and
calculate a set of overlaps as we go. The complexity here will depend on how
many lines and how many sets of overlaps there are. This would be the only
possible choice if the lines were unbounded. However, for this kind of problem,
it would be a ugly monstrosity, since we’d want to specialize on the four
different line orientations (vertical, horizontal, same-sign diagonal,
opposite-sign diagonal) to do efficient line-lookup, and so it would lead to a
lot of code that looks copy-pasted, which I hate. So I will *not* be doing that.

## Part 2

And so part 2 is just a simple continuation of part 1.

```
# Under the assumption that all non-orthogonal lines are exactly diagonal.
diagonal = lines[np.all(lines[:,0] != lines[:,1], 1)]
def from_to(x,y):
if x <= y:
return np.arange(x, y+1)
else:
return np.arange(x, y-1, -1)
# Same caveat as in part 1.
for l,u in diagonal:
arena[from_to(l[0],u[0]), from_to(l[1],u[1])] += 1
print(f'part 2 solution: {np.count_nonzero(arena >= 2)}')
```

# Lanternfish

Oh boy. This task will separate the mathematical thinkers from the… not-so.

The setup is:

You have some list of numbers. For each *step*, the numbers all decrease by 1.
Upon reaching 0, the number is removed and the two numbers 6 and 8 are added to
the list. Given such a list, after `n`

steps, how many numbers are there in the
list.

Q: is this:

- a dynamic programming problem?
- a memoization problem?
- a problem involving matrices?
- a problem involving polynomials?
- a recurrence problem?

A: it’s any of the above you feel comfortable with.

See the day 14 problem Extended Polymerization for a more convoluted or complex version of this problem where we’ll use the same techniques.

## Prelude

As usual:

```
aoc = __import__('aoc-common').aoc()
import numpy as np
import re
text = aoc.input_text(6)
# Intentionally avoiding numpy because we might want bigints now.
nums = [int(x) for x in text.strip().split(',') if x.isdigit()]
```

## Math Is Terrifying

To the unmathematical mind it’s a problem of memoization.

```
import functools
@functools.cache
def num_fishies(day):
"""From a single fishie starting at 0, how many fishies are there after `day`
days?
"""
if day <= 0:
return 1
return num_fishies(day-7) + num_fishies(day-9)
def tot_fishies(nums, day):
return sum(num_fishies(day - x) for x in nums)
print(f'answer to part 1: {tot_fishies(nums, 80)}')
print(f'answer to part w: {tot_fishies(nums, 256)}')
```

This is a linear-ish $O(days)$ solution.

## Math Is Awesome

With just a tiny bit of practical math (matrices) we can get an $O(gd)$ solution.

```
M = np.asmatrix(np.eye(9, 9, 1, dtype=np.uint64))
M[(6,8), 0] = 1
v = np.bincount(nums, minlength=9).astype(np.uint64)
print(f'(better) answer to part 2: {(M**256 @ v).sum()}')
print(f'or after 100 000 days: {str((M.astype(object)**100_000 @ v.astype(object)).sum())[:100]}...<plenty more digits>')
```

In *practice* this is only helpful when dealing with modular numbers or
something in a finite field. If we’re dealing with integers directly, they will
grow at an exponential rate, meaning *their bit length* increase linearly,
meaning the cost of each arithmetic operation actually goes up and becomes the
dominant factor in our complexity analysis. The problem in calculating the
number of fishes after, say, $2_{64}=18446744073709551616$ days, isn’t the
number of “matrix multiplications” themselves, it’s actually the numbers
themselves that grow too big to handle. But we can easily calculate the
number of fishes after $10_{100}$ days modulo $2_{64}$, say:

```
>>> (M**(10**100) @ v).sum()
10473238934407972448
```

## Pshaw, That’s Hardly Math

Well, yes, sure. See Wikipedia for further study.

# Treachery of Whales

## Part 1

Given a list of integers $a_{1},a_{2},a_{3},…,a_{n}$, find an integer $x∈Z$ which minimizes $∣a_{1}−x∣+∣a_{2}−x∣+⋯$.

I reasoned as follows:

if we have found such an $x$, then either $x+1$ must also be minimal or it
increases the cost. If it is also minimal, meaning the cost didn’t go up it
means we have half the values on one side and the other half on the other, so
we’re “in between” the two medians. If it does increase the cost there must be
an imbalance now — more values on one side, as each step (that isn’t on a
point) increases the cost by the number of values we move away from and
decreases it by the ones we move toward. $x+1$ is now beyond the median point,
so it *was* at the median exactly. The same goes for $x−1$. So we find the
values that minimizes the cost is either only the direct median value itself (if
there is one), or it’s the entire band between (and including) the two shared
median points.

I could probably have explained that better, but whatever.

```
aoc = __import__('aoc-common').aoc()
import numpy as np
text = aoc.input_text(7)
nums = np.array(text.strip().split(','), np.int64)
pseudomed = np.sort(nums)[len(nums)//2] # or indeed, int(np.median(nums))
answer1 = np.abs(nums - pseudomed).sum()
print(f'answer to part 1: {answer1}')
```

Given the nature of AoC, I suspected the problem data would be crafted such that there is only one possible answer though.

## Part 2

Given a list of integers $a_{1},a_{2},a_{3},…,a_{n}$, find an integer $x∈Z$ which minimizes $c(∣a_{1}−x∣)+c(∣a_{2}−x∣)+⋯$ where $c(n)=1+2+⋯+n=2n(n+1) $.

This was a lot harder to reason about and nothing came to me immediately, other than what I already “know” — I had some vague idea that “the mean minimizes the square error,” so I tried to plug that in and it worked. It makes some intuitive sense but I’m not sure I could prove why it is so on the spot, just looking at it here while writing these notes. Hmm, I feel it’s something I ought to be able to show though.

Now the “cost” (error function) is not exactly $∣a_{i}−x∣_{2}$ here, but rather $∣a_{i}−x∣_{2}+∣a_{i}−x∣$. And this will shift the minimum a little when the numbers are not not balanced.

OK, here is where the proper imposter syndrome is starting to set in and I absolutely feel like I’ve failed this problem now, because I’m at a loss for how to prove (even informally) the limit of this bound.

Experimentally it seems that $∣μ−ν∣≤21 $ where $μ$ is the arithmetic mean and $ν∈R$ is the true minimum. Experimentally, and even more curiously, the difference $∣μ−ν∣$ seems to be expressed as some sum of fractions on the form $n(n+1)1 $ (??), based on the how many numbers are less than the mean versus how many are greater than the mean.

It’s not obvious to me why or how, and have now already spent over two hours thinking about this problem and typing this out, so I’m giving up and I’ll just settle for being too stupid to figure it out. I’ve never been good at dealing with bounds.

```
cost = lambda x: x*(x+1)//2
est = nums.mean()
center = min(cost(np.abs(nums - x)).sum() for x in [int(est)-1, int(est), int(est)+1]) # bounds???
print(f'answer to part 2: {center}')
```

Edit: no, fuck, fuck AoC, it can’t be that hard, I will figure it out.
*Otherwise I am worthless.*

OK so. Say. Say we’re in $R$. We’re a little number on the $R$ number line.
We have the other given numbers here, to the left and right. We’re not *on* any
of the given numbers, we’re not an edge case, we’re snugly in between somewhere,
so we can consider some neighborhood… Now. Now we can get rid off all the
stupid absolute values that confused me, because it’s fixed which are negative
and which are positive. So we have a regular polynomials—

Oh God, I am so stupid. Of course. This is basic high school math. Stupid, stupid, stupid. We take the derivate to find the minimum…

$[i=1∑n (a_{i}−x)_{2}]_{′}=i=1∑n 2x−2a_{i}=2n(x−μ)$

So, yes, the regular square error is minimized when $x=μ$.

Now for our expression,

$[i=1∑n (a_{i}−x)_{2}+s_{i}(a_{i}−x)]_{′}=i=1∑n 2x−2a_{i}+s_{i}=2n(x−μ)+i=1∑n s_{i}$

Where $s_{i}$ is $±1$ depending on where $x$ is in relation to $a_{i}$. The sum can be at most $n−1$ in magnitude. So…

$2n(x−μ)−n<2n(x−μ)+i=1∑n s_{i}<2n(x−μ)+n$

$−21 <x−μ<21 $

Good. I’m just gonna handwave whatever edge cases might exist; if we’re “on” a given number, I guess the $s_{i}$ will just be 0 at that point, so it doesn’t matter; and I’m gonna ignore the details of broken derivate at those points. I’m satisfied, at least. And I’m sick to death of this stupid problem.

```
cost = lambda x: x*(x+1)//2
est = nums.mean()
center = min(cost(np.abs(nums - x)).sum() for x in [int(est), int(est)+1])
print(f'answer to part 2: {center}')
```

# Seven Segment Search

Not much commentary on this one because it’s pretty straightforward.

The description for this task was one of the most convoluted and confusing ones yet. Well done on the obfuscation part.

## Prelude

```
aoc = __import__('aoc-common').aoc()
import numpy as np
import re
from collections import Counter
from functools import reduce
text = aoc.input_text(8)
```

## Part 1

Count number of strings of certain lengths. One of the simpler tasks.

```
obs, nums = zip(*(l.split(' | ') for l in text.split('\n') if l))
counts = Counter(map(len, ' '.join(nums).split()))
# Doing the above (joining the strings and re-splitting) is better than doing
# updates on each sub-string. (See my "rant" linked on AoC day 3.)
answer = sum(counts[x] for x in [2,4,3,7])
print(f'answer to part 1: {answer}')
```

## Part 2

Task: reason about seven-segment display to figure out what numbers are displayed on a broken display.

I chose to make a utility class here, even though it’s not really necessary. But
this is one of the great strengths of Python and other languages that allow
customizing things like numbers, iterables, maps, and so on — you can
approximate domain specific languages and make things ten times more ergonomic
and, well, *pleasant* to work with.

```
class segments(int):
"""Utility class because I'm a huge fan of DSLs.
>>> abd = segments.from_str('abd')
>>> acdg = segments.from_str('agdc')
>>> abd - acdg
b
>>> abd + acdg
abcdg
>>> ~abd
cefg
>>> ~abd & acdg
cg
>>> list(abd)
[a, b, d]
"""
def __repr__(self):
return ''.join(chr(97 + i) for i in range(7) if self & 1 << i)
def __str__(self):
return repr(self)
@staticmethod
def from_str(s):
return segments(reduce(lambda x,y: x|y, (1 << ord(x) - 97 for x in s.lower())))
__add__ = lambda x, y: x|y
__sub__ = lambda x, y: x^(x&y)
__neg__ = lambda x: ~x
__not__ = lambda x: ~x
def __or__(x,y): return segments(int.__or__(x, y))
def __and__(x,y): return segments(int.__and__(x, y))
def __xor__(x,y): return segments(int.__xor__(x, y))
def __invert__(x): return x^127
def __iter__(self):
return (segments(self & 1 << i) for i in range(7) if self & 1 << i)
```

Making these classes is nearly automatic and by instinct now, after so many
CTFs. You *always* want to have something you can play around with in the
REPL when investigating some problem or group:

```
>>> a,b,c,d,e,f,g,all = map(segments, (1,2,4,8,16,32,64,127))
>>> a+b+c
abc
>>> all&(b+c+d) - d
bc
>>> [a + c for x in all]
[ac, ac, ac, ac, ac, ac, ac]
```

Anyway, the actual reasoning I mapped out in a comment and then the implementation is just straightforward:

```
# Some notes while I work this out.
#
# aaaa
# b c
# b c
# dddd
# e f
# e f
# gggg
#
# L2 = cf (1)
# L3 = acf (7)
# L4 = bcdf (4)
# L7 = <all> (8)
#
# L5 = acdeg acdfg abdfg (235)
# L6 = abcefg abdefg abcdfg (069)
#
# cf = L2
# a = L3 - L2
# bd = L4 - L2
# dg = (only L5 & cf) - cf - a
# d = dg & bd
# g = dg - d
# b = bd - d
#
# 0 is only L6 without d, 2 is only L5 without b
class resolver:
"""Fairly useless to pack this in a class, but unfortunately it's the best way
in Python to get some isolated code with state.
"""
def __init__(self, obs):
self._by_len = dict()
for x in obs:
self._by_len.setdefault(len(x), []).append(segments.from_str(x))
self._digits = dict()
self._analyze()
def __call__(self, num):
return self._digits[segments.from_str(num)]
def _analyze(self):
cf = self._discover(1, 2)
a = self._discover(7, 3) - cf
bd = self._discover(4, 4) - cf
self._discover(8, 7)
dg = self._discover(3, 5, has=cf) - cf - a
d = dg & bd
self._discover(0, 6, hasnt=d)
self._discover(9, 6, has=cf)
self._discover(6, 6) # only choice left
b = bd - d
self._discover(2, 5, hasnt=b)
self._discover(5, 5) # only choice left.
def _discover(self, d, l, has=None, hasnt=None):
sel = [x for x in self._by_len[l] if (has is None or has & x == has) and (hasnt is None or ~x & hasnt == hasnt)]
assert len(sel) == 1
self._by_len[l].remove(sel[0])
self._digits[sel[0]] = d
return sel[0]
total = 0
for obs_txt, num_txt in zip(obs, nums):
r = resolver(obs_txt.split())
total += sum(r(n) * 10**i for i,n in enumerate(num_txt.split()[::-1]))
print(f'answer to part 2: {total}')
```

# Smoke Basin

So using NumPy is almost starting to feel like cheating… However, it *is* the
best way in Python, and I do maintain (*strongly*) that it should be considered
for the standard library. NumPy is the bedrock upon which Python’s foothold in
numerical computation rests. If it was banned, we might as well be using Julia.

Alright, so technically here I’m using SciPy methods, but they’re built on NumPy.

## Prelude

Standard fetch & setup.

```
aoc = __import__('aoc-common').aoc()
import numpy as np
text = aoc.input_text(9)
# Packed single-digit text matrix to numpy.
lines = text.split()
rows = len(lines)
heights = np.frombuffer(''.join(lines).encode(), 'u1').reshape(rows, -1) - ord('0')
```

## Part 1

Here’s another task where it’d be much better if both problems were revealed at
once. I started out writing something a bit less elegant here, like adding a
border and doing `(h[1:-1,:] < h[2:,:]) + (h[1:-1,:] > h[:-2,:]) + ...`

.

But then part 2 revealed they obviously want me to use `ndimage.label`

, so
alright, and I went back and fixed part 1 since if I’m using `ndimage.label`

anyway, it can be simplified greatly.

```
import scipy.ndimage as ndi
# Cycle the digits so 9 becomes 0.
#
# Besides, part 1 asks for +1 on minima, so it's like it was designed for this.
heights = (heights + 1) % 10
# It turns out there's a guarantee of the input that make it so all 'regions'
# will be separated by 9 and only 9 (conveniently 0 in our offset image).
lbls,nlbls = ndi.label(heights)
lix = np.arange(1, nlbls+1)
answer1 = ndi.minimum(heights, lbls, lix).astype('i8').sum()
print(f'answer to part1: {answer1}')
```

## Part 2

I don’t really think there’s much point in doing a conventional breadth-first
search, even for illustrative purposes. I’d rather someone learn `ndimage.label`

than write a BFS in pure Python for something like this.

```
# Unfortunately sum_labels seem to only return floats, which we don't want.
basin_sizes = ndi.labeled_comprehension(heights, lbls, lix, np.count_nonzero, np.int64, -1)
answer2 = np.prod(np.sort(basin_sizes)[-3:])
print(f'answer to part1: {answer2}')
```

# Syntax Scoring

What’s this, a non-NumPy task? They do exist.

Restated problem: match `"<>()[]{}"`

delimiters in a string to find delimiter imbalance.

Fun fact: it would seem I learned nothing from this task because I fucked it up on a recent CTF task just a couple of days later. Picard’s forehead.

Note to self: think before coding.

## Parser

First we might mention a trick that isn’t really a parser, but it works in *this
specific case*:

```
def collapse(s):
reps = '[] () {} <>'.split()
sʻ = None
while s != sʻ:
sʻ = s
for r in reps:
s = s.replace(r, '')
return s
```

Like I said, not really a parser, but it’s worth mentioning because this will be a lot more efficient than a character-by-character parser for simple cases that are not deeply nested.

As for the actual parsing, we can consider the underlying language that of
matched parenthesis, and it is
likely the simplest form of a structured language you can have. Yet it’s not as
simple as regular languages, so *normal* regular expressions can’t match
them^{1}. It is a context-free langauge, so some form of state is required to
parse them. The simplest is to just use a stack, either in the form of recursion
(top-down parsers) or maintain some list, such as:

```
def simple_parser(s):
table = dict(zip('{[(<', '}])>'))
stack = []
for c in s:
if c in table:
stack.append(table[c])
elif stack and c == stack[-1]:
stack.pop()
else:
return c
return stack
```

In a more general setting (if there were text between the parenthesis for
example), it would in general be better to search forward to the next
point-of-interest rather than going character-by-character. Here regular
expressions could come in handy, as there is no `str.findany()`

.

And what we’re seeking is providing proof that the given string are *not* part
of the language, in the form of imbalanced or unclosed brackets.

## Part 1

Focuses on finding unopened delimiters, i.e. find the first closing delimiter that was never opened.

There’s some extra noise about scoring them, which I find totally unnecessary, but alright.

So, including both the code snippets mentioned above:

```
aoc = __import__('aoc-common').aoc()
import numpy as np
from itertools import repeat
# test = [
# '{([(<{}[<>[]}>{[]{[(<()>',
# '[[<[([]))<([[{}[[()]]]',
# '[{[{({}]{}}([{[{{{}}([]',
# '[<(<(<(<{}))><([]([]()',
# '<{([([[(<>()){}]>(<<{{',
# ]
text = aoc.input_text(10)
lines = text.splitlines()
# The hack.
def collapse(s):
# Python's lack of do-while...
reps = '[] () {} <>'.split()
sʻ = None
while s != sʻ:
sʻ = s
for r in reps:
s = s.replace(r, '')
return s
# The proper.
def simple_parser(s):
table = dict(zip('{[(<', '}])>'))
stack = []
for c in s:
if c in table:
stack.append(table[c])
elif stack and c == stack[-1]:
stack.pop()
else:
return c
return stack
# collapsed = [collapse(l) for l in lines]
collapsed = [simple_parser(l) for l in lines]
corrupted = [x for x in collapsed if isinstance(x, str)]
scores1 = {
')': 3,
']': 57,
'}': 1197,
'>': 25137,
}
answer1 = sum(map(scores1.get, corrupted))
print(f'answer to part 1: {answer1}')
```

## Part 2

Here we are to find the unclosed delimiters. I.e. what set of delimiters would it take to balance the string.

Again there’s a silly scoring system^{2}. It seems like this additional red tape
is to verify the correctness of our answer, but I personally feel the most
“honest” way to do it would be to require you input `sha3( <answer> ).hexdigest()`

or something on the website instead of inventing some arbitrary
“scoring method” for each task.

```
from statistics import median # did you even know `statistics` existed; be honest.
incomplete = [x for x in collapsed if not isinstance(x, str)]
scores2 = {
')': 1,
']': 2,
'}': 3,
'>': 4,
}
# Basically treat the delimiters as (non-zero) digits in a base-5 number.
answer2 = median(sum(5**i * scores2[c] for i,c in enumerate(st)) for st in incomplete)
print(f'answer to part 2: {answer2}')
```

# Dumbo Octopus

We’re back to NumPy problems, it seems.

We have some grid of values and we (mostly) want to manipulate these values as a whole (in parallel).

Let’s do the mandatory prelude:

```
aoc = __import__('aoc-common').aoc()
import numpy as np
import scipy.ndimage as ndi
from itertools import count
# test = '''5483143223
# 2745854711
# 5264556173
# 6141336146
# 6357385478
# 4167524645
# 2176841721
# 6882881134
# 4846848554
# 5283751526
# '''
text = aoc.input_text(11)
lines = text.splitlines()
octi = np.array([np.frombuffer(s.encode(), 'u1') for s in lines], 'i4')
octi -= ord('0')
```

Now we have some 2-dimensional array `octi`

with numbers in `[0..9]`

. This is our state.

## Advancing the State

The core logic is how we advanced from one state to the next for a given array:

- let
*inactive*be an empty set of indices. - increase all values in the array by 1.
- for each index
*not*in the*inactive*set whose value is 10 or higher:- add this index to the
*inactive*set, - increase the values of this index’ direct neighbors (including diagonals) by 1, (called a “flash” in the problem statement — the value spreading out to neighboring values)
- return to step 3 above.

- add this index to the
- reset all values with indices in the
*inactive*set to value 0.

As it’s outlined in the problem text, the numbers represent octopodes that accumulate energy and then “flash” and spread energy to their neighbors. The key part here (step 3.2) we can model as a convolution.

Imagine we deal with 2-dimentional matrices, and that we have a masking matrix
such that $M_{ij}=1$ only where a “flash” occurs. The increase to a given
element is the sum of all bordering `1`

s in this matrix. And we can count them
by applying a convolution with a simple kernel:

$⎣⎡ 111 1x1 111 ⎦⎤ $

Where the central element doesn’t matter. This is how image filters (blur, edge detection etc.) in photo editors used to work too, before they got all super advanced and AI.

```
kernel = np.ones((3,3), bool)
def step(octi, b=10):
"""In-place step octopus matrix `octi` with base (threshold) `b`."""
octi += 1
flashes = np.zeros(octi.shape, bool)
while True:
new = (octi >= b) & ~flashes
if not np.any(new):
break
flashes |= new
spread = ndi.convolve(new.astype('i1'), kernel, mode='constant', cval=0)
octi += spread
octi[octi >= b] = 0
return np.count_nonzero(flashes)
```

Here I’ve also applied the natural generalization, i.e. we can change the “base” from 10 to some other number.

## Part 1 & Part 2

With this, it’s easy:

```
answer1 = sum(step(octi.copy()) for i in range(100))
print(f'answer to part 1: {answer1}')
def first_flash(octi):
# XXX: a loop detection algorithm should be used instead, as this will run
# into an infinite loop on most random inputs.
for i in count(1):
if step(octi) == octi.size:
return i
answer2 = first_flash(octi)
print(f'answer to part 2: {answer2}')
```

## But There’s a Ton to Explore

The input is specifically crafted so that the matrix ends up in a cycle of length 10, with all values in the array or matrix being synchronized.

- What other kinds of cycles can we get?

It turns out that length-10 cycles are actually fairly rare. Almost all random states end up in a cycle whose length is a multiple of 7. (From cycles of length 7 to cycles with length of several thousands (yet still multiples of 7). Of course the larger the matrix, the longer the potential cycles.)

It *seems* that *all* cycles (in this setup) are either 10 or some multiple of
7, but I haven’t attempted to prove it.

- Why 7?

More generally, it seems that most random matrices end up in cycles with a factor $B−3$ where $B$ is the base. That’s why it’s 7 for base 10. Why? I don’t know, but my guess is it has to do with the diagonals and the fact that the corners of the matrix influence only three other values?

- What about higher-dimensional arrays?

Haven’t checked.

- What is the maximum cycle length given a matrix M×N and a base B?

I have no idea.

- What happens at different topologies? (I.e. let’s say the edges wrapped around, which would give a torus-like topology.)

Again, a question for those with more time on their hands.

# Passage Pathing

The first pure graph problem?

The problem statement is to count the number of possible paths from the nodes
`start`

to `end`

in a unidirected graph while only visiting *most* vertices
once.

I say “*most*” because of course they found a way to make the problem
description overly convoluted. There’s two types of vertices, uppercase and
lowercase, and in part 1 we can only visit the lowercase vertices once, although
in part 2 we can visit *at most one lowercase node twice*. Uppercase vertices we
can visit any number of times.

So we’re kind of looking to count Hamiltonian-ish paths.

First some reasonable observations:

We cannot have any cycles with uppercase vertices that are still connected to the end point, otherwise we would have an infinite number of paths. In fact, no two uppercase vertices can be adjacent (connected), as we could just go back and forth.

This feels very much like an NP problem.

*Enumerating*(listing) all the paths would certainly be exponential. (Consider a complete k-graph: there would be $(k−2)!$ different paths of length $k−1$.) However, I’m uncertain if merely counting them markedly changes the scenario in our case since we’re still dealing with general graphs. It*feels*like we can’t go subexponential, so we should probably be happy with an exponential solution.We can

*simplify*the graph a little by outright*removing*the uppercase vertices and connecting up all the lowercase vertices that could reach each other through them. This should also substantially lower the cost of our algorithm (although we’re still exponential).Why does this work? Because the uppercase vertices don’t actually constrain us in any way, we can travel over them freely to any of their neighbors as if we did nothing at all; so we might as well travel directly to that neighbor.

This simplifies us having to deal with uppercase vertices, however now we have to deal with multiplicities of edges, because there might be two edges between

`a`

and`b`

and they count for two different paths, so whenever we go from`a`

to`b`

or vice versa, we have to multiply the result by 2.

The simplest approach to most graph problems is either DFS (depth-first search) or BFS (breadth-first search), depending on the problem. We’ll stick with that here. Even though by point (3) above we could turn it into some dynamic programming thing because we could use bitmasks for vertex membership, but that would complicate our part 2 and blah, blah, blah.

## Prelude

The template prelude as per usual. It’s growing.

```
aoc = __import__('aoc-common').aoc()
import sys
import numpy as np
import scipy.ndimage as ndi
import itertools as itt
import functools as fnt
import collections as coll
test = '''fs-end
he-DX
fs-he
start-DX
pj-DX
end-zg
zg-sl
zg-pj
pj-he
RW-he
fs-DX
pj-RW
zg-RW
start-pj
he-WI
zg-he
pj-fs
start-RW'''
text = test if '-t' in sys.argv else aoc.input_text(12)
lines = text.splitlines()
```

## Counting Paths

First the code for simplifying the input by removing all the uppercase caves.
Note that this assumes no two uppercase caves are adjacent, which they
*technically could be* so long as they don’t connect to the `end`

node (i.e.
they can’t be part of any path). But I’ll ignore that.

```
def simplify(edges):
big_caves = coll.defaultdict(list)
G = graph()
for e1,e2 in edges:
e1,e2 = min(e1, e2), max(e1, e2)
assert not e2.isupper()
if e1.isupper():
big_caves[e1].append(e2)
for x in big_caves[e1]:
G.add_edge(e2, x)
else:
G.add_edge(e1, e2)
return G
```

And then a simple utility class with a pretty generic `count_paths()`

that works
for both part 1 and part 2 of the problem.

```
class graph(coll.defaultdict):
def __init__(self):
super().__init__(coll.Counter)
def add_edge(self, e1, e2):
self[e1][e2] += 1
if e1 != e2:
self[e2][e1] += 1
def count_paths(self, start='start', end='end', duplicates=0):
path = []
def _count_ex(node, dups):
if node == end:
return 1
cnt = 0
path.append(node)
for nxt,m in self[node].items():
if nxt not in path:
cnt += m * _count_ex(nxt, dups)
elif dups > 0 and nxt != start:
cnt += m * _count_ex(nxt, dups-1)
path.pop()
return cnt
return _count_ex(start, duplicates)
```

And then just:

```
g = simplify(x.split('-') for x in lines)
answer1 = g.count_paths('start', 'end', 0)
answer2 = g.count_paths('start', 'end', 1)
print(f'answer to part 1: {answer1}')
print(f'answer to part 2: {answer2}')
```

## A Note on the Simplification

How good is our simplification?

Consider a simple naive `count_paths()`

which still works with uppercase vertices:

```
def naive_count(node, path=list(), double=False, Q=[0]):
if node == 'end':
return 1
if node.islower():
path.append(node)
cnt = 0
for nxt in connected[node]:
if nxt not in banned:
cnt += naive_count(nxt, path, double)
elif double and nxt != 'start':
cnt += naive_count(nxt, path, False)
if node.islower():
path.pop()
return cnt
```

A decent measure in Python would be to simply count the number of invocation of our function.

Here I’ll also measure another “naive” but straightforward way to accomplish
part 2: instead of tracking a bool for whether we’ve used a vertex twice or not,
we’ll iterate over all the vertices and *actually duplicate* the vertex in the
graph, then count the number of paths like normal, and finally subtract and
divide out all the paths we counted several times in this process. I call this
“naive” not because it’s simple but because it’s a sort of brute force approach
to reusing another solution. Its *theoretical* complexity isn’t actually that much worse
(since we’re already in exponential land, it’s hard to worsen it), but of course in practice
it kind of sucks:

method | CALLS | answer |
---|---|---|

naive_count | 13292 | 4885 |

simplified | 1436 | 4885 |

duplicate & naive_count | 753592 | 117095 |

naive_count(double=True) | 316603 | 117095 |

simplified | 22182 | 117095 |

Here we see how good it is to simplify: we improve by at least an order of magnitude.

# Transparent Origami

Not a very interesting problem. Just an exercise in bug avoidance.

Problem statement: given a sequence of “folds” (horizontal or vertical lines) which mirror each point in $Z_{2}$ (provide an equivalence relation), reduce a set of input points to their canonical equivalence class representation (smallest positive value or whatever).

Algorithmically I don’t think you can get around $O(nk)$ with $n$ points and $k$ folds without doing some assumptions (i.e. that the points are bounded to be “small” or the folds coprime)? My first thought was we could do some modulus trick, but since the transformation changes depending on the original value, I’m not so sure that works.

So my low-effort solution is just:

```
aoc = __import__('aoc-common').aoc()
import sys
import numpy as np
import scipy.ndimage as ndi
import itertools as itt
import functools as fnt
import collections as coll
import re
text = aoc.input_text(13)
expr = re.compile('^fold along ([xy])=(\d+)', re.M)
# fuck it, we'll hard commit to input assumptions.
pts, folds = text.split('\n\n')
pts = np.array(pts.replace(',', '\n').split(), np.int64).reshape(-1, 2).T
folds = ((m.group(1), int(m.group(2))) for m in expr.finditer(folds))
# pts[0] are x-coords, pts[1] are y-coords.
def dofold(pts, axis, coord):
axis = int(axis == 'y')
pts[axis] = np.where(pts[axis] < coord, pts[axis], 2*coord - pts[axis])
return np.unique(pts, axis=1)
pts = dofold(pts, *next(folds))
print(f'answer to part 1: {pts.shape[1]}')
for ax,c in folds:
pts = dofold(pts, ax, c)
A = np.ones(tuple(np.max(pts, 1) + 1), 'u1') * 32
A[tuple(pts)] = ord('#')
# nah, fuck you AoC, I'm not going to do a visual ascii decoder.
answer2 = ...
print(f'answer to part 2: {answer2}')
for r in A.T:
print(r.tobytes().decode())
```

In the end I just print out the points as ASCII art:

```
[franksh@moso aoc2021] python day13-transparent-origami.py
answer to part 1: 607
answer to part 2: Ellipsis
## ### #### # ### #### #### #
# # # # # # # # # # #
# # # # # # # ### # #
# ### # # ### # # #
# # # # # # # # #
## # #### #### # # #### ####
```

# Extended Polymerization

Aaand back to NumPy.

Or well, perhaps the NumPy connection isn’t immediately obvious.

Problem statement: we’re given an input string (starting point) like `ABCCABAC`

and a set of rewrite rules on the form `AB -> C`

, `CC -> ABC`

and so on. The
rule `AB -> ...`

means that `AB`

is rewritten to `A...B`

. I.e. the right-hand
side is inserted between the two characters on the left-hand side. The left-hand
side will always have two and only two characters. All rules are applied
simultaneously to the input and each such application is a step. The goal is
to count the final characters in the string after $n$ steps.

So given a starting point of `ABC`

, no matter what set of rules we apply, or how
many steps we go through, we’ll always end have a string on the form `A...B...C`

where the ellipses denote arbitrary characters.

I’ve actually generalized the problem a little here since it doesn’t cost us
anything. Pedantically they only allow rules that expand two characters into
three, inserting a single character between them^{1}.

## The Problem & Naive Solution

First observation is that it’s probably easier to think in terms of counts of character-pairs than it is to think about the characters directly, as our rules apply on the level of character-pairs.

Let’s say we have a function like `step("ABC", n)`

which gives a dictionary of
`pair -> count`

saying how many occurrences of a given pair there is in the
final string after $n$ steps.

And so the second observation is that we have a simple recurrence relation here,
because if we have `step("ABC", n)`

we can easily calculate `step("ABC", n+1)`

by going over all the pairs and expanding them according to the rules. I.e. with
a rule like `AB -> C`

, the number of `AC`

and `CB`

pairs at step $n+1$ will
increase by the number of `AB`

pairs at step $n$.

Finally, given an input string like `"ABC"`

we’re free to calculate the trajectory
for `"AB"`

and `"BC"`

separately, as they will never influence each other.

So this problem is just a more complicated Lantenfish, really, and I’ll provide both the basic solutions.

## Boilerplate

The usual prelude with two notables:

I’m using my own utility class

`number_dict`

which functions like`collections.Counter()`

in most cases but has some extra functionality. Notably it “acts like a number” when doing arithmetic with it, broadcasting the arithmetic operations to its elements. I.e.`x//2`

divides all the counts in half.The trivial

`all_pairs()`

helper function which will be ubiquitous in both solutions below.

```
aoc = __import__('aoc-common').aoc()
import sys
import numpy as np
import scipy.ndimage as ndi
import itertools as itt
import functools as fnt
import collections as coll
from collections import abc, Counter
import re
from flagmining.dicts import number_dict
test = '''NNCB
CH -> B
HH -> N
CB -> H
NH -> C
HB -> C
HC -> B
HN -> C
NN -> C
BH -> H
NC -> B
NB -> B
BN -> B
BB -> N
BC -> B
CC -> N
CN -> C
'''
text = test if '-t' in sys.argv else aoc.input_text(14)
polystr, rules = text.split('\n\n')
rules = dict(t.split(' -> ') for t in rules.splitlines())
def all_pairs(s):
return [x+y for x,y in zip(s, s[1:])]
# ...
# ...
# ...
# ...
# solver = AbstractPairRewriteMemoizationProxyMixinFactoryBean(rules)
solver = cleverer_matrix_solution_thing(rules)
cnts = sorted(solver.count_elements(polystr, 10).values())
answer1 = cnts[-1] - cnts[0]
print(f'answer to part 1: {answer1}')
cnts = sorted(solver.count_elements(polystr, 40).values())
answer2 = cnts[-1] - cnts[0]
print(f'answer to part 2: {answer2}')
```

## The Naive Solution

… like with Lanternfish we can start with the most basic solution to recurrence relation problems, which is memoization — or dynamic programming if you will, the former just being a very specific instance of the latter.

```
class AbstractPairRewriteMemoizationProxyMixinFactoryBean(dict):
def __init__(self, rules):
super().__init__()
for k,v in rules.items():
if len(k) != 2:
raise ValueError(f"invalid replacement rule: {k} -> {v}")
self[k,0] = number_dict(all_pairs(k[0] + v + k[1]))
def count_pairs_memo(self, s, steps=1):
return number_dict.sum(self[p,steps] for p in all_pairs(s))
def __getitem__(self, key):
ss, n = key
assert len(ss) == 2 and n >= 0
if n == 0:
# Only one choice.
return number_dict.one(ss)
if (ss,n) not in self:
# Memoization/cache.
self[ss,n] = number_dict.sum(
self[k,n-1]*v for k,v in self.get((ss, 0)).items())
return super().__getitem__((ss,n))
def count_elements(self, initial, steps):
# Compensate for edge chars only being in 1 pair.
total = number_dict(initial[0] + initial[-1])
for k,v in self.count_pairs_memo(initial, steps).items():
for c in k:
total[c] += v
return total // 2 # Each element was counted in two pairs.
```

Here wrapped in a class because I tend to consider it the lesser of two evils when compared to nested functions.

## Cleverer

Just like in Lanternfish we can construct a matrix such that we
perform a single step by multiplying with this matrix. Then we can use matrix
exponentiation to calculate the recurrence more “directly” (complexity that
scales with `log(steps)`

rather than linearly).

The same notes apply, with regard to how this doesn’t really buy us much if we have exponential growth, since arithmetic (arbitrarily sized) integers will quickly grow to choke both algorithms way before the steps vs. log(steps) difference comes into play. But it’s very useful for working in finite fields.

There’s some extra boilerplate in this solution because we have to assign an index to each possible pair and then translate back and forth when we go from matrices or vectors over $Z$ to strings.

```
class cleverer_matrix_solution_thing():
def __init__(self, rules, dtype=np.uint64):
self.words = dict()
self.indices = dict()
for k,v in rules.items():
assert len(k) == 2
self._add(k)
for p in all_pairs(k[0] + v + k[1]):
self._add(p)
mat = np.zeros((len(self), len(self)), dtype=dtype)
for k,v in rules.items():
for p in all_pairs(k[0] + v + k[1]):
mat[ self[p], self[k] ] += 1
# mat[ [self[t] for t in all_pairs(k[0] + v + k[1])], self[k] ] += 1
self.M = np.matrix(mat)
def _add(self, word):
if word not in self.words:
idx = len(self.words)
self.words[word] = idx
self.indices[idx] = word
def __getitem__(self, key):
if hasattr(key, '__index__'):
return self.indices[key]
else:
return self.words[key]
def __len__(self):
return len(self.words)
def count_pairs(self, initial, steps):
inp = number_dict(all_pairs(initial))
vec = np.array([inp[self[i]] for i in range(len(self))], self.M.dtype)
res = (self.M**steps @ vec).A.squeeze()
ix = res.nonzero()[0]
# Note: would be better to skip this and do direct pairvector->elements here
# using NumPy indices but eh. I copy-pasted `count_elements()` below.
return dict((self[i], int(n)) for i,n in zip(ix, res[ix]))
def count_elements(self, initial, steps):
# Compensate for edge chars only being in 1 pair.
total = number_dict(initial[0] + initial[-1])
for k,v in self.count_pairs(initial, steps).items():
for c in k:
total[c] += v
return total // 2 # Each element was counted in two pairs.
```

I believe this kind of failure to apply obvious “zero-cost generalizations” is the reason programming languages tend to end up like JavaScript more often than Ruby or Python.↩

# Project Euler

Haskell was my main language when doing Project Euler.

TODO: write things here, detail problems, expose Haskell code?

# CryptoHack

The first time I discovered CryptoHack I did all the tasks that were 100 points or more, since those were the only ones that looked interesting.

It’s a bit sad that doing the most difficult tasks net little reward, as all that matters is your “total score,” which is completely dominated by the countless trivial problems. So it’s more of a measure of stamina than anything else.

I went back later and completed everything though, all the riff-raff. That was
the most painful part. The tasks in the 30-80 range are the worst, since they
often require *some* work, yet the problems themselves are often trivial and/or
not interesting. Felt endless.

CryptoHack is not a fan of putting solutions online, but I’ll share with anyone who wants to contact me.

I heavily employed flagmining during my solve run. I wrote all the solutions as self-contained scripts in an Org file. E.g. this was the prelude:

```
#+begin_src python :session ch :cache false
from flagmining.all import *
from pwn import remote, context
from flagrs import *
context.encoding = 'utf-8'
class jsremote(remote):
def recvjson(self):
return json.loads(self.recvuntil(b'}\n'), object_pairs_hook=jsdict)
def sendjson(self, d):
return self.sendline(json.dumps(d).encode())
json_get = lambda *args: requests.get(*args).json(object_pairs_hook=jsdict)
#+end_src
```

And then a solution would typically look like this:

```
#+begin_src python :session ch :results output :cache yes
def enc(data):
url = f'https://aes.cryptohack.org/lazy_cbc/encrypt/{data.hex()}/'
return bytes.fromhex(json_get(url).ciphertext)
def test(ct):
url = f'https://aes.cryptohack.org/lazy_cbc/receive/{ct.hex()}/'
res = json_get(url)
if 'error' in res:
return bytes.fromhex(res.error[len('Invalid plaintext: '):])
return None
null_block = enc(bytes(32))
key = xor_bytes(null_block[:16], test(null_block[16:]))
url = f'https://aes.cryptohack.org/lazy_cbc/get_flag/{key.hex()}/'
res = json_get(url)
print(bytes.fromhex(res.plaintext).decode())
#+end_src
```

`#![allow(unused)] fn main() { todo!() }`

`#![allow(unused)] fn main() { todo!() }`

`#![allow(unused)] fn main() { todo!() }`

`#![allow(unused)] fn main() { todo!() }`

# Games

I always wanted to make games, *but not really*. That is, I like “game design.”
That is, *my imagination far exceeds my persistence or diligence*.
That is, monsters like Unreal and the Web always frightened me and I was weak of heart.

And even 1980s Bill Gates would say my sense of design and UI aesthetics was awful.

So I don’t make games, I just play them. Sometimes.

## The Good

My favorites tend to be in the creative sandbox genre—building, optimization,
planning, that kind of thing^{1}. Games that focus on establishing the rules of
some system, not necessarily goals or solutions—those
are largely left for the player to explore, set their own goals, find their
own creative solutions, and so forth.

Factorio, for example, is probably one of the most perfect games ever made in my opinion.

## The ~~Ugly~~ Clever

Indie puzzle games is another big favorite. One of the big ones being *The
Witness*, by the legendary naysayer Jon Blow^{2}.

But if you want a pro tip, then Simon Tatham’s Puzzle
Pack is available for
(at least) Android, and it is, in my opinion, the best `DIVIDE-BY-ZERO`

value
you can get in a single app.

## The Bad

That said I also have an unhealthy amount of hours in grindy escape-games. Games
you can play relentlessly, desperately, in order to escape the septic fire that
is your life. Games that are endless logarithmic curves tucked away inside shiny
slot-machines. I call these *depression games*.

Path of Exile is the best of its kind.

It’s easy to identify the three tiers of players in these games:

- the ignorant casuals, just popping in for some relaxation. They’ve got shit to
do, places to be, people to meet, children to raise. These are the invisible
99%, and nobody really cares about them except the financial
department of the company that makes the game, who cares about them
*a lot*. - the miserable middle-class: they don’t have anywhere to be, and prefer avoiding to meeting, so they can play a hundred times more. Yet for all their grind they never feel wealthy or efficient enough. They oscillate between arrogant elitism and jealousy, crippling insecurity, beset by extreme envy of those that have more than them. They’re fuelled by self-loathing, depression, and a thousand real life problems they’re anxiously avoiding. They have everything to prove. They’re the reason a community grows toxic, as the clinically depressed mind is one of extreme cynicism and constant, relentless negativity. They live and breathe depressive realism, cursed to see the cynical redpill “truth” behind the veil of blissfully ignorant bluepill lies that others are delude themselves with.
- actually well-adjusted veteran players: people who can no-life with the best
of them yet who are not motivated by misery, having nothing to prove. Valued
community members. Among
*streamers*and public figures specifically, these are the ones who will succeed in that space, so from the outside it will*seem*like the majority of no-lifers are in this category—even though they’re the 1% of the 1% of mental health. Their very existance fuels the middle class’ misery, because they’re the successful brothers their parents love more, they’re the colleague who quit and got a better job—just like that—while you’re left behind nursing your dying soul under phosphorescent lights, trapped in a cubicle you hate but are too afraid to leave.

Guess my category.

## (Abstract) Board Games

In particular I’m a big fan of Go and used to play it a lot. I am or was around 1-dan to 4-dan depending on the ranking system used. Good enough to know I am bad.

It has inspired several programmatic stabs at making abstract game libraries or
UIs for such, and I was fully immersed in Google’s machine learning campaign
where they exploited Go as a platform for marketing their cloud computing^{3}.

But more generally I think of these games as across three axes:

- topology: usually a graph.
- mechanics: how “pieces” interact with the graph and other pieces.
- sequencing: the ordering of moves, the number of players, etc.

I’m a big fan of *variations*. That is, investigating what happens to a game
when some core part of its mechanics or sequencing is changed. For example, Go
where each player places two stones at a time is a very different game indeed.
(A drastic change to its sequencing.) Yet Go played on a torus is still,
essentially very Go-like (since all mechanics are retained and only the topology
changed).

Notable fun games:

- TwixT
- Hex
- Amazons

## Creations

I’ve also made a ton of tiny things, not really worth mentioning. Maybe later.

### Omstokk [wip]

### Descramb [defunct]

### Empathy Web [defunct]

### Dual Snake

### QPlanarity

# Omstokk

Omstokk is a tiny-teensy anagram web game.

Mostly just shuffling around some calls to jQuery & jQuery UI. It looks like shit because my web design skills are on par with a blind Volvo engineer from the 80s.

The original plan was to use it to learn Flutter and Dart, but I lost my will to live pretty quickly while working in that…ecosystem?

## note (fun fact)

`aekrst`

has the most anagrams in Norwegian (according to NSF), with **26** valid words.

Second place is `aeknst`

with 25, and then there’s a steep drop-off for third place `aeknrst`

with 19.

## note (anagrams)

Given a set of *permissible words*, for example a list of legal words in
Scrabble for a given language, the first natural reduction is to find
a normal form such that anagrams are equivalent.
For example `hatter`

and `threat`

are the same when
talking about anagrams as they consist of the same letter set.

The simplest idea is to sort the letters in each word:

```
def normalize(s):
return ''.join(sorted(s))
```

This works fine. The keys are short strings, and pretty efficient. More
complicated patterns like having a hashmap of `char -> int`

, or an $∣A∣$-length
vector of counts, etc. don’t really offer anything new, especially not in
Python.

An alternative is to map anagrams to positive integers via the primes. Basically we map each unique letter in the alphabet to a prime number and the exponent of that prime is how many times the letter occurs.

If we sort the alphabet so that the most common letters map to the smallest primes, this allows for up to 12-letter anagrams to be stored in 64-bit words. (For the Norwegian word list.)

Now several common operations on anagrams-as-multisets translate to arithmetic
operations on the integers: the union of two anagrams (e.g. `normalize(a+b)`

if using the strings above)
become plain multiplication $AB$, $lcm(A,B)$ is the *join* of two
anagrams (the minimum set needed to make every word possible to make from either set), subtracting the possible
words of one anagram from another can be expressed with $gcd(A,B)A $, and so on.

Doing `gcd`

and `lcm`

like this might seem inefficient—and indeed, it is, for
Python’s stupid homemade bigints—but it allows us to use `numpy`

and `numpy`

über alles once we’ve crossed the threshold to speak of Python performance…

## note (an algorithm)

### Definitions

If $A$ is a multiset of letters, let $∣A∣_{k}$ be the number of $k$-letter words that can be made by using letters from this set (without replacement). Let $C_{k}$ be the set of all $k$-letter multisets (anagram).

### Goal

We want an algorithm `R = ALGORITHM(len, lo, hi)`

such that $lo≤∣R∣_{len}<hi$.

### Constaints

- we want all the letters in
`R`

to be “useful” in some way (e.g. part of a possible $k$-letter word). - we want the letter-set
`R`

to be near minimal, tho not necessarily perfectly minimal (as it may constrain randomness?). - we want the set to be sufficiently “random” such that a large percentage (if not all) possible such letter-sets below a given length might be a possible outcome.
- we probably want the randomness to be weighted in some way toward more common combinations of letters?

### Variations

- is there an alternative algorithm that can compute multiple such sets in parallel? (probably yes.)

### Sketch of Simple Implementation

Probabilistic and slow:

- start with a random $k$-letter anagram $a∈C_{k}$ s.t. $∣a∣_{k}<hi$.
- if $lo≤∣a∣_{k}$, then terminate the algorithm.
- compute: $y∈C_{k}$ where $gcd(y,a)y $ is prime
- pick a random (weighted) $y$ such that $∣lcm(a,y)∣_{k}<hi$. If impossible, abort and restart.
- update $a:=lcm(a,y)$ and go to step (2).

Or in plain English:

- start with a valid anagram.
- if number of k-letter words is within bounds we’re done.
- find letters such that adding this letter means we’ll be able to form more unique k-letter words.
- add a random such letter, making sure we don’t exceed the bounds.
- repeat, retry on failure.

`#![allow(unused)] fn main() { todo!() }`

`#![allow(unused)] fn main() { todo!() }`

# Keyboard Layout

Being a special snowflake, I use a custom keyboard layout.

It looks like this:

`#![allow(unused)] fn main() { todo!() }`

It is based on Colemak, but modified to give me easy access to the Norwegian
letters `æøå`

, and to put common punctuation and programming symbols on the
three base rows accessed through `AltGr`

. I avoid using the number line for
anything but actual numbers and some rare symbols.

I am pretty hungry for mod-keys in general. Ctrl, Alt, Shift tend to be for the application space. The left Win key I use for my window manager (XMonad or i3), AltGr for common punctuation, and the last one (Menu key or right Win key) for rarer symbols and unicode.

I favor combining characters over precomposed characters (except for the three Norwegian characters) as I feel it’s a more elegant solution when only occasionally needing access to foreign diacretics, tho acknowlege that a lot of software has problems with combining characters.

# Choice of Colemak

When I learned to type in my childhood it happened entirely in a vacuum, I never
used techniques like touch typing, so I ended up with my own idiosyncratic way
of typing QWERTY. I only used two (left) or three (right) fingers for the main
letter keys, and the pinkies for things like `Shift`

and `Enter`

. Notably I
hardly ever used the ring fingers except for certain muscle memory sequences. It
worked pretty well, I could type comfortably at 80wpm, peaking at around 100wpm,
but of course, “it could be optimized.”

Every nerd with high enough openness probably goes through a phase where they, too, want to optimize they keyboard usage in some way. For me that optimization came in the form of switching to a better (more comfortable) keyboard layout while simulaneously learning proper touch typing. Like a full reboot of my muscle memory.

The two main competitors at the time were Dvorak and Colemak (then a relative
newcomer, but quickly gaining popularity). I tried Dvorak briefly, but found it
was *incredibly* painful to learn, progression was dauntingly slow, even after
several days. In contrast, Colemak was very pleasant from the beginning, and
early progression was very fast. When it became clear that Colemak would be my
new layout of choice, I went the extra mile by creating an early version of the
above-mentioned layout which I could install on Linux and Windows. In the end,
even having later looked at layouts like Workman etc., I’m very happy with my
choice^{1} of Colemak.

I started out with a singular mindset of wanting to get *faster* (which I did;
my peak with Colemak is around 125wpm), basically tuning out layout enthusiasts
when they started to talk about “ergonomics.” Ergonomics are for old geezers who
take breaks every hour, who need to eat real meals for their digestion, and
can’t hack an 18 hour code session. The zoomer mindset.

I’ve since predictably changed this very naïve mindset^{2}. Now I too would say
that *comfort* is the singularly most important factor that should inform
anyone’s decision vis-á-vis switching layouts. And Colemak is so, so, *so* much
more comfortable than QWERTY. It just *feels good* to type with it, in a way
that QWERTY never did.

# Half the Value of Colemak Is a Gimmick

The single, most important thing about Colemak is that it puts Backspace on Caps Lock.

People are not aware of how much they use Backspace. It’s *extremely likely* one
of your top five most used keys on your whole keyboard (this includes space,
Enter, everything). No matter who you are, no matter how good you are at typing,
it’s still a pretty safe bet.

*Any* layout would instantly be improved by an order of magnitude simply by
swapping these two keys. The original placement of Backspace deep in Timbuktu is
mind-boggling.

# Surviving Being a Snowflake

Choosing to employ a non-QWERTY layout also means you’re playing on hard mode when it comes to being a software user. Not that many people are mindful of the existence of non-QWERTY layouts, and there’s lots of badly written games and software that is entirely coupled with the layout, without any way of modifying it.

A big customization pain point can be vim and evil-mode. I refused to settle for
`hjkl`

-navigation on Colemak, even if that is something that is often
recommended.

Inspired and enlightened by the configuration of a certain `theniceboy@github`

,
I learned the modifications necessary aren’t daunting. It turns out only two
sets of four keys need to be cycled in behavior to end up with a good Colemak
vim/evil-mode keymap.

Instead of using `hjkl`

on a single row it uses

```
+---+
| l |
+--++--++---+
| n | e | i |
+---+---+---+
```

Which I actually prefer, since it’s so similar to the regular cursor keys or the WASD-keys (QWERTY gaming).

The swapped chains are:

```
u -> k :: 'u' loses <UNDO>, becomes <UP>
k -> i :: 'k' loses <UP>, becomes <INSERT>
i -> l :: 'i' loses <INSERT>, becomes <RIGHT>
l -> u :: 'l' loses <RIGHT>, becomes <UNDO>
```

For the first step. Note that we unfortunately have to move `i`

. It’s hard to avoid that. And the second chain we cycle is:

```
h -> e :: 'h' loses <LEFT>, becomes <WORD-END>
e -> j :: 'e' loses <WORD-END>, becomes <DOWN>
j -> n :: (OPTIONAL) 'j' loses <DOWN>, becomes <SEARCH-NEXT>
n -> h :: 'n' loses <SEARCH-NEXT>, becomes <LEFT>
```

Note here that the functionality of `n`

(next regex search hit) ends up on the
very painful `j`

, that’s why I recommend also using something like EasyMotion,
Clever-f, Sneak, etc. and using `;`

and `,`

as a unified interface for
navigating search hits. That way we can drop `j`

entirely, and it becomes a
‘free’ key for customization.

The only gripe I have with Colemak is the placement of

`g`

. It feels like a rather painful position, and I use it quite a lot (editors, programming, Norwegian). I would probably switch`b`

and`g`

, but have never done so.↩Speed is not a

*bad*metric though, because the better the ergonomics, the higher the ceiling is likely to be.↩

# Amphetype

While teaching myself Colemak I also made a type-training program for myself—Amphetype. The first version was written in some hypomania of non-stop coding, with the core idea and functionality completed after just two days. I would continue to improve it somewhat for about a month, as I was actively using it myself, though the post-Christmas depression that year was pretty bad and severed all my ties with it.

Much later I discovered that it was actually in use by other people, that others had discovered it and found it useful. Some had implemented several modifications. Someone had even written a clone completely from scratch (in C#) (tho with less features as far as I could tell).

This inspired me to revitalize the project somewhat. I found my old code in the
Google Code archive and updated the code base aggressively, switching from Qt4
to Qt5. I also implemented a lot of fixes that I found posted from back then
(which I had ignored entirely), added a few new features, made it into a `pip`

package for easier installation, and so on.

`#![allow(unused)] fn main() { todo!() }`

`#![allow(unused)] fn main() { todo!() }`

# The Employees, by Olga Ravn

Stupidly I bought the book in English, not knowing the author was Danish.

- and my mind is like a hand, it touches rather than thinks.

Reminded me somewhat of *Notable American Women* by Ben Marcus in how it (seems
to) to render the familiar alien, a sort of mystification of the commonplace.
Although the language style is markedly different, and not nearly as fully
developed here as it is in Ben Marcus.

`#![allow(unused)] fn main() { todo!() }`

# The Foundation, by Isaac Asimov

I recently caught some episodes of *The Foundation* TV-series and quite enjoyed
it. There’s a ton of interesting concepts, engaging characters (Empire being my
favorite), good actors (I love Jared Harris), all encapsulated into a well-built
world. It does have its share of trope cheese, like equating “hero” and “action
hero,”^{1} forcing all heroes to do epic action hero things. It also milks the
young-protagonist-with-a-gifted-mind cliche to a higher state of climax than
even *Dune* or *Ender’s Game*.

I hadn’t read any of the books, though, but they’ve been lingering on my optimistic to-read list since I was a child. As good an opportunity as any, so I started the eponymous “book 1.”

It is a smooth and easy read, though admittedly became a bit of a disingenuous
hate-reading on my part. The basic summary is that I wasn’t a big fan of it,
I had all the wrong expectations. It’s one
of the few instances where I think the TV-adaptation trumps the source,
easily^{2}.

One of the great masterworks of science fiction …

Yeah, I don’t know, Bob.

First let’s trace the difference between my expectations and the actual text
though. It quickly became apparent the TV-series is *extremely* different from
the book(s^{3}), not only in content, but in structure and style. The TV-series
shifts between several complex parallel narratives and stories (probably weaving
in parts of the other books?), sometimes non-chronologically, whereas the book
is very simple and straightforward.

On the superficial side, the TV-series’ adds a touch of wokeness by upgrading
half the characters’ genders, free of charge. It became abundantly clear why
they did it as I read the book though. In it, as far as can I recall, women
didn’t even exist *as a concept* until some 80% through^{4}. No sisters,
mothers, et cetera^{5}. It’s a book of capable, asexual men, much in the spirit
of the short stories of Arthur C. Clarke^{6}.

But that’s a trifle compared to the heavy-handed story rewrites and deep-tissue archetypal transformations the characters go through. The differences there, too, border on the comical.

Take Gaal, for example. In the TV-series, she is a super-special ~~Kwisatz
Haderach~~ genius with unknown but seemingly unbounded mental capabilities. She
is furnished with an emotional backstory: rejected by her own kind (religious
luddites) for her super-special super-genius, she brings shame on her family
simply by being what she is. Having solved one of mathematics’ (big?) unsolved
mysteries, she arrives, dramatically, in the capital city of the Galaxy, the
lavish heart of all things, on the personal inviation of the great mathematician
Hari Seldon. Other clues about her special-ness are given on the journey, like
the fact that her conscious mind can withstand a space-time jump, when ordinary
mortals must be unconscious for the duration. She impresses the begrudging Hari
Seldon with her abilities, even though he had very high expectations already.
Stuff happens, politics ensue, but her path from peasant to political player is
clearly delineated. She goes on trial with Hari, they are exiled. She’s
hurriedly granted a love interest in Hari’s foster-son, allotted moments of
heroism, and so on. If anything her story is “over the top.”

In the book *he*, Gaal, is just some random peasant with a fresh doctorate
degree, invited to Trantor to work for Hari, a “renowned psychologist.” Beset
with some vague “British abroad” mentality, his first course of action after
checking in to a hotel is to ask the concierge about going on a space tour or
sky tour or whatever it was. Afterwards he is visited by Hari who doesn’t seem
too impressed, but warns him about the government cracking down on their
project. The Hari-trial happens, as a formal question-answer excerpt. However,
despite Hari supposedly having a hundred thousand people following and working
under him, probably scores of lieutenants or trusted nth-in-commands, he (and
the city’s government) chooses this unknown and unfaceted bumpkin boy as his
prime associate. They inexplicably involves him in the trial. Hari grooms him
with expositions on what’s what, explains all the things he did in the TV
version. But Gaal being given a temporary central position in the plot here
seems to have little to do with logic, merit, confluence or even chance; it
feels a lot more like a rushed plot simply doing the readers a favor by
installing him as a privileged observer. He has no story and does nothing but
observe and listen to Hari, offering only the weakest and shallowest responses.
He lasts about 40 pages in the book and is then promptly forgotten completely as
the book jumps forward in time. He seems a nobody^{7}.

I freely admit I fell for the TV-series first, and that’s no doubt what shaped my preference for it. This is my own bias. In another timeline I might equally argue the opposing viewpoint, applauding the book for starting off in a much more mute and unassuming manner, detailing the perspective of some random pawn in the grand scheme of things, without immediately trying to sell us the trope of grand heroes strangled by their own rarefied specialness…

Anyway, that’s part one of the book. The book consists of five parts, the episodes I’ve seen barely covered one-and-a-half of them. The second part was likewise totally different. And there’s a ton of stuff in the series (the cloned emperors) not even mentioned in this first book; I suspect those are part of the other books in the series.

Speaking solely of the book, it’s a multi-generational epic about politics and
socio-economic forces at the grand scale of civilizations. The use of religion
to consolidate the power of the masses, allowing the many to overcome the few
with superior firepower. The use of wealth and economy (“money power”) to
further consolidate large-scale power in a technologically advancing secular
age, and so on. Which *sounds great*, on the surface of it, but…

[..] Now what do you suppose will happen once the tiny nuclear generators begin failing, and one gadget after another goes out of commission?

The small household appliances go first. After a half a year of this stalemate that you abhor, a woman’s

^{8}nuclear knife won’t work any more. Her stove begins failing. Her washer doesn’t do a good job. The temperature-humidity control in her house dies on a hot summer day.What happens?

Which is taken from one of the predictable patterns of the book. Each
section is structured around some “crisis of civilization” that the Foundation
faces. These crises are all solved by confident, resourceful men who calmly and
cleverly find some way to make the right political choice, which is, naturally,
the opposite choice of the panicking mob or less cool-headed politicians. Each
of them is then allocated a scene with some underling, side-kick, or opponent,
cut from a denser cloth, to whom they exlpain what will happen (or has
happened), how everything fits together in the grand
socio-religio-cultural-economic scheme, in grating pseudo-monologues. The
monologue-enabler naturally fails to see the big picture (just like us!) so the
hero also gets to gloat with phrases like “you’ve missed the mark entirely my
boy,” “no, that’s not it at all, you fail to understand the real issue!” or
“don’t you see? It’s obvious!”^{9} before revealing his raw, throbbing insights
and out-of-the-box-thinking. I’m no psycho-historian, but those insights also
come off as a bit naive.

That’s all about the *content*. The *language* inspired a number of thoughts of
their own, but they are not limited to this particular book.
So I will instead
switch to discussing “science-fiction and fantasy books I don’t like” (which
I’ll henceforth abbreviate, as a concept).

# SFFBIDLs (Science-Fiction and Fantasy Books I Don’t Like)

… is a narcissistic term, but this clearly isn’t an objective or academic
“critique”—it’s a personal rant. Little of what I say can be applied
universally. *My* main issue with books are usually on the point of language,
not plot or characters or subject-matter, which is clearly not universal. And
I’ve long had such an issue with fantasy (most) and science-fiction (some),
without clearly tracing out what it is. I’d like to try that here.

It comes about, I think, when language is used in a “deplorably practical” way.
The books are mere feedbags of words, *an ugly mechanical pump*, strapped to the
reader’s face, pushing the language into the readers eyes and mouth as fast as
they will go. After all, the whole point of it is the belly-warming plot and
mouth-watering world-building, so why care about the method of delivery?

The easiest is to give examples of what it is *not*. I’ve had the fortune to
read a lot of sci-fi of which I have no traumatic SFFBIDL-memories: works by
Delany, Russ, Le Guin, Huxley, Lem… Fantasy fares a bit worse here, as Tolkien
is the *only* example I personally know of that is not SFFBIDL. I’m not sure
what it is about fantasy in particular that attracts SFFBIDL writing, I
honestly don’t see a reason for it, but I strongly suspect there is a root cause
even so.

## A Furtive Rant

The cliche of bad writing using too many adjectives is well-known and well-condemned, but I would like to wag an equal finger at writing that attempts to cheat condemnation by crowding a dozen “theatrical verbs” onto every page. These verbs are but adjectives with Groucho glasses.

Lips twitch and quiver; eyes shine, blaze, or twinkle, but mostly squint or widen; eyebrows—sometimes also
forehead, as the two often conflated into “brow” for unknown artistic
reasons—will knot, wrinkle, or be raised in surprise, but they *always* come
down frowning or scowling. There’s so much frowning and scowling. Voices harden,
soften, waver, or crack. Hands are placed, clenched, clasped, opened to
revelation. Unnamed body parts are forever fidgeting (hands) or shuddering
(shoulders). Anger and temper are functional synonyms, as they both spark, rise,
break, boil, before being subdued, reigned, masked, and controlled. Figures
skulk, servants scurry; women, more than men, startle and shriek; men, more than
women, grumble and growl. Heroes sometimes grin (like the Spanish), chuckle
(unlike the British), or snort (in the German tradition)—all verbs which
villains tend to avoid unless they can affix it with an appropriately villainous
adverb (*contemptuously*, *derisively*, *coldly*, and so forth).
Evil characters prefer to smirk, sulk (young), or condescend (adult) instead.

Certain verbs are worse than others when it comes to adverb smuggling. Smiling
is a notorious one, smiling is hard to do in a single word. One smiles
sardonically, grimly, wearily, coldly, distractedly, sarcastically, or *to
himself* (rarely herself). Faces, or the aforementoined brows, are also good
smugglers, being hard, soft (like voices), grim (like smiles), grizzled, weary
(there it is again), or even intelligent^{10}. Gestures are never left alone to
be themselves, but always pushed to either extreme: wild, ostentatious; or
hidden, furtive.

The descriptive language in SFFBIDLs is so cyclical and repetitive it can become
entirely too familiar. When I was younger, reading mostly these books that I now
disdain (gah, I, too, am *afflicted*), I thought that “furtively” meant “in an
annoyed, peevish, or sulking manner.” Someone who was *furtive* was annoyed or irritated, usually in a somewhat childish or ineffectual way.

I blame the strongly incestuous vocabulary of SFFBIDLs. All the furtively stolen glances by peevish or sulking villains, the furtive looks by the prince at the girl he desires, annoyed by her love for another, the emperor with wounded pride who furtively signals his soldiers to attack, the child who hides in the grass to peer furtively at the strangers from another planet, the petty man furtively looking over his neighbor’s shoulder at the farming geegaw that was never returned to him, the exposed con man who furtively signals his partner for help, and so on. You see, it’s an easy word to infer from context!

I never looked up the word, even though I tend to do so with unfamiliar words. I was already more than confident in my definition. Reading SFFBIDLs it made perfect sense. I confidently used the word myself, no doubt causing comical confusion.

I wasn’t set straight until well into my late 20s, when an author of general
fiction used the word in what I thought was a very strange way. I believe it had
to do with a “furtive affair,” that was nonetheless sexually passionate, devoid
of any love-hate peevishness that the word seemed to indicate… The shock of
discovering the real meaning left me feeling a deep kinship with *Crome
Yellow*‘s Denis^{11} and his love for the “romantic” word *carminative*.

## You Must See What I See

It seems to me there’s an anxious obesession among writers of SFFBIDL to describe everything. I’ve seen would-be science-fiction and YA writers ask questions online, “What’s the best way to describe my character’s luscious red hair in a natural way? I originally had her consider it when she looked in the mirror, but now I’m having doubts. Could I have another character comment on it? Or do I simply offer a description of her in the narrative when introduced?” Or: “my character has lost an eye, how can I make her eye(s) widen without making it ridiculous?”

These kinds of surface descriptions, especially when it comes to how people are
perceived or look, gain tenfold traction when it comes to how they act or
behave, naming all the actions of their characters, *he picked up the cup*, *he
turned it in his hands*, *he looked at it*… It makes me wonder if earlier
drafts didn’t have sentences closer to “he extended his right arm slowly (XXX:
but not unnaturally slowly?) to pick up the red cup (chipped on the side?) by
its metallic handle (it was cold), admiring the glimmering nubolites below the paint’s surface (fine Qrtzyxxcyan craftsmanship - include brief history of Q. as peace-loving artisans?).”

It feels all too common among SFFBIDL writers (and readers!) to approach writing as one would approach a sort of mental cinematography or directing, showing only the most epic movies.

The scenes’ exterior actions are described moment-to-moment, but if the reader
is granted access to the character’s thoughts and feelings, it is by contrast
unlike any moment-to-moment subjective experience. The contrast—or rather, the
*lack of contrast* between outer and inner—can be startling. The thoughts of
SFFBIDL characters never stray from what is immediately relevant, follows
naturalistic cause-and-effect laws, and remain fixed by the exterior of the
scene. Their inner lives are but the scene interpolated inward. Even when the
plot calls for anxiety, doubt, and vicissitude—chaos!—they will barrel down on
it harder and more stubbornly than Dostoyevski’s Raskolnikov. SFFBIDL characters
have an ADHD diagnosis rate of 0.0%.

And because the diagnosis rate is probably reversed in readers, we have the
mania of over-attributing spoken dialogue. It’s clarity gone mad. The authors
usually limit the number of characters appearing on the stage together.
Otherwise the text would *have to* laboriously attribute all dialogue with
repeated “he said,” “said Grynnyk,” “said Budwoein,” “said Mmmumleif,” and so
on, wouldn’t it? Lest there be confusion! But sometimes the script calls for
three or more people to speak in the same room though, necessitating a solution.

And the solution is again a verb cheat. *He growled*, *she hissed*, *he spat*,
*she all but shouted*, *he barked*, *she cried*, *he lamented*, *he admitted*.
Characters start to declare, observe, and offer all their lines of dialogue.
This, *on top of* all the descriptive trills like “with a scowl,” “wincing,”
“with a sardonic smile,” “while trying not to laugh.”

I imagine many SFFBIDL authors started out by constructed some kind of mental
spinwheel of adjectives, adjectives, verbs, and so forth, which can be used for
generating these pseudo-random artistic choices. Lesser artistry such as
alliteration, surprising collocations, suggestive connotations, etc. is
necessarily stripped away. Yet isn’t this the language of the mechanized
intelligences SFFBIDLs were enamoured with mid 20th century^{12}? In general
fiction I’d grant slight differences in meaning between a person’s expression
being “tranquil,” “calm,” or “serene,” but I don’t extend that trust to
SFFBIDLs.

This mistrust seems to be mutual. SFFBIDLs seem to get terribly anxious that the
readers won’t “experience” the movie in the same way, or see it the way it’s
intended to be seen^{13}. And they definitely *do not* trust the reader to
figure out tone or sarcasm for themselves, or allow a different interpretation.
Is it a matter of canon or control? If the authoritarian author intends a
character to be empathetic even though she’s forced into making tough and brutal
choices, we cannot have her being thought of as callous by anyone! Those readers
would be *wrong*!

Tone is not fool-proof. And it takes long to establish and familiarize the
reader with even when it can be applied. It can hardly be justified when there’s
the entire galactic history to go through, or a fifteen-year D&D campaign
spanning three continents! It’s more economical to steady his voice, reply with
a blank expression, disguising his fear, masking her anger, mutter under his
breath, give a sarcastic laugh, snort contemptuously, or, of course, a favorite
of *The Foundation*: *smiling sardonically*.

It takes a lot of effort to make characters alive to the point where readers,
especially casual ones, can be trusted to *know* those characters. To know them
to the point of being able to tell their mood from just a few words on a page
that carry no picture or sound on their own. So signals are needed; SFFBIDLs
just err on the safe side—far, far on the safe side. That’s a rather unique
feature of SFFBIDLs: you can pick up a new book, open it at a random page, and
still be fully informed of the emotional state of the characters, *visually* up
to date, as they cower, shudder, sheepishly admit, hold back tears, and give a
booming chuckle. Every new paragraph constantly updates all the external data
relevant for the mental screening. Just as you can walk in mid-plot of a movie
and plainly *see* that the character is sad.

Hardin had almost gotten out of the habit of laughing

^{14}, but after Sermak and his three silent partners were well out of earshot, he indulged in a dry chuckle and bent an amused look on Lee.

*Dune*, *The Foundation*, *The Wheel of Time*, the *Malazan* books, *Night’s
Dawn* trilogy, etc^{15}— they all give a vibe of being… transcribed? As if
they started out (epic) theater plays or films from the author’s private movie
studio, and then later made into a book. Then again, maybe most of fiction
writing is like that? Polished transcripts from the imagination. So then what—

A sort of militant misapplication of show-don’t-tell? “So what you’re saying is
I shouldn’t *tell* the readers my character is happy, I should *show* it? Hm, so
I guess I will make him laugh or something? Or smile or—the *clack*, *clack*
sound of a spinwheel losing momentum—*give his eyes a mischevious twinkle*…
Hey, you’re right, this is good stuff!” In every paragraph, for as long as the
emotion subsists! May every action show its telling! Amen.

## Does It Have to Be This Way?

**INT.**: A tavern in a destitute part of the land.

Early afternoon, still bright. Thin smoke fills the room, piss-colored swill is served as mead, the late-autumn rain blows heavily against the windows; it appears bitterly cold. The fireplace is recently lit, not yet warm. The early patrons are all quiet, sitting apart, avoiding eye contact. No one is talking this early.

There’s a scene, taken from an arbitrary fantasy universe. Peasants, miners,
never mind the stereotype. Uneducated and consigned, faceless men whose lives
play the same role as women in *The Foundation*. I.e. they are merely referred
to obliquely (or *en masse*), in service of world-building some sadistic king or
whatever.

Collate and compare a dozen imagined SFFBIDL-descriptions of this scene. The hero walks in, sits down, and, looking around at the poor-folk who surrounds him—what? Try to capture the mood that envelops him.

Weary, tired, or haggard men? Miserable? You’re given free reign to use words
like “abject,” “gloomy,” and “despondent” if you deem it necessary^{16}. Just
mind we’re not exactly talking about a “writhing in hellfire” kind of misery
here, but the grey misery reflected in the weather outside. How do we draw a
circle around the emptiness of drinking piss mead early in the afternoon?

How many authorial voices would come up with something distinct, or even surprising? How many are merely writing down permutations of the next writer with varying synonyms?

Consider: “All around them men drank alone, staring out of their faces.”^{17}
This sentence isn’t mine, nor is it from a fantasy or science-fiction book. But
what’s stopping SFFBIDLs, in particular, from coming up with such sentences in
their own style?^{18} Nothing, except that *SFFBIDLs rarely have any detectable
style*. For all their colorful adjectives, adverbs, and verbs, the *voice* tends
to be colorless: flat, authoritarian, and monotone.

Instead it’s basically what I’ve outlined above: heavy-handed, but plainwoven: the piss-colored swill, the weary, haggard men, the grizzled faces, eyes peering out suspiciously from under heavy brows, the disheveled, threadbare garments, et cetera, et cetera, with synonyms permuted appropriately.

And that’s where the D comes in.

This action hero obsession seems to be American TV’s eternal priapism.↩

Other easy examples are /American Psycho/ and /The Rules of Attraction/.↩

Of course, I can only speak for book #1,

*The Foundation*. I imagine the series borrows non-linearly from the all the other books.↩However, when gender does make an apparance, it is definitely with a boom!

Mallow drew gently out of an inner pocket a flat, linked chain of polished metal. “This, for instance.”

“What is it?”

“That’s got to be demonstrated. Can you get a woman? Any young female will do.

*And*a mirror, full length.”“Hm-m-m. Let’s get indoors, then.”

Remember, this is (probably) the first time—and how—the book acknowledges women exist. From there on, for the last 60 or 70 pages, several women are mentioned though. An unnamed daughter is used to euphemistically speak about wartime rape, the insouciant hero gives a smart-aleck reply as to why certain details were left report out of a report by asking if they want the name of his mistress too, and so on. The only wife to make an apparance is a fully-fledged Shakespearian

*shrew*, who does nothing but belittle and emasculate her much-older husband (who seems to hate-desire her in turn).↩Addendum: no, I recall now (later) that it was actually briefly mentioned in a sentence early on, where Hari Seldon talked about his followers, saying they had

*families*, and perhaps the word women was even used.↩Who at least acknowledge that women exist, even if they’re just a “necessary evil,” sirens calling men away from the the noble pursuit of science.↩

Hari doesn’t seem to have a foster-son in this universe, so no love interest either.↩

This is from the last part of the book, when women have been introduced as a concept.↩

This is somewhat reminiscent of stuff like Jubal Harshaw’s speeches in

*Stranger in a Strange Land*: speeches held forth to some weak interlocutor who must usually concede to the wisdom and intelligence of Jubal (roman-à-clef), or at least bear it out.↩What are intelligent faces, anyway? I read about them constantly, but still not sure what they look like. Asking for a friend who suspects he has a stupid face.↩

A fantastic extract:

‘It’s a word I’ve treasured since my earliest infancy,’ said Denis, ‘treasured and loved. They used to give me cinnamon when I had a cold - quite useless, but not disagreeable. One poured it drop by drop out of narrow bottles, a golden liquor, fierce and fiery. On the label was a list of its virtues and, among other things, it was described as being in the highest degree carminative. I adored the word. ‘Isn’t it carminative,’ I used to say to myself when I’d taken my dose. It seemed so wonderfully to describe that sensation of internal warmth, that glow, that - what shall I call it? - physical satisfaction which followed the drinking of cinnamon. Later when I discovered alcohol, ‘carminative’ described that similar, but nobler, more spiritual glow which wine evokes not only in the body but the soul as well. The carminative virtues of burgundy, of rum, of old brandy, of Lacryma Christi, of Marsala, of Aleatico, of stout, of gin, of champagne, of claret, of the raw new wine of this year’s Tuscan vintage - I compared them, I classified them. Marsala is rosily, downily carminative; gin pricks and refreshes while it warms. I had a whole table of carmination values. And now,’ Denis spread out his hands, palms upwards, despairingly, ‘now I know what carminative really means.’

‘Well, what does it mean?’ asked Mr Scogan, a little impatiently.

‘Carminative,’ said Denis, lingering lovingly over the syllables, ‘carminative. I imagined vaguely that it had something to do with carmen-carminis, still more vaguely with caro-carnis, and its derivatives, like carnival and carnation. Carminative - there was the idea of singing and the idea of flesh, rose-coloured and warm, with a suggestion of the jollities of mi-Careme and the masked holidays of Venice. Carminative - the warmth, the glow, the interior ripeness were all in the word.’

C.f. as a small example,

*Dune*‘s incessant reteration about the Fremen’s blue eyes. The blue eyes—no white at all. No white at all. Blue-within-blue. No white in their eyes. No white at all!Chill out, my dude. I kept waiting for some meta-insert like, “Hi, it’s me, Frank. Frank Herbert. Sorry, I don’t mean to jar you with this meta-narrative interruption, but these eyes, the eyes of the Fremen, they’re just really, /really/ blue, OK? I can’t impress upon you how blue they are, and how spooky it is seeing them. I now I just keep repeating the same phrases whenever they appear, but I’m not sure what else to do. It feels important to me that you’re as impressed as I was when I saw them in my dream. It really freaked me out, and I’d really like to capture that feeling. There’s really no white at all—at all! It’s so disturbing seeing human eyes that way. If you saw it like I’ve seen it, you’d know what I mean.”↩

A barefaced lie.↩

Want to add

*Ready Player One*but that has its whole other slew of problems. There’s definitely also parts of Heinlein and Ian M. Bank to be mentioned… But my memory is hazy.↩Though for some reason SFFBIDLs tend to stay away from the word “depressing” or “depression.” Is depression not colorful enough?↩

from

*Angels*by Denis Johnson, by way of a note of David Foster Wallace mentioning five underappreciated novels.↩Case: Delany’s writing, for example, is densely woven with these unassuming sentences heavy with meaning. “He thought his own thoughts, occasionally glancing to wonder what hers were.” Or this little exchange:

“[..] is it all right that I remembered your poem; and wrote it down?”

“Eh…yeah.” He smiled, and wished desperately she would correct that comma.

# Aetherial Worlds, by Tatyana Tolstaya

Short stories.

# Havfruehjerte, av Ingvild Lothe

- Det finnes så mange umuligheter i verden, jeg kan sitte hele dagen og tenke på umuligheter.

`#![allow(unused)] fn main() { todo!() }`

`#![allow(unused)] fn main() { todo!() }`

# Endless Consumption

A year of gluttony.

## A+pain

Oh my dear life and heart. Rarely painfree or comfortable. Sometimes a tad on the intense side. Hard to watch alone.

#### Normal People

#### Succession

#### Scenes from a Marriage

#### White Lotus

#### Easy

#### Gilmore Girls

#### The Office (US)

## A-tier

Great, but easier on the soul. This is Goldilocks TV—soothing for mind and stomach.

#### The Foundation

See The Foundation.

#### The Kominsky Method

A show about old actors having trouble peeing.

#### Osmosis

#### Katla

Beautiful cinematography.

#### Brand New Cherry Flavor

#### The Marvelous Mrs. Maisel

Love the character of Maisel’s father, Abe.

#### The End of the Fucking World

#### The Expanse

All but season 5. Season 5 is awful.

#### The Twelve

## C-tier

Fine, enjoyable; don’t regret watching it, in the literal sense.

#### Deadwind

#### Rita

#### The Sinner (season 1)

#### Mythomaniac

#### Newsroom

#### Babylon Berlin

#### Here and Now

#### Queen’s Gambit

## D through F.

Little merit, try again, or actively trying to cause me pain.

#### Killing Eve

The story is pain, the characters are pain, the writing is pain, all is pain.

#### Squid Game

A group from the Anime Acting Academy fight to the death in this subtle social
commentary of economic inequality. *If* there is a plot twist at the end, it is
completely natural and not at all forced.

#### Star Trek: Discovery

I was taken in at first because it seemed so grand… Crisp visuals, great computer graphics and effects—

But the show seemed to get high on its own special sauce, insisting on constant mindless action, especially favoring the most ridiculous form of action in a hyper-advanced universe full of magic science: hand-to-hand combat.

I’d have liked it much better if it chilled out and rather committed to the silliness and deadpan humor.

#### The Stand

#### Dead to Me

Two female stereotypes joined at the hip. The control-freak shrew with a raging temper, taking it upon herself to make the tough choices, to become hard when others are too soft. The wide-eyed bird-like nurturer, forever tending her neighbor’s flowerbed between her yoga classes.

Yet the show seems to be heavily intended for women. Perhaps giving each a pole to identify with?

# Ex, Unsorted

- Emma
- The Hobbit
- Dune
- The Remains of the Day
- Invasion

`#![allow(unused)] fn main() { todo!() }`

## A part of the series *I Hate the Web*

I used to generate HTML from a single big Org-mode file. But it grew and grew and the exported page became sluggish and slow, as the the poor little export theme I was using didn’t seem to handle big content that well.

So today I ~~spent~~ wasted *more than ten hours* looking for a simple way to
generate a static site from Markdown-like data.

Conditions:

- A simple dark mode theme.
- Self-hosted, no third-party website involved.
- Simple and sane generation from plain files (Org or Markdown), preferably without needing to edit any layouts or templates.
- $KATE X$ support, preferably offline as a post-processing option.
- Code highlighting by default and the usual stuff.
- Time from download to having a command-line setup to generate output should be less than two hours.

Easy, right?

**Jekyll**: Seemed like a pain to get offline $KATE X$ rendering going, plus it’s Ruby. So it was pretty easy to dismiss, because I expected it to be easy to find a good alternative.**Hyde**: Python**2**?! Instant delete.**Pelican**: Pure Python, which is nice. Found a plugin for math, lots of themes, was raring to go. But then as I went through the themes and templates I started to despair… The few dark ones I found were bad or broken.**nikola**: looked pretty clean and well-designed. Also Python, checks most of the boxes, but has*one*dedicated dark mode theme, and it doesn’t look very good.**Retype**: Tons of good usability features by default, but closed source? There wasn’t a way to do the simplest modifications. It seems they’re preparing to go commercial with it.**Hugo**: some promising themes and layouts. I’d given up on offline $KATE X$ at this point, so I was in compromise mode. I was so tired at this point that I actually started using it. After several hours I was deep into editing HTML templates for page listings and whatever else nonsense before the existential question “what am I doing with my life?” jolted me awake.**hexo**: a convoluted and bloated mess.- At some point I also started to seriously consider just writing some scripts
of my own using
**pandoc**, but then I’d have to generate indices and menus and…

In the end I just gave up and went with mdBook even though it lacks several features (the side bar being extremely limiting, not allowing links to sub-sections or generic content).

When I’m less depressed I might go back and try Jekyll or some other alternatives again.

# Web = Borg

A problem I often run into with huge monolithic cross-platform environment
is there will be a lot of vague problems reported that tend to be difficult to
reproduce or nail down. For example the upstart time for the web applications is
*sometimes* too long, *sometimes* it gets stuck on compilation step for over a
minute, certain UI elements are super sluggish *sometimes*, *sometimes* a
dragged widget is suddenly dropped. In the related issue threads there will just
be a ton of “can’t reproduce”s, “I have no idea”s, and “closing because not
helpful”s.

The web in particular is where big, monolithic systems can go to breed and grow fat. The scope and breadth of the web is so big, there’s so many disparate technologies stacked on top of each other, some dating back to the 80s, some switched out every year, so there’s always room to grow the Borg frameworks to cover even more ground. The web offers a smörgåsbord of huge, redundant, broken, impossible standards to implement, with a free roll of duct tape for every meal.

The web can only grow. The web assimilates all technology and platforms. The web is the place to be.

In a way, it’s beautiful. It’s the technological primordial soup before our very
eyes. Yet cold and inexorable like time itself; it *will* drown us all.

In other spheres like Haskell you can suffocate by climbing too far up the ivory tower of mathematical abstraction, forever chasing some category theoretical ideal of beauty and elegance, but getting nowhere in the process.

With the web it’s the opposite: everyone surviving on the web are *obsessed*
with going places, getting things done, “whatever works will do.” This ruthless
pragmatism led to the birth (and revolting success) of phenomenons like PHP, a
blight upon the minds of the
people.

# AXIOMS FOR PROGRAMMING LANGUAGES:

## AXIOM 0: 0-indexed > N-indexed

The rationale comes from basic group theory. Indices are more often treated as
elements of an additive group than a multiplicative one. I.e. indices are more
often *added* than they are *multiplied*, so it is natural for the *base*
index to be the additive unit, namely $0$.

ref. dynamic programming algorithms that build up a solution using arrays, where the inductive base case often naturally corresponds to zero.

## AXIOM 1: AST macros > templates > language syntax

Examples of languages that do it right: Nim, Scheme.

Rust is an example of a language that tried to do the right thing, but failed horribly at it. Its AST macro system is pathetically gimped and cowardly.

Programming languages that did not adopted an ethos of meta-programming or AST manipulation early on have all invariably end up feeling extremely stale and static (e.g. Java).

Imagine a puzzle game: the language in which you can easily work with widgets, manipulate stateful UI changes, animations, etc. on the front-end side probably loses a lot of its naturality in the back-end when you try to express the game rules in a logical declarative way.

Different problems are usually solved using different languages, so a language in which you can program the language itself is inherently more versatile and has a wider scope than a static language without any meta-programming capabilities.

Yet modern languages like Go and Dart purposefully break from this axiom, chasing the success trajectory of C, Java, et cetera.

In static languages new features must be constantly added to improve ergonomics:

- Good news! We’ve added a special short-hand syntax to the for-loop statement for iterating over arrays!
- Good news! You can now implement custom iterable types that can be used in for-loops!
- Good news! For loops now have optional syntax to make the index available during its scope.
- Good news! We’ve added a special syntax for using for-loops as an inline expressions which evaluate to arrays.
- Good news! You can now collect expression-for-loops into custom data structures that implement interface so-and-so!
- Good news! We’ve now added special syntax support for writing generator functions that compile into regular for-loops!
- … and so on.

Some languages end up adding a *ton* of such special cases or additional syntax,
almost all of which could have been made as user-land macros or
user-controllable libraries if the language had a non-static parsing process in
mind from the start.

Some static languages get extremely far with no AST macros, though, especially untyped languages or well-designed functional languages. Notable examples here are Python and Haskell. The former usually has excellent meta-programming capabilities through run-time reflection, and the latter is hard carried by the functional programming paradigm’s emphasis on raw expressive power.

## AXIOM 2: 0-cost >= safety

TODO

## AXIOM 3: expressions > statements

Or: why functional gets so many things for free.

TODO

## AXIOM 4: integrated build system > third-party

TODO

Note: see https://youtu.be/Gv2I7qTux7g?t=2111 about Zig’s `zig build`

.

## AXIOM 5: libgmp > homemade bigints

People need to stop rolling their own arbitrary precision integers.

Integers are *extremely important*, and they need to be extremely efficient. No
language that rolls their own toy big-integers has even gotten close to par with
GMP. (C.f. Python.)

## AXIOM 6: type classes > object-oriented

TODO

## AXIOM 7: algebraic pattern matching > if-chains or switch-statements

TODO: should be obvious.

## AXIOM 8: cohesion > gimmicks > pandering

TODO

# Markdown is Ugly-Adjacent

Markdown is simple^{1}. Markdown is loved because it is simple.

Simple is good. Simple is better liked than complex.

But `simple <-> complex`

is not the only axis that matters here.

There’s another axis, which I’ll name `malleable <-> rigid`

.

A piece of paper is simple, yet it can be folded in on itself to create
incredibly complex origami figures or paper planes. A slab of concrete is also
simple, but there’s not much you can *do* with it except to just place it
somewhere and stare at it disapprovingly. You can change the piece of paper to
suit your needs, you can’t change the slab of concrete (unless you’re very
strong or use other tools).

Malleable languages allow for the expression of higher complexity at equal cost.

## A Regular Example

Imagine a subset of regular expressions where you can only use the symbols `A|{}()`

.
This is a simple language. But when your boss asks you to express an unknown
number of `A`

s and you start typing `|A|AA|AAA|AAAA|AAAAA|...`

it becomes
immediately clear how rigid the language is. It explodes with repetitive code,
from this seemingly simple task. That is, the cost of typing increases with
every new case, and there is never a way to simplify this process, to build
shortcuts, to *fold the paper back in on itself*.

This is also an unnecessarily complex subset, since the parentheses are redundant and the curly braces are completely unusable (you have no digits or the comma). It is a (relatively) ugly language.

Imagine if you got to replace some of the symbols in the subset, making it a
more *well-designed subset*, such as `A|*`

. This subset is much smaller and
simpler, yet it is also more powerful! You can now make your boss happy by
simply typing `A*`

and it will cover an *infinite* number of cases. Imagine
that! Two characters to reach infinity.

## The Four Quadrants

The intersection between simple and malleable is the *quadrant of beauty*. Things
that are both simple and malleable are beautiful.

A beautiful language is reasonably simple at its core, but contains very versatile building blocks that can be used to express high complexity at very low cost. It might even have meta-rules for changing its own rules (macros) to adapt to novel complexities. It enjoys epithets such as clean, composable, extensible, versatile, etc.

Things which are both rigid and complex are *ugly*.

An ugly language is one where the language itself is already complex and you can do very little to lower that complexity or deal with novelty. For example it might have a lot of inconsistencies, edge cases that crop up in many situations, tons of keywords and different statements, without any form of abstractions to deal with all these cases.

Things that are complex but malleable are very daunting, and they have a big
upfront cost which might not be worth paying. They tend to be well-liked by
those that have already paid that cost. But it’s still hard for them to gain
traction due to the initial complexity hurdle. This is *the ivory tower
quadrant*.

This could also be called the quadrant of unpopularity, as these languages and tools will very rarely pierce the mainstream.

Things that are simple but rigid are very beginner-friendly. They have no
upfront cost and can gain rapid traction.
This is the *quadrant of popularity*.

## The Quadrant of Popular Frustration

The popularity quadrant could also be called the quadrant of *frustration*, because later when trying to
do non-beginner or novel things you find it is much harder than it “ought” to be
(i.e. than it would be if the system was just a little bit more malleable about
this or that, a little better designed for how you want it to be).

But I claim most fashionable and popular things are still in this quadrant. This is natural because it has the lowest upstart cost, it’s the easist to get started with, can accommodate large swaths of new people writing new tools and libraries and so forth, and so these tools and languages simply get more chances at breaking threshold before they’re ingrained in systems for decades to come.

Many people would even argue that this is the *optimal* quadrant, because it has
the least cognitive load of all the quadrants, and it forms a basis of building
*replaceable tools* for *replaceable programmers*. In this way it could be said
to be malleable *as a tool*, on a meta level, even if it’s not a malleable tool
in itself. However, as it becomes more and more ingrained, it’s harder to
replace, and we’re back to frustration…

Markdown is like this.

Markdown is simple, but rigid. Markdown is beginner-friendly, and it’s easy to get started.

Markdown is here to stay. Probably for my lifetime.

But it can be frustrating. It lacks a lot of features, features that are
*simple* generalizations, *and* there is no innate malleability for making those
generalizations (folds) youself. Instead you’re left to invoke its
meta-malleability by replacing the tool itself with one of the dozens of
Markdown-dialects that support what you need, or where plugins can be written,
or even write your own parser.

But as you do so, if, as your needs grow, you’re building up your own Markdown parser to meet them, or writing Markdown plugins for a different library, what was really granted by Markdown in the first place? You feel tricked. Hoodwinked by its allure of simplicity and widespread adaptation.

## My Own Journey

`mdbook`

, which I’m using here, use a Markdown-parser with many restrictions. It
strives to “follow the spec,” but the spec is the simplest and most rigid
Markdown of all.

OK, so it at least supports footnotes, that’s a good start.

And there’s a $KATE X$ plugin. Good, great.

But hm, I’d like to automatically sort the footnotes instead of showing them in the order I inserted. Is that possible? I guess I’ll write a plugin…

Why are quote-blocks so ugly? In fact, the indented code blocks were also a terrible idea. ```-blocks ase so much better. Is there something like that for quote-blocks? I want a way to add a “tag” (like “caution” or “note”) to quote-blocks!

Oh, and I want a way to short-hand link to various stuff, like headings. Surely
it can’t be intended I’m to write out the full `[Apple Pie](#apple-pie)`

when the link
is generated by Markdown itself? Is there no shorthand for it?

And I want to hook in simple filters on pure text fragments (anything outside of
HTML or links or code blocks) to replace `--`

with the beautiful —, and so on.
Should be super simple…

What do you mean I can’t put several paragraphs inside a footnote? A footnote
can only be *one* paragraph!? What—

And no code blocks…?

Hey, you know what would be cool—

## The Reality

Is that I basically don’t use `mdbook`

for any Markdown at all… It is fed HTML
which it looks at, does nothing, and copy them out together with the CSS and JS
stuff. It’s stupid.

I’d argue it looks a lot simpler than it is, but it’s still a shot from being

*complex*.↩

# Python Is Slow

As a matter of intellectual honesty I feel it’s necessary to state unequivocally
that Python is *slow*. Python is absoluetly, incredibly, and sometimes painfully
slow. There are always a lot of “buts” and coping mechanisms that couch this simple
fact.

- But Numpy…
- But Numba…
- But PyPy…
- But language X is even slower…
- But it doesn’t matter most of the time…
- But selecting a language based on speed is a form of premature optimization…
- But you can still use Cython or write C plugins…
- But Python is so good at…
- But it depends on what you mean by “slow”…

But I say it’s okay. We all intuitively know what it means for Python to be a
slow language. We still use and love it. There’s nothing to be scared of — it’s a
morally neutral statement. It is both honest and *totally satisfactory* to state
it up front.

# Less Python Is Faster Python

By admitting that Python is slow, we learn the main way of writing *efficient*
Python: execute as little code as possible. If there’s two ways to accomplish
some task, it’s usually better to choose the path involves *less Python code*.
Less code also means less code to read for others, less code is more
maintainable, less code is more extensible, less code can be rewritten more
easily. From this we get two guidelines on how to approach efficient Python
programming:

Learn idioms, the standard library, and esoterics. Learn succinct coding.

Try to leverage Python’s powerful expressiveness as much as possible. If you need to reverse a string, just use

`s[::-1]`

rather than something more explicit and arguably less “magic” like`''.join(s[i-1] for i in range(len(s), 0, -1))`

^{1}.Knowledge is power. If you want to select random characters from some alphabet a lot of beginners would do

`[random.choice('ABCDwxyz') for _ in range(10_000)]`

or similar, perhaps not realizing that there exists`random.choices('ABCDwxyz', k=10_000)`

which is both simpler*and*faster.On encountering anything that involves a loop, have a moment to think if there’s some way to leverage Python’s built-in functionality or the standard library to do the same thing as a single expression, moving the loop to the C-side, or at least to the standard library where the code is likely to be better and faster than yours.

A good way to gain knowledge like this is to actually look at what people do in code golf. Another option is just plain regular ad hoc scripts, the messier the better. Perhaps counter-intuitively, specialized or “dirty” code like this is much more densely packed with information and learning opportunities for the reader,

*because*it’s unfiltered and has a high expression range. Highly developed generic and abstract code is, conversely, terrible for learning. All the code is a homogenized and consists of doc-strings and methods that simply defer to other methods, there’s hierarchies of plugins and backends and factories and proxies, and little of the code is actually interesting. While maybe good from an API perspective, polished, abstract code is a wasteland when it comes to learning opportunities.Do not worry about redundancy. Use the general solution.

Intuitive efficiency does not translate into real efficiency in Python. For example it’s far better to execute highly redundant code on the native C side, like something that loops over a structure in several passes, than to do the loop even once on the Python side.

If you have a string that consists of a billion words and numbers in some relative proportion, and you want to end up with a list of the numbers and then another list of the words (but

*uppercased*), it’s far better to just uppercase the entire string in a single call, than it is to first do the separation and then to only uppercase the strings that need it. (In fact, this might hold even for C, depending on the proportion, and how the string is processed.)Similarly, it tends to be better to simply work with a single large matrix in Numpy and use masks to manipulate parts of it, than it is to work with a ton of individual small matrices where you programmatically select and update “only the ones that need updating,” as long as it leads to less code being executed in Python.

We might refocus these guideslines slightly, and consider what to do if we actually do need “speed” in Python. In that case:

Use an external library that does what you want to do. Examples include cryptography, graphics, event handling, and the like.

Or Numpy. Numpy is the answer to a lot of questions. In my opinion, Numpy ought to be built into Python itself, it ought to come with the standard library. If you have lists or arrays and you’re looping over these these to do some kind of filtering or computation, then in

*almost all cases*you can replace the loops with Numpy indexing expression that translates the loops from the Python side to the C side.

# Case Study

Here’s a little exploration I did based on AoC, day 3. The
input is a single string that consists of `n`

-length sequences of `'0'`

and
`'1'`

, each sequence separated by a newline. The output should be a list of
numbers where each is the number these substrings that has a `'1'`

character at
the given position. Or, rephrased: you get a single string that can be divided
up into lines where each line is a row in a matrix, and you want to count how
many `'1'`

characters there are in each column of this matrix.

Below is the code using various tricks and idioms.

See if you can predict which performs better in what circumstance. Results are at the bottom.

```
# The functions below take two arguments:
#
# `inp` (str) consists of rows of 0s and 1s separated by a newline, e.g. '1011\n1111\n1001\n1000\n'
#
# `n` (int) is the number of columns in each line/row. Some functions do not need/rely on this.
#
# Benchmarks are run on two sets:
#
# A "narrow" set with 10_000 lines and 8 columns, and a "wide" set with 2_000 lines, and 40 columns.
def py_naive(inp, n):
"""Arguably the most obvious and straightforward way to do it.
Split the string into lines, then for each column, count the number of lines
that has a '1' in that position.
"""
lines = inp.split()
return [sum(line[k] == '1' for line in lines) for k in range(n)]
def py_zip_join(inp, _n):
"""The usual transpose-zip combined with a join.
"""
return [''.join(x).count('1') for x in zip(*inp.split())]
def py_zip_counter(inp, _n):
"""Like `py_zip_join()` but using a `Counter()` for the count on the
resulting tuples.
"""
return [Counter(x)['1'] for x in zip(*inp.split())]
def py_clever(inp, n):
"""Under the assumption that the input format is set in stone, we can cleverly
slice the string into substrings consisting of the desired columns.
"""
return [inp[k::n+1].count('1') for k in range(n)]
def ex_resplit(inp, n):
"""This is just to show the difference with `py_clever()` where we cannot assume
uniformity of input, so we force a `.split()` and then join it up again to add
artificial overhead.
"""
inp = ''.join(inp.split())
return [inp[k::n].count('1') for k in range(n)]
def np_count(inp, n):
"""Elaborating on the `py_clever()` and under the same assumption (that the
input format is absolute), we can use Numpy to accomplish the same sort of
count-by-columns directly, though here without the need to create per-column
substrings.
"""
mat = np.frombuffer(inp.encode(), 'u1').reshape(-1, n+1)
return list((mat == ord('1')).sum(0)[:-1])
def np_split(inp, n):
"""This looks ugly but it's just an artificial example to show the case where we
actually `split()` the incoming string.
"""
return list(np.array(inp.split(), f'S{n}')[...,None].view('S1').astype('i4').sum(0))
def np_itersplit(inp, n):
"""Fully splitting all the values before passing them off to numpy for the sum.
"""
mat = np.array(list(map(list, inp.split())), 'i4')
return list(mat.sum(0))
def py_bigint(inp, n):
"""Over-engineered solution translating the entire string to a single integer
and masking out the bits in question.
Although useless here since we still end up converting back to strings, it's a
generally useful trick in other areas.
"""
inp = inp.replace('\n', '')
as_int = int(inp, 2)
bit_1 = 2**(8 * len(inp)) // (2**n - 1)
return [bin(as_int & (bit_1 << k)).count('1') for k in range(n)][::-1]
def py_regex(inp, n):
"""Another over-engineered solution using a regex search to count each of the
patterns of a column having a '1'.
Also useless here, but a useful trick.
"""
return [len(re.findall('^' + '.'*k + '1', inp, re.M)) for k in range(n)]
def py_enumcounter(inp, n):
"""Cycle a counter together with the string, counting all occurrence-combinations.
Note that this counts everything, including `'0'` characters and newlines.
"""
cnts = Counter(zip(itt.cycle(range(n+1)), inp))
return [cnts[(k,'1')] for k in range(n)]
def py_num_trick(inp, n):
"""A variant on the numerical tricks, converting the number to
(pseudo-)base-65536 numbers which can be summed directly without the digits
overflowing. Then a formatting trick to split the resulting number up by
column sums.
"""
cols = sum(int(x, 16) for x in inp.replace('0','0000').replace('1','0001').splitlines())
return [int(x, 16) for x in f'{cols:_x}'.split('_')]
```

# Beauty as a Side-effect

Python is slow. But in writing efficient Python one starts to develop an
intuition that *succinct code* is usually *more efficient code*. In my opinion,
this is a very fortunate heuristic, one that speaks to the heart of what is
beautiful about Python.

This heuristic *does not* hold for all languages. For most compiled languages,
it tends to be the opposite. Performance-optimized Haskell tends to be uglier,
more verbose (and more imperative) than succinct or “naive” Haskell. Performant
languages like C also tend to blow up to some degree when they’re optimized to
the point of striding loops, structs are split into arrays, and compiler hints
are inserted everywhere. If you’ve ever seen C code that tries to convey likely
and unlikely branches in `if`

-statements you know what I mean.

Many dynamic languages that lack Python’s expressiveness or syntactic freedom
also do not show this side-effect. In these languages one would tend to break
down the abstractions and write as C-like code as possible, praying the
interpreter has an easier time recognizing patterns and/or triggering its JIT
compilation. One example is JavaScript^{2}, which offers way less in terms of
raw language power or a standard library, and where people still write C-style
`for`

-loops.

When you have a very expressive (but slow) dynamic language it leads to a sort of harmony where elegance and efficiency tend to align rather than diverge. Instead of being told to break down all the abstractions and write assembly-like low-level code (such as in performant unboxed Haskell or JIT-friendly JavaScript), you’re actively encouraged to go even higher, and try to pack as much power into every expression as possible.

Ruby would be an example — another slow, dynamic language — that appears to share the same harmony. I don’t know much about it, but I nonetheless expect succinct Ruby to be efficient Ruby.

Saying that Python’s slowness *encourages* a beautiful succinct aesthetics isn’t
an excuse though. Of course I’d much rather have Python as-is be fast.
I’m not arguing that slowness is a good thing^{3}.

Although, for me, Python (and Ruby) presents a tenent that I wish was true for
all languages: high-level code should not be slower than low level code. I don’t
know why so many accept it as inviolable mathematical fact that high level code
is necessarily slower than low level code. The relationship is *at most* `=<`

,
not `<`

. Thus I also strongly believe that in natively compiled languages, *all
abstractions should be zero-cost*. If they are not that is a *failure* of
language design, albeit one we might have to live with temporarily.

# Benchmark Results

Narrow lines:

------------------------------------------------ benchmark: 12 tests ------------------------------------------------- Name (time in us) Mean Median StdDev OPS Rounds ---------------------------------------------------------------------------------------------------------------------- test_np_count201.1564 (1.0) 196.4490 (1.0) 16.4426 (1.0) 4,971.2569 (1.0)12198 test_clever376.4197 (1.87) 342.8305 (1.75) 78.1109 (4.75) 2,656.6094 (0.53)7420 test_ex_resplit641.5690 (3.19) 624.5760 (3.18) 59.4850 (3.62) 1,558.6789 (0.31)3925 test_zip_join2,776.8000 (13.80) 1,845.5670 (9.39) 2,572.5337 (156.46) 360.1268 (0.07)1360 test_bigint_trick3,262.8631 (16.22) 3,232.3440 (16.45) 229.3595 (13.95) 306.4793 (0.06)847 test_zip_counter4,419.2994 (21.97) 3,496.5080 (17.80) 2,598.6162 (158.04) 226.2802 (0.05)693 test_naive4,609.4335 (22.91) 4,600.3455 (23.42) 170.4481 (10.37) 216.9464 (0.04)686 test_num_trick4,639.5709 (23.06) 4,566.1065 (23.24) 227.4872 (13.84) 215.5372 (0.04)646 test_np_split5,373.7856 (26.71) 5,347.6110 (27.22) 135.4890 (8.24) 186.0886 (0.04)561 test_enum_counter7,680.5651 (38.18) 7,634.6120 (38.86) 192.3201 (11.70) 130.1988 (0.03)379 test_regex7,758.5097 (38.57) 7,698.3045 (39.19) 433.0626 (26.34) 128.8907 (0.03)402 test_np_itersplit11,649.3359 (57.91) 10,608.8820 (54.00) 2,946.2972 (179.19) 85.8418 (0.02)280 ----------------------------------------------------------------------------------------------------------------------

Wider lines:

------------------------------------------------- benchmark: 12 tests ------------------------------------------------- Name (time in us) Mean Median StdDev OPS Rounds ----------------------------------------------------------------------------------------------------------------------- test_np_count70.8971 (1.0) 68.6465 (1.0) 11.1233 (1.0) 14,104.9396 (1.0)25304 test_clever393.2690 (5.55) 358.8175 (5.23) 83.2271 (7.48) 2,542.7888 (0.18)7230 test_ex_resplit517.6856 (7.30) 483.3975 (7.04) 90.4284 (8.13) 1,931.6744 (0.14)5256 test_zip_join1,512.8239 (21.34) 1,332.8995 (19.42) 1,037.6603 (93.29) 661.0155 (0.05)1876 test_zip_counter2,997.4516 (42.28) 2,860.1275 (41.66) 874.2208 (78.59) 333.6167 (0.02)276 test_num_trick3,472.1816 (48.97) 3,397.6545 (49.49) 298.9875 (26.88) 288.0034 (0.02)834 test_naive4,335.9662 (61.16) 4,218.6585 (61.45) 446.8619 (40.17) 230.6291 (0.02)598 test_np_split4,717.6956 (66.54) 4,682.5300 (68.21) 169.6819 (15.25) 211.9679 (0.02)677 test_bigint_trick6,445.4938 (90.91) 6,368.7445 (92.78) 429.7515 (38.64) 155.1472 (0.01)424 test_np_itersplit6,954.7329 (98.10) 6,807.0015 (99.16) 1,018.6224 (91.58) 143.7870 (0.01)422 test_enum_counter7,935.9839 (111.94) 7,903.2715 (115.13) 272.1491 (24.47) 126.0083 (0.01)382 test_regex23,734.6893 (334.78) 23,165.2080 (337.46) 2,301.2013 (206.88) 42.1324 (0.00)129 -----------------------------------------------------------------------------------------------------------------------

As expected `np_count()`

shines, notably because it can exploit Python’s buffer
views. But what I really want to highlight is how good `py_clever()`

is. It’s
simple, straightforward, and the fastest pure-Python variant. Compared with the
naive way which performs about a full order of magnitude worse.

Next comes the zip-join, which is arguably the most idiomatic Python, and it’s still better than the “brute force” loops.

The extra-clever tricks of using numbers or regex don’t perform well here, but
they are mostly included for illustrative purposes^{4}. And `np_itersplit`

shows
what happens when using libraries in a bad way: doing all the splitting on the
Python side circumvents half the point of using Numpy.

The absolute

*worst*is to write explicit C-like`for`

-loops that constantly indexes and updates variables. It’s not more readable. It’s just bad.↩I’m actually talking about “naive” JavaScript here that doesn’t have any JIT compilation, but which stays a dynamic interpreted language. Due to JIT compilation, performance in JavaScript is a whole research field in and by itself.↩

I also will never truly forgive Python for using a homemade ad hoc bigint implementation over something like GMP.↩

Though

`py_num_trick()`

performs pretty well for what it is.↩