Omstokk

Omstokk is a tiny-teensy anagram web game.

Mostly just shuffling around some calls to jQuery & jQuery UI. It looks like shit because my web design skills are on par with a blind Volvo engineer from the 80s.

The original plan was to use it to learn Flutter and Dart, but I lost my will to live pretty quickly while working in that…ecosystem?

note (fun fact)

aekrst has the most anagrams in Norwegian (according to NSF), with 26 valid words.

Second place is aeknst with 25, and then there’s a steep drop-off for third place aeknrst with 19.

Given a set of permissible words, for example a list of legal words in Scrabble for a given language, the first natural reduction is to find a normal form such that anagrams are equivalent. For example hatter and threat are the same when talking about anagrams as they consist of the same letter set.

The simplest idea is to sort the letters in each word:

def normalize(s):
  return ''.join(sorted(s))

This works fine. The keys are short strings, and pretty efficient. More complicated patterns like having a hashmap of char -> int, or an $∣ A ∣$ -length vector of counts, etc. don’t really offer anything new, especially not in Python.

An alternative is to map anagrams to positive integers via the primes. Basically we map each unique letter in the alphabet to a prime number and the exponent of that prime is how many times the letter occurs.

If we sort the alphabet so that the most common letters map to the smallest primes, this allows for up to 12-letter anagrams to be stored in 64-bit words. (For the Norwegian word list.)

Now several common operations on anagrams-as-multisets translate to arithmetic operations on the integers: the union of two anagrams (e.g. normalize(a+b) if using the strings above) become plain multiplication $A B$ , $lcm (A, B)$ is the join of two anagrams (the minimum set needed to make every word possible to make from either set), subtracting the possible words of one anagram from another can be expressed with $\frac{A}{g c d ( A , B )}$ , and so on.

Doing gcd and lcm like this might seem inefficient—and indeed, it is, for Python’s stupid homemade bigints—but it allows us to use numpy and numpy über alles once we’ve crossed the threshold to speak of Python performance…

we want all the letters in R to be “useful” in some way (e.g. part of a possible $k$ -letter word).
we want the letter-set R to be near minimal, tho not necessarily perfectly minimal (as it may constrain randomness?).
we want the set to be sufficiently “random” such that a large percentage (if not all) possible such letter-sets below a given length might be a possible outcome.
we probably want the randomness to be weighted in some way toward more common combinations of letters?

Variations

is there an alternative algorithm that can compute multiple such sets in parallel? (probably yes.)

Sketch of Simple Implementation

Probabilistic and slow:

start with a random $k$ -letter anagram $a \in C_{k}$ s.t. $∣ a ∣_{k} < hi$ .
if $l o \leq ∣ a ∣_{k}$ , then terminate the algorithm.
compute: $y \in C_{k}$ where $\frac{y}{g c d ( y , a )}$ is prime
pick a random (weighted) $y$ such that $∣ lcm (a, y) ∣_{k} < hi$ . If impossible, abort and restart.
update $a := lcm (a, y)$ and go to step (2).

Or in plain English:

start with a valid anagram.
if number of k-letter words is within bounds we’re done.
find letters such that adding this letter means we’ll be able to form more unique k-letter words.
add a random such letter, making sure we don’t exceed the bounds.
repeat, retry on failure.

a soft spiracle

Omstokk

note (fun fact)

note (anagrams)

note (an algorithm)

Definitions

Goal

Constaints

Variations

Sketch of Simple Implementation