# Omstokk

Omstokk is a tiny-teensy anagram web game.

Mostly just shuffling around some calls to jQuery & jQuery UI. It looks like shit because my web design skills are on par with a blind Volvo engineer from the 80s.

The original plan was to use it to learn Flutter and Dart, but I lost my will to live pretty quickly while working in that…ecosystem?

## note (fun fact)

`aekrst`

has the most anagrams in Norwegian (according to NSF), with **26** valid words.

Second place is `aeknst`

with 25, and then there’s a steep drop-off for third place `aeknrst`

with 19.

## note (anagrams)

Given a set of *permissible words*, for example a list of legal words in
Scrabble for a given language, the first natural reduction is to find
a normal form such that anagrams are equivalent.
For example `hatter`

and `threat`

are the same when
talking about anagrams as they consist of the same letter set.

The simplest idea is to sort the letters in each word:

```
def normalize(s):
return ''.join(sorted(s))
```

This works fine. The keys are short strings, and pretty efficient. More
complicated patterns like having a hashmap of `char -> int`

, or an $∣A∣$-length
vector of counts, etc. don’t really offer anything new, especially not in
Python.

An alternative is to map anagrams to positive integers via the primes. Basically we map each unique letter in the alphabet to a prime number and the exponent of that prime is how many times the letter occurs.

If we sort the alphabet so that the most common letters map to the smallest primes, this allows for up to 12-letter anagrams to be stored in 64-bit words. (For the Norwegian word list.)

Now several common operations on anagrams-as-multisets translate to arithmetic
operations on the integers: the union of two anagrams (e.g. `normalize(a+b)`

if using the strings above)
become plain multiplication $AB$, $lcm(A,B)$ is the *join* of two
anagrams (the minimum set needed to make every word possible to make from either set), subtracting the possible
words of one anagram from another can be expressed with $gcd(A,B)A $, and so on.

Doing `gcd`

and `lcm`

like this might seem inefficient—and indeed, it is, for
Python’s stupid homemade bigints—but it allows us to use `numpy`

and `numpy`

über alles once we’ve crossed the threshold to speak of Python performance…

## note (an algorithm)

### Definitions

If $A$ is a multiset of letters, let $∣A∣_{k}$ be the number of $k$-letter words that can be made by using letters from this set (without replacement). Let $C_{k}$ be the set of all $k$-letter multisets (anagram).

### Goal

We want an algorithm `R = ALGORITHM(len, lo, hi)`

such that $lo≤∣R∣_{len}<hi$.

### Constaints

- we want all the letters in
`R`

to be “useful” in some way (e.g. part of a possible $k$-letter word). - we want the letter-set
`R`

to be near minimal, tho not necessarily perfectly minimal (as it may constrain randomness?). - we want the set to be sufficiently “random” such that a large percentage (if not all) possible such letter-sets below a given length might be a possible outcome.
- we probably want the randomness to be weighted in some way toward more common combinations of letters?

### Variations

- is there an alternative algorithm that can compute multiple such sets in parallel? (probably yes.)

### Sketch of Simple Implementation

Probabilistic and slow:

- start with a random $k$-letter anagram $a∈C_{k}$ s.t. $∣a∣_{k}<hi$.
- if $lo≤∣a∣_{k}$, then terminate the algorithm.
- compute: $y∈C_{k}$ where $gcd(y,a)y $ is prime
- pick a random (weighted) $y$ such that $∣lcm(a,y)∣_{k}<hi$. If impossible, abort and restart.
- update $a:=lcm(a,y)$ and go to step (2).

Or in plain English:

- start with a valid anagram.
- if number of k-letter words is within bounds we’re done.
- find letters such that adding this letter means we’ll be able to form more unique k-letter words.
- add a random such letter, making sure we don’t exceed the bounds.
- repeat, retry on failure.