# Extended Polymerization

Aaand back to NumPy.

Or well, perhaps the NumPy connection isn’t immediately obvious.

Problem statement: we’re given an input string (starting point) like `ABCCABAC`

and a set of rewrite rules on the form `AB -> C`

, `CC -> ABC`

and so on. The
rule `AB -> ...`

means that `AB`

is rewritten to `A...B`

. I.e. the right-hand
side is inserted between the two characters on the left-hand side. The left-hand
side will always have two and only two characters. All rules are applied
simultaneously to the input and each such application is a step. The goal is
to count the final characters in the string after $n$ steps.

So given a starting point of `ABC`

, no matter what set of rules we apply, or how
many steps we go through, we’ll always end have a string on the form `A...B...C`

where the ellipses denote arbitrary characters.

I’ve actually generalized the problem a little here since it doesn’t cost us
anything. Pedantically they only allow rules that expand two characters into
three, inserting a single character between them^{1}.

## The Problem & Naive Solution

First observation is that it’s probably easier to think in terms of counts of character-pairs than it is to think about the characters directly, as our rules apply on the level of character-pairs.

Let’s say we have a function like `step("ABC", n)`

which gives a dictionary of
`pair -> count`

saying how many occurrences of a given pair there is in the
final string after $n$ steps.

And so the second observation is that we have a simple recurrence relation here,
because if we have `step("ABC", n)`

we can easily calculate `step("ABC", n+1)`

by going over all the pairs and expanding them according to the rules. I.e. with
a rule like `AB -> C`

, the number of `AC`

and `CB`

pairs at step $n+1$ will
increase by the number of `AB`

pairs at step $n$.

Finally, given an input string like `"ABC"`

we’re free to calculate the trajectory
for `"AB"`

and `"BC"`

separately, as they will never influence each other.

So this problem is just a more complicated Lantenfish, really, and I’ll provide both the basic solutions.

## Boilerplate

The usual prelude with two notables:

I’m using my own utility class

`number_dict`

which functions like`collections.Counter()`

in most cases but has some extra functionality. Notably it “acts like a number” when doing arithmetic with it, broadcasting the arithmetic operations to its elements. I.e.`x//2`

divides all the counts in half.The trivial

`all_pairs()`

helper function which will be ubiquitous in both solutions below.

```
aoc = __import__('aoc-common').aoc()
import sys
import numpy as np
import scipy.ndimage as ndi
import itertools as itt
import functools as fnt
import collections as coll
from collections import abc, Counter
import re
from flagmining.dicts import number_dict
test = '''NNCB
CH -> B
HH -> N
CB -> H
NH -> C
HB -> C
HC -> B
HN -> C
NN -> C
BH -> H
NC -> B
NB -> B
BN -> B
BB -> N
BC -> B
CC -> N
CN -> C
'''
text = test if '-t' in sys.argv else aoc.input_text(14)
polystr, rules = text.split('\n\n')
rules = dict(t.split(' -> ') for t in rules.splitlines())
def all_pairs(s):
return [x+y for x,y in zip(s, s[1:])]
# ...
# ...
# ...
# ...
# solver = AbstractPairRewriteMemoizationProxyMixinFactoryBean(rules)
solver = cleverer_matrix_solution_thing(rules)
cnts = sorted(solver.count_elements(polystr, 10).values())
answer1 = cnts[-1] - cnts[0]
print(f'answer to part 1: {answer1}')
cnts = sorted(solver.count_elements(polystr, 40).values())
answer2 = cnts[-1] - cnts[0]
print(f'answer to part 2: {answer2}')
```

## The Naive Solution

… like with Lanternfish we can start with the most basic solution to recurrence relation problems, which is memoization — or dynamic programming if you will, the former just being a very specific instance of the latter.

```
class AbstractPairRewriteMemoizationProxyMixinFactoryBean(dict):
def __init__(self, rules):
super().__init__()
for k,v in rules.items():
if len(k) != 2:
raise ValueError(f"invalid replacement rule: {k} -> {v}")
self[k,0] = number_dict(all_pairs(k[0] + v + k[1]))
def count_pairs_memo(self, s, steps=1):
return number_dict.sum(self[p,steps] for p in all_pairs(s))
def __getitem__(self, key):
ss, n = key
assert len(ss) == 2 and n >= 0
if n == 0:
# Only one choice.
return number_dict.one(ss)
if (ss,n) not in self:
# Memoization/cache.
self[ss,n] = number_dict.sum(
self[k,n-1]*v for k,v in self.get((ss, 0)).items())
return super().__getitem__((ss,n))
def count_elements(self, initial, steps):
# Compensate for edge chars only being in 1 pair.
total = number_dict(initial[0] + initial[-1])
for k,v in self.count_pairs_memo(initial, steps).items():
for c in k:
total[c] += v
return total // 2 # Each element was counted in two pairs.
```

Here wrapped in a class because I tend to consider it the lesser of two evils when compared to nested functions.

## Cleverer

Just like in Lanternfish we can construct a matrix such that we
perform a single step by multiplying with this matrix. Then we can use matrix
exponentiation to calculate the recurrence more “directly” (complexity that
scales with `log(steps)`

rather than linearly).

The same notes apply, with regard to how this doesn’t really buy us much if we have exponential growth, since arithmetic (arbitrarily sized) integers will quickly grow to choke both algorithms way before the steps vs. log(steps) difference comes into play. But it’s very useful for working in finite fields.

There’s some extra boilerplate in this solution because we have to assign an index to each possible pair and then translate back and forth when we go from matrices or vectors over $Z$ to strings.

```
class cleverer_matrix_solution_thing():
def __init__(self, rules, dtype=np.uint64):
self.words = dict()
self.indices = dict()
for k,v in rules.items():
assert len(k) == 2
self._add(k)
for p in all_pairs(k[0] + v + k[1]):
self._add(p)
mat = np.zeros((len(self), len(self)), dtype=dtype)
for k,v in rules.items():
for p in all_pairs(k[0] + v + k[1]):
mat[ self[p], self[k] ] += 1
# mat[ [self[t] for t in all_pairs(k[0] + v + k[1])], self[k] ] += 1
self.M = np.matrix(mat)
def _add(self, word):
if word not in self.words:
idx = len(self.words)
self.words[word] = idx
self.indices[idx] = word
def __getitem__(self, key):
if hasattr(key, '__index__'):
return self.indices[key]
else:
return self.words[key]
def __len__(self):
return len(self.words)
def count_pairs(self, initial, steps):
inp = number_dict(all_pairs(initial))
vec = np.array([inp[self[i]] for i in range(len(self))], self.M.dtype)
res = (self.M**steps @ vec).A.squeeze()
ix = res.nonzero()[0]
# Note: would be better to skip this and do direct pairvector->elements here
# using NumPy indices but eh. I copy-pasted `count_elements()` below.
return dict((self[i], int(n)) for i,n in zip(ix, res[ix]))
def count_elements(self, initial, steps):
# Compensate for edge chars only being in 1 pair.
total = number_dict(initial[0] + initial[-1])
for k,v in self.count_pairs(initial, steps).items():
for c in k:
total[c] += v
return total // 2 # Each element was counted in two pairs.
```

I believe this kind of failure to apply obvious “zero-cost generalizations” is the reason programming languages tend to end up like JavaScript more often than Ruby or Python.↩