Extended Polymerization

Aaand back to NumPy.

Or well, perhaps the NumPy connection isn’t immediately obvious.

Problem statement: we’re given an input string (starting point) like ABCCABAC and a set of rewrite rules on the form AB -> C, CC -> ABC and so on. The rule AB -> ... means that AB is rewritten to A...B. I.e. the right-hand side is inserted between the two characters on the left-hand side. The left-hand side will always have two and only two characters. All rules are applied simultaneously to the input and each such application is a step. The goal is to count the final characters in the string after steps.

So given a starting point of ABC, no matter what set of rules we apply, or how many steps we go through, we’ll always end have a string on the form A...B...C where the ellipses denote arbitrary characters.

I’ve actually generalized the problem a little here since it doesn’t cost us anything. Pedantically they only allow rules that expand two characters into three, inserting a single character between them1.

The Problem & Naive Solution

First observation is that it’s probably easier to think in terms of counts of character-pairs than it is to think about the characters directly, as our rules apply on the level of character-pairs.

Let’s say we have a function like step("ABC", n) which gives a dictionary of pair -> count saying how many occurrences of a given pair there is in the final string after steps.

And so the second observation is that we have a simple recurrence relation here, because if we have step("ABC", n) we can easily calculate step("ABC", n+1) by going over all the pairs and expanding them according to the rules. I.e. with a rule like AB -> C, the number of AC and CB pairs at step will increase by the number of AB pairs at step .

Finally, given an input string like "ABC" we’re free to calculate the trajectory for "AB" and "BC" separately, as they will never influence each other.

So this problem is just a more complicated Lantenfish, really, and I’ll provide both the basic solutions.


The usual prelude with two notables:

  1. I’m using my own utility class number_dict which functions like collections.Counter() in most cases but has some extra functionality. Notably it “acts like a number” when doing arithmetic with it, broadcasting the arithmetic operations to its elements. I.e. x//2 divides all the counts in half.

  2. The trivial all_pairs() helper function which will be ubiquitous in both solutions below.

aoc = __import__('aoc-common').aoc()
import sys
import numpy as np
import scipy.ndimage as ndi
import itertools as itt
import functools as fnt
import collections as coll
from collections import abc, Counter
import re

from flagmining.dicts import number_dict

test = '''NNCB

CH -> B
HH -> N
CB -> H
NH -> C
HB -> C
HC -> B
HN -> C
NN -> C
BH -> H
NC -> B
NB -> B
BN -> B
BB -> N
BC -> B
CC -> N
CN -> C

text = test if '-t' in sys.argv else aoc.input_text(14)
polystr, rules = text.split('\n\n')
rules = dict(t.split(' -> ') for t in rules.splitlines())

def all_pairs(s):
  return [x+y for x,y in zip(s, s[1:])]

# ...
# ...
# ...
# ...

# solver = AbstractPairRewriteMemoizationProxyMixinFactoryBean(rules)
solver = cleverer_matrix_solution_thing(rules)

cnts = sorted(solver.count_elements(polystr, 10).values())
answer1 = cnts[-1] - cnts[0]

print(f'answer to part 1: {answer1}')

cnts = sorted(solver.count_elements(polystr, 40).values())
answer2 = cnts[-1] - cnts[0]

print(f'answer to part 2: {answer2}')

The Naive Solution

… like with Lanternfish we can start with the most basic solution to recurrence relation problems, which is memoization — or dynamic programming if you will, the former just being a very specific instance of the latter.

class AbstractPairRewriteMemoizationProxyMixinFactoryBean(dict):
  def __init__(self, rules):
    for k,v in rules.items():
      if len(k) != 2:
        raise ValueError(f"invalid replacement rule: {k} -> {v}")
      self[k,0] = number_dict(all_pairs(k[0] + v + k[1]))

  def count_pairs_memo(self, s, steps=1):
    return number_dict.sum(self[p,steps] for p in all_pairs(s))

  def __getitem__(self, key):
    ss, n = key
    assert len(ss) == 2 and n >= 0

    if n == 0:
      # Only one choice.
      return number_dict.one(ss)

    if (ss,n) not in self:
      # Memoization/cache.
      self[ss,n] = number_dict.sum(
        self[k,n-1]*v for k,v in self.get((ss, 0)).items())

    return super().__getitem__((ss,n))

  def count_elements(self, initial, steps):
    # Compensate for edge chars only being in 1 pair.
    total = number_dict(initial[0] + initial[-1])
    for k,v in self.count_pairs_memo(initial, steps).items():
      for c in k:
        total[c] += v
    return total // 2  # Each element was counted in two pairs.

Here wrapped in a class because I tend to consider it the lesser of two evils when compared to nested functions.


Just like in Lanternfish we can construct a matrix such that we perform a single step by multiplying with this matrix. Then we can use matrix exponentiation to calculate the recurrence more “directly” (complexity that scales with log(steps) rather than linearly).

The same notes apply, with regard to how this doesn’t really buy us much if we have exponential growth, since arithmetic (arbitrarily sized) integers will quickly grow to choke both algorithms way before the steps vs. log(steps) difference comes into play. But it’s very useful for working in finite fields.

There’s some extra boilerplate in this solution because we have to assign an index to each possible pair and then translate back and forth when we go from matrices or vectors over to strings.

class cleverer_matrix_solution_thing():
  def __init__(self, rules, dtype=np.uint64):
    self.words = dict()
    self.indices = dict()
    for k,v in rules.items():
      assert len(k) == 2
      for p in all_pairs(k[0] + v + k[1]):

    mat = np.zeros((len(self), len(self)), dtype=dtype)
    for k,v in rules.items():
      for p in all_pairs(k[0] + v + k[1]):
        mat[ self[p], self[k] ] += 1
      # mat[ [self[t] for t in all_pairs(k[0] + v + k[1])], self[k] ] += 1
    self.M = np.matrix(mat)

  def _add(self, word):
    if word not in self.words:
      idx = len(self.words)
      self.words[word] = idx
      self.indices[idx] = word

  def __getitem__(self, key):
    if hasattr(key, '__index__'):
      return self.indices[key]
      return self.words[key]

  def __len__(self):
    return len(self.words)

  def count_pairs(self, initial, steps):
    inp = number_dict(all_pairs(initial))
    vec = np.array([inp[self[i]] for i in range(len(self))], self.M.dtype)
    res = (self.M**steps @ vec).A.squeeze()
    ix = res.nonzero()[0]
    # Note: would be better to skip this and do direct pairvector->elements here
    # using NumPy indices but eh. I copy-pasted `count_elements()` below.
    return dict((self[i], int(n)) for i,n in zip(ix, res[ix]))

  def count_elements(self, initial, steps):
    # Compensate for edge chars only being in 1 pair.
    total = number_dict(initial[0] + initial[-1])
    for k,v in self.count_pairs(initial, steps).items():
      for c in k:
        total[c] += v
    return total // 2  # Each element was counted in two pairs.
  1. I believe this kind of failure to apply obvious “zero-cost generalizations” is the reason programming languages tend to end up like JavaScript more often than Ruby or Python.