Passage Pathing

The first pure graph problem?

The problem statement is to count the number of possible paths from the nodes start to end in a unidirected graph while only visiting most vertices once.

I say “most” because of course they found a way to make the problem description overly convoluted. There’s two types of vertices, uppercase and lowercase, and in part 1 we can only visit the lowercase vertices once, although in part 2 we can visit at most one lowercase node twice. Uppercase vertices we can visit any number of times.

So we’re kind of looking to count Hamiltonian-ish paths.

First some reasonable observations:

  1. We cannot have any cycles with uppercase vertices that are still connected to the end point, otherwise we would have an infinite number of paths. In fact, no two uppercase vertices can be adjacent (connected), as we could just go back and forth.

  2. This feels very much like an NP problem. Enumerating (listing) all the paths would certainly be exponential. (Consider a complete k-graph: there would be different paths of length .) However, I’m uncertain if merely counting them markedly changes the scenario in our case since we’re still dealing with general graphs. It feels like we can’t go subexponential, so we should probably be happy with an exponential solution.

  3. We can simplify the graph a little by outright removing the uppercase vertices and connecting up all the lowercase vertices that could reach each other through them. This should also substantially lower the cost of our algorithm (although we’re still exponential).

    1. Why does this work? Because the uppercase vertices don’t actually constrain us in any way, we can travel over them freely to any of their neighbors as if we did nothing at all; so we might as well travel directly to that neighbor.

    2. This simplifies us having to deal with uppercase vertices, however now we have to deal with multiplicities of edges, because there might be two edges between a and b and they count for two different paths, so whenever we go from a to b or vice versa, we have to multiply the result by 2.

The simplest approach to most graph problems is either DFS (depth-first search) or BFS (breadth-first search), depending on the problem. We’ll stick with that here. Even though by point (3) above we could turn it into some dynamic programming thing because we could use bitmasks for vertex membership, but that would complicate our part 2 and blah, blah, blah.

Prelude

The template prelude as per usual. It’s growing.

aoc = __import__('aoc-common').aoc()
import sys
import numpy as np
import scipy.ndimage as ndi
import itertools as itt
import functools as fnt
import collections as coll

test = '''fs-end
he-DX
fs-he
start-DX
pj-DX
end-zg
zg-sl
zg-pj
pj-he
RW-he
fs-DX
pj-RW
zg-RW
start-pj
he-WI
zg-he
pj-fs
start-RW'''

text = test if '-t' in sys.argv else aoc.input_text(12)
lines = text.splitlines()

Counting Paths

First the code for simplifying the input by removing all the uppercase caves. Note that this assumes no two uppercase caves are adjacent, which they technically could be so long as they don’t connect to the end node (i.e. they can’t be part of any path). But I’ll ignore that.

def simplify(edges):
  big_caves = coll.defaultdict(list)
  G = graph()
  for e1,e2 in edges:
    e1,e2 = min(e1, e2), max(e1, e2)
    assert not e2.isupper()
    if e1.isupper():
      big_caves[e1].append(e2)
      for x in big_caves[e1]:
        G.add_edge(e2, x)
    else:
      G.add_edge(e1, e2)
  return G

And then a simple utility class with a pretty generic count_paths() that works for both part 1 and part 2 of the problem.

class graph(coll.defaultdict):
  def __init__(self):
    super().__init__(coll.Counter)

  def add_edge(self, e1, e2):
    self[e1][e2] += 1
    if e1 != e2:
      self[e2][e1] += 1

  def count_paths(self, start='start', end='end', duplicates=0):
    path = []

    def _count_ex(node, dups):
      if node == end:
        return 1
      cnt = 0
      path.append(node)
      for nxt,m in self[node].items():
        if nxt not in path:
          cnt += m * _count_ex(nxt, dups)
        elif dups > 0 and nxt != start:
          cnt += m * _count_ex(nxt, dups-1)
      path.pop()
      return cnt

    return _count_ex(start, duplicates)

And then just:

g = simplify(x.split('-') for x in lines)
answer1 = g.count_paths('start', 'end', 0)
answer2 = g.count_paths('start', 'end', 1)

print(f'answer to part 1: {answer1}')
print(f'answer to part 2: {answer2}')

A Note on the Simplification

How good is our simplification?

Consider a simple naive count_paths() which still works with uppercase vertices:

def naive_count(node, path=list(), double=False, Q=[0]):
  if node == 'end':
    return 1

  if node.islower():
    path.append(node)

  cnt = 0
  for nxt in connected[node]:
    if nxt not in banned:
      cnt += naive_count(nxt, path, double)
    elif double and nxt != 'start':
      cnt += naive_count(nxt, path, False)

  if node.islower():
    path.pop()
  return cnt

A decent measure in Python would be to simply count the number of invocation of our function.

Here I’ll also measure another “naive” but straightforward way to accomplish part 2: instead of tracking a bool for whether we’ve used a vertex twice or not, we’ll iterate over all the vertices and actually duplicate the vertex in the graph, then count the number of paths like normal, and finally subtract and divide out all the paths we counted several times in this process. I call this “naive” not because it’s simple but because it’s a sort of brute force approach to reusing another solution. Its theoretical complexity isn’t actually that much worse (since we’re already in exponential land, it’s hard to worsen it), but of course in practice it kind of sucks:

method CALLS answer
naive_count 13292 4885
simplified 1436 4885
duplicate & naive_count 753592 117095
naive_count(double=True) 316603 117095
simplified 22182 117095

Here we see how good it is to simplify: we improve by at least an order of magnitude.