# Passage Pathing

The first pure graph problem?

The problem statement is to count the number of possible paths from the nodes
`start`

to `end`

in a unidirected graph while only visiting *most* vertices
once.

I say “*most*” because of course they found a way to make the problem
description overly convoluted. There’s two types of vertices, uppercase and
lowercase, and in part 1 we can only visit the lowercase vertices once, although
in part 2 we can visit *at most one lowercase node twice*. Uppercase vertices we
can visit any number of times.

So we’re kind of looking to count Hamiltonian-ish paths.

First some reasonable observations:

We cannot have any cycles with uppercase vertices that are still connected to the end point, otherwise we would have an infinite number of paths. In fact, no two uppercase vertices can be adjacent (connected), as we could just go back and forth.

This feels very much like an NP problem.

*Enumerating*(listing) all the paths would certainly be exponential. (Consider a complete k-graph: there would be $(k−2)!$ different paths of length $k−1$.) However, I’m uncertain if merely counting them markedly changes the scenario in our case since we’re still dealing with general graphs. It*feels*like we can’t go subexponential, so we should probably be happy with an exponential solution.We can

*simplify*the graph a little by outright*removing*the uppercase vertices and connecting up all the lowercase vertices that could reach each other through them. This should also substantially lower the cost of our algorithm (although we’re still exponential).Why does this work? Because the uppercase vertices don’t actually constrain us in any way, we can travel over them freely to any of their neighbors as if we did nothing at all; so we might as well travel directly to that neighbor.

This simplifies us having to deal with uppercase vertices, however now we have to deal with multiplicities of edges, because there might be two edges between

`a`

and`b`

and they count for two different paths, so whenever we go from`a`

to`b`

or vice versa, we have to multiply the result by 2.

The simplest approach to most graph problems is either DFS (depth-first search) or BFS (breadth-first search), depending on the problem. We’ll stick with that here. Even though by point (3) above we could turn it into some dynamic programming thing because we could use bitmasks for vertex membership, but that would complicate our part 2 and blah, blah, blah.

## Prelude

The template prelude as per usual. It’s growing.

```
aoc = __import__('aoc-common').aoc()
import sys
import numpy as np
import scipy.ndimage as ndi
import itertools as itt
import functools as fnt
import collections as coll
test = '''fs-end
he-DX
fs-he
start-DX
pj-DX
end-zg
zg-sl
zg-pj
pj-he
RW-he
fs-DX
pj-RW
zg-RW
start-pj
he-WI
zg-he
pj-fs
start-RW'''
text = test if '-t' in sys.argv else aoc.input_text(12)
lines = text.splitlines()
```

## Counting Paths

First the code for simplifying the input by removing all the uppercase caves.
Note that this assumes no two uppercase caves are adjacent, which they
*technically could be* so long as they don’t connect to the `end`

node (i.e.
they can’t be part of any path). But I’ll ignore that.

```
def simplify(edges):
big_caves = coll.defaultdict(list)
G = graph()
for e1,e2 in edges:
e1,e2 = min(e1, e2), max(e1, e2)
assert not e2.isupper()
if e1.isupper():
big_caves[e1].append(e2)
for x in big_caves[e1]:
G.add_edge(e2, x)
else:
G.add_edge(e1, e2)
return G
```

And then a simple utility class with a pretty generic `count_paths()`

that works
for both part 1 and part 2 of the problem.

```
class graph(coll.defaultdict):
def __init__(self):
super().__init__(coll.Counter)
def add_edge(self, e1, e2):
self[e1][e2] += 1
if e1 != e2:
self[e2][e1] += 1
def count_paths(self, start='start', end='end', duplicates=0):
path = []
def _count_ex(node, dups):
if node == end:
return 1
cnt = 0
path.append(node)
for nxt,m in self[node].items():
if nxt not in path:
cnt += m * _count_ex(nxt, dups)
elif dups > 0 and nxt != start:
cnt += m * _count_ex(nxt, dups-1)
path.pop()
return cnt
return _count_ex(start, duplicates)
```

And then just:

```
g = simplify(x.split('-') for x in lines)
answer1 = g.count_paths('start', 'end', 0)
answer2 = g.count_paths('start', 'end', 1)
print(f'answer to part 1: {answer1}')
print(f'answer to part 2: {answer2}')
```

## A Note on the Simplification

How good is our simplification?

Consider a simple naive `count_paths()`

which still works with uppercase vertices:

```
def naive_count(node, path=list(), double=False, Q=[0]):
if node == 'end':
return 1
if node.islower():
path.append(node)
cnt = 0
for nxt in connected[node]:
if nxt not in banned:
cnt += naive_count(nxt, path, double)
elif double and nxt != 'start':
cnt += naive_count(nxt, path, False)
if node.islower():
path.pop()
return cnt
```

A decent measure in Python would be to simply count the number of invocation of our function.

Here I’ll also measure another “naive” but straightforward way to accomplish
part 2: instead of tracking a bool for whether we’ve used a vertex twice or not,
we’ll iterate over all the vertices and *actually duplicate* the vertex in the
graph, then count the number of paths like normal, and finally subtract and
divide out all the paths we counted several times in this process. I call this
“naive” not because it’s simple but because it’s a sort of brute force approach
to reusing another solution. Its *theoretical* complexity isn’t actually that much worse
(since we’re already in exponential land, it’s hard to worsen it), but of course in practice
it kind of sucks:

method | CALLS | answer |
---|---|---|

naive_count | 13292 | 4885 |

simplified | 1436 | 4885 |

duplicate & naive_count | 753592 | 117095 |

naive_count(double=True) | 316603 | 117095 |

simplified | 22182 | 117095 |

Here we see how good it is to simplify: we improve by at least an order of magnitude.