Syntax Scoring

What’s this, a non-NumPy task? They do exist.

Restated problem: match "<>()[]{}" delimiters in a string to find delimiter imbalance.

Fun fact: it would seem I learned nothing from this task because I fucked it up on a recent CTF task just a couple of days later. Picard’s forehead.

Note to self: think before coding.

Parser

First we might mention a trick that isn’t really a parser, but it works in this specific case:

def collapse(s):
  reps = '[] () {} <>'.split()
  sʻ = None
  while s != sʻ:
    sʻ = s
    for r in reps:
      s = s.replace(r, '')
  return s

Like I said, not really a parser, but it’s worth mentioning because this will be a lot more efficient than a character-by-character parser for simple cases that are not deeply nested.

As for the actual parsing, we can consider the underlying language that of matched parenthesis, and it is likely the simplest form of a structured language you can have. Yet it’s not as simple as regular languages, so normal regular expressions can’t match them1. It is a context-free langauge, so some form of state is required to parse them. The simplest is to just use a stack, either in the form of recursion (top-down parsers) or maintain some list, such as:

def simple_parser(s):
  table = dict(zip('{[(<', '}])>'))
  stack = []

  for c in s:
    if c in table:
      stack.append(table[c])
    elif stack and c == stack[-1]:
      stack.pop()
    else:
      return c

  return stack

In a more general setting (if there were text between the parenthesis for example), it would in general be better to search forward to the next point-of-interest rather than going character-by-character. Here regular expressions could come in handy, as there is no str.findany().

And what we’re seeking is providing proof that the given string are not part of the language, in the form of imbalanced or unclosed brackets.

Part 1

Focuses on finding unopened delimiters, i.e. find the first closing delimiter that was never opened.

There’s some extra noise about scoring them, which I find totally unnecessary, but alright.

So, including both the code snippets mentioned above:


aoc = __import__('aoc-common').aoc()
import numpy as np
from itertools import repeat

# test = [
#   '{([(<{}[<>[]}>{[]{[(<()>',
#   '[[<[([]))<([[{}[[()]]]',
#   '[{[{({}]{}}([{[{{{}}([]',
#   '[<(<(<(<{}))><([]([]()',
#   '<{([([[(<>()){}]>(<<{{',
# ]

text = aoc.input_text(10)
lines = text.splitlines()


# The hack.
def collapse(s):
  # Python's lack of do-while...
  reps = '[] () {} <>'.split()
  sʻ = None
  while s != sʻ:
    sʻ = s
    for r in reps:
      s = s.replace(r, '')
  return s


# The proper.
def simple_parser(s):
  table = dict(zip('{[(<', '}])>'))
  stack = []

  for c in s:
    if c in table:
      stack.append(table[c])
    elif stack and c == stack[-1]:
      stack.pop()
    else:
      return c

  return stack



# collapsed = [collapse(l) for l in lines]
collapsed = [simple_parser(l) for l in lines]

corrupted = [x for x in collapsed if isinstance(x, str)]
scores1 = {
  ')': 3,
  ']': 57,
  '}': 1197,
  '>': 25137,
}

answer1 = sum(map(scores1.get, corrupted))
print(f'answer to part 1: {answer1}')

Part 2

Here we are to find the unclosed delimiters. I.e. what set of delimiters would it take to balance the string.

Again there’s a silly scoring system2. It seems like this additional red tape is to verify the correctness of our answer, but I personally feel the most “honest” way to do it would be to require you input sha3( <answer> ).hexdigest() or something on the website instead of inventing some arbitrary “scoring method” for each task.

from statistics import median  # did you even know `statistics` existed; be honest.

incomplete = [x for x in collapsed if not isinstance(x, str)]
scores2 = {
  ')': 1,
  ']': 2,
  '}': 3,
  '>': 4,
}

# Basically treat the delimiters as (non-zero) digits in a base-5 number.
answer2 = median(sum(5**i * scores2[c] for i,c in enumerate(st)) for st in incomplete)

print(f'answer to part 2: {answer2}')
  1. to say nothing of Perl’s REs, they are a beast of their own.

  2. which actually just simplifies to interpreting our resulting string as a base-5 number, though it is not explained as such.