Part 1

It seems the first example of language parsing is writing a basic four-function calculator… not really remembering any of that, I’m going to write a recursive state machine, with an accumulator and three states:

  1. Scanning for next token. If we encounter a digit or open paren, change state. If an operator, store it.
  2. In a number. Once it ends (encounter whitespace or close paren, or reach the end of the expression), we apply it to the accumulator with the saved operator.
  3. In a paren block of depth n (each encountered open paren, including the first one, while in this state increases depth by one, close parens decrease by one). Once depth is 0 we recurse on the internal expression, applying the result to accumulator. If we’re in this state at the end of the expression it’s an error.

State machines can be tricky, since it’s easy to get into an unexpected state, or forget a state change, or forget to clean up based on the state it’s in at the end. I had to deal with a couple of those, using liberal @debug statements. Then there was a stack overflow, since I was recursing with the parenthentical expression including parens. But overall pretty straightforward following the outlined logic.

Part 2

Well I suppose now I do have to fully parse the expression into an AST and then evaluate it. I’ll store each node in a mutable struct:

mutable struct Node
    left::Union{Node, Nothing}
    right::Union{Node, Nothing}
    op::Union{Function, Nothing}

The parser will be a simplified version of the state machine from part 1:

  1. Scanning for next token. If we encounter an asterisk then set op to * and everything before it goes into left and everything after goes into right for further parsing. If we encounter a plus then note its location. If an open paren then change state.
  2. In a paren block of depth n. If a closing paren decrease depth. If depth now is 0, set left to everything enclosed in the top-level parens (this may be overwritten later) and go back to scanning.

We need to do some cleanup at the end of parsing if op is nothing:

  1. If we noted a location for a plus then set op to + and everything before it goes into left and everything after goes into right for further parsing.
  2. If we set something for left but there’s nothing for right then it’s a fully enclosed parenthentical, so just promote left.

After parsing we need to eval() the expression, which is pretty simple:

  1. If both left and right are nothing, then we’re at a leaf (a number), so parse it into an integer.
  2. Otherwise call op with eval(left) and eval(right) as arguments.

I’m sure there’s easier/better ways to do this, but I’m happy I was able to work it out. It did take quite a bit of trial-and-error with test cases though. And worse, all the test cases passed but the answer was wrong! I ended up grabbing someone’s solution code from r/adventofcode and having it print out the answer for every line of my test input (without looking at the logic. No, really!), then comparing to my output. I then turned one of the wrong expressions into another test case. It was pretty clear what was wrong - I wasn’t allowing multiple sub-expressions to be on the left - which was actually from over-complicated logic.