Python Packrat Parsing

To kick things off: what is parsing? It’s like when your mom used to read you bedtime stories as a kid, but instead of reading words off a page, it’s reading code. And just like with bedtime stories, sometimes there are multiple ways to interpret the same thing (like whether Harry Potter should be pronounced “Harry Pottah” or “Pot-er”). That’s where parsing comes in it helps us figure out which interpretation is correct based on some set of rules.

Now packrat parsing specifically. Packrat parsing is a type of recursive descent parser that uses memoization to avoid redundant computations. In other words, instead of recomputing the same thing over and over again (like your mom reading you the same bedtime story every night), it remembers what it’s already computed so it can use that information later on.

So how does packrat parsing work? Let’s take a look at an example grammar:

# Importing the necessary module for parsing
from pyparsing import *

# Defining a forward expression to handle recursive parsing
expr = Forward()

# Defining a group for addition and subtraction operations
expr << (Group(Literal("+") | Literal("-")) + expr)

# Defining a group for multiplication, division, modulus, and floor division operations
expr << Group(Literal("*") | Literal("/") | Literal("%") | Literal("//") + expr)

# Defining a group for comparison operations
expr << Group(Literal("==") | Keyword("is") | Literal("!=") + expr)

# Adding a mapping to the expression
expr["a"] = "I am a mapping too!"

# Defining the starting point for parsing
START = expr

# Defining a group for comments using regular expressions
COMMENTS = (Token(re="#(?:[^rn]*(?:rn?|nr?))") | Token(re="/[*](?"))


This grammar defines an expression language that allows for addition, subtraction, multiplication, division, modulo, and equality/inequality comparisons. The `expr` variable is a forward reference it’s defined later in the code but used earlier to avoid circular dependencies (like when your mom would read you the end of the story before she got to the beginning).

The grammar uses recursive descent parsing, which means that each rule defines what can come next based on the current state. For example, `expr` can be followed by an addition or subtraction operator (`+` or `-`) and another expression (`expr`), or it can be followed by a multiplication, division, modulo, or equality/inequality comparison operator (`*`, `/`, `%`, `//`, `==`, or `is` or `!=`) and another expression.

The packrat parsing part comes in when we use memoization to avoid redundant computations. For example, if the parser encounters an addition operation followed by a multiplication operation (like `2 + 3 * 4`), it will first parse the addition operation using the same rules as before. But instead of immediately parsing the second expression (which would be `3 * 4`), it will remember that it’s already parsed this expression and use that information later on when it encounters the multiplication operator. This can significantly improve performance for complex grammars with many repeated subexpressions.

It may not be as exciting as bedtime stories, but it’s definitely more useful (and less likely to put you to sleep).

SICORPS