Python Pest
Python Pest is a port of the Rust Pest parsing library - a powerful, elegant PEG (Parsing Expression Grammar) parser generator. We use exactly the same grammar syntax as Pest v2, so existing Pest grammars can be used without modification.
Install
Use pip
or your favorite package manager.
Example
Let's say we want to parse input like:
Our grammar is just a Python string, possibly read from a file (reading from a file means we don't need to worry about double escaping):
array = { "[" ~ int_list ~ "]" }
int_list = { int ~ ("," ~ int)* }
int = @{ "0" | ASCII_NONZERO_DIGIT ~ ASCII_DIGIT* }
WHITESPACE = _{ " " }
array
, int_list
, int
and WHITESPACE
are rules. A rule can reference other rules by name. ~
means followed by and |
is ordered choice.
To use a Pest grammar in Python we pass it to the static method Parser.from_grammar()
, which returns a new instance of Parser
.
from pest import Pair
from pest import Parser
# Read grammar rules from a file.
with open("example.pest", encoding="utf-8") as fd:
grammar = fd.read()
parser = Parser.from_grammar(grammar)
Instances of Parser
have a parse()
method that takes the name of the rule to start parsing from and the input text to parse. Here we parse our example input and dump a compact representation of the resulting parse tree.
# ... continued
parse_tree = parser.parse("array", "[1, 2, 3, 42]")
print(parse_tree.dumps())
# - array > int_list
# - int: "1"
# - int: "2"
# - int: "3"
# - int: "42"
Debug output
A more detailed parse tree representation is available by passing compact=False
to Pairs.dumps()
.
[
{
"rule": "array",
"span": {
"str": "[1, 2, 3, 42]",
"start": 0,
"end": 13
},
"inner": [
{
"rule": "int_list",
"span": {
"str": "1, 2, 3, 42",
"start": 1,
"end": 12
},
"inner": [
{
"rule": "int",
"span": {
"str": "1",
"start": 1,
"end": 2
},
"inner": []
},
{
"rule": "int",
"span": {
"str": "2",
"start": 4,
"end": 5
},
"inner": []
},
{
"rule": "int",
"span": {
"str": "3",
"start": 7,
"end": 8
},
"inner": []
},
{
"rule": "int",
"span": {
"str": "42",
"start": 10,
"end": 12
},
"inner": []
}
]
}
]
}
]
A parse tree is composed of token Pair
and Pairs
types, where each node represents a matched grammar rule and all descendant rules. To make traversing and transforming that tree expressive and type-safe, we recommend using Python's structural pattern matching (match
/case
) syntax. It lets you destructure parse tree nodes directly by rule name and inner content, clearly showing what each branch of your parser expects.
For this very simple example, we need just one match expression to match the array
token pair and unpack its inner int_list
.
# ... continued
match parse_tree.first():
case Pair("array", [Pair("int_list", inner)]):
numbers = [int(p.text) for p in inner]
case _:
raise ValueError("unexpected parse tree")
print(numbers)
Code generation
So far we've parsed input text directly from a grammar tree (the Parser
instance), but you can also generate a Python module with Parser.generate()
. This is something you'd do once after modifying your grammar.
TODO: Show how to do this with the CLI - once I've written it.
Generated parser modules expose a parse()
function, a Rule
enum and a simple command line interface for testing your generated parser manually.
Important
This example is not continued from above. We are importing from the parser.py
module we've just generated.
from parser import Rule
from parser import parse
from pest import Pair
parse_tree = parse(Rule.ARRAY, "[1, 2, 3, 42]")
match parse_tree.first():
case Pair(Rule.ARRAY, [Pair(Rule.INT_LIST, inner)]):
numbers = [int(p.text) for p in inner]
case _:
raise ValueError("unexpected parse tree")
print(numbers)
Parse trees obtained from generated code are identical to those returned by Parser.parse()
.
More examples
More involved and realistic examples can be found in the examples/
folder in the root of this projects source tree.
examples/jsonpath
is an implementation of RFC 9535 that uses the precedence climbing technique to handle operator precedence. You can compare it directly to python-jsonpath-rfc9535, which is implemented with a hand-crafted parser and identical internal representation.
examples/calculator
shows three different approaches to handling operator precedence:
examples/calculator/prec_climber.py
- Precedence climbing method,examples/calculator/pratt.py
- Pratt parsing (this one is the most readable),examples/calculator/grammar_encoded_prec.py
- precedence encoded directly in the grammar.