A parser engine
Puzzles and problems related to parsing occur naturally at
computer and human interaction. Often the hard parsing problems lead
to searching for alternatives that eventually turn out to be just as
problematic as the original challenge.
This parsing module is
an implementation of the marpa
parsing algorithm . It is distinctively easy to use.
This
module can be used to interpret a sequence or a string according to a
context free grammar. Alternative uses include generating strings or
detecting ambiguity in generated output.
This is one of the
libraries that made Lever feasible to implement by a single person.
Table of contents ↑ Table of contents 01. Short example of
use
from chartparser import Rule, Nonterminal, Terminal, preprocess
expr = Nonterminal("expr")
number = Nonterminal("number")
digit = Terminal("digit")
plus = Terminal("plus")
grammar = [
Rule(expr, [expr, plus, number], "math_op"),
Rule(expr, [number], "pass_number"),
Rule(number, [digit], "new_number"),
Rule(number, [number, digit], "add_digit")
]
new_parser = preprocess(grammar, expr)
parser = new_parser()
for ch in "123 + 456"
if ch.is_space()
continue
elif ch.is_digit()
parser.step(digit, ch)
elif ch == "+"
parser.step(plus, ch)
else
print("parsing failed")
print(parser.expect...)
exit(1)
print("accept?", parser.accepted)
if parser.accepted
result = parser.traverse((rule, tree, start, stop):
name = rule.annotation
if name == "new_number"
num = parse_int(tree[0])
return num
elif name == "add_digit"
return tree[0] * 10 + parse_int(tree[1])
elif name == "pass_number"
print("a number", tree[0])
return tree[0]
elif name == "math_op"
print("adding two numbers", tree[0], tree[2])
return tree[0] + tree[2]
else
assert false
"Not implemented: " ++ name)
print(result)
The
output:
accept? true
a number 123
adding two numbers 123 456
579
The
writing of grammar rules may be tedious, so there is a separate
language for writing rules and annotations into them.
02. Core interface A rule in a context-free
grammar.
This object is used to represent production rules in a
grammar. The meaning is roughly, that things on the rhs can be used in
the place of a lhs.
+init (self, lhs, rhs, annotation = null ) lhs Left hand side, expected to be
a Nonterminal. rhs Right hand side,
expected to be a sequence of Nonterminals and Terminals.
annotation A field for a custom annotation.
You can access this during the traverse to recognize the structure
from the rules that are present.
Creates a new
Rule.
+repr (self ) returns Representation of the rule.
LHS is separated from RHS with a '->'.
volatile
A symbol that is not
supposed to appear in the formal language that a grammar represents.
+init (self, name ) name Name, in theory it's
optional and not used for parsing, but it's a good idea to name
your nonterminals.
A symbol, which may appear
in the formal language that a grammar represents.
+init (self, name ) name Name, in theory it's
optional and not used for parsing, but it's a good idea to name
your terminals.
+repr (self ) returns the repr(name), prefixed
with '@'
volatile
preprocess (user_grammar, default_accept ) user_grammar List of Rule objects
representing the grammar you want to parse.
default_accept Default rule to accept, can
be chosen when invoking the initiator.
returns An Initiator.
Preprocesses a grammar and prepares for
parsing.
The preprocessing of a grammar should be relatively fast.
When it begins to take hundreds of milliseconds, you may want to
consider memoizing the Initiator.
Represents an preprocessed
grammar, ready for parsing.
+call (self, accept = null ) accept This parsing algorithm
doesn't optimize the grammar by the accept rule, so you can pick
any nonterminal during instantiation as the accept symbol.
returns a new Parser, containing starting
state.
Instantiates the parser.
+init (self, grammar, blankset, right_recursive, default_accept ) returns Creates a new initiator.
Use the 'preprocess' function to
create a new Initiator.
internal
A parser state you can
step through the input.
the name may change
+init (self, init, accept, output ) init Initiator
accept Accept rule this parser was created
with. output Starting output, usually
empty.
Use the Initiator.+call to create a new
parser, rather than this.
internal
boolean, on whether
.output is filled or not.
Indicates whether the input is accepted.
Related to ambiguity resolution,
throws an exception.
internal
A set of symbols the
parser is expecting.
expecting (self, symbol ) symbol Symbol to be checked.
returns true or false
Can be used to check whether a specific
terminal or nonterminal is expected.
step (self, term, token, start = null, stop = null ) term The Terminal (or Nonterminal)
to step with. token The token to associate
to this step. start Start source location
(if used) stop Stop source location (if
used).
Steps the parsing state with a symbol.
traverse (self, postorder_cb, blank_cb = null, resolve_ambiguity = null ) postorder_cb Traversing function to
use. Returns: (rule, tree, start, stop) blank_cb
An optional callback for generating blank symbol. As a
default the postorder_cb is used with generated arguments.
resolve_ambiguity A function callback for
resolving ambiguity in a grammar during postorder traverse. Not
documented for now. returns Whatever the
postorder_cb does.
Postorder traverse the
parse tree.
03. Implementation details
not documented
volatile
+hash (self ) self not documented
returns not documented
not documented
volatile
+init (self, rule, pos, origin ) self not documented
rule not documented pos
not documented origin
not documented
returns not documented
not documented
volatile
+repr (self ) self not documented
returns not documented
not documented
volatile
is_completed (self ) self not documented
returns not documented
not documented
volatile
is_confirmed (self ) self not documented
returns not documented
not documented
volatile
is_predicted (self ) self not documented
returns not documented
not documented
volatile
next (self ) self not documented
returns not documented
not documented
volatile
penult (self ) self not documented
returns not documented
not documented
volatile
postdot (self ) self not documented
returns not documented
not documented
volatile
not documented
volatile
+init (self, pos = null, indent = null, dedent = null, newline = null ) self not documented
pos not documented indent
not documented dedent
not documented newline
not documented
returns not documented
not documented
volatile
finish (self, parser, pos ) self not documented
parser not documented pos
not documented
returns not documented
not documented
volatile
slip (self, parser, pos, source ) self not documented
parser not documented pos
not documented source
not documented
returns not documented
not documented
volatile
step (self, parser, pos, source ) self not documented
parser not documented pos
not documented source
not documented
returns not documented
not documented
volatile
not documented
volatile
+init (self, left, cc ) self not documented
left not documented cc
not documented
returns not documented
not documented
volatile
to_sppf (self ) self not documented
returns not documented
not documented
volatile
not documented
volatile
+init (self, left, rule, sppf ) self not documented
left not documented rule
not documented sppf not documented
returns not documented
not documented
volatile
not documented
volatile
+init (self, left, right, link = null ) self not documented
left not documented right
not documented link not documented
returns not documented
not documented
volatile
not documented
volatile
+init (self, rule, present ) self not documented
rule not documented
present not documented
returns not documented
not documented
volatile
not documented
volatile
+init (self, value ) self not documented
value not documented
returns not documented
not documented
volatile
not documented
volatile
+init (self, start, stop, cell, link ) self not documented
start not documented stop
not documented cell not documented
link not documented
returns not documented
not documented
volatile
+iter (self ) self not documented
returns not documented
not documented
volatile
insert (self, left, right ) self not documented
left not documented right
not documented
returns not documented
not documented
volatile
is_leaf (self ) self not documented
returns not documented
not documented
volatile
single (self ) self not documented
returns not documented
not documented
volatile
to_sppf (self ) self not documented
returns not documented
not documented
volatile
not documented
volatile
+init (self, message, location, source, at_eof = null ) self not documented
message not documented
location not documented
source not documented
at_eof not documented
returns not documented
not documented
volatile
+repr (self ) self not documented
returns not documented
not documented
volatile
not documented
volatile
+init (self, expect, location, source, at_eof = null ) self not documented
expect not documented
location not documented
source not documented
at_eof not documented
returns not documented
not documented
volatile
+repr (self ) self not documented
returns not documented
not documented
volatile
all_nullable (rhs, nullable ) rhs not documented
nullable not documented
returns not documented
not documented
volatile
ambiguity_traverser (sppf, postorder_cb, blank_cb, resolve_ambiguity ) sppf not documented
postorder_cb not documented
blank_cb not documented
resolve_ambiguity not documented
returns not documented
not documented
volatile
build_nnf (grammar, nullable ) grammar not documented
nullable not documented
returns not documented
not documented
volatile
cache_transitions (transitions, eim, cc, leims ) transitions not documented
eim not documented cc
not documented leims
not documented
returns not documented
not documented
volatile
count_nonrec (rule ) rule not documented
returns not documented
not documented
volatile
expand (start, stop, cell, blank_callback, seq ) start not documented
stop not documented cell
not documented blank_callback
not documented seq not documented
returns not documented
not documented
volatile
find_nullable (grammar ) grammar not documented
returns not documented
not documented
volatile
is_leo_eligible (edges, right_recursive ) edges not documented
right_recursive not documented
returns not documented
not documented
volatile
make_default_blank (parser, postorder_cb ) parser not documented
postorder_cb not documented
returns not documented
not documented
volatile
nihilist_rule (rule, index, nullable ) rule not documented
index not documented
nullable not documented
returns not documented
not documented
volatile
prediction (current, nodes, grammar, transitions, postdot ) current not documented
nodes not documented
grammar not documented
transitions not documented
postdot not documented
returns not documented
not documented
volatile
repr_spaces (seq, space = null ) seq not documented
space not documented
returns not documented
not documented
volatile
shift_eim (current, nodes, eim, bb, cc ) current not documented
nodes not documented eim
not documented bb not documented
cc not documented
returns not documented
not documented
volatile
shift_eims (current, nodes, edges, cc, right_recursive, leims ) current not documented
nodes not documented
edges not documented cc
not documented right_recursive
not documented leims
not documented
returns not documented
not documented
volatile
(a : Terminal , b : Nonterminal ) a not documented
b not documented
returns not documented
not documented volatile
(a : Nonterminal , b : Terminal ) a not documented
b not documented
returns not documented
not documented volatile
(a : Nonterminal , b : Nonterminal ) a not documented
b not documented
returns not documented
not documented volatile
(a : Terminal , b : Terminal ) a not documented
b not documented
returns not documented
not documented volatile
not documented
volatile
traverse_sppf (stack, postorder_cb, blank_cb, resolve_ambiguity ) stack not documented
postorder_cb not documented
blank_cb not documented
resolve_ambiguity not documented
returns not documented
not documented
volatile