Manual: chartparser

A parser engine

Puzzles and problems related to parsing occur naturally at computer and human interaction. Often the hard parsing problems lead to searching for alternatives that eventually turn out to be just as problematic as the original challenge.

This parsing module is an implementation of the marpa parsing algorithm. It is distinctively easy to use.

This module can be used to interpret a sequence or a string according to a context free grammar. Alternative uses include generating strings or detecting ambiguity in generated output.

This is one of the libraries that made Lever feasible to implement by a single person.

Table of contents ↑
Table of contents
01. Short example of use
02. Core interface
03. Implementation details

01. Short example of use

from chartparser import Rule, Nonterminal, Terminal, preprocess

expr   = Nonterminal("expr")
number = Nonterminal("number")
digit = Terminal("digit")
plus  = Terminal("plus")

grammar = [
    Rule(expr, [expr, plus, number], "math_op"),
    Rule(expr, [number],             "pass_number"),
    Rule(number, [digit],            "new_number"),
    Rule(number, [number, digit],    "add_digit")
]
new_parser = preprocess(grammar, expr)
parser = new_parser()
for ch in "123 + 456"
    if ch.is_space()
        continue
    elif ch.is_digit()
        parser.step(digit, ch)
    elif ch == "+"
        parser.step(plus, ch)
    else
        print("parsing failed")
        print(parser.expect...)
        exit(1)
print("accept?", parser.accepted)
if parser.accepted
    result = parser.traverse((rule, tree, start, stop):
        name = rule.annotation
        if name == "new_number"
            num = parse_int(tree[0])
            return num
        elif name == "add_digit"
            return tree[0] * 10 + parse_int(tree[1])
        elif name == "pass_number"
            print("a number", tree[0])
            return tree[0]
        elif name == "math_op"
            print("adding two numbers", tree[0], tree[2])
            return tree[0] + tree[2]
        else
            assert false
                "Not implemented: " ++ name)
    print(result)

The output:

accept? true
a number 123
adding two numbers 123 456
579

The writing of grammar rules may be tedious, so there is a separate language for writing rules and annotations into them.

02. Core interface

class Rule extends object
A rule in a context-free grammar.

This object is used to represent production rules in a grammar. The meaning is roughly, that things on the rhs can be used in the place of a lhs.

+init(self, lhs, rhs, annotation = null)
lhs Left hand side, expected to be a Nonterminal.
rhs Right hand side, expected to be a sequence of Nonterminals and Terminals.
annotation A field for a custom annotation. You can access this during the traverse to recognize the structure from the rules that are present.
Creates a new Rule.
+repr(self)
returns Representation of the rule. LHS is separated from RHS with a '->'.
volatile
class Nonterminal extends object
A symbol that is not supposed to appear in the formal language that a grammar represents.
+init(self, name)
name Name, in theory it's optional and not used for parsing, but it's a good idea to name your nonterminals.
+repr(self)
returns repr(self.name)
volatile
class Terminal extends object
A symbol, which may appear in the formal language that a grammar represents.
+init(self, name)
name Name, in theory it's optional and not used for parsing, but it's a good idea to name your terminals.
+repr(self)
returns the repr(name), prefixed with '@'
volatile
preprocess(user_grammar, default_accept)
user_grammar List of Rule objects representing the grammar you want to parse.
default_accept Default rule to accept, can be chosen when invoking the initiator.
returns An Initiator.
Preprocesses a grammar and prepares for parsing.

The preprocessing of a grammar should be relatively fast. When it begins to take hundreds of milliseconds, you may want to consider memoizing the Initiator.

class Initiator extends object
Represents an preprocessed grammar, ready for parsing.
+call(self, accept = null)
accept This parsing algorithm doesn't optimize the grammar by the accept rule, so you can pick any nonterminal during instantiation as the accept symbol.
returns a new Parser, containing starting state.
Instantiates the parser.
+init(self, grammar, blankset, right_recursive, default_accept)
returns Creates a new initiator.
Use the 'preprocess' function to create a new Initiator.
internal
class Parser extends object
A parser state you can step through the input.
the name may change
+init(self, init, accept, output)
init Initiator
accept Accept rule this parser was created with.
output Starting output, usually empty.
Use the Initiator.+call to create a new parser, rather than this.
internal
accepted : property
boolean, on whether .output is filled or not.

Indicates whether the input is accepted.

default_ambiguity_resolution(self, sppf)
Related to ambiguity resolution, throws an exception.
internal
expect : property
A set of symbols the parser is expecting.
expecting(self, symbol)
symbol Symbol to be checked.
returns true or false
Can be used to check whether a specific terminal or nonterminal is expected.
step(self, term, token, start = null, stop = null)
term The Terminal (or Nonterminal) to step with.
token The token to associate to this step.
start Start source location (if used)
stop Stop source location (if used).
Steps the parsing state with a symbol.
traverse(self, postorder_cb, blank_cb = null, resolve_ambiguity = null)
postorder_cb Traversing function to use. Returns: (rule, tree, start, stop)
blank_cb An optional callback for generating blank symbol. As a default the postorder_cb is used with generated arguments.
resolve_ambiguity A function callback for resolving ambiguity in a grammar during postorder traverse. Not documented for now.
returns Whatever the postorder_cb does.
Postorder traverse the parse tree.

03. Implementation details

class EIM extends object
not documented
volatile
+hash(self)
self not documented
returns not documented
not documented
volatile
+init(self, rule, pos, origin)
self not documented
rule not documented
pos not documented
origin not documented
returns not documented
not documented
volatile
+repr(self)
self not documented
returns not documented
not documented
volatile
is_completed(self)
self not documented
returns not documented
not documented
volatile
is_confirmed(self)
self not documented
returns not documented
not documented
volatile
is_predicted(self)
self not documented
returns not documented
not documented
volatile
next(self)
self not documented
returns not documented
not documented
volatile
penult(self)
self not documented
returns not documented
not documented
volatile
postdot(self)
self not documented
returns not documented
not documented
volatile
class IndentParser extends object
not documented
volatile
+init(self, pos = null, indent = null, dedent = null, newline = null)
self not documented
pos not documented
indent not documented
dedent not documented
newline not documented
returns not documented
not documented
volatile
finish(self, parser, pos)
self not documented
parser not documented
pos not documented
returns not documented
not documented
volatile
slip(self, parser, pos, source)
self not documented
parser not documented
pos not documented
source not documented
returns not documented
not documented
volatile
step(self, parser, pos, source)
self not documented
parser not documented
pos not documented
source not documented
returns not documented
not documented
volatile
class LEO extends object
not documented
volatile
+init(self, left, cc)
self not documented
left not documented
cc not documented
returns not documented
not documented
volatile
stop : property
not documented
volatile
to_sppf(self)
self not documented
returns not documented
not documented
volatile
class NNF extends object
not documented
volatile
+init(self, rule, present)
self not documented
rule not documented
present not documented
returns not documented
not documented
volatile
class Resolve extends object
not documented
volatile
+init(self, value)
self not documented
value not documented
returns not documented
not documented
volatile
class SPPF extends object
not documented
volatile
+init(self, start, stop, cell, link)
self not documented
start not documented
stop not documented
cell not documented
link not documented
returns not documented
not documented
volatile
+iter(self)
self not documented
returns not documented
not documented
volatile
insert(self, left, right)
self not documented
left not documented
right not documented
returns not documented
not documented
volatile
is_leaf(self)
self not documented
returns not documented
not documented
volatile
single(self)
self not documented
returns not documented
not documented
volatile
to_sppf(self)
self not documented
returns not documented
not documented
volatile
class SyntaxError extends Exception
not documented
volatile
+init(self, message, location, source, at_eof = null)
self not documented
message not documented
location not documented
source not documented
at_eof not documented
returns not documented
not documented
volatile
+repr(self)
self not documented
returns not documented
not documented
volatile
class SyntaxErrorExpected extends SyntaxError
not documented
volatile
+init(self, expect, location, source, at_eof = null)
self not documented
expect not documented
location not documented
source not documented
at_eof not documented
returns not documented
not documented
volatile
+repr(self)
self not documented
returns not documented
not documented
volatile
all_nonterminals(rhs)
rhs not documented
returns not documented
not documented
volatile
all_nullable(rhs, nullable)
rhs not documented
nullable not documented
returns not documented
not documented
volatile
ambiguity_traverser(sppf, postorder_cb, blank_cb, resolve_ambiguity)
sppf not documented
postorder_cb not documented
blank_cb not documented
resolve_ambiguity not documented
returns not documented
not documented
volatile
build_nnf(grammar, nullable)
grammar not documented
nullable not documented
returns not documented
not documented
volatile
cache_transitions(transitions, eim, cc, leims)
transitions not documented
eim not documented
cc not documented
leims not documented
returns not documented
not documented
volatile
count_nonrec(rule)
rule not documented
returns not documented
not documented
volatile
dir : path
not documented
volatile
expand(start, stop, cell, blank_callback, seq)
start not documented
stop not documented
cell not documented
blank_callback not documented
seq not documented
returns not documented
not documented
volatile
find_nullable(grammar)
grammar not documented
returns not documented
not documented
volatile
find_right_recursive(grammar)
grammar not documented
returns not documented
not documented
volatile
format_origin(source, location, message = null)
source not documented
location not documented
message not documented
returns not documented
not documented
volatile
import : Import
not documented
volatile
is_leo_eligible(edges, right_recursive)
edges not documented
right_recursive not documented
returns not documented
not documented
volatile
main()
returns not documented
not documented
volatile
make_default_blank(parser, postorder_cb)
parser not documented
postorder_cb not documented
returns not documented
not documented
volatile
name = "chartparser"
not documented
volatile
nihilist_rule(rule, index, nullable)
rule not documented
index not documented
nullable not documented
returns not documented
not documented
volatile
prediction(current, nodes, grammar, transitions, postdot)
current not documented
nodes not documented
grammar not documented
transitions not documented
postdot not documented
returns not documented
not documented
volatile
repr_spaces(seq, space = null)
seq not documented
space not documented
returns not documented
not documented
volatile
shift_eim(current, nodes, eim, bb, cc)
current not documented
nodes not documented
eim not documented
bb not documented
cc not documented
returns not documented
not documented
volatile
shift_eims(current, nodes, edges, cc, right_recursive, leims)
current not documented
nodes not documented
edges not documented
cc not documented
right_recursive not documented
leims not documented
returns not documented
not documented
volatile
symbol_lt
(a : Terminal, b : Nonterminal)
a not documented
b not documented
returns not documented
not documented
volatile
(a : Nonterminal, b : Terminal)
a not documented
b not documented
returns not documented
not documented
volatile
(a : Nonterminal, b : Nonterminal)
a not documented
b not documented
returns not documented
not documented
volatile
(a : Terminal, b : Terminal)
a not documented
b not documented
returns not documented
not documented
volatile
not documented
volatile
traverse_sppf(stack, postorder_cb, blank_cb, resolve_ambiguity)
stack not documented
postorder_cb not documented
blank_cb not documented
resolve_ambiguity not documented
returns not documented
not documented
volatile
warshall_transitive_closure(a)
a not documented
returns not documented
not documented
volatile