There is distinct effort made
to make the Lever language itself easier to develop. The grammar
language described here has a big role in there.
Here's
the context-free attribute grammar for the grammar language, described
in itself:
use alias()
a = [symbol]
on = ['constructive', 'use', 'terminal', 'append', 'concat']
use indentation(indent, dedent, newline)
can_close = [')', ']', '}']
file: sep(decl, newline)
decl:
'use' symbol '(' sep(symbol, ',') ')' (indent join(option, newline) dedent)? / use
'constructive'? 'terminal' join(symbol, ',') / terminal
symbol '(' sep(symbol, ',') ')' ':' body / template
symbol ':' body / rule
option: symbol '=' '[' sep(primitive, ',') ']'
primitive:
symbol / symbol
string / string
body:
prod / [.]
indent join(prod, newline) dedent
prod: expr* ('/' [annotation, symbol / shorthand])? / prod
expr:
term
'(' arg_block(prod) ')' '+' / plus
'(' arg_block(prod) ')' '*' / star
'(' arg_block(prod) ')' '?' / opt
symbol '(' arg_block(prod) ')' / expand
term:
term '+' / plus
term '*' / star
term '?' / opt
symbol / symbol
string / string
'[' arg_block(prod) ']' / prod_set
annotation:
annotation_term
annotation 'append' annotation_term / append
annotation 'concat' annotation_term / concat
annotation_term:
'(' annotation ')'
int / index
'.' / dot
'..' / dotdot
'null' / a_null
symbol '(' sep(annotation, ',') ')' / label
'[' sep(annotation, ',') ']' / a_list
arg_block(x):
sep(x, ',', ',')
indent sep_concat(x, newline) dedent
sep_concat(x, y):
sep(x, ',', ',')
seq_b(x) y sep(x, ',', ',') / (. concat .)
01. Structure
The 'lhs: rhs' part in the
grammar is the production rule, the '/ thing' is an
annotation that the rule maps into.
01.1. How to
read the attribute
The annotation coming after the rule may
consists of a single symbol or even just an expression.
The
annotation is a template that is filled by deducting the details from
the expression.
01.2. Capturing elements
Some pieces of the right hand side rule are ignored. These are
the keyword string tokens and tokens that have been marked as
'constructive', such as indentation and dedentation tokens.
If there is not a single element that is captured, the ignored
elements are retrieved in whole, otherwise only the capturing elements
are captured.
When there is no annotation provided: If there is
only one element captured, it is captured 'as it', otherwise
the captured elements are inserted into a list.
When there is a
single symbol, the elements are captured into an attribute that holds
the name in that symbol.
01.3. Index and dot
annotations
The index and dot annotations are read such that
the elements captured by the index are marked as retrieved and not
double-captured by the dot annotations.
Every item, captured or
not captured, has an unique index. These indices start from 1, which
is merely a convention of the grammar language. You can think of the
index '0' as a symbol on the left hand side if you're a
programmer, although the left-hand side cannot be captured.
The
single dot '.' captures an one symbol, whereas the double
dot '..' captures all remaining symbols. The double dot may
only occur once, and the single dot may appear before and after the
double dot.
01.4. Annotation expressions
In
the annotation you may denote that the contents are placed into a
list, or that something gets appended into a list or that lists are
concatenated together.
You may also denote that a null is
explicitly passed into an annotation.
02. Extensions
Extensions do not change how the source grammar is parsed, but
they will provide additional hints about how it will be used.
The use of an extension is described with the 'use'
-directive, and the ordering of the extensions determines in which
order they are invoked during parsing.
The symbols passed as
'arguments' are defined as constructive terminals, which
means they are not gathered during parsing. These are used
02.1. indentation -extension
The
'indentation' extension allows you to define
layout-sensitive grammars. The terminals provided as input provide
tokens that may be used to trigger the layout sensitivity.
The
indent/dedent/newline is implemented by feeding such a token
whenever it may appear and when the parsing state expects it.
Therefore the current implementation of layout sensitivity may result
in surprising results at ambiguous grammars.
The can_close
-option lets you select tokens that allow the indentation level to
decrease without reaching the end of the line first.
02.2. alias -extension
The 'alias'
extension is meant for situations when you want to have keywords, but
do not want them to mess up with user code.
The effect of
'alias' is that whenever the parsing cannot proceed with one
of the terminals in the 'on' -list, but it could proceed
with one of the terminals in the 'a' -list, the alias
extension will rename the terminal to allow the parsing to pass.
Similarly to the indentation extension, the details are still a
bit vague on these ones, and may be subject to change.
03. A Simple Example
Here's a complicated way
to print '4' with the grammar library:
The
'calc.grammar':
addition: term '+' term / add
term: int / int
The
'calc.lc':
import grammar
main = (args):
calc_lang = grammar.read_file(dir ++ 'calc.grammar')
result = calc_lange.read_string("1 + 3")
result = result.traverse(traverse)
print(result)
traverse = (name, args, loc):
if name == 'int'
return parse_int(args[0])
elif name == 'add'
return args[0] + args[1]
else
assert false, [name, args, loc]