├── .gitignore
├── README.md
├── combinator_grammars.py
├── eg_B_compiler
    ├── README.md
    ├── ast.py
    ├── b2020.parson
    ├── bcomp.py
    ├── eg
    │   ├── eg0.b
    │   ├── eg0.s.ref
    │   ├── eg1.b
    │   ├── eg1.s.ref
    │   ├── eg2.b
    │   ├── eg2.s.ref
    │   ├── eg3.b
    │   └── eg3.s.ref
    ├── error_tests
    │   └── notb.b
    ├── gen_vm_asm.py
    ├── structs.py
    └── testme.sh
├── eg_basic.py
├── eg_bicicleta.py
├── eg_calc.py
├── eg_calc_compile.py
├── eg_calc_to_rpn.py
├── eg_ebnf
    ├── c_emit.py
    ├── ebnf.py
    ├── metagrammar.py
    ├── notes.text
    ├── structs.py
    └── vm.py
├── eg_fp.py
├── eg_itsy
    ├── README.md
    ├── ast.py
    ├── c_emitter.py
    ├── c_prelude.h
    ├── complainer.py
    ├── eg
    │   ├── examples.itsy
    │   ├── regex.itsy
    │   ├── sieve.itsy
    │   ├── superopt.itsy
    │   └── um.itsy
    ├── error_tests
    │   ├── bad.itsy
    │   ├── bad2.itsy
    │   └── lvalues.itsy
    ├── grammar
    ├── halpme.py
    ├── itsy.py
    ├── primitives.py
    ├── reref.sh
    ├── structs.py
    ├── testme.sh
    └── typecheck.py
├── eg_json.py
├── eg_linear_equations.py
├── eg_metapeg.py
├── eg_microses.py
├── eg_misc.py
├── eg_mutagen_from_js.py
├── eg_oberon0.py
├── eg_oberon0_with_lexer.py
├── eg_outline.py
├── eg_phone_num.py
├── eg_pother.py
├── eg_precedence.py
├── eg_puzzler.py
├── eg_regex.py
├── eg_roman.py
├── eg_templite.py
├── eg_trees.py
├── eg_url.py
├── eg_wc.py
├── microses.py
├── parson.py
├── peg.py
├── peglet_to_parson.py
├── pegvm.py
├── setup.py
├── structs.py
├── testsmoke.py
└── treepeg.py


/.gitignore:
--------------------------------------------------------------------------------
 1 | *.py[cod]
 2 | *~
 3 | 
 4 | # C extensions
 5 | *.so
 6 | 
 7 | # Packages
 8 | *.egg
 9 | *.egg-info
10 | dist
11 | build
12 | eggs
13 | parts
14 | bin
15 | var
16 | sdist
17 | develop-eggs
18 | .installed.cfg
19 | lib
20 | lib64
21 | 
22 | # Installer logs
23 | pip-log.txt
24 | 
25 | # Unit test / coverage reports
26 | .coverage
27 | .tox
28 | nosetests.xml
29 | 
30 | # Translations
31 | *.mo
32 | 
33 | # Mr Developer
34 | .mr.developer.cfg
35 | .project
36 | .pydevproject
37 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
 1 | parson
 2 | ======
 3 | 
 4 | Yet another PEG parser combinator library in Python. Selling points:
 5 | 
 6 |   * The optional concrete syntax for grammars incorporates semantic
 7 |     actions in a concise host-language-independent way. A Parson
 8 |     grammar won't tie you to Python.
 9 | 
10 |   * Whole grammars can be analyzed and compiled, even if built at
11 |     runtime using combinators. (Contrast with a monadic library, where
12 |     this is uncomputable.)
13 | 
14 |   * Semantic actions take and return values in a kind of point-free
15 |     style. 
16 | 
17 |   * You can use the concrete syntax with about as little ceremony as
18 |     `re.match`.
19 | 
20 |   * You can parse non-string sequences.
21 | 
22 | Anti-selling points:
23 | 
24 |   * This library's in fluid design still, undocumented, utterly
25 |     untuned, etc. I'd like you to use it if you think you might give
26 |     feedback on the design; otherwise, no promises.
27 | 
28 |   * Semantic actions work in a nontraditional way that may remind you
29 |     of Forth and which I haven't yet tried to make play well in typed
30 |     languages like Haskell. It's concise and just right for parsing,
31 |     but maybe in the end it'll turn out too cute and make me rip it
32 |     out if I want this to be used.
33 | 
34 |   * I don't intend to make grammars work in other host languages
35 |     before the design settles. (I have done this a bit for the
36 |     [Peglet](https://github.com/darius/peglet) library, a more basic
37 |     and settled expression of the same approach to actions: it has
38 |     Python and JavaScript ports.)
39 | 
40 | I guess the most similar library out there is LPEG, and that's way way
41 | more polished.
42 | 
43 | 
44 | Examples
45 | ========
46 | 
47 | For now, see all the eg_whatever.py files here. eg_calc.py,
48 | eg_misc.py, eg_wc.py, and eg_regex.py have the smallest ones.
49 | eg_trees.py shows parsing of tree structures, OMeta-style. Other
50 | examples include programming languages and other somewhat-bigger
51 | stuff.
52 | 
53 | Basic things still to explain:
54 |   * grammar syntax
55 |   * combinators
56 |   * recursion with combinators
57 |   * actions
58 | 
59 | Examples of where I've used it for more than examples:
60 |   * [IDEAL](https://github.com/darius/unreal/blob/master/parser.py), a drawing language
61 |   * [Linogram](https://github.com/darius/goobergram/blob/master/parser.py), also a drawing language
62 |   * [Pythological](https://github.com/darius/pythological/blob/master/parser.py), a MiniKanren with a vaguely Prologish frontend
63 |   * [tinyhiss](https://github.com/darius/tinyhiss/blob/master/parser.py) -- Smalltalkish
64 |   * [Squee](https://github.com/darius/squee/blob/master/parse_sans_offsides.py), an experimental language not much like any others
65 |   * [Toot](https://github.com/darius/toot/blob/master/parse.py), a tutorial on writing a bytecode compiler
66 | 
67 | 
68 | Needs more work:
69 | ================
70 | 
71 |   * There's a way to make a grammar automatically skip whitespace and
72 |     comments and such ('FNORD' rules), which probably should be done
73 |     differently.
74 | 
75 |   * It should be made easy to use with a separate lexer, and I haven't
76 |     tried this enough to say it's ready (it's probably not).
77 | 
78 |   * It should also be easy to write a 'real' compiler, where source-location
79 |     info gets added to all the AST nodes or whatever representation
80 |     you're building. This is doable but should be more automated.
81 | 
82 | After these design issues, this ought to be ported to a
83 | different-enough language to bring out issues of working nicely with
84 | multiple languages.
85 | 
86 | After *that*, I think it'd be time to tackle quality of implementation.
87 | 


--------------------------------------------------------------------------------
/combinator_grammars.py:
--------------------------------------------------------------------------------
 1 | """
 2 | A convenience for defining recursive grammars in the combinator DSL.
 3 | The delay() combinator works for this, but code using it is maybe uglier.
 4 | """
 5 | 
 6 | import parson as P
 7 | 
 8 | class Grammar(object):   # XXX call it something else? name clash
 9 |     def __init__(self):
10 |         object.__setattr__(self, '_rules', {})
11 |         object.__setattr__(self, '_stubs', {})
12 | 
13 |     def __getattr__(self, name):
14 |         try: return self._rules[name]
15 |         except KeyError: pass
16 |         try: return self._stubs[name]
17 |         except KeyError: pass
18 |         self._stubs[name] = result = P.delay(lambda: self._rules[name], '<%s>', name)
19 |         return result
20 | 
21 |     def __setattr__(self, name, value):
22 |         self._rules[name] = value
23 | 
24 | # Example:
25 | ## g = Grammar()
26 | ## g.a = 'A' + g.b
27 | ## g.b = 'B'
28 | ## g.a('AB')
29 | #. ()
30 | 
31 | # TODO try fancier examples
32 | # TODO investigate implementing via descriptors instead
33 | # TODO nicer error when misused
34 | 


--------------------------------------------------------------------------------
/eg_B_compiler/README.md:
--------------------------------------------------------------------------------
1 | A compiler from
2 | https://github.com/johnwcowan/pdp8x/blob/master/b202x.md to a custom
3 | virtual machine.
4 | 
5 | May not exactly fit the spec: I haven't yet added all the new
6 | features, and I followed the C operator-precedence table, which might
7 | have minor differences from B.
8 | 


--------------------------------------------------------------------------------
/eg_B_compiler/ast.py:
--------------------------------------------------------------------------------
 1 | """
 2 | Abstract syntax of B2020.
 3 | """
 4 | 
 5 | from structs import Struct
 6 | 
 7 | 
 8 | # Global declarations
 9 | 
10 | class Global(    Struct('name opt_size opt_init')): pass
11 | class Proc(      Struct('name params stmt')): pass
12 | 
13 | 
14 | # Statements
15 | 
16 | class Auto(      Struct('decls')): pass
17 | class Extern(    Struct('names')): pass
18 | class Static(    Struct('names')): pass
19 | class Block(     Struct('stmts')): pass
20 | class If_stmt(   Struct('exp then_ opt_else')): pass
21 | class While(     Struct('exp stmt')): pass
22 | class Switch(    Struct('exp stmt')): pass
23 | class Goto(      Struct('exp')): pass
24 | class Return(    Struct('opt_exp')): pass
25 | class Label(     Struct('name stmt')): pass
26 | class Case(      Struct('literal stmt')): pass
27 | class Exp(       Struct('opt_exp')): pass
28 | 
29 | 
30 | # Expressions
31 | 
32 | class Assign(      Struct('e1 binop e2')): pass
33 | class If_exp(      Struct('e1 e2 e3')): pass
34 | class Binary_exp(  Struct('e1 binop e2')): pass
35 | class Call(        Struct('e1 args')): pass
36 | class Pre_incr(    Struct('op e1')): pass
37 | class Post_incr(   Struct('e1 op')): pass
38 | class Literal(     Struct('text kind')): pass  # TODO check octal constants for /[89]/
39 | class Variable(    Struct('name')): pass
40 | class Unary_exp(   Struct('unop e1')): pass
41 | 
42 | class Address_of(  Struct('e1')): pass    # TODO these are currently under Unary_exp instead
43 | 
44 | class And(         Struct('e1 e2')): pass   # TODO maybe use instead of Binary_exp
45 | class Or(          Struct('e1 e2')): pass
46 | 
47 | def Index(e1, e2):
48 |     return Unary_exp('*', Binary_exp(e1, '+', e2))
49 | 


--------------------------------------------------------------------------------
/eg_B_compiler/b2020.parson:
--------------------------------------------------------------------------------
  1 | # Changes from the old B grammar:
  2 | #    spell extrn as extern
  3 | #    \ instead of * as a character and string escape
  4 | #    the && and || operators avoid the need for special treatment of & and |
  5 | #    octal constants have octal digits only -- leaving this up to semantic actions
  6 | #    the assignment operators are reversed (+= as in C, not =+ as in B)]
  7 | #    declare internal variables with static instead of no keyword
  8 | #    allocate arrays with syntax like `auto x[42]`
  9 | 
 10 | # Not yet since it's a new feature, not just a change:
 11 | #    initialize variables in declarations with =
 12 | 
 13 | 
 14 | program:
 15 | 	  _ definition* :end.
 16 | 
 17 | definition:
 18 | 	  name ('[' (constant) ']' | :None) ['=' ival++',' :hug | :None] ';' :Global
 19 | 	| name '(' [name**',' :hug] ')' statement :Proc.
 20 | 
 21 | ival:
 22 | 	  constant
 23 | 	| name :Variable.
 24 | 
 25 | statement:
 26 | 	  "auto" [name ('[' constant ']' | :None) :hug]++',' ';' :hug :Auto
 27 | 	| "extern" name++',' ';'       :hug :Extern
 28 | 	| "static" name++',' ';'       :hug :Static
 29 | 	| '{' statement* '}'           :hug :Block
 30 | 	| "if" '(' exp ')' statement ("else" statement | :None) :If_stmt
 31 | 	| "while" '(' exp ')' statement     :While
 32 | 	| "switch" exp statement            :Switch
 33 | 	| "goto" exp ';'                    :Goto
 34 | 	| "return" (exp | :None) ';'        :Return
 35 | 	| "case" constant ':' statement     :Case
 36 | 	| name            ':' statement     :Label
 37 | 	| (exp | :None) ';'                 :Exp.
 38 | 
 39 | 
 40 | ####
 41 | # https://en.cppreference.com/w/c/language/operator_precedence
 42 | # XXX This isn't identical to kbman precedences
 43 | ####
 44 | 
 45 | exp1:
 46 | 	( name                      :Variable
 47 | 	| constant
 48 | 	| '(' exp ')'
 49 |         ) ( inc_dec                 :Post_incr
 50 |           | '[' exp ']'             :Index
 51 |           | '(' [exp**',' :hug] ')' :Call
 52 |           )*.
 53 | 
 54 | exp2:
 55 | 	  unaryop exp2              :Unary_exp
 56 |         | '&' !/[&=]/ exp2          :Address_of
 57 |         | inc_dec exp2              :Pre_incr
 58 |         | exp1.
 59 | 
 60 | exp3:     exp2 (op3 exp2 :Binary_exp)*.   op3  ~: { '*' !/=/ | '/' !/[*=]/ | '%' !/=/ } _.
 61 | exp4:     exp3 (op4 exp3 :Binary_exp)*.   op4  ~: { '+' !/[+=]/ | '-' !/[-=]/ } _.
 62 | exp5:     exp4 (op5 exp4 :Binary_exp)*.   op5  ~: { '<<' !/=/ | '>>' !/=/ } _.
 63 | exp6:     exp5 (op6 exp5 :Binary_exp)*.   op6  ~: { '<=' | '>=' | '<' | '>' } _.
 64 | exp7:     exp6 (op7 exp6 :Binary_exp)*.   op7  ~: { '==' | '!=' } _.
 65 | exp8:     exp7 (op8 exp7 :Binary_exp)*.   op8  ~: { '&' !/[&=]/ } _.
 66 | exp9:     exp8 (op9 exp8 :Binary_exp)*.   op9  ~: { '^' !/=/} _.
 67 | exp10:    exp9 (op10 exp9 :Binary_exp)*.  op10 ~: { '|' !/[|=]/} _.
 68 | exp11:    exp10 (op11 exp10 :And)*.       op11 ~:   '&&' _.
 69 | exp12:    exp11 (op12 exp11 :Or)*.        op12 ~:   '||' _.
 70 | exp13:    exp12 ('?' exp ':' exp13 :If_exp)?.
 71 | exp14:    exp13 (assign exp14 :Assign)?.
 72 | exp:      exp14.
 73 | 
 74 | 
 75 | # Lexical grammar
 76 | 
 77 | assign ~:
 78 | 	  opassign | /(=)(?!=)/ _.
 79 | 
 80 | inc_dec ~:
 81 | 	{ '++'
 82 | 	| '--'
 83 | 	} _.
 84 | 
 85 | unaryop ~:
 86 | 	{ '-' !/[-=]/
 87 |         | '~'
 88 | 	| '!' !'='
 89 |         | '*' !'='
 90 |         } _.
 91 | 
 92 | opassign ~:
 93 | 	{ '<<='
 94 |         | '>>='
 95 |         | '|='
 96 | 	| '&='
 97 | 	| '^='
 98 | 	| '-='
 99 | 	| '+='
100 | 	| '%='
101 | 	| '*='
102 | 	| '/='
103 |         } _.
104 | 
105 | 
106 | constant ~:
107 | 	  {'0' digit+}             _ :'octal'   :Literal
108 | 	| {    digit+}             _ :'decimal' :Literal
109 | 	| {/'/ sqchar sqchar? /'/} _ :'char'    :Literal
110 | 	| {/"/ dqchar*        /"/} _ :'string'  :Literal.
111 | 
112 | sqchar ~: escape | /[^']/.
113 | dqchar ~: escape | /[^"]/.
114 | escape ~: /\\./.
115 | 
116 | name ~:	  !keyword {alpha (alpha|digit)*} _.
117 | 
118 | keyword = /(auto|extern|static|if|while|switch|goto|return|case)\b/.
119 | 
120 | alpha ~:  /[A-Za-z_]/.           # "and backspace"?! I'm just ignoring that.
121 | digit ~:  /[0-9]/.
122 | 
123 | FNORD ~:  _.
124 | _ ~:	  (/\s+/ | comment)*.
125 | 
126 | comment ~: '/*' commentbody.  # (The following awkward definition is to save Python stack space.)
127 | commentbody ~: '*/' | /[^*]+/ commentbody | '*' commentbody.
128 | 
129 | # TODO better definition:
130 | # comment ~: '/*' (!'*/' :anyone)* '*/'.
131 | 


--------------------------------------------------------------------------------
/eg_B_compiler/bcomp.py:
--------------------------------------------------------------------------------
 1 | """
 2 | Tie the modules together into a compiler.
 3 | It writes VM assembly to stdout.
 4 | """
 5 | 
 6 | import sys
 7 | 
 8 | import ast
 9 | from gen_vm_asm import gen_program
10 | from parson import Grammar, Unparsable
11 | 
12 | with open('b2020.parson') as f:
13 |     grammar_source = f.read()
14 | parser = Grammar(grammar_source).bind(ast)
15 | 
16 | def main(argv):
17 |     err = 0
18 |     for filename in argv[1:]:
19 |         err |= compiler_main(filename)
20 |     return err
21 | 
22 | def compiler_main(filename, out_filename=None):
23 |     with open(filename) as f:
24 |         text = f.read()
25 |     try:
26 |         global_decls = parser.program(text)
27 |     except Unparsable as exc:
28 |         (before, after) = exc.failure
29 |         complain(filename, before, after, "Syntax error")
30 |         return 1
31 |     gen_program(global_decls)
32 |     return 0
33 | 
34 | def complain(filename, before, after, plaint):
35 |     line_no = before.count('\n')
36 |     prefix = (before+'\n').splitlines()[line_no]
37 |     suffix = (after+'\n').splitlines()[0] # XXX what if right on newline?
38 |     prefix, suffix = sanitize(prefix), sanitize(suffix)
39 |     message = ["%s:%d:%d: %s" % (filename, line_no+1, len(prefix), plaint),
40 |                '  ' + prefix + suffix,
41 |                '  ' + ' '*len(prefix) + '^']
42 |     sys.stderr.write('\n'.join(message) + '\n')
43 | 
44 | def sanitize(s):
45 |     "Make s predictably printable, sans control characters like tab."
46 |     unprintable = chr(127)
47 |     return ''.join(c if ' ' <= c < unprintable else ' ' # XXX crude
48 |                    for c in s)
49 |     
50 | if __name__ == '__main__':
51 |     sys.exit(main(sys.argv))
52 | 


--------------------------------------------------------------------------------
/eg_B_compiler/eg/eg0.b:
--------------------------------------------------------------------------------
1 | printn() {}
2 | 


--------------------------------------------------------------------------------
/eg_B_compiler/eg/eg0.s.ref:
--------------------------------------------------------------------------------
1 | printn	proc
2 | 	params
3 | 	return_void
4 | 	endproc
5 | 
6 | 


--------------------------------------------------------------------------------
/eg_B_compiler/eg/eg1.b:
--------------------------------------------------------------------------------
 1 | /* The following function will print a non-negative number, n, to
 2 |    the base b, where 2<=b<=10,  This routine uses the fact that
 3 |    in the ASCII character set, the digits 0 to 9 have sequential
 4 |    code values.  */
 5 | 
 6 | printn(n,b) {
 7 |     extern putchar;
 8 |     auto a;
 9 | 
10 |     if(a=n/b) /* assignment, not test for equality */
11 |             printn(a, b); /* recursive */
12 |     putchar(n%b + '0');
13 | }
14 | 


--------------------------------------------------------------------------------
/eg_B_compiler/eg/eg1.s.ref:
--------------------------------------------------------------------------------
 1 | printn	proc
 2 | 	params	n, b
 3 | putchar	extern
 4 | a	local
 5 | 	addr	a
 6 | 	value	n
 7 | 	value	b
 8 | 	op2	/
 9 | 	assign	=
10 | 	if_not	endif.0
11 | 	value	printn
12 | 	value	a
13 | 	value	b
14 | 	call	2
15 | 	pop
16 | endif.0
17 | 	value	putchar
18 | 	value	n
19 | 	value	b
20 | 	op2	%
21 | 	push	'0'
22 | 	op2	+
23 | 	call	1
24 | 	pop
25 | 	return_void
26 | 	endproc
27 | 
28 | 


--------------------------------------------------------------------------------
/eg_B_compiler/eg/eg2.b:
--------------------------------------------------------------------------------
 1 | /* The following program will calculate the constant e-2 to about
 2 |    4000 decimal digits, and print it 50 characters to the line in
 3 |    groups of 5 characters.  The method is simple output conversion
 4 |    of the expansion
 5 |      1/2! + 1/3! + ... = .111....
 6 |    where the bases of the digits are 2, 3, 4, . . . */
 7 | 
 8 | main() {
 9 | 	extern putchar, n, v;
10 | 	auto i, c, col, a;
11 | 
12 | 	i = col = 0;
13 | 	while(i<n)
14 | 		v[i++] = 1;
15 | 	while(col<2*n) {
16 | 		a = n+1 ;
17 | 		c = i = 0;
18 | 		while (i<n) {
19 | 			c += v[i] *10;
20 | 			v[i++]  = c%a;
21 | 			c /= a--;
22 | 		}
23 | 
24 | 		putchar(c+'0');
25 | 		if(!(++col%5))
26 | 			putchar(col%50?' ': '\n');
27 | 	}
28 | 	putchar('\n\n');
29 | }
30 | 
31 | v[2000];
32 | n = 2000;
33 | 


--------------------------------------------------------------------------------
/eg_B_compiler/eg/eg2.s.ref:
--------------------------------------------------------------------------------
  1 | main	proc
  2 | 	params
  3 | putchar	extern
  4 | n	extern
  5 | v	extern
  6 | i	local
  7 | c	local
  8 | col	local
  9 | a	local
 10 | 	addr	i
 11 | 	addr	col
 12 | 	push	0
 13 | 	assign	=
 14 | 	assign	=
 15 | 	pop
 16 | 	jump	while.0
 17 | loop.1
 18 | 	value	v
 19 | 	addr	i
 20 | 	postinc	++
 21 | 	op2	+
 22 | 	push	1
 23 | 	assign	=
 24 | 	pop
 25 | while.0
 26 | 	value	i
 27 | 	value	n
 28 | 	op2	<
 29 | 	if	loop.1
 30 | 	jump	while.2
 31 | loop.3
 32 | 	addr	a
 33 | 	value	n
 34 | 	push	1
 35 | 	op2	+
 36 | 	assign	=
 37 | 	pop
 38 | 	addr	c
 39 | 	addr	i
 40 | 	push	0
 41 | 	assign	=
 42 | 	assign	=
 43 | 	pop
 44 | 	jump	while.4
 45 | loop.5
 46 | 	addr	c
 47 | 	value	v
 48 | 	value	i
 49 | 	op2	+
 50 | 	op1	*
 51 | 	push	10
 52 | 	op2	*
 53 | 	assign	+=
 54 | 	pop
 55 | 	value	v
 56 | 	addr	i
 57 | 	postinc	++
 58 | 	op2	+
 59 | 	value	c
 60 | 	value	a
 61 | 	op2	%
 62 | 	assign	=
 63 | 	pop
 64 | 	addr	c
 65 | 	addr	a
 66 | 	postinc	--
 67 | 	assign	/=
 68 | 	pop
 69 | while.4
 70 | 	value	i
 71 | 	value	n
 72 | 	op2	<
 73 | 	if	loop.5
 74 | 	value	putchar
 75 | 	value	c
 76 | 	push	'0'
 77 | 	op2	+
 78 | 	call	1
 79 | 	pop
 80 | 	addr	col
 81 | 	preinc	++
 82 | 	push	5
 83 | 	op2	%
 84 | 	op1	!
 85 | 	if_not	endif.6
 86 | 	value	putchar
 87 | 	value	col
 88 | 	push	50
 89 | 	op2	%
 90 | 	if_not	else.7
 91 | 	push	' '
 92 | 	jump	endif.8
 93 | else.7
 94 | 	push	'\n'
 95 | endif.8
 96 | 	call	1
 97 | 	pop
 98 | endif.6
 99 | while.2
100 | 	value	col
101 | 	push	2
102 | 	value	n
103 | 	op2	*
104 | 	op2	<
105 | 	if	loop.3
106 | 	value	putchar
107 | 	push	'\n\n'
108 | 	call	1
109 | 	pop
110 | 	return_void
111 | 	endproc
112 | 
113 | v	global	size(2000)
114 | 
115 | n	global	init(2000)
116 | 
117 | 


--------------------------------------------------------------------------------
/eg_B_compiler/eg/eg3.b:
--------------------------------------------------------------------------------
 1 | /* The following function is a general formatting, printing, and
 2 |    conversion subroutine.  The first argument is a format string.
 3 |    Character sequences of the form `%x' are interpreted and cause
 4 |    conversion of type 'x' of the next argument, other character
 5 |    sequences are printed verbatim.   Thus
 6 | 
 7 |     printf("delta is %d*n", delta);
 8 | 
 9 |     will convert the variable delta to decimal (%d) and print the
10 |     string with the converted form of delta in place of %d.   The
11 |     conversions %d-decimal, %o-octal, *s-string and %c-character
12 |     are allowed.
13 | 
14 |     This program calls upon the function `printn'. (see section
15 |     9.1) */
16 | 
17 | printf(fmt, x1,x2,x3,x4,x5,x6,x7,x8,x9) {
18 | 	extern printn, char, putchar;
19 | 	auto adx, x, c, i, j;
20 | 
21 | 	i= 0;
22 | 	adx = &x1;
23 | loop :
24 | 	while((c=char(fmt,i++) ) != '%') {
25 | 		if(c == '\e')
26 | 			return;
27 | 		putchar(c);
28 | 	}
29 | 	x = *adx++;
30 | 	switch c = char(fmt,i++) {
31 | 
32 | 	case 'd':
33 | 	case 'o':
34 | 		if(x < 0) {
35 | 			x = -x ;
36 | 			putchar('-');
37 | 		}
38 | 		printn(x, c=='o'?8:10);
39 | 		goto loop;
40 | 
41 | 	case 'c' : /* char */
42 | 		putchar(x);
43 | 		goto loop;
44 | 
45 | 	case 's': /* string */
46 | 		while((c=char(x, j++)) != '\e')
47 | 			putchar(c);
48 | 		goto loop;
49 | 	}
50 | 	putchar('%') ;
51 | 	i--;
52 | 	adx--;
53 | 	goto loop;
54 | }
55 | 


--------------------------------------------------------------------------------
/eg_B_compiler/eg/eg3.s.ref:
--------------------------------------------------------------------------------
  1 | printf	proc
  2 | 	params	fmt, x1, x2, x3, x4, x5, x6, x7, x8, x9
  3 | printn	extern
  4 | char	extern
  5 | putchar	extern
  6 | adx	local
  7 | x	local
  8 | c	local
  9 | i	local
 10 | j	local
 11 | 	addr	i
 12 | 	push	0
 13 | 	assign	=
 14 | 	pop
 15 | 	addr	adx
 16 | 	addr	x1
 17 | 	assign	=
 18 | 	pop
 19 | loop
 20 | 	jump	while.0
 21 | loop.1
 22 | 	value	c
 23 | 	push	'\e'
 24 | 	op2	==
 25 | 	if_not	endif.2
 26 | 	return_void
 27 | endif.2
 28 | 	value	putchar
 29 | 	value	c
 30 | 	call	1
 31 | 	pop
 32 | while.0
 33 | 	addr	c
 34 | 	value	char
 35 | 	value	fmt
 36 | 	addr	i
 37 | 	postinc	++
 38 | 	call	2
 39 | 	assign	=
 40 | 	push	'%'
 41 | 	op2	!=
 42 | 	if	loop.1
 43 | 	addr	x
 44 | 	addr	adx
 45 | 	postinc	++
 46 | 	op1	*
 47 | 	assign	=
 48 | 	pop
 49 | 	addr	c
 50 | 	value	char
 51 | 	value	fmt
 52 | 	addr	i
 53 | 	postinc	++
 54 | 	call	2
 55 | 	assign	=
 56 | 	switch
 57 | 	case	'd', case.3
 58 | 	case	'o', case.4
 59 | 	case	'c', case.5
 60 | 	case	's', case.6
 61 | 	endcases
 62 | case.3			# 'd'
 63 | case.4			# 'o'
 64 | 	value	x
 65 | 	push	0
 66 | 	op2	<
 67 | 	if_not	endif.7
 68 | 	addr	x
 69 | 	value	x
 70 | 	op1	-
 71 | 	assign	=
 72 | 	pop
 73 | 	value	putchar
 74 | 	push	'-'
 75 | 	call	1
 76 | 	pop
 77 | endif.7
 78 | 	value	printn
 79 | 	value	x
 80 | 	value	c
 81 | 	push	'o'
 82 | 	op2	==
 83 | 	if_not	else.8
 84 | 	push	8
 85 | 	jump	endif.9
 86 | else.8
 87 | 	push	10
 88 | endif.9
 89 | 	call	2
 90 | 	pop
 91 | 	value	loop
 92 | 	goto
 93 | case.5			# 'c'
 94 | 	value	putchar
 95 | 	value	x
 96 | 	call	1
 97 | 	pop
 98 | 	value	loop
 99 | 	goto
100 | case.6			# 's'
101 | 	jump	while.10
102 | loop.11
103 | 	value	putchar
104 | 	value	c
105 | 	call	1
106 | 	pop
107 | while.10
108 | 	addr	c
109 | 	value	char
110 | 	value	x
111 | 	addr	j
112 | 	postinc	++
113 | 	call	2
114 | 	assign	=
115 | 	push	'\e'
116 | 	op2	!=
117 | 	if	loop.11
118 | 	value	loop
119 | 	goto
120 | 	value	putchar
121 | 	push	'%'
122 | 	call	1
123 | 	pop
124 | 	addr	i
125 | 	postinc	--
126 | 	pop
127 | 	addr	adx
128 | 	postinc	--
129 | 	pop
130 | 	value	loop
131 | 	goto
132 | 	return_void
133 | 	endproc
134 | 
135 | 


--------------------------------------------------------------------------------
/eg_B_compiler/error_tests/notb.b:
--------------------------------------------------------------------------------
1 | some random junk
2 | 


--------------------------------------------------------------------------------
/eg_B_compiler/gen_vm_asm.py:
--------------------------------------------------------------------------------
  1 | """
  2 | Compile from AST to assembly for a stack machine.
  3 | """
  4 | 
  5 | from itertools import count
  6 | from structs import Visitor
  7 | 
  8 | def gen_program(global_decls):
  9 |     for gd in global_decls:
 10 |         gen_global_decl(gd)
 11 |         print
 12 | 
 13 | def asm(instruction='', arguments=(), label='', comment=None):
 14 |     assert instruction or not arguments
 15 |     fields = [label, instruction, ', '.join(arguments)]
 16 |     if comment: fields += ['# ' + comment]
 17 |     print '\t'.join(fields).rstrip()
 18 | 
 19 | # Generated labels need to be distinct from B identifiers. We make
 20 | # sure of this with '.'.
 21 | def Label(stem):
 22 |     return '%s.%d' % (stem, next(label_counter))
 23 | 
 24 | label_counter = count()
 25 | 
 26 | 
 27 | class GenGlobalDecl(Visitor):
 28 | 
 29 |     def Global(self, t):
 30 |         args = []
 31 |         if t.opt_size is not None:
 32 |             args.append('size(%s)' % gen_literal(t.opt_size))
 33 |         if t.opt_init is not None:
 34 |             args.append('init(%s)' % ', '.join(map(gen_literal, t.opt_init)))
 35 |         asm('global', args, label=t.name)
 36 | 
 37 |     def Proc(self, t):
 38 |         asm('proc', label=t.name)
 39 |         asm('params', t.params)
 40 |         gen_stmt(t.stmt)
 41 |         asm('return_void') # TODO skip if redundant
 42 |         asm('endproc')  # Telling the assembler it can discard local labels
 43 | 
 44 | gen_global_decl = GenGlobalDecl()
 45 | 
 46 | 
 47 | class GenStmt(Visitor):
 48 | 
 49 |     def Auto(self, t):
 50 |         for (name, XXX) in t.decls:
 51 |             args = [] if XXX is None else [repr(XXX)]  # TODO fill in
 52 |             asm('local', args, label=name)
 53 | 
 54 |     def Extern(self, t):
 55 |         for name in t.names:
 56 |             asm('extern', label=name)
 57 | 
 58 |     def Static(self, t):
 59 |         for name in t.names:
 60 |             asm('static', label=name)
 61 | 
 62 |     def Block(self, t):
 63 |         for stmt in t.stmts:
 64 |             gen_stmt(stmt)
 65 | 
 66 |     def If_stmt(self, t):
 67 |         if_not = Label('endif' if t.opt_else is None else 'else')
 68 |         gen_exp(t.exp)
 69 |         asm('if_not', [if_not])
 70 |         gen_stmt(t.then_)
 71 |         if t.opt_else is None:
 72 |             asm(label=if_not)
 73 |         else:
 74 |             endif = Label('endif')
 75 |             asm('jump', [endif])
 76 |             asm(label=if_not)
 77 |             gen_stmt(t.opt_else)
 78 |             asm(label=endif)
 79 | 
 80 |     def While(self, t):
 81 |         enter = Label('while')
 82 |         loop = Label('loop')
 83 |         asm('jump', [enter])
 84 |         asm(label=loop)
 85 |         gen_stmt(t.stmt)
 86 |         asm(label=enter)
 87 |         gen_exp(t.exp)
 88 |         asm('if', [loop])
 89 | 
 90 |     def Switch(self, t):
 91 |         gen_exp(t.exp)
 92 |         asm('switch')
 93 |         labeler = LabelCases()
 94 |         labeler(t.stmt)
 95 |         for literal, label in labeler.cases:
 96 |             asm('case', [gen_literal(literal), label])
 97 |         asm('endcases')
 98 |         gen_stmt(t.stmt)
 99 | 
100 |     def Goto(self, t):
101 |         # TODO optimize for constant t.exp, defined by a B label
102 |         gen_exp(t.exp)
103 |         asm('goto')
104 | 
105 |     def Return(self, t):
106 |         if t.opt_exp is None:
107 |             asm('return_void')
108 |         else:
109 |             gen_exp(t.opt_exp)
110 |             asm('return')
111 | 
112 |     def Label(self, t):
113 |         asm(label=t.name)    # TODO relate this to Goto's at compile time
114 |         gen_stmt(t.stmt)
115 | 
116 |     def Case(self, t):
117 |         asm(label=t.case_label, comment=gen_literal(t.literal))
118 |         gen_stmt(t.stmt)
119 | 
120 |     def Exp(self, t):
121 |         if t.opt_exp is not None:
122 |             gen_exp(t.opt_exp)
123 |             asm('pop')
124 | 
125 | gen_stmt = GenStmt()
126 | 
127 | 
128 | def gen_literal(literal):
129 |     return literal.text
130 | 
131 | 
132 | class LabelCases(Visitor):
133 |     def __init__(self):
134 |         self.cases = []
135 | 
136 |     def Case(self, t):
137 |         assert not hasattr(t, 'case_label')
138 |         t.case_label = Label('case')
139 |         self.cases.append((t.literal, t.case_label))
140 |         self(t.stmt)
141 | 
142 |     def Switch(self, t):
143 |         pass
144 | 
145 |     def Block(self, t):
146 |         for stmt in t.stmts:
147 |             self(stmt)
148 |     def If_stmt(self, t):
149 |         self(t.then_)
150 |         if t.opt_else is not None:
151 |             self(t.opt_else)
152 |     def While(self, t):
153 |         self(t.stmt)
154 |     def Label(self, t):
155 |         self(t.stmt)
156 | 
157 |     def default(self, t):
158 |         pass
159 | 
160 | 
161 | class GenExp(Visitor):
162 | 
163 |     def Assign(self, t):
164 |         gen_lvalue(t.e1)
165 |         gen_exp(t.e2)
166 |         asm('assign', [t.binop])
167 |     
168 |     def If_exp(self, t):
169 |         else_, endif = Label('else'), Label('endif')
170 |         gen_exp(t.e1)
171 |         asm('if_not', [else_])
172 |         gen_exp(t.e2)
173 |         asm('jump', [endif])
174 |         asm(label=else_)
175 |         gen_exp(t.e3)
176 |         asm(label=endif)
177 |     
178 |     def Binary_exp(self, t):
179 |         gen_exp(t.e1)
180 |         gen_exp(t.e2)
181 |         asm('op2', [t.binop])
182 |     
183 |     def Call(self, t):
184 |         gen_exp(t.e1)
185 |         for arg in t.args:
186 |             gen_exp(arg)
187 |         asm('call', [str(len(t.args))])
188 |     
189 |     def Pre_incr(self, t):
190 |         gen_lvalue(t.e1)
191 |         asm('preinc', [t.op])
192 |     
193 |     def Post_incr(self, t):
194 |         gen_lvalue(t.e1)
195 |         asm('postinc', [t.op])
196 |     
197 |     def Literal(self, t):
198 |         asm('push', [gen_literal(t)])
199 |     
200 |     def Variable(self, t):
201 |         asm('value', [t.name])
202 |     
203 |     def Unary_exp(self, t):
204 |         gen_exp(t.e1)
205 |         asm('op1', [t.unop])
206 |     
207 |     def Address_of(self, t):
208 |         gen_lvalue(t.e1)
209 |     
210 |     def And(self, t):
211 |         if_not = Label('and')
212 |         gen_exp(t.e1)
213 |         asm('if_pop_else_skip', [if_not])
214 |         gen_exp(t.e2)
215 |         asm(label=if_not)
216 |     
217 |     def Or(self, t):
218 |         if_so = Label('or')
219 |         gen_exp(t.e1)
220 |         asm('if_skip_else_pop', [if_so])
221 |         gen_exp(t.e2)
222 |         asm(label=if_so)
223 | 
224 | gen_exp = GenExp()
225 | 
226 | 
227 | class GenLvalue(Visitor):
228 |     
229 |     def Variable(self, t):
230 |         asm('addr', [t.name])
231 |     
232 |     def Unary_exp(self, t):
233 |         if t.unop != '*':
234 |             raise Exception("Not an lvalue", t)
235 |         gen_exp(t.e1)
236 | 
237 |     def default(self, t):
238 |         raise Exception("Not an lvalue", t)
239 | 
240 | gen_lvalue = GenLvalue()
241 | 


--------------------------------------------------------------------------------
/eg_B_compiler/structs.py:
--------------------------------------------------------------------------------
 1 | """
 2 | Define a named-tuple-like type, but simpler.
 3 | Also Visitor to dispatch on datatypes defined this way.
 4 | """
 5 | 
 6 | # TODO figure out how to use __slots__
 7 | 
 8 | def Struct(field_names, name=None, supertype=(object,)):
 9 |     if isinstance(field_names, (str, unicode)):
10 |         field_names = tuple(field_names.split())
11 | 
12 |     if name is None:
13 |         name = 'Struct<%s>' % ','.join(field_names)
14 |         def get_name(self): return self.__class__.__name__
15 |     else:
16 |         def get_name(self): return name
17 | 
18 |     def __init__(self, *args):
19 |         if len(field_names) != len(args):
20 | 	    raise TypeError("%s takes %d arguments (%d given)"
21 |                             % (get_name(self), len(field_names), len(args)))
22 |         self.__dict__.update(zip(field_names, args))
23 | 
24 |     def __repr__(self):
25 |         return '%s(%s)' % (get_name(self), ', '.join(repr(getattr(self, f))
26 |                                                      for f in field_names))
27 | 
28 |     # (for use with pprint)
29 |     def my_as_sexpr(self):         # XXX better name?
30 |         return (get_name(self),) + tuple(as_sexpr(getattr(self, f))
31 |                                          for f in field_names)
32 |     my_as_sexpr.__name__ = 'as_sexpr'
33 | 
34 |     return type(name,
35 |                 supertype,
36 |                 dict(__init__=__init__,
37 |                      __repr__=__repr__,
38 |                      as_sexpr=my_as_sexpr,
39 |                      _meta_fields=field_names))
40 | 
41 | def as_sexpr(obj):
42 |     if hasattr(obj, 'as_sexpr'):
43 |         return getattr(obj, 'as_sexpr')()
44 |     elif isinstance(obj, list):
45 |         return map(as_sexpr, obj)
46 |     elif isinstance(obj, tuple):
47 |         return tuple(map(as_sexpr, obj))
48 |     else:
49 |         return obj
50 | 
51 | 
52 | # Is there a nicer way to do this?
53 | 
54 | class Visitor(object):
55 |     def __call__(self, subject, *args):
56 |         tag = subject.__class__.__name__
57 |         method = getattr(self, tag, None)
58 |         if method is None:
59 |             try:
60 |                 method = getattr(self, 'default')
61 |             except AttributeError:
62 |                 raise AttributeError("%r has no method for %r argument %r" % (self, tag, subject))
63 |         return method(subject, *args)
64 | 


--------------------------------------------------------------------------------
/eg_B_compiler/testme.sh:
--------------------------------------------------------------------------------
 1 | #!/bin/bash
 2 | # Automated test.
 3 | 
 4 | for f in error_tests/*.b; do
 5 |     echo
 6 |     echo "Should fail:" ${f}
 7 |     fs=${f%.*}.s
 8 |     if python bcomp.py ${f} >${fs}; then
 9 |         echo "Didn't fail!"
10 |     fi
11 | done
12 | 
13 | for f in eg/*.b; do
14 |     echo
15 |     echo "To assembly:" ${f}
16 |     fs=${f%.*}.s
17 |     if python bcomp.py ${f} >${fs}; then
18 |         echo -n    # Expected success (btw what's a no-op in bash?)
19 |     else
20 |         echo "Failed!"
21 |     fi
22 |     if test -f ${fs}.ref; then
23 |         diff -u ${fs}.ref ${fs}
24 |         # TODO raise error at exit if there was a diff
25 |     else
26 |         echo '  (No ref)'
27 |     fi
28 | done
29 | 


--------------------------------------------------------------------------------
/eg_basic.py:
--------------------------------------------------------------------------------
  1 | """
  2 | BASIC interpreter, inspired by Tiny BASIC.
  3 | """
  4 | 
  5 | import bisect, operator, sys
  6 | from parson import Grammar, alter
  7 | 
  8 | def chat():
  9 |     print "I am Puny Basic. Enter 'bye' to dismiss me."
 10 |     while True:
 11 |         try: text = raw_input('> ').strip()
 12 |         except EOFError: break
 13 |         if text == 'bye': break
 14 |         try: basic.command(text)
 15 |         except Exception as e:
 16 |             # TODO: put the current line# in the prompt instead, if any;
 17 |             #  should work nicely with a resumable STOP statement
 18 |             print e, ('' if pc is None else 'at line %d' % lines[pc][0])
 19 | 
 20 | grammar = Grammar(r"""
 21 | command   :  /(\d+)/ :int /(.*)/          /$/ :set_line
 22 |           |  "run"                        /$/ :run
 23 |           |  "new"                        /$/ :new
 24 |           |  "load"   /(\S+)/             /$/ :load
 25 |           |  "save"   /(\S+)/             /$/ :save
 26 |           |  stmt
 27 |           |                               /$/.
 28 | 
 29 | stmt      :  "print"  printing            /$/        :next
 30 |           |  '?'      printing            /$/        :next
 31 |           |  "input"  id                  /$/ :input :next
 32 |           |  "goto"   exp                 /$/        :goto
 33 |           |  "if"     relexp "then" exp   /$/        :if_goto
 34 |           |  "gosub"  exp                 /$/        :gosub
 35 |           |  "return"                     /$/        :return_
 36 |           |  "end"                        /$/        :end
 37 |           |  "list"                       /$/ :list  :next
 38 |           |  "rem"    /.*/                /$/        :next
 39 |           |  "let"?   id '=' exp          /$/ :store :next.
 40 | 
 41 | printing  :  (display writes)?.
 42 | writes    :  ';'        printing
 43 |           |  ',' :space printing
 44 |           |      :newline.
 45 | 
 46 | display  ~:  exp :write
 47 |           |  '"' [qchar :write]* '"' FNORD.
 48 | qchar    ~:  /"(")/     # Two consecutive double-quotes mean '"'.
 49 |           |  /([^"])/.  # Any other character just means itself.
 50 | 
 51 | relexp    :  exp  (  '<>' exp :ne
 52 |                    | '<=' exp :le
 53 |                    | '<'  exp :lt
 54 |                    | '='  exp :eq
 55 |                    | '>=' exp :ge
 56 |                    | '>'  exp :gt
 57 |                   )?.
 58 | exp       :  exp1 (  '+' exp1 :add
 59 |                    | '-' exp1 :sub
 60 |                   )*.
 61 | exp1      :  exp2 (  '*' exp2 :mul
 62 |                    | '/' exp2 :idiv
 63 |                   )*.
 64 | exp2      :  primary ('^' exp2 :pow)?.
 65 | 
 66 | primary   :  '-' exp1 :neg
 67 |           |  /(\d+)/  :int
 68 |           |  id       :fetch
 69 |           |  '(' exp ')'.
 70 | 
 71 | id        :  /([a-z])/.  # TODO: longer names, screening out reserved words
 72 | 
 73 | FNORD    ~:  /\s*/.
 74 | """)
 75 | 
 76 | 
 77 | lines = []         # A sorted array of (line_number, source_line) pairs.
 78 | pc = None          # The program counter: an index into lines[], or None.
 79 | return_stack = []  # A stack of line numbers of GOSUBs in progress.
 80 | env = {}           # Current variable values.
 81 | 
 82 | def run():
 83 |     reset()
 84 |     go()
 85 | 
 86 | def reset():
 87 |     global pc
 88 |     pc = 0 if lines else None
 89 |     return_stack[:] = []
 90 |     env.clear()
 91 | 
 92 | def go():
 93 |     global pc
 94 |     while pc is not None:       # TODO: check for stopped, instead
 95 |         _, line = lines[pc]
 96 |         pc, = basic.stmt(line)
 97 | 
 98 | def new():
 99 |     lines[:] = []
100 |     reset()
101 | 
102 | def load(filename):
103 |     with open(filename) as f:
104 |         new()
105 |         for line in f:
106 |             basic.command(line)
107 | 
108 | def save(filename):
109 |     with open(filename, 'w') as f:
110 |         for pair in lines:
111 |             f.write('%d %s\n' % pair)
112 | 
113 | def listing():
114 |     for n, line in lines:
115 |         print n, line
116 | 
117 | def find(n): # The slice of lines[] including line n, or where to insert it.
118 |     i = bisect.bisect(lines, (n, ''))
119 |     return slice(i, i+1 if i < len(lines) and lines[i][0] == n else i)
120 | 
121 | def set_line(n, text):
122 |     lines[find(n)] = [(n, text)] if text else []
123 | 
124 | def goto(n):
125 |     sl = find(n)
126 |     if sl.start == sl.stop: raise Exception("Missing line", n)
127 |     return sl.start
128 | 
129 | def if_goto(flag, n):
130 |     return goto(n) if flag else next_line(pc)
131 | 
132 | def next_line(a_pc):
133 |     return None if a_pc in (None, len(lines)-1) else a_pc+1
134 | 
135 | def gosub(n):
136 |     target = goto(n)
137 |     return_stack.append(lines[pc][0])
138 |     return target
139 | 
140 | def return_():
141 |     return next_line(goto(return_stack.pop()))
142 | 
143 | # Parson's default meaning for a function appearing in a grammar is a
144 | # semantic action returning one value. In this Basic we do some actions
145 | # only for effect: this wraps those actions to produce no values.
146 | def for_effect(fn):
147 |     def fn_for_effect(*args):
148 |         fn(*args)
149 |         return ()
150 |     return alter(fn_for_effect)
151 | 
152 | basic = grammar(
153 |     fetch    = env.__getitem__,
154 |     store    = for_effect(env.__setitem__),
155 |     input    = for_effect(lambda var: env.__setitem__(var, int(raw_input()))),
156 |     set_line = for_effect(set_line),
157 |     goto     = goto,
158 |     if_goto  = if_goto,
159 |     gosub    = gosub,
160 |     return_  = return_,
161 |     eq       = operator.eq,
162 |     ne       = operator.ne,
163 |     lt       = operator.lt,
164 |     le       = operator.le,
165 |     ge       = operator.ge,
166 |     gt       = operator.gt,
167 |     add      = operator.add,
168 |     sub      = operator.sub,
169 |     mul      = operator.mul,
170 |     idiv     = operator.idiv,
171 |     pow      = operator.pow,
172 |     neg      = operator.neg,
173 |     end      = lambda: None,
174 |     list     = for_effect(listing),
175 |     run      = for_effect(run),
176 |     next     = lambda: next_line(pc),
177 |     new      = for_effect(new),
178 |     load     = for_effect(load),
179 |     save     = for_effect(save),
180 |     write    = for_effect(lambda x: sys.stdout.write(str(x))),
181 |     space    = for_effect(lambda: sys.stdout.write(' ')),
182 |     newline  = for_effect(lambda: sys.stdout.write('\n')),
183 | )
184 | 
185 | 
186 | if __name__ == '__main__':
187 |     chat()
188 |     
189 | ## basic.command('100 print "hello"')
190 | #. ()
191 | ## lines
192 | #. [(100, 'print "hello"')]
193 | ## basic.command('100 print "goodbye"')
194 | #. ()
195 | ## lines
196 | #. [(100, 'print "goodbye"')]
197 | ## basic.command('99 print 42,')
198 | #. ()
199 | ## lines
200 | #. [(99, 'print 42,'), (100, 'print "goodbye"')]
201 | 
202 | ## basic.command('run')
203 | #. 42 goodbye
204 | #. ()
205 | 
206 | 
207 | ## basic.command('print')
208 | #. (None,)
209 | ## basic.command('let x = 5')
210 | #. (None,)
211 | ## basic.command('print x*x')
212 | #. 25
213 | #. (None,)
214 | ## basic.command('print 2+2; -5, "hi"')
215 | #. 4-5 hi
216 | #. (None,)
217 | ## basic.command('? 42 * (5-3) + -2^2')
218 | #. 80
219 | #. (None,)
220 | ## basic.command('print 2^3^2, ')
221 | #. 512 
222 | #. (None,)
223 | ## basic.command('print 5-3-1')
224 | #. 1
225 | #. (None,)
226 | ## basic.command('print 3/2')
227 | #. 1
228 | #. (None,)
229 | 
230 | ## basic.command('new')
231 | #. ()
232 | ## basic.command('load countdown.bas')
233 | #. ()
234 | ## basic.command('list')
235 | #. 10 let a = 10
236 | #. 20 if a < 0 then 60
237 | #. 30 print a
238 | #. 40 a = a - 1
239 | #. 50 goto 20
240 | #. 60 print "Blast off!"
241 | #. 70 end
242 | #. (None,)
243 | ## basic.command('run')
244 | #. 10
245 | #. 9
246 | #. 8
247 | #. 7
248 | #. 6
249 | #. 5
250 | #. 4
251 | #. 3
252 | #. 2
253 | #. 1
254 | #. 0
255 | #. Blast off!
256 | #. ()
257 | 


--------------------------------------------------------------------------------
/eg_calc.py:
--------------------------------------------------------------------------------
 1 | """
 2 | The customary calculator example.
 3 | """
 4 | 
 5 | import operator
 6 | from parson import Grammar
 7 | 
 8 | calc = Grammar(r"""  exp0 :end.
 9 | 
10 | exp0  :  exp1 ( '+'  exp1 :add
11 |               | '-'  exp1 :sub )*.
12 | exp1  :  exp2 ( '*'  exp2 :mul
13 |               | '//' exp2 :div
14 |               | '/'  exp2 :truediv
15 |               | '%'  exp2 :mod )*.
16 | exp2  :  exp3 ( '^'  exp2 :pow )?.
17 | 
18 | exp3  :  '(' exp0 ')'
19 |       |  '-' exp1 :neg
20 |       |  /(\d+)/  :int.
21 | 
22 | FNORD~:  /\s*/.
23 | 
24 | """).bind(operator).expecting_one_result()
25 | 
26 | ## calc('42 * (5-3) + -2^2')
27 | #. 80
28 | ## calc('2^3^2')
29 | #. 512
30 | ## calc('5-3-1')
31 | #. 1
32 | ## calc('3//2')
33 | #. 1
34 | ## calc('3/2')
35 | #. 1.5
36 | 


--------------------------------------------------------------------------------
/eg_calc_compile.py:
--------------------------------------------------------------------------------
 1 | """
 2 | After http://www.vpri.org/pdf/rn2010001_programm.pdf
 3 | """
 4 | 
 5 | from parson import Grammar
 6 | 
 7 | def assign(v, exp):  return exp(0) + ['sw r0, ' + v]
 8 | 
 9 | def ld_const(value): return lambda s: ['lc r%d, %d' % (s, value)]
10 | def ld_var(name):    return lambda s: ['lw r%d, %s' % (s, name)]
11 | 
12 | def add(exp1, exp2): return lambda s: (exp1(s) + exp2(s+1)
13 |                                        + ['add r%d, r%d, r%d' % (s, s+1, s)])
14 | def mul(exp1, exp2): return lambda s: (exp1(s) + exp2(s+1)
15 |                                        + ['mul r%d, r%d, r%d' % (s, s+1, s)])
16 | 
17 | g = Grammar(r"""  stmt :end.
18 | 
19 | stmt  :  ident ':=' exp0   :assign.
20 | 
21 | exp0  :  exp1 ('+' exp1    :add)*.
22 | exp1  :  exp2 ('*' exp2    :mul)*.
23 | 
24 | exp2  :  '(' exp0 ')'
25 |       |  /(\d+)/ :int      :ld_const
26 |       |  ident             :ld_var.
27 | 
28 | ident :  /([A-Za-z]+)/.
29 | 
30 | FNORD~:  /\s*/.
31 | 
32 | """)(**globals()).expecting_one_result()
33 | 
34 | ## for line in g('v := 42 * (5+3) + 2*2'): print line
35 | #. lc r0, 42
36 | #. lc r1, 5
37 | #. lc r2, 3
38 | #. add r1, r2, r1
39 | #. mul r0, r1, r0
40 | #. lc r1, 2
41 | #. lc r2, 2
42 | #. mul r1, r2, r1
43 | #. add r0, r1, r0
44 | #. sw r0, v
45 | 


--------------------------------------------------------------------------------
/eg_calc_to_rpn.py:
--------------------------------------------------------------------------------
 1 | """
 2 | Tiny example of 'compiling'.
 3 | """
 4 | 
 5 | from parson import Grammar, alter
 6 | 
 7 | g = Grammar(r"""  stmt* :end.
 8 | 
 9 | stmt   :  ident '=' exp0 ';'  :assign.
10 | 
11 | exp0   :  exp1 ('+' exp1      :'add')*.
12 | exp1   :  exp2 ('*' exp2      :'mul')*.
13 | 
14 | exp2   :  '(' exp0 ')'
15 |        |  /(\d+)/
16 |        |  ident               :'fetch'.
17 | 
18 | ident  :  /([A-Za-z]+)/.
19 | 
20 | FNORD ~:  /\s*/.
21 | """)(assign=alter(lambda name, *rpn: rpn + (name, 'store')))
22 | 
23 | ## print ' '.join(g('v = 42 * (5+3) + 2*2; v = v + 1;'))
24 | #. 42 5 3 add mul 2 2 mul add v store v fetch 1 add v store
25 | 


--------------------------------------------------------------------------------
/eg_ebnf/c_emit.py:
--------------------------------------------------------------------------------
  1 | """
  2 | Generate a parser in C.
  3 | """
  4 | 
  5 | import codecs
  6 | from structs import Visitor
  7 | 
  8 | def gen_kinds_enum(self):
  9 |     return '\n'.join(gen_kinds(self))
 10 | 
 11 | def gen_parser(self):
 12 |     return '\n'.join(codegen(self))
 13 | 
 14 | # TODO shouldn't an EOF also be a kind?
 15 | def gen_kinds(grammar):
 16 |     tokens = grammar.lexer_symbols()
 17 |     kinds = sorted(map(c_encode_token, tokens))
 18 |     yield 'enum {'
 19 |     for kind in kinds:
 20 |         yield kind + ','
 21 |     yield '};'
 22 | 
 23 | def gen_lexer_fns(grammar):
 24 |     syms = grammar.lexer_symbols()
 25 |     assert all(t.text for t in syms)
 26 |     assert len(syms) == len(set(t.text for t in syms))
 27 |     lits = tuple(t for t in syms if t.kind == 'literal')
 28 |     kwds = tuple(t for t in syms if t.kind == 'keyword')
 29 |     yield gen_lexer_fn('lex_lits', lits)
 30 |     yield ''
 31 |     yield gen_lexer_fn('lex_keywords', kwds)    
 32 |     # TODO skip lex_keywords if no keywords. In principle there might be no lits, too.
 33 | 
 34 | def gen_lexer_fn(name, syms):
 35 |     return ('void %s(void) %s'
 36 |             % (name, embrace('\n'.join(gen_trie_lexer(syms)))))
 37 | 
 38 | def gen_trie_lexer(syms):
 39 |     trie = sprout({t.text: t for t in syms})
 40 |     for line in gen_lex_dispatch(trie, 0):
 41 |         yield line
 42 | 
 43 | def sprout(rel):
 44 |     """Given a map of {string: value}, represent it as a trie
 45 |     (opt_value_for_empty_string, {leading_char: subtrie})."""
 46 |     parts = map_from_relation((k[0], (k[1:], v))
 47 |                               for k,v in rel.items() if k)
 48 |     return (rel.get(''),
 49 |             {head: sprout(dict(tails)) for head,tails in parts.items()})
 50 | 
 51 | def map_from_relation(pairs):
 52 |     result = {}
 53 |     for k, v in pairs:
 54 |         result.setdefault(k, []).append(v)
 55 |     return result
 56 | 
 57 | def gen_lex_dispatch((opt_on_empty, branches), offset):
 58 |     heads = sorted(branches.keys())
 59 |     if opt_on_empty:
 60 |         default = ('token.kind = %s; scan += %d; return;'
 61 |                    % (c_encode_token(opt_on_empty), offset))
 62 |     else:
 63 |         default = ''
 64 |     if heads:
 65 |         yield 'switch (scan[%d]) {' % offset
 66 |         for head in heads:
 67 |             yield 'case %s:' % c_char_literal(head)
 68 |             for line in gen_lex_dispatch(branches[head], offset + 1):
 69 |                 yield '  ' + line
 70 |             yield '  break;'
 71 |         if default:
 72 |             yield 'default:'
 73 |             yield '  ' + default
 74 |         yield '}'
 75 |     elif default:
 76 |         yield default
 77 | 
 78 | def c_char_literal(ch):
 79 |     # TODO anywhere this doesn't match C?
 80 |     return "'%s'" % codecs.encode(ch, 'string_escape')
 81 | 
 82 | def c_encode_token(token):
 83 |     # TODO rename to TOKEN_%s or something
 84 |     return 'kind_%s' % ''.join(escapes.get(c, c) for c in token.text)
 85 | 
 86 | escapes = {
 87 |     '!': '_BANG',
 88 |     '@': '_AT',
 89 |     '#': '_HASH',
 90 |     '$': '_DOLLAR',
 91 |     '%': '_PERCENT',
 92 |     '^': '_HAT',
 93 |     '&': '_AMPERSAND',
 94 |     '*': '_STAR',
 95 |     '(': '_LPAREN',
 96 |     ')': '_RPAREN',
 97 |     '-': '_DASH',
 98 |     '_': '_UNDERSCORE',
 99 |     '\\': '_BACKSLASH',
100 |     '|': '_BAR',
101 |     "'": '_QUOTE',
102 |     '"': '_DOUBLEQUOTE',
103 |     '/': '_SLASH',
104 |     '?': '_QUESTION',
105 |     ',': '_COMMA',
106 |     '.': '_DOT',
107 |     '<': '_LESS',
108 |     '>': '_GREATER',
109 |     '[': '_LBRACKET',
110 |     ']': '_RBRACKET',
111 |     '=': '_EQUALS',
112 |     '+': '_PLUS',
113 |     '`': '_BACKQUOTE',
114 |     '~': '_TILDE',
115 |     '{': '_LBRACE',
116 |     '}': '_RBRACE',
117 |     ';': '_SEMICOLON',
118 |     ':': '_COLON',
119 | }
120 | 
121 | def codegen(grammar):
122 |     for plaint in grammar.errors:
123 |         yield '// ' + plaint
124 |     if grammar.errors: yield ''
125 |     for block in gen_lexer_fns(grammar):
126 |         yield block
127 |     yield ''
128 |     for name in grammar.nonterminals:
129 |         yield 'void parse_%s(void);' % name
130 |     for name in grammar.nonterminals:
131 |         body = gen(grammar.directed[name])
132 |         yield ''
133 |         yield 'void parse_%s(void) %s' % (name, embrace(body))
134 | 
135 | def embrace(s): return '{%s\n}' % indent('\n' + s)
136 | def indent(s): return s.replace('\n', '\n  ')
137 | 
138 | class Gen(Visitor):
139 |     def Empty(self, t):  return ''
140 |     def Symbol(self, t): return 'eat(%s);' % c_encode_token(t)
141 |     def Call(self, t):   return 'parse_%s();' % t.name
142 |     def Branch(self, t): return gen_switch(t)
143 |     def Fail(self, t):   return 'parser_fail();'
144 |     def Chain(self, t):  return '\n'.join(filter(None, [self(t.e1), self(t.e2)]))
145 |     def Loop(self, t):   return gen_while(t.firsts, self(t.body))
146 |     def Action(self, t): return '/* XXX action */'
147 | gen = Gen()
148 | 
149 | def gen_while(firsts, body):
150 |     test = ' || '.join(map(gen_test, sorted(firsts)))
151 |     return 'while (%s) %s' % (test, embrace(body))
152 | 
153 | def gen_test(token):
154 |     return 'token.kind == %s' % c_encode_token(token)
155 | 
156 | def gen_switch(t):
157 |     cases = ['%s %s' % ('\n'.join('case %s:' % c_encode_token(c)
158 |                                   for c in sorted(kinds)),
159 |                         embrace(gen(alt)))
160 |              for kinds, alt in t.cases]
161 |     default = 'default: ' + embrace(gen(t.default))
162 |     return 'switch (token.kind) ' + embrace(' break;\n'.join(cases + [default]))
163 | 
164 | 
165 | # Smoke test
166 | 
167 | ## from ebnf import Grammar, eg
168 | ## import operator
169 | ## actions = dict(X=lambda: 3, **operator.__dict__)
170 | 
171 | ## egg = Grammar(eg, actions)
172 | 
173 | ## print gen_parser(egg)
174 | #. void lex_lits(void) {
175 | #.   switch (scan[0]) {
176 | #.   case '(':
177 | #.     token.kind = kind__LPAREN; scan += 1; return;
178 | #.     break;
179 | #.   case ')':
180 | #.     token.kind = kind__RPAREN; scan += 1; return;
181 | #.     break;
182 | #.   case '*':
183 | #.     token.kind = kind__STAR; scan += 1; return;
184 | #.     break;
185 | #.   case '+':
186 | #.     token.kind = kind__PLUS; scan += 1; return;
187 | #.     break;
188 | #.   case '-':
189 | #.     token.kind = kind__DASH; scan += 1; return;
190 | #.     break;
191 | #.   case 'b':
192 | #.     token.kind = kind_b; scan += 1; return;
193 | #.     break;
194 | #.   case 'x':
195 | #.     token.kind = kind_x; scan += 1; return;
196 | #.     break;
197 | #.   case 'y':
198 | #.     token.kind = kind_y; scan += 1; return;
199 | #.     break;
200 | #.   }
201 | #. }
202 | #. 
203 | #. void lex_keywords(void) {
204 | #.   
205 | #. }
206 | #. 
207 | #. void parse_A(void);
208 | #. void parse_B(void);
209 | #. void parse_C(void);
210 | #. void parse_exp(void);
211 | #. void parse_term(void);
212 | #. void parse_factor(void);
213 | #. 
214 | #. void parse_A(void) {
215 | #.   switch (token.kind) {
216 | #.     case kind_b: {
217 | #.       parse_B();
218 | #.       eat(kind_x);
219 | #.       parse_A();
220 | #.     } break;
221 | #.     case kind_y: {
222 | #.       eat(kind_y);
223 | #.     } break;
224 | #.     default: {
225 | #.       parser_fail();
226 | #.     }
227 | #.   }
228 | #. }
229 | #. 
230 | #. void parse_B(void) {
231 | #.   eat(kind_b);
232 | #. }
233 | #. 
234 | #. void parse_C(void) {
235 | #.   
236 | #. }
237 | #. 
238 | #. void parse_exp(void) {
239 | #.   parse_term();
240 | #.   switch (token.kind) {
241 | #.     case kind__PLUS: {
242 | #.       eat(kind__PLUS);
243 | #.       parse_exp();
244 | #.       /* XXX action */
245 | #.     } break;
246 | #.     case kind__DASH: {
247 | #.       eat(kind__DASH);
248 | #.       parse_exp();
249 | #.       /* XXX action */
250 | #.     } break;
251 | #.     default: {
252 | #.       
253 | #.     }
254 | #.   }
255 | #. }
256 | #. 
257 | #. void parse_term(void) {
258 | #.   parse_factor();
259 | #.   while (token.kind == kind__STAR) {
260 | #.     eat(kind__STAR);
261 | #.     parse_factor();
262 | #.     /* XXX action */
263 | #.   }
264 | #. }
265 | #. 
266 | #. void parse_factor(void) {
267 | #.   switch (token.kind) {
268 | #.     case kind_x: {
269 | #.       eat(kind_x);
270 | #.       /* XXX action */
271 | #.     } break;
272 | #.     case kind__LPAREN: {
273 | #.       eat(kind__LPAREN);
274 | #.       parse_exp();
275 | #.       eat(kind__RPAREN);
276 | #.     } break;
277 | #.     default: {
278 | #.       parser_fail();
279 | #.     }
280 | #.   }
281 | #. }
282 | 


--------------------------------------------------------------------------------
/eg_ebnf/metagrammar.py:
--------------------------------------------------------------------------------
 1 | """
 2 | Abstract and concrete syntax of grammars.
 3 | """
 4 | 
 5 | from structs import Struct as _S
 6 | 
 7 | class Empty (_S('')): pass
 8 | class Symbol(_S('text kind')): pass
 9 | class Call  (_S('name')): pass
10 | class Either(_S('e1 e2')): pass
11 | class Chain (_S('e1 e2')): pass
12 | class Star  (_S('e1')): pass
13 | class Action(_S('name')): pass
14 | 
15 | # TODO more efficient implementations:
16 | def Maybe(e1):     return Either(e1, Empty())
17 | def Plus(e1):      return Chain(e1, Star(e1))
18 | def Plus2(e1, e2): return Chain(e1, Star(Chain(e2, e1)))
19 | def Star2(e1, e2): return Maybe(Plus2(e1, e2))
20 | 
21 | metagrammar_text = r"""
22 | '' rule* :end.
23 | 
24 | rule         :  name ':' exp '.' :hug.
25 | 
26 | exp          :  term ('|' exp :Either)?
27 |              |                :Empty.
28 | 
29 | term         :  factor (term :Chain)?.
30 | factor       :  primary ('**' primary :Star2
31 |                         |'++' primary :Plus2
32 |                         |'*'          :Star
33 |                         |'+'          :Plus
34 |                         |'?'          :Maybe
35 |                         )?.
36 | 
37 | primary      :  qstring      :'literal' :Symbol
38 |              |  dqstring     :'keyword' :Symbol
39 |              |  '$' name     :'lexer'   :Symbol
40 |              |  name         :Call
41 |              |  ':' name     :Action
42 |              |  ':' qstring  :Action
43 |              |  '[' exp ']' # Dunno if we'll still want this for semantics.
44 |                             # I'm keeping this production enabled because it's
45 |                             # used in itsy.grammar, but XXX this should be either
46 |                             # deleted or given a proper semantic action.
47 |              |  '(' exp ')'.
48 | 
49 | name         :  /([A-Za-z_]\w*)/.
50 | 
51 | qstring     ~:  /'/ quoted_char* /'/ FNORD :join.
52 | dqstring    ~:  '"' dquoted_char* '"' FNORD :join.
53 | 
54 | quoted_char ~:  /\\(.)/ | /([^'])/.
55 | dquoted_char~:  /\\(.)/ | /([^"])/.
56 | 
57 | FNORD       ~:  whitespace?.
58 | whitespace  ~:  /(?:\s|#.*)+/.
59 | """
60 | 


--------------------------------------------------------------------------------
/eg_ebnf/notes.text:
--------------------------------------------------------------------------------
 1 | see also
 2 | https://python-history.blogspot.com/2018/05/the-origins-of-pgen.html
 3 | https://github.com/rvirding/spell1
 4 | https://os.ghalkes.nl/LLnextgen/
 5 | 
 6 | def Maybe(e1):     return Either(e1, Empty())
 7 | def Plus(e1):      return Chain(e1, Star(e1))                # inefficient because dup
 8 | def Plus2(e1, e2): return Chain(e1, Star(Chain(e2, e1)))     # inefficient because dup
 9 | def Star2(e1, e2): return Maybe(Plus2(e1, e2))
10 | # TODO more efficient implementations
11 | So, how to do that? We could generate custom code for each, but that's
12 | extra work, especially if we want to vary backends.
13 | 
14 | [[e*]]     = loop { unless(!!e) break; [[e]]; }
15 | [[e+]]     = loop { [[e]]; unless(!!e) break; }
16 | [[e++sep]] = loop { [[e]]; unless(!!sep) break; [[sep]]; }
17 | [[e**sep]] = if(!!e) [[e++sep]];    # assuming e is not nullable
18 | 
19 | This suggests replacing the Star type with a Loop type like
20 | 
21 |   Loop(break_at, es)   
22 |   where break_at is an index in [0..len(es)]
23 |   inserting a break test at that index,
24 |   which checks against the first-set of es[break_at % len(es)]
25 | 
26 | or similarly
27 | 
28 |   Loop(es_before_break, es_after_break)
29 | 
30 | (These lists es here can be restricted to length <= 1, but they do
31 | require empty-list to be distinct from Empty(), to make it clear when
32 | the break test's lookahead wraps around.)
33 | 
34 | nullable(Loop(before, after)) = nullable(before)
35 | firsts(Loop(before, after))   = firsts(before) | (firsts(after) if nullable(before) else {})
36 | # I guess.
37 | 
38 | Another approach: just translate everything to BNF and let tail-call
39 | optimization sort it out. Not a crazy idea, but I think it'd take more
40 | work in imperative target languages with no goto, and produce
41 | less-predictable code there, and might not fly well with semantic
42 | actions.
43 | 
44 | I think we could simplify the current code by combining analyze and
45 | directify into one function (saving the analysis and the direct-form
46 | for each rule). And coalesce the interpreter and VM compiler into a
47 | nonrecursive interpreter.
48 | 


--------------------------------------------------------------------------------
/eg_ebnf/structs.py:
--------------------------------------------------------------------------------
  1 | """
  2 | Define a named-tuple-like type, but for immutable values, and simpler.
  3 | Also Visitor to dispatch on datatypes defined this way.
  4 | """
  5 | 
  6 | # TODO figure out how to use __slots__
  7 | 
  8 | def Struct(field_names, name=None, supertype=(object,)):
  9 |     if isinstance(field_names, (str, unicode)):
 10 |         field_names = tuple(field_names.split())
 11 | 
 12 |     if name is None:
 13 |         name = 'Struct<%s>' % ','.join(field_names)
 14 |         def get_name(self): return self.__class__.__name__
 15 |     else:
 16 |         def get_name(self): return name
 17 | 
 18 |     def __init__(self, *args):
 19 |         if len(field_names) != len(args):
 20 | 	    raise TypeError("%s takes %d arguments (%d given)"
 21 |                             % (get_name(self), len(field_names), len(args)))
 22 |         self.__dict__.update(zip(field_names, args))
 23 | 
 24 |     def __repr__(self):
 25 |         return '%s(%s)' % (get_name(self), ', '.join(repr(getattr(self, f))
 26 |                                                      for f in field_names))
 27 | 
 28 |     def __hash__(self):
 29 |         return hash((name, tuple(map(self.__dict__.__getitem__, field_names))))
 30 | 
 31 |     def __eq__(self, other):
 32 |         return (self.__class__ is other.__class__    # I guess...
 33 |                 and all(self.__dict__[field] == other.__dict__[field]
 34 |                         for field in field_names))
 35 |     def __ne__(self, other):
 36 |         return not __eq__(self, other)
 37 |     def compare(self, other):
 38 |         if self.__class__ is not other.__class__:
 39 |             raise NotImplemented
 40 |         return cmp(map(self.__dict__.__getitem__, field_names),
 41 |                    map(other.__dict__.__getitem__, field_names))
 42 | 
 43 |     # (for use with pprint)
 44 |     def my_as_sexpr(self):         # XXX better name?
 45 |         return (get_name(self),) + tuple(as_sexpr(getattr(self, f))
 46 |                                          for f in field_names)
 47 |     my_as_sexpr.__name__ = 'as_sexpr'
 48 | 
 49 |     return type(name,
 50 |                 supertype,
 51 |                 dict(__init__=__init__,
 52 |                      __repr__=__repr__,
 53 |                      __hash__=__hash__,
 54 |                      __eq__=__eq__,
 55 |                      __ne__=__ne__,
 56 |                      __lt__=lambda self, other: compare(self, other) < 0,
 57 |                      __le__=lambda self, other: compare(self, other) <= 0,
 58 |                      __gt__=lambda self, other: compare(self, other) > 0,
 59 |                      __ge__=lambda self, other: compare(self, other) >= 0,
 60 |                      as_sexpr=my_as_sexpr,
 61 |                      _meta_fields=field_names))
 62 | 
 63 | def as_sexpr(obj):
 64 |     if hasattr(obj, 'as_sexpr'):
 65 |         return getattr(obj, 'as_sexpr')()
 66 |     elif isinstance(obj, list):
 67 |         return map(as_sexpr, obj)
 68 |     elif isinstance(obj, tuple):
 69 |         return tuple(map(as_sexpr, obj))
 70 |     else:
 71 |         return obj
 72 | 
 73 | 
 74 | # Is there a nicer way to do this?
 75 | 
 76 | class Visitor(object):
 77 |     def __call__(self, subject, *args):
 78 |         tag = subject.__class__.__name__
 79 |         method = getattr(self, tag, None)
 80 |         if method is None:
 81 |             try:
 82 |                 method = getattr(self, 'default')
 83 |             except AttributeError:
 84 |                 raise AttributeError("%r has no method for %r argument %r" % (self, tag, subject))
 85 |         return method(subject, *args)
 86 | 
 87 | 
 88 | # Test comparisons and hashing:
 89 | ## class Action(Struct('name')): pass
 90 | ## Action('x') == Action('x')
 91 | #. True
 92 | ## Action('x') == Action('y')
 93 | #. False
 94 | ## Action('x') < Action('y')
 95 | #. True
 96 | ## Action('x') > Action('y')
 97 | #. False
 98 | ## d = {Action('x'): 1}
 99 | ## d[Action('x')]
100 | #. 1
101 | ## set([Action('x'), Action('x')])
102 | #. set([Action('x')])
103 | 


--------------------------------------------------------------------------------
/eg_ebnf/vm.py:
--------------------------------------------------------------------------------
  1 | """
  2 | Parse by interpreting a compiled VM.
  3 | """
  4 | 
  5 | from structs import Visitor
  6 | from ebnf import Grammar
  7 | 
  8 | def compile_grammar(grammar):
  9 |     labels = {}
 10 |     insns = []
 11 |     for name in grammar.nonterminals:
 12 |         labels[name] = len(insns)
 13 |         insns.extend(compiling(grammar.directed[name]))
 14 |         insns.append(('return', None))
 15 |     # TODO: make sure the client knows about grammar.errors
 16 |     return Code(insns, labels, grammar.actions)
 17 | 
 18 | class Code(object):
 19 |     def __init__(self, insns, labels, actions):
 20 |         self.insns = insns
 21 |         self.labels = labels
 22 |         self.actions = actions
 23 |         self.label_of_addr = dict(zip(labels.values(), labels.keys()))
 24 | 
 25 |     def show(self):
 26 |         for pc in range(len(self.insns)):
 27 |             self.show_insn(pc)
 28 | 
 29 |     def show_insn(self, pc):
 30 |         label = self.label_of_addr.get(pc, '')
 31 |         op, arg = self.insns[pc]
 32 |         if op == 'return':
 33 |             arg = ''
 34 |         elif op == 'branch':
 35 |             cases, default = arg
 36 |             cases = [(','.join(sorted(s.text for s in kinds)), dest)
 37 |                      for kinds, dest in cases]
 38 |             arg = cases, default
 39 |         print '%-10s %3d %-6s %r' % (label, pc, op, arg)
 40 | 
 41 |     def parse(self, tokens, start='start'):
 42 |         limit = 100
 43 |         tokens = list(tokens) + [None]  # EOF sentinel
 44 |         i = 0
 45 |         frames = [[]]
 46 |         return_stack = [None]
 47 |         pc = self.labels[start]
 48 |         print 'starting', pc
 49 |         while pc is not None:
 50 |             limit -= 1
 51 |             if limit <= 0: break
 52 |             print ' '*50, zip(return_stack, frames)
 53 |             self.show_insn(pc)
 54 |             op, arg = self.insns[pc]
 55 |             pc += 1
 56 |             if op == 'call':
 57 |                 frames.append([])
 58 |                 return_stack.append(pc)
 59 |                 pc = self.labels[arg]
 60 |             elif op == 'return':
 61 |                 pc = return_stack.pop()
 62 |                 results = frames.pop()
 63 |                 if pc is None:
 64 |                     return results
 65 |                 frames[-1].extend(results)
 66 |             elif op == 'eat':
 67 |                 if tokens[i] == arg.text:
 68 |                     i += 1
 69 |                 else:
 70 |                     raise SyntaxError("Missing %r" % arg)
 71 |             elif op == 'branch':
 72 |                 cases, default = arg
 73 |                 for kinds, dest in cases:
 74 |                     if tokens[i] in [s.text for s in kinds]: # XXX awkward
 75 |                         pc += dest
 76 |                         break
 77 |                 else:
 78 |                     pc += default
 79 |             elif op == 'jump':
 80 |                 pc += arg
 81 |             elif op == 'act':
 82 |                 action = self.actions[arg]
 83 |                 frame = frames[-1]
 84 |                 frame[:] = [action(*frame)]
 85 |             elif op == 'fail':
 86 |                 raise SyntaxError("Unexpected token %r; expecting one of %r"
 87 |                                   % (tokens[i], sorted(arg)))
 88 |             else:
 89 |                 assert False
 90 | 
 91 | class Compiling(Visitor):
 92 |     def Empty(self, t):  return []
 93 |     def Symbol(self, t): return [('eat', t)]
 94 |     def Call(self, t):   return [('call', t.name)]
 95 |     def Branch(self, t): return compile_branch(t)
 96 |     def Fail(self, t):   return [('fail', t.possibles)]
 97 |     def Chain(self, t):  return self(t.e1) + self(t.e2)
 98 |     def Loop(self, t):   return compile_loop(t)
 99 |     def Action(self, t): return [('act', t.name)]
100 | compiling = Compiling()
101 | 
102 | def compile_branch(t):
103 |     cases = []
104 |     insns = []
105 |     fixups = []
106 |     for kinds, alt in t.cases:
107 |         dest = len(insns)       # Offset from the branch insn
108 |         cases.append((kinds, dest))
109 |         insns.extend(compiling(alt))
110 |         fixups.append(len(insns))
111 |         insns.append(None)      # to be fixed up
112 |     default = len(insns)        # Offset from the branch insn
113 |     insns.extend(compiling(t.default))
114 |     for addr in fixups:
115 |         insns[addr] = ('jump', len(insns) - (addr + 1)) # Skip to the common exit point
116 |     insns.insert(0, ('branch', (cases, default)))
117 |     return insns
118 | 
119 | def compile_loop(t):
120 |     body = compiling(t.body)
121 |     return ([('branch', ([(t.firsts, 0)], len(body)+1))]
122 |             + body
123 |             + [('jump', -len(body)-2)])
124 | 
125 | 
126 | # Smoke test
127 | 
128 | ## from ebnf import eg
129 | ## import operator
130 | ## actions = dict(X=lambda: 3, **operator.__dict__)
131 | ## egg = Grammar(eg, actions)
132 | 
133 | ## egc = compile_grammar(egg)
134 | ## egc.show()
135 | #. A            0 branch ([('b', 0), ('y', 4)], 6)
136 | #.              1 call   'B'
137 | #.              2 eat    Symbol('x', 'literal')
138 | #.              3 call   'A'
139 | #.              4 jump   3
140 | #.              5 eat    Symbol('y', 'literal')
141 | #.              6 jump   1
142 | #.              7 fail   frozenset([Symbol('b', 'literal'), Symbol('y', 'literal')])
143 | #.              8 return ''
144 | #. B            9 eat    Symbol('b', 'literal')
145 | #.             10 return ''
146 | #. C           11 return ''
147 | #. exp         12 call   'term'
148 | #.             13 branch ([('+', 0), ('-', 4)], 8)
149 | #.             14 eat    Symbol('+', 'literal')
150 | #.             15 call   'exp'
151 | #.             16 act    'add'
152 | #.             17 jump   4
153 | #.             18 eat    Symbol('-', 'literal')
154 | #.             19 call   'exp'
155 | #.             20 act    'sub'
156 | #.             21 jump   0
157 | #.             22 return ''
158 | #. term        23 call   'factor'
159 | #.             24 branch ([('*', 0)], 4)
160 | #.             25 eat    Symbol('*', 'literal')
161 | #.             26 call   'factor'
162 | #.             27 act    'mul'
163 | #.             28 jump   -5
164 | #.             29 return ''
165 | #. factor      30 branch ([('x', 0), ('(', 3)], 7)
166 | #.             31 eat    Symbol('x', 'literal')
167 | #.             32 act    'X'
168 | #.             33 jump   5
169 | #.             34 eat    Symbol('(', 'literal')
170 | #.             35 call   'exp'
171 | #.             36 eat    Symbol(')', 'literal')
172 | #.             37 jump   1
173 | #.             38 fail   frozenset([Symbol('x', 'literal'), Symbol('(', 'literal')])
174 | #.             39 return ''
175 | ### egc.parse("x", start='exp')
176 | ### egc.parse("x+x-x", start='exp')
177 | ### egc.parse("x+(x*x+x)", start='exp')
178 | 


--------------------------------------------------------------------------------
/eg_fp.py:
--------------------------------------------------------------------------------
  1 | """
  2 | A concatenative variant of John Backus's FP language.
  3 | http://en.wikipedia.org/wiki/FP_%28programming_language%29
  4 | """
  5 | 
  6 | from __future__ import division
  7 | 
  8 | from parson import Grammar
  9 | 
 10 | program = {}
 11 | 
 12 | def FP(text):
 13 |     global program
 14 |     program = dict(primitives)
 15 |     program.update(fp_parse(text))
 16 | 
 17 | def mk_def(name, exp): return (name, exp)
 18 | def mk_call(name):     return lambda arg: program[name](arg)
 19 | def mk_if(c, t, e):    return lambda arg: (t if c(arg) else e)(arg)
 20 | def mk_compose(g, f):  return lambda arg: f(g(arg))
 21 | def mk_map(f):         return lambda arg: map(f, arg)
 22 | def mk_insertl(f):     return lambda arg: insertl(f, arg)
 23 | def mk_insertr(f):     return lambda arg: insertr(f, arg)
 24 | def mk_filter(f):      return lambda arg: filter(f, arg)
 25 | def mk_aref(n):        return (lambda arg: arg[n-1]) if 0 < n else (lambda arg: arg[n])
 26 | def mk_literal(n):     return lambda _: n
 27 | def mk_op(name):       return ops[name]
 28 | def mk_list(*exps):    return lambda arg: [f(arg) for f in exps]
 29 | 
 30 | escape = lambda s: s.decode('unicode-escape')
 31 | 
 32 | fp_parse = Grammar(r"""  def* :end.
 33 | 
 34 | def     : name '==' exp '.'      :mk_def.
 35 | 
 36 | exp     : term ('->' term ';' exp :mk_if)?.
 37 | 
 38 | term    : factor (term :mk_compose)?.
 39 | 
 40 | factor  : '@' factor             :mk_map
 41 |         | '/' factor             :mk_insertr
 42 |         | '\\' factor            :mk_insertl
 43 |         | '?' factor             :mk_filter
 44 |         | primary.
 45 | 
 46 | primary : integer                :mk_aref
 47 |         | '~' integer            :mk_literal
 48 |         | string                 :mk_literal
 49 |         | name                   :mk_call
 50 |         | /([<=>*%+-])/~ !opchar ''
 51 |                                  :mk_op
 52 |         | '[' exp ** ',' ']'     :mk_list
 53 |         | '(' exp ')'.
 54 | 
 55 | opchar  : /[\w@\/\\?<=>*%+-]/.
 56 | 
 57 | decimal : /(\d+)/                :int.
 58 | integer : /(-?\d+)/              :int.
 59 | name    : /([A-Za-z]\w*)/.
 60 | 
 61 | string ~: '"' schar* '"' FNORD   :join.
 62 | schar  ~: /([^\x00-\x1f"\\])/
 63 |         | /\\(["\\])/
 64 |         | /(\\[bfnrt])/          :escape.
 65 | 
 66 | FNORD  ~: /\s*/.
 67 | 
 68 | """)(**globals())
 69 | 
 70 | def insertl(f, xs):
 71 |     if not xs: return function_identity(f)
 72 |     return reduce(lambda x, y: f([x, y]), xs)
 73 | 
 74 | def insertr(f, xs):
 75 |     if not xs: return function_identity(f)
 76 |     z = xs[-1]
 77 |     for x in xs[-2::-1]:
 78 |         z = f([x, z])
 79 |     return z
 80 | 
 81 | add = lambda (x, y): x + y
 82 | sub = lambda (x, y): x - y
 83 | mul = lambda (x, y): x * y
 84 | divide = lambda (x, y): x / y
 85 | intdiv = lambda (x, y): x // y
 86 | mod = lambda (x, y): x % y
 87 | eq  = lambda (x, y): x == y
 88 | lt  = lambda (x, y): x < y
 89 | gt  = lambda (x, y): x > y
 90 | 
 91 | ops = {'+': add, '-': sub, '*': mul, '%': divide, # N.B. '/' is reserved for insertr
 92 |        '=': eq, '<': lt, '>': gt}
 93 | 
 94 | primitives = dict(
 95 |     apndl     = lambda (x, xs): [x] + xs,
 96 |     apndr     = lambda (xs, x): xs + [x],
 97 |     chain     = lambda lists: sum(lists, []),
 98 |     distl     = lambda (x, ys): [[x, y] for y in ys],
 99 |     distr     = lambda (xs, y): [[x, y] for x in xs],
100 |     div       = intdiv,
101 |     enumerate = lambda xs: [(x, i) for i,x in enumerate(xs, 1)], # XXX unused
102 |     id        = lambda x: x,
103 |     iota      = lambda n: range(1, n+1),
104 |     join      = lambda (strs, sep): sep.join(strs),
105 |     length    = len,
106 |     mod       = mod,
107 |     rev       = lambda xs: xs[::-1],
108 |     slice     = lambda (xs, n): [xs[:n-1], xs[n-1], xs[n:]],
109 |     sort      = sorted,
110 |     split     = lambda (s, sep): s.split(sep),
111 |     tl        = lambda xs: xs[1:],
112 |     transpose = lambda arg: zip(*arg),
113 | )
114 | primitives['and'] = lambda (x, y): x and y
115 | primitives['or']  = lambda (x, y): x or y
116 | 
117 | def function_identity(f):
118 |     if f in (add, sub): return 0
119 |     if f in (mul, divide, intdiv): return 1
120 |     # XXX could add chain, and, or, lt, gt, ...
121 |     raise Exception("No known identity element", f)
122 | 
123 | 
124 | examples = r"""
125 | factorial == iota /*.
126 | 
127 | e_sum == [~0, iota] apndl @(factorial [~1, id] %) /+.
128 | 
129 | dot == transpose @* \+.
130 | matmult == [1, 2 transpose] distr @distl @@dot.
131 | 
132 | iszero == [id, ~0] =.
133 | divisible == mod iszero.
134 | iseven == [id, ~2] divisible.
135 | 
136 | max == /(< -> 2; 1).
137 | 
138 | qsort == [length, ~2] < -> id; 
139 |          [id, 1] distr [?< @1 qsort, ?= @1, ?> @1 qsort] chain.
140 | 
141 | euler1 == iota ?([[id, ~3] divisible, [id, ~5] divisible] or) /+.
142 | 
143 | fibs == [~40, 1] < -> tl; [[1,2] +, id] apndl fibs.
144 | euler2 == [~2,~1] fibs ?iseven /+.
145 | 
146 | fibsr == [~40, -1] < -> rev tl rev; [id, [-1,-2] +] apndr fibsr.
147 | euler2r == [~1,~2] fibsr ?iseven /+.
148 | """
149 | 
150 | def defs(names): return [program[name] for name in names.split()]
151 | 
152 | ## FP(examples)
153 | ## factorial, e_sum, dot, matmult = defs('factorial e_sum dot matmult')
154 | ## divisible, euler1 = defs('divisible euler1')
155 | ## qmax, qsort = defs('max qsort')
156 | ## qmax([1, 5, 3])
157 | #. 5
158 | ## qmax([5, 1])
159 | #. 5
160 | ## qsort([])
161 | #. []
162 | ## qsort([3,1,4,1,5,9])
163 | #. [1, 1, 3, 4, 5, 9]
164 | 
165 | ## fibs, euler2, fibsr = defs('fibs euler2 fibsr')
166 | ## fibs([1,1])
167 | #. [34, 21, 13, 8, 5, 3, 2, 1, 1]
168 | ## euler2(0)
169 | #. 44
170 | ## fibsr([1,1])
171 | #. [1, 1, 2, 3, 5, 8, 13, 21, 34]
172 | 
173 | ## divisible([9, 5]), divisible([10, 5]), 
174 | #. (False, True)
175 | ## euler1(9)
176 | #. 23
177 | 
178 | ## factorial(0)
179 | #. 1
180 | ## factorial(5)
181 | #. 120
182 | 
183 | ## dot([[1,2], [3,4]])
184 | #. 11
185 | ## dot([])
186 | #. 0
187 | 
188 | ## matmult([ [], [] ])
189 | #. []
190 | ## matmult([ [[4]], [[5]] ])
191 | #. [[20]]
192 | ## matmult([ [[2,0],[0,2]], [[5,6],[7,8]] ])
193 | #. [[10, 12], [14, 16]]
194 | ## matmult([ [[0,1],[1,0]], [[5,6],[7,8]] ])
195 | #. [[7, 8], [5, 6]]
196 | 
197 | ## e_sum(20)
198 | #. 2.718281828459045
199 | 
200 | 
201 | # Inspired by James Morris, "Real programming in functional
202 | # languages", figure 1.
203 | 
204 | kwic_program = r"""
205 | kwic      == lines split  kwiclines  lines join.
206 | 
207 | kwiclines == @(words split generate) chain sort @2.
208 | generate  == [id, length iota] distl @label.
209 | label     == slice [2,
210 |                     [1, [["<",2,">"] chars join], 3] chain  words join].
211 | 
212 | chars == [id, ""].
213 | words == [id, " "].
214 | lines == [id, "\n"].
215 | """
216 | 
217 | ## FP(kwic_program)
218 | ## kwic, = defs('kwic')
219 | ## print kwic("leaves of grass\nflowers of evil")
220 | #. flowers of <evil>
221 | #. <flowers> of evil
222 | #. leaves of <grass>
223 | #. <leaves> of grass
224 | #. flowers <of> evil
225 | #. leaves <of> grass
226 | 
227 | 
228 | # Prime numbers from 3 to ?
229 | # (adapted from an example from Andy Valencia's C implementation of FP)
230 | primes_program = r"""
231 | primes     == candidates ?isprime.
232 | isprime    == [id, [id, ~4] div candidates] distl ?divisible isempty.
233 | isempty    == [length, ~0] =.
234 | divisible  == [mod, ~0] =.
235 | candidates == iota @(double add1).
236 | double     == [~2, id] *.
237 | add1       == [id, ~1] +.
238 | """
239 | ## FP(primes_program)
240 | ## primes, = defs('primes')
241 | ## primes(20)
242 | #. [3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41]
243 | 


--------------------------------------------------------------------------------
/eg_itsy/README.md:
--------------------------------------------------------------------------------
 1 | ## Itsy is not C but can translate to it directly.
 2 | 
 3 | An example language inspired by the design goals of Per Vognsen's
 4 | [Ion](https://github.com/pervognsen/bitwise/blob/master/notes/ion_motivation.md)
 5 | language, but the syntax was slapped together mostly before seeing his
 6 | work. (The operator precedences are taken from Ion, though.) Also, I'm
 7 | pretty unconcerned about familiarity to C people.
 8 | 
 9 | Not finished, and I dunno if it ever will be.
10 | 
11 | The syntax for pointers and arrays (their declaration and use) has
12 | appeared earlier in the SPECS alternative syntax for C++ (Ben Werther
13 | & Damian Conway) and in [Odin](https://github.com/odin-lang/Odin). I
14 | didn't get it from them but by following the implications of a remark
15 | in Ritchie's "The Development of the C Language": "Sethi observed that
16 | many of the nested declarations and expressions would become simpler
17 | if the indirection operator had been taken as a postfix operator
18 | instead of prefix, but by then it was too late to change." For this he
19 | references R. Sethi, "Uniform Syntax for Type Expressions and
20 | Declarators." Softw., Pract. Exper. 11 (6): 623-628
21 | (1981). Unfortunately I can't find that paper online.
22 | 


--------------------------------------------------------------------------------
/eg_itsy/ast.py:
--------------------------------------------------------------------------------
 1 | """
 2 | Abstract syntax of itsy.
 3 | """
 4 | 
 5 | from structs import Struct
 6 | 
 7 | 
 8 | # Declarations
 9 | 
10 | class To(        Struct('pos name signature body')): pass
11 | class Typedef(   Struct('pos name type')): pass
12 | class Let(       Struct('pos names type opt_exp')): pass
13 | class Enum(      Struct('pos opt_name pairs')): pass
14 | class Record(    Struct('pos kind opt_name fields')): pass # kind = 'struct' or 'union'
15 | 
16 | 
17 | # Statements
18 | 
19 | class Block(     Struct('pos parts')): pass
20 | 
21 | class Exp(       Struct('pos opt_exp')): pass
22 | class Return(    Struct('pos opt_exp')): pass
23 | class Break(     Struct('pos ')): pass
24 | class Continue(  Struct('pos ')): pass
25 | class While(     Struct('pos exp block')): pass
26 | class Do(        Struct('pos block exp')): pass
27 | class For(       Struct('pos opt_e1 opt_e2 opt_e3 block')): pass
28 | class If_stmt(   Struct('pos exp then_ opt_else')): pass
29 | class Switch(    Struct('pos exp cases')): pass
30 | 
31 | class Case(      Struct('pos exps block')): pass
32 | class Default(   Struct('pos block')): pass
33 | 
34 | 
35 | # Types
36 | 
37 | class Type_name( Struct('pos name')): pass
38 | class Pointer(   Struct('pos type')): pass
39 | class Array(     Struct('pos size type')): pass
40 | class Signature( Struct('pos params return_type')): pass  # params are (type, (name or '')) pairs
41 | 
42 | # TODO rename fields like 'type' to 'base_type' or something
43 | 
44 | class Int_type(  Struct('size signedness')): pass
45 | class Float_type(Struct('size')): pass
46 | 
47 | def Void(pos): return Type_name(pos, 'void')  # for now, anyway
48 | 
49 | def spread_params(names, type_):
50 |     return tuple((type_, name) for name in names)
51 | 
52 | def chain(*seqs):
53 |     return sum(seqs, ())
54 | 
55 | 
56 | # Expressions
57 | 
58 | class Seq(         Struct('e1 e2')): pass
59 | class Assign(      Struct('e1 opt_binop e2')): pass
60 | class If_exp(      Struct('e1 e2 e3')): pass
61 | class And(         Struct('e1 e2')): pass
62 | class Or(          Struct('e1 e2')): pass
63 | class Binary_exp(  Struct('e1 binop e2')): pass
64 | class Index(       Struct('e1 e2')): pass
65 | class Call(        Struct('e1 args')): pass
66 | class Dot(         Struct('e1 field')): pass
67 | class Deref(       Struct('e1')): pass
68 | class Post_incr(   Struct('e1 op')): pass
69 | class Cast(        Struct('e1 type')): pass
70 | 
71 | class Literal(     Struct('pos text kind')): pass
72 | class Variable(    Struct('pos name')): pass
73 | class Address_of(  Struct('pos e1')): pass
74 | class Sizeof_type( Struct('pos type')): pass
75 | class Sizeof(      Struct('pos e1')): pass
76 | class Unary_exp(   Struct('pos unop e1')): pass
77 | class Pre_incr(    Struct('pos e1 op')): pass
78 | class Compound_exp(Struct('pos exps')): pass
79 | 


--------------------------------------------------------------------------------
/eg_itsy/c_emitter.py:
--------------------------------------------------------------------------------
  1 | """
  2 | Emit C code from an AST.
  3 | """
  4 | 
  5 | from structs import Visitor
  6 | import ast
  7 | 
  8 | def indent(s):
  9 |     return s.replace('\n', '\n    ')
 10 | 
 11 | def embrace(lines):
 12 |     return '{\n    %s\n}' % indent('\n'.join(lines))
 13 | 
 14 | def opt_c_exp(opt_e, if_some='', p=0):
 15 |     return '' if opt_e is None else if_some + c_exp(opt_e, p)
 16 | 
 17 | def opt_space(opt_s):
 18 |     return ' %s' % opt_s if opt_s else ''
 19 | 
 20 | 
 21 | # Declarations and statements (either can appear in a block)
 22 | # TODO rename 'declaration' to avoid confusion with c_decl below
 23 | 
 24 | class CEmitter(Visitor):
 25 | 
 26 |     def To(self, t):
 27 |         return '%s %s' % (c_decl(t.signature, t.name), c(t.body))
 28 | 
 29 |     def Typedef(self, t):
 30 |         return 'typedef %s;' % c_decl(t.type, t.name)
 31 | 
 32 |     def Let(self, t):
 33 |         assert t.opt_exp is None or len(t.names) == 1
 34 |         assign = opt_c_exp(t.opt_exp, ' = ', elem_prec)
 35 |         return '\n'.join('%s%s;' % (c_decl(t.type, name), assign)
 36 |                          for name in t.names)
 37 | 
 38 |     def Enum(self, t):
 39 |         enums = ['%s%s,' % (name, opt_c_exp(opt_exp, ' = '))
 40 |                  for name, opt_exp in t.pairs]
 41 |         lines = []
 42 |         if t.opt_name:
 43 |             lines.append('typedef enum %s %s;' % (t.opt_name, t.opt_name))
 44 |         lines.append('enum%s %s;' % (opt_space(t.opt_name), embrace(enums)))
 45 |         return '\n'.join(lines)
 46 | 
 47 |     def Record(self, t):
 48 |         lines = []
 49 |         if t.opt_name:
 50 |             lines.append('typedef %s %s %s;' % (t.kind, t.opt_name, t.opt_name))
 51 |         c_defn = '%s%s %s;' % (t.kind,
 52 |                                opt_space(t.opt_name),
 53 |                                embrace(c_decl(type_, name) + ';'
 54 |                                        for type_, name in t.fields))
 55 |         lines.append(c_defn)
 56 |         return '\n'.join(lines)
 57 | 
 58 |     def Block(self, t):
 59 |         return embrace(map(c, t.parts));
 60 | 
 61 |     def Exp(self, t):
 62 |         return opt_c_exp(t.opt_exp) + ';'
 63 | 
 64 |     def Return(self, t):
 65 |         return 'return%s;' % opt_c_exp(t.opt_exp, ' ')
 66 | 
 67 |     def Break(self, t):
 68 |         return 'break;'
 69 | 
 70 |     def Continue(self, t):
 71 |         return 'continue;'
 72 | 
 73 |     def While(self, t):
 74 |         return 'while (%s) %s' % (c_exp(t.exp), c(t.block))
 75 | 
 76 |     def Do(self, t):
 77 |         return 'do %s while (%s);' % (c(t.block), c_exp(t.exp))
 78 | 
 79 |     def If_stmt(self, t):
 80 |         branches = []
 81 |         while isinstance(t, ast.If_stmt):
 82 |             branches.append('if (%s) %s' % (c_exp(t.exp), c(t.then_)))
 83 |             t = t.opt_else
 84 |         if t is not None:
 85 |             branches.append(c(t))
 86 |         return ' else '.join(branches)
 87 | 
 88 |     def For(self, t):
 89 |         return 'for (%s; %s; %s) %s' % (opt_c_exp(t.opt_e1),
 90 |                                         opt_c_exp(t.opt_e2),
 91 |                                         opt_c_exp(t.opt_e3),
 92 |                                         c(t.block))
 93 | 
 94 |     def Switch(self, t):
 95 |         return 'switch (%s) %s' % (c_exp(t.exp),
 96 |                                    embrace(map(c, t.cases)))
 97 | 
 98 |     def Case(self, t):
 99 |         cases = '\n'.join('case %s:' % c_exp(e) for e in t.exps)
100 |         return '%s %s break;' % (cases, c(t.block))
101 | 
102 |     def Default(self, t):
103 |         return 'default: %s break;' % c(t.block)
104 | 
105 | c = c_emit = CEmitter()
106 | 
107 | 
108 | # Types
109 | 
110 | def c_type(type_):
111 |     return c_decl(type_, '')
112 | 
113 | def c_decl(type_, name):
114 |     return ('%s %s' % decl_pair(type_, name, 0)).rstrip()
115 | 
116 | class DeclPair(Visitor):
117 | 
118 |     def Type_name(self, t, e, p): # e: expression-like C fragment, p: surrounding precedence
119 |         return t.name, e
120 | 
121 |     def Pointer(self, t, e, p):
122 |         return self(t.type,
123 |                     '*%s' % hug(e, p, 2),
124 |                     2)
125 | 
126 |     def Array(self, t, e, p):
127 |         return self(t.type,
128 |                     '%s[%s]' % (hug(e, p, 1), opt_c_exp(t.size)),
129 |                     1)
130 | 
131 |     def Signature(self, t, e, p):
132 |         c_params = ', '.join(c_decl(type_, name) for type_, name in t.params)
133 |         return self(t.return_type,
134 |                     '%s(%s)' % (hug(e, p, 1), c_params or 'void'),
135 |                     1)
136 | 
137 | decl_pair = DeclPair()
138 | 
139 | def hug(s, outer, inner):
140 |     return s if outer <= inner else '(%s)' % s # XXX is '<=' quite right? instead of '<'?
141 | 
142 | 
143 | # Expressions
144 | 
145 | def c_exp(e, p=0):              # p: surrounding precedence
146 |     return c_exp_emitter(e, p)
147 | 
148 | class CExpEmitter(Visitor):
149 | 
150 |     def Literal(self, t, p):
151 |         return t.text
152 | 
153 |     def Variable(self, t, p):
154 |         return t.name
155 | 
156 |     def Address_of(self, t, p):
157 |         return fmt1(p, unary_prec, '&%s', t.e1)
158 | 
159 |     def Sizeof_type(self, t, p):
160 |         return 'sizeof(%s)' % c_type(t.type)
161 | 
162 |     def Sizeof(self, t, p):
163 |         return fmt1(p, unary_prec, 'sizeof %s', t.e1)
164 | 
165 |     def Deref(self, t, p):
166 |         return fmt1(p, unary_prec, '*%s', t.e1)
167 | 
168 |     def Unary_exp(self, t, p):
169 |         return fmt1(p, unary_prec, t.unop + '%s', t.e1)
170 | 
171 |     def Cast(self, t, p):
172 |         return wrap(p, cast_prec, '(%s) %s' % (c_type(t.type),
173 |                                                self(t.e1, cast_prec)))
174 | 
175 |     def Seq(self, t, p):
176 |         return fmt2(p, ',', t.e1, t.e2, fmt_str = '%s%s %s')
177 | 
178 |     def Pre_incr(self, t, p):
179 |         return fmt1(p, unary_prec, t.op+'%s', t.e1)
180 | 
181 |     def Post_incr(self, t, p):
182 |         return fmt1(p, postfix_prec, '%s'+t.op, t.e1)
183 | 
184 |     def If_exp(self, t, p):
185 |         lp, rp = binaries['?:']
186 |         return wrap(p, rp, # TODO recheck that rp is the right thing here in place of the usual lp
187 |                     '%s ? %s : %s' % (self(t.e2, lp),
188 |                                       self(t.e1, 0),
189 |                                       self(t.e3, rp)))
190 | 
191 |     def Assign(self, t, p):
192 |         return fmt2(p, (t.opt_binop or '') + '=', t.e1, t.e2) # TODO clumsy
193 | 
194 |     def Binary_exp(self, t, p):
195 |         op = '^' if t.binop == '@' else t.binop
196 |         return fmt2(p, op, t.e1, t.e2)
197 | 
198 |     def Index(self, t, p):
199 |         return wrap(p, postfix_prec,
200 |                     '%s[%s]' % (self(t.e1, postfix_prec),
201 |                                 self(t.e2, 0)))
202 | 
203 |     def Call(self, t, p):
204 |         return wrap(p, postfix_prec,
205 |                     '%s(%s)' % (self(t.e1, postfix_prec),
206 |                                 ', '.join(self(e, elem_prec)
207 |                                           for e in t.args)))
208 | 
209 |     def Dot(self, t, p):
210 |         if isinstance(t.e1, ast.Deref):
211 |             lhs, op = t.e1.e1, '->'
212 |         else:
213 |             lhs, op = t.e1, '.'
214 |         return wrap(p, postfix_prec, self(lhs, postfix_prec) + op + t.field)
215 | 
216 |     def And(self, t, p):
217 |         return fmt2(p, '&&', t.e1, t.e2)
218 | 
219 |     def Or(self, t, p):
220 |         return fmt2(p, '||', t.e1, t.e2)
221 | 
222 |     def Compound_exp(self, t, p):
223 |         # XXX I think C imposes restrictions on where this can appear?
224 |         # Also, the indentation might be awful without more work
225 |         return embrace(c_exp(e, elem_prec) + ',' for e in t.exps)
226 | 
227 | c_exp_emitter = CExpEmitter()
228 | 
229 | 
230 | # Parenthesizing by precedence
231 | 
232 | infix_precedence_tower = """\
233 | ,
234 | =
235 | ?:
236 | ||
237 | &&
238 | |
239 | ^
240 | &
241 | == !=
242 | < > <= >=
243 | << >>
244 | + -
245 | * / %
246 | (cast)
247 | (unary)
248 | (postfix)""".splitlines()
249 | 
250 | binaries = {op: (2*i, 2*i+1)
251 |             for i, line in enumerate(infix_precedence_tower)
252 |             for op in line.split()}
253 | cast_prec    = binaries['(cast)'][0]
254 | unary_prec   = binaries['(unary)'][0]
255 | postfix_prec = binaries['(postfix)'][0]
256 | 
257 | elem_prec    = binaries['='][0]  # The next precedence after ','
258 | # Make the left precedence of the assignment operator be unary_expression, and right-associative:
259 | binaries['='] = (unary_prec, binaries['='][0])
260 | 
261 | # Also, ?: is also right-associative:
262 | binaries['?:'] = (binaries['?:'][0], binaries['?:'][0])
263 | 
264 | def fmt1(outer, inner, fmt_str, e1):
265 |     return wrap(outer, inner, fmt_str % c_exp(e1, inner))
266 | 
267 | def wrap(outer, inner, s):
268 |     return '(%s)' % s if inner < outer else s
269 | 
270 | def fmt2(p, op, e1, e2, fmt_str='%s %s %s'):
271 |     lp, rp = binaries['=' if op.endswith('=') else op]
272 |     return wrap(p, lp, fmt_str % (c_exp(e1, lp), op, c_exp(e2, rp)))
273 | 


--------------------------------------------------------------------------------
/eg_itsy/c_prelude.h:
--------------------------------------------------------------------------------
  1 | // Standard prelude for Itsy code translated to C.
  2 | 
  3 | #include <assert.h>
  4 | #include <ctype.h>
  5 | #include <errno.h>
  6 | #include <limits.h>
  7 | #include <math.h>
  8 | #include <stdarg.h>
  9 | #include <stdbool.h>
 10 | #include <stddef.h>
 11 | #include <stdint.h>
 12 | #include <stdio.h>
 13 | #include <stdlib.h>
 14 | #include <string.h>
 15 | 
 16 | typedef unsigned int uint;
 17 | 
 18 | typedef int8_t  int8;
 19 | typedef int16_t int16;
 20 | typedef int32_t int32;
 21 | typedef int64_t int64;
 22 | 
 23 | typedef uint8_t  uint8;
 24 | typedef uint16_t uint16;
 25 | typedef uint32_t uint32;
 26 | typedef uint64_t uint64;
 27 | 
 28 | typedef float  float32;
 29 | typedef double float64;
 30 | 
 31 | // Definitions from Ion by Per Vognsen. I need these for the Bitwise
 32 | // homework for now. In the longer term we'll presumably want a more
 33 | // flexible way to bring in external headers/libraries.
 34 | 
 35 | #define MAX(x, y) ((x) >= (y) ? (x) : (y))
 36 | 
 37 | static void *xrealloc(void *ptr, size_t num_bytes) {
 38 |     ptr = realloc(ptr, num_bytes);
 39 |     if (!ptr) {
 40 |         perror("xrealloc failed");
 41 |         exit(1);
 42 |     }
 43 |     return ptr;
 44 | }
 45 | 
 46 | static void *xmalloc(size_t num_bytes) {
 47 |     void *ptr = malloc(num_bytes);
 48 |     if (!ptr) {
 49 |         perror("xmalloc failed");
 50 |         exit(1);
 51 |     }
 52 |     return ptr;
 53 | }
 54 | 
 55 | static void fatal(const char *fmt, ...) {
 56 |     va_list args;
 57 |     va_start(args, fmt);
 58 |     printf("FATAL: ");
 59 |     vprintf(fmt, args);
 60 |     printf("\n");
 61 |     va_end(args);
 62 |     exit(1);
 63 | }
 64 | 
 65 | // Stretchy buffers, invented (?) by Sean Barrett
 66 | 
 67 | typedef struct BufHdr {
 68 |     size_t len;
 69 |     size_t cap;
 70 |     char buf[];
 71 | } BufHdr;
 72 | 
 73 | #define buf__hdr(b) ((BufHdr *)((char *)(b) - offsetof(BufHdr, buf)))
 74 | 
 75 | #define buf_len(b) ((b) ? buf__hdr(b)->len : 0)
 76 | #define buf_cap(b) ((b) ? buf__hdr(b)->cap : 0)
 77 | #define buf_end(b) ((b) + buf_len(b))
 78 | #define buf_sizeof(b) ((b) ? buf_len(b)*sizeof(*b) : 0)
 79 | 
 80 | #define buf_free(b) ((b) ? (free(buf__hdr(b)), (b) = NULL) : 0)
 81 | #define buf_fit(b, n) ((n) <= buf_cap(b) ? 0 : ((b) = buf__grow((b), (n), sizeof(*(b)))))
 82 | #define buf_push(b, ...) (buf_fit((b), 1 + buf_len(b)), (b)[buf__hdr(b)->len++] = (__VA_ARGS__))
 83 | 
 84 | static void *buf__grow(const void *buf, size_t new_len, size_t elem_size) {
 85 |     assert(buf_cap(buf) <= (SIZE_MAX - 1)/2);
 86 |     size_t new_cap = MAX(16, MAX(1 + 2*buf_cap(buf), new_len));
 87 |     assert(new_len <= new_cap);
 88 |     assert(new_cap <= (SIZE_MAX - offsetof(BufHdr, buf))/elem_size);
 89 |     size_t new_size = offsetof(BufHdr, buf) + new_cap*elem_size;
 90 |     BufHdr *new_hdr;
 91 |     if (buf) {
 92 |         new_hdr = xrealloc(buf__hdr(buf), new_size);
 93 |     } else {
 94 |         new_hdr = xmalloc(new_size);
 95 |         new_hdr->len = 0;
 96 |     }   
 97 |     new_hdr->cap = new_cap;
 98 |     return new_hdr->buf;
 99 | }
100 | 
101 | // End of prelude.
102 | 


--------------------------------------------------------------------------------
/eg_itsy/complainer.py:
--------------------------------------------------------------------------------
 1 | """
 2 | Format parse errors with a vaguely-friendly display of the position.
 3 | TODO Parson itself ought to help with something like this
 4 | TODO Position info needs to be a range, not a single coordinate.
 5 |      Or at least point to the '+' in 'a+a' instead of to the 'a'.
 6 | """
 7 | 
 8 | from structs import Struct
 9 | import sys
10 | 
11 | status_ok, status_error = range(2)
12 | 
13 | class Complainer(object):
14 | 
15 |     def __init__(self, text, filename):
16 |         self.text = text
17 |         self.filename = filename
18 |         self.status = status_ok
19 | 
20 |     def ok(self):
21 |         return self.status == status_ok
22 | 
23 |     def syntax_error(self, exc):
24 |         self.complain(exc.failure, "Syntax error")
25 | 
26 |     def semantic_error(self, plaint, pos):
27 |         self.complain((self.text[:pos], self.text[pos:]), plaint)
28 | 
29 |     def complain(self, (before, after), plaint):
30 |         self.status = status_error
31 |         line_no = before.count('\n')
32 |         prefix = (before+'\n').splitlines()[line_no]
33 |         suffix = (after+'\n').splitlines()[0] # XXX what if right on newline?
34 |         prefix, suffix = sanitize(prefix), sanitize(suffix)
35 |         message = ["%s:%d:%d: %s" % (self.filename, line_no+1, len(prefix), plaint),
36 |                    '  ' + prefix + suffix,
37 |                    '  ' + ' '*len(prefix) + '^']
38 |         sys.stderr.write('\n'.join(message) + '\n')
39 | 
40 | def sanitize(s):
41 |     "Make s predictably printable, sans control characters like tab."
42 |     return ''.join(c if ' ' <= c < chr(127) else ' ' # XXX crude
43 |                    for c in s)
44 | 


--------------------------------------------------------------------------------
/eg_itsy/eg/examples.itsy:
--------------------------------------------------------------------------------
 1 | let a: int = 5;
 2 | 
 3 | to f(x: int): int {
 4 |     return x * x;
 5 | }
 6 | 
 7 | to fib(n: int): int {
 8 |     return 1 if n < 2 else fib(n-1) + fib(n-2);
 9 | }
10 | 
11 | to fact(n: int): int {
12 |     let p: int = 1;
13 |     for ; 0 < n; --n {
14 |         p *= n;
15 |     }
16 |     return p;
17 | }
18 | 
19 | to asm_store_field(address: Address, L, R: int, cell: Cell): void {
20 |     assert(address < memory_size);
21 |     if VERBOSE {
22 |         let temp: [12]int;
23 |         unparse_cell(temp, cell);
24 |         printf("%4o(%u,%u): %s\n", address, L, R, temp);
25 |     }
26 |     memory[address] = set_field(cell, make_field_spec(L, R), memory[address]);
27 | }
28 | 
29 | to make16(a, b: uint8): int {
30 |    return a:int + b:int << 8;
31 | }
32 | 
33 | to foo() {
34 |     do {
35 |         if a == sizeof:int { continue; }
36 |         else if b { break; }
37 |     } while x-->0;
38 | }
39 | 
40 | struct Closure {
41 |    f: ^(^Closure, float64) void;
42 |    free_var1, free_var2: int;
43 | }
44 | 


--------------------------------------------------------------------------------
/eg_itsy/eg/regex.itsy:
--------------------------------------------------------------------------------
  1 | // star_thompsonlike_lowlevel.py ported to C ported to itsy
  2 | // TODO explicit clarity on signed vs. unsigned
  3 | 
  4 | enum { loud = 0 }
  5 | 
  6 | to error(plaint: ^char) {
  7 |     fprintf(stderr, "%s\n", plaint);
  8 |     exit(1);
  9 | }
 10 | 
 11 | enum { max_insns = 8192, accept = 0 }
 12 | enum { op_accept, op_eat, op_fork, op_loop, }
 13 | 
 14 | let ninsns:       int;
 15 | let accepts, ops: [max_insns]uint8;
 16 | let arg1, arg2:   [max_insns]int16;
 17 | 
 18 | let names: [4]^char = [   // TODO leave out the 4
 19 |     "win", "eat", "fork", "loop",
 20 | ];
 21 | 
 22 | to dump1(pc: int) {
 23 |     printf("%c %2u: %-4s ", '*' if accepts[pc] else ' ', pc, names[ops[pc]]);
 24 |     printf("\n" if pc == accept else "'%c' %d\n" if ops[pc] == op_eat else "%d %d\n",
 25 |            arg1[pc], arg2[pc]);
 26 | }
 27 | 
 28 | to dump() {
 29 |     let pc: int;
 30 |     for pc = ninsns-1; 0 <= pc; --pc {
 31 |         dump1(pc);
 32 |     }
 33 | }
 34 | 
 35 | let occupied: [max_insns]uint8;
 36 | 
 37 | to after(ch: char, start, end: int, next_states: ^^int) {
 38 |     while start != end {
 39 |         let r: int = arg1[start];
 40 |         let s: int = arg2[start];
 41 |         match ops[start] {
 42 |             on op_eat {
 43 |                 if r == ch && !occupied[s] {
 44 |                     next_states^++^ = s;
 45 |                     occupied[s] = 1;
 46 |                 }
 47 |                 return;
 48 |             }
 49 |             on op_fork {
 50 |                 after(ch, r, end, next_states);
 51 |                 start = s;
 52 |             }
 53 |             on op_loop {
 54 |                 after(ch, r, start, next_states);
 55 |                 start = s;
 56 |             }
 57 |             else {
 58 |                 error("Can't happen");
 59 |             }
 60 |         }
 61 |     }
 62 | }
 63 | 
 64 | let states0, states1: [max_insns]int;
 65 | 
 66 | to run(start: int, input: ^char): int {
 67 |     if accepts[start] {
 68 |         return 1;
 69 |     }
 70 |     let cur_start, cur_end, next_start, next_end: ^int;
 71 |     cur_start = states0,  cur_end = cur_start;
 72 |     next_start = states1, next_end = next_start;
 73 |     cur_end++^ = start;
 74 |     memset(occupied, 0, ninsns); // N.B. we could avoid this by always
 75 |                                  // finishing the next_start..next_end
 76 |                                  // loop below
 77 | 
 78 |     for ; input^; ++input {
 79 |         let state: ^int;
 80 |         for state = cur_start; state < cur_end; ++state {
 81 |             after(input^, state^, accept, &next_end);
 82 |         }
 83 |         for state = next_start; state < next_end; ++state {
 84 |             if accepts[state^] {
 85 |                 return 1;
 86 |             }
 87 |             occupied[state^] = 0;
 88 |         }
 89 |         let t: ^int = cur_start;
 90 |         cur_start = next_start, cur_end = next_end;
 91 |         next_start = next_end = t;
 92 |     }
 93 |     return 0;
 94 | }
 95 | 
 96 | to emit(op: uint8, r, s: int, accepting: uint8): int {
 97 |     if max_insns <= ninsns { error("Pattern too long"); }
 98 |     ops[ninsns] = op, arg1[ninsns] = r, arg2[ninsns] = s;
 99 |     accepts[ninsns] = accepting;
100 |     return ninsns++;
101 | }
102 | 
103 | // start, current parsing position
104 | let pattern, pp: ^char;
105 | 
106 | to eat(c: char): int {
107 |     return (--pp, 1) if pattern < pp && pp[-1] == c else 0;
108 | }
109 | 
110 | to parsing(precedence, state: int): int {
111 |     let rhs: int;
112 |     if pattern == pp || pp[-1] == '(' || pp[-1] == '|' {
113 |         rhs = state;
114 |     }
115 |     else if eat(')') {
116 |         rhs = parsing(0, state);
117 |         if !eat('(') { error("Mismatched ')'"); }
118 |     }
119 |     else if eat('*') {
120 |         rhs = emit(op_loop, 0, state, accepts[state]); // (0 is a placeholder...
121 |         arg1[rhs] = parsing(6, rhs);                   // ...filled in here.)
122 |     }
123 |     else {
124 |         rhs = emit(op_eat, (--pp)^, state, 0);
125 |     }
126 |     while pattern < pp && pp[-1] != '(' {
127 |         let prec: int = 3 if pp[-1] == '|' else 5;
128 |         if prec <= precedence { break; }
129 |         if eat('|') {
130 |             let rhs2: int = parsing(prec, state);
131 |             rhs = emit(op_fork, rhs, rhs2, accepts[rhs] || accepts[rhs2]);
132 |         }
133 |         else {
134 |             rhs = parsing(prec, rhs);
135 |         }
136 |     }
137 |     return rhs;
138 | }
139 | 
140 | to parse(string: ^char): int {
141 |     pattern = string; pp = pattern + strlen(pattern);
142 |     ninsns = 0;
143 |     let state: int = parsing(0, emit(op_accept, 0, 0, 1));
144 |     if pattern != pp { error("Bad pattern"); }
145 |     return state;
146 | }
147 | 
148 | to main(argc: int, argv: ^^char): int {
149 |     if argc != 2 { error("Usage: grep pattern"); }
150 |     let start_state: int = parse(argv[1]);
151 |     if loud {
152 |         printf("start: %u\n", start_state);
153 |         dump();
154 |     }
155 |     let matched: int = 0;
156 |     let line: [9999]char;
157 |     while fgets(line, sizeof line, stdin) {
158 |         if run(start_state, line) {
159 |             fputs(line, stdout);
160 |             matched = 1;
161 |         }
162 |     }
163 |     return !matched;
164 | }
165 | 


--------------------------------------------------------------------------------
/eg_itsy/eg/sieve.itsy:
--------------------------------------------------------------------------------
 1 | // Sieve of Eratosthenes benchmark.
 2 | 
 3 | enum { SIZE = 8190 }
 4 | 
 5 | let flags: [SIZE+1] bool;
 6 | 
 7 | to main(): int {
 8 |     printf("10 iterations\n");
 9 |     let count: int;
10 |     let iter: int;
11 |     for iter = 1; iter <= 10; ++iter {
12 |         count = 0;
13 |         let i: int;
14 |         for i = 0; i <= SIZE; ++i { flags[i] = true; }
15 |         for i = 0; i <= SIZE; ++i {
16 |             if flags[i] {
17 |                 let prime: int = i + i + 3;
18 |                 let k: int;
19 |                 for k = i + prime; k <= SIZE; k += prime {
20 |                     flags[k] = false;
21 |                 }
22 |                 ++count;
23 |             }
24 |         }
25 |     }
26 |     printf("%d primes\n", count);
27 |     return 0;
28 | }
29 | 


--------------------------------------------------------------------------------
/eg_itsy/eg/superopt.itsy:
--------------------------------------------------------------------------------
  1 | // Ported from my superbench repo
  2 | 
  3 | enum { max_wires = 20 }
  4 | enum { max_inputs = 5 }
  5 | 
  6 | typedef Word = uint;
  7 | 
  8 | let argv0: ^char = "";
  9 | 
 10 | to error(plaint: ^char) {
 11 |     fprintf(stderr, "%s: %s\n", argv0, plaint);
 12 |     exit(1);
 13 | }
 14 | 
 15 | let target_output: Word;
 16 | 
 17 | let ninputs: int;
 18 | let mask: Word;
 19 | 
 20 | let found: bool = false;
 21 | let nwires: int;
 22 | let wires: [max_wires]Word;
 23 | let linputs, rinputs: [max_wires]int;
 24 | 
 25 | to vname(w: int): char {
 26 |     return w + ('A' if w < ninputs else 'a');
 27 | }
 28 | 
 29 | to print_circuit() {
 30 |     let w: int;
 31 |     for w = ninputs; w < nwires; ++w {
 32 |         printf("%s%c = ~(%c %c)",
 33 |                "" if w == ninputs else "; ",
 34 |                vname(w), vname(linputs[w]), vname(rinputs[w]));
 35 |     }
 36 |     printf("\n");
 37 | }
 38 | 
 39 | to compute(left_input, right_input: Word): Word {
 40 |     return ~(left_input & right_input);    // A ~& operator would be nice.
 41 | }
 42 | 
 43 | to sweeping(w: int) {
 44 |     let ll: int;
 45 |     for ll = 0; ll < w; ++ll {
 46 |         let llwire: Word = wires[ll];
 47 |         linputs[w] = ll;
 48 |         if w+1 == nwires {
 49 |             let rr: int;
 50 |             for rr = 0; rr <= ll; ++rr {
 51 |                 if mask & compute(llwire, wires[rr]) == target_output {  // N.B. & precedence
 52 |                     found = true;
 53 |                     rinputs[w] = rr;
 54 |                     print_circuit();
 55 |                 }
 56 |             }
 57 |         }
 58 |         else {
 59 |             let rr: int;
 60 |             for rr = 0; rr <= ll; ++rr {
 61 |                 wires[w] = compute(llwire, wires[rr]);
 62 |                 rinputs[w] = rr;
 63 |                 sweeping(w + 1);
 64 |             }
 65 |         }
 66 |     }
 67 | }
 68 | 
 69 | to tabulate_inputs() {
 70 |     let i: int;
 71 |     for i = 1; i <= ninputs; ++i {
 72 |         let shift: Word = 1 << (i-1);
 73 |         wires[ninputs-i] = (1 << shift) - 1;   // XXX 1u      // N.B. could leave out parens
 74 |         let j: int;
 75 |         for j = ninputs-i+1; j < ninputs; ++j {
 76 |             wires[j] |= wires[j] << shift;
 77 |         }
 78 |     }
 79 | }
 80 | 
 81 | to find_circuits(max_gates: int) {
 82 |     mask = (1 << (1 << ninputs)) - 1;
 83 |     tabulate_inputs();
 84 |     printf("Trying 0 gates...\n");
 85 |     if target_output == 0 || target_output == mask {
 86 |         printf("%c = %d\n", vname(ninputs), target_output & 1);
 87 |         return;
 88 |     }
 89 |     let w: int;
 90 |     for w = 0; w < ninputs; ++w {
 91 |         if target_output == wires[w] {
 92 |             printf("%c = %c\n", vname(ninputs), vname(w));
 93 |             return;
 94 |         }
 95 |     }
 96 |     let ngates: int;
 97 |     for ngates = 1; ngates <= max_gates; ++ngates {
 98 |         printf("Trying %d gates...\n", ngates);
 99 |         nwires = ninputs + ngates;
100 |         assert(nwires <= 26); // vnames must be letters
101 |         sweeping(ninputs);
102 |         if found { return; }
103 |     }
104 | }
105 | 
106 | to parse_uint(s: ^char, base: uint): uint {
107 |     let end: ^char;
108 |     let u: uint64 = strtoul(s, &end, base);
109 |     if u == 0 && errno == EINVAL {
110 |         error(strerror(errno));
111 |     }
112 |     if end^ != '\0' {
113 |         error("Literal has crud in it, or extra spaces, or something");
114 |     }
115 |     return u:uint;
116 | }
117 | 
118 | to superopt(tt_output: ^char, max_gates: int) {
119 |     ninputs = log2(strlen(tt_output)): int;
120 |     if 1 << ninputs != strlen(tt_output) {
121 |         error("truth_table_output must have a power-of-2 size");
122 |     }
123 |     if max_inputs < ninputs {
124 |         error("Truth table too big. I can't represent so many inputs.");
125 |     }
126 |     target_output = parse_uint(tt_output, 2);
127 |     find_circuits(max_gates);
128 | }
129 | 
130 | to main(argc: int, argv: ^^char): int {
131 |     argv0 = argv[0];
132 |     assert((1 << (1 << max_inputs)) - 1 <= UINT_MAX);
133 |     if argc != 3 {
134 |         error("Usage: circuitoptimizer truth_table_output max_gates");
135 |     }
136 |     superopt(argv[1], parse_uint(argv[2], 10): int);
137 |     return 0;
138 | }
139 | 


--------------------------------------------------------------------------------
/eg_itsy/eg/um.itsy:
--------------------------------------------------------------------------------
  1 | // Interpreter for the "Universal Machine"
  2 | // documented at http://boundvariable.org/task.shtml
  3 | 
  4 | // Compile-time options:
  5 | 
  6 | // Turn off safety checks; assume the UM image is neither incorrect
  7 | // nor malevolent.
  8 | enum { trusting = 0 }
  9 | 
 10 | // Max # of arrays that can be active at once.
 11 | enum { max_arrays = 8 * 1024 * 1024 }
 12 | 
 13 | 
 14 | // Standard helper functions
 15 | 
 16 | to panic(complaint: ^char) {
 17 |     fprintf(stderr, "%s\n", complaint);
 18 |     exit(1);
 19 | }
 20 | 
 21 | to allot(size: size_t): ^void {
 22 |     let r: ^void = malloc(size);
 23 |     if NULL == r && 0 < size { panic(strerror(errno)); }
 24 |     return r;
 25 | }
 26 | 
 27 | to open_file(filename: ^char, mode: ^char): ^FILE {
 28 |     if 0 == strcmp("-", filename) {
 29 |         return stdin if 'r' == mode[0] else stdout;
 30 |     }
 31 |     let r: ^FILE = fopen(filename, mode);
 32 |     if NULL == r { panic(strerror(errno)); }
 33 |     return r;
 34 | }
 35 | 
 36 | 
 37 | // UM state
 38 | 
 39 | typedef u8  = uint8;        // maybe these should be the real Itsy names
 40 | typedef u32 = uint32;
 41 | 
 42 | typedef Platter = u32;
 43 | 
 44 | let r: [8]Platter;              // The registers
 45 | 
 46 | struct Array {
 47 |     size: u32;
 48 |     _: ^Platter;
 49 | }
 50 | 
 51 | to make_array(nplatters: u32): Array {
 52 |     return [nplatters, allot(nplatters * sizeof:^Platter)]: Array;  // the cast eventually shouldn't be needed
 53 | }
 54 | 
 55 | to fetch(a: Array, i: u32): Platter {
 56 |     if !trusting && a.size <= i { panic("Fetch out of bounds"); }
 57 |     return a._[i];
 58 | }
 59 | 
 60 | to store(a: Array, i: u32, p: Platter) {
 61 |     if !trusting && a.size <= i { panic("Store out of bounds"); }
 62 |     a._[i] = p;
 63 | }
 64 | 
 65 | let first_free: uint;
 66 | let next_free:  [max_arrays]uint; // 0 value means active
 67 | let arrays:     [max_arrays]Array;
 68 | 
 69 | to set_up_free_list() {
 70 |     first_free = ~0;
 71 |     let i: uint;
 72 |     for i = max_arrays; 1 < i; --i {
 73 |         next_free[i - 1] = first_free;
 74 |         first_free = i - 1;
 75 |     }
 76 |     next_free[0] = 0;
 77 | }
 78 | 
 79 | to get_array(id: u32): Array {
 80 |     if trusting || (id < max_arrays && 0 == next_free[id]) {
 81 |         return arrays[id];
 82 |     }
 83 |     panic("Bad array");
 84 |     return arrays[0];
 85 | }
 86 | 
 87 | to allocate(size: u32): u32 {
 88 |     let i: uint = first_free;
 89 |     if ~0:uint == i { panic("Out of arrays"); }
 90 |     first_free = next_free[i];
 91 |     next_free[i] = 0;
 92 |     arrays[i] = make_array(size);
 93 |     memset(arrays[i]._, 0, size * sizeof arrays[i]._[0]);
 94 |     return i;
 95 | }
 96 | 
 97 | to abandon(id: u32) {
 98 |     let a: Array = get_array(id);
 99 |     free(a._);
100 |     next_free[id] = first_free;
101 |     first_free = id;
102 |     if !trusting && 0 == id { panic("Abandoned 0"); }
103 | }
104 | 
105 | to duplicate(id: u32) {
106 |     if 0 != id {
107 |         let a: Array = get_array(id);
108 |         free(arrays[0]._);
109 |         arrays[0] = make_array(a.size);
110 |         memcpy(arrays[0]._, a._, a.size * sizeof a._[0]);
111 |     }
112 | }
113 | 
114 | enum Opcodes {
115 |     cond_move,
116 |     array_index,
117 |     array_amend,
118 |     add,
119 |     mult,
120 |     division,
121 |     not_and,
122 |     halt,
123 |     alloc,
124 |     abandonment,
125 |     output,
126 |     input,
127 |     load_program,
128 |     orthography,
129 | }
130 | 
131 | to spin_cycle() {
132 |     let finger: u32 = 0;
133 | 
134 |     for ;; {
135 |         let insn: Platter = fetch(arrays[0], finger);
136 |         ++finger;
137 | 
138 |         // These unfortunately for speed are not quite always used.
139 |         // In the C version they were macros:
140 |         let a: uint = 7 & (insn >> 6);
141 |         let b: uint = 7 & (insn >> 3);
142 |         let c: uint = 7 & (insn >> 0);
143 | 
144 |         match insn >> 28 {
145 |             on cond_move {
146 |                 if 0 != r[c] {
147 |                     r[a] = r[b];
148 |                 }
149 |             }
150 |             on array_index {
151 |                 r[a] = fetch(get_array(r[b]), r[c]);
152 |             }
153 |             on array_amend {
154 |                 store(get_array(r[a]), r[b], r[c]);
155 |             }
156 |             on add {
157 |                 r[a] = r[b] + r[c];
158 |             }
159 |             on mult {
160 |                 r[a] = r[b] * r[c];
161 |             }
162 |             on division {
163 |                 r[a] = r[b] / r[c];
164 |             }
165 |             on not_and {
166 |                 r[a] = ~(r[b] & r[c]);
167 |             }
168 |             on halt {
169 |                 return;
170 |             }
171 |             on alloc {
172 |                 r[b] = allocate(r[c]);
173 |             }
174 |             on abandonment {
175 |                 abandon(r[c]);
176 |             }
177 |             on output {
178 |                 putchar(r[c]);
179 |             }
180 |             on input {
181 |                 fflush(stdout);
182 |                 let ch: int = getchar();
183 |                 r[c] = ~0 if EOF == ch else 0xff & ch;
184 |             }
185 |             on load_program {
186 |                 duplicate(r[b]);
187 |                 finger = r[c];
188 |             }
189 |             on orthography {
190 |                 let a1: uint = 7 & (insn >> (32 - 7));
191 |                 r[a1] = insn & 0x01ffffff;
192 |             }
193 |             else {
194 |                 panic("Unknown instruction");
195 |             }
196 |         }
197 |     }
198 | }
199 | 
200 | 
201 | // Loading and running the image
202 | 
203 | to make_platter(a, b, c, d: u8): u32 {
204 |     return a<<24 | b<<16 | c<<8 | d;
205 | }
206 | 
207 | to file_size_and_rewind(f: ^FILE): size_t {
208 |     fseek(f, 0, SEEK_END);
209 |     let rc: long = ftell(f);
210 |     rewind(f);
211 |     return rc: size_t;
212 | }
213 | 
214 | to read_zero(f: ^FILE) {
215 |     let size: size_t = file_size_and_rewind(f);
216 |     assert(0 == size % 4);
217 |     arrays[0] = make_array(size / 4);
218 |     let i: u32 = 0;
219 |     for i = 0; i < size / 4; ++i {
220 | 	let a: int = fgetc(f);
221 | 	let b: int = fgetc(f);
222 | 	let c: int = fgetc(f);
223 | 	let d: int = fgetc(f);
224 | 	assert(EOF != a && EOF != b && EOF != c && EOF != d);
225 | 	arrays[0]._[i] = make_platter(a, b, c, d);
226 |     }
227 |     assert(EOF == fgetc(f));
228 | }
229 | 
230 | to main(argc: int, argv: ^^char): int {
231 |     if 2 != argc { panic("Usage: vm filename"); }
232 |     set_up_free_list();
233 |     let f: ^FILE = open_file(argv[1], "rb");
234 |     read_zero(f);
235 |     fclose(f);
236 |     spin_cycle();
237 |     return 0;
238 | }
239 | 


--------------------------------------------------------------------------------
/eg_itsy/error_tests/bad.itsy:
--------------------------------------------------------------------------------
1 | // Not actual Itsy code; for testing compiler error handling.
2 | 
3 | to main(argc: int, argv: ^^char): int {
4 |     gooble blarg printf("Hello, sailor!\n");
5 |     return 0;
6 | }
7 | 


--------------------------------------------------------------------------------
/eg_itsy/error_tests/bad2.itsy:
--------------------------------------------------------------------------------
1 | // Not actual Itsy code; for testing compiler error handling.
2 | 
3 | to main(argc: int, argv: ^^char): int {
4 |     printf("Hello, sailor!\n");
5 |     let a, b: int = [1, 2];     // Illegal
6 |     return 0;
7 | }
8 | 


--------------------------------------------------------------------------------
/eg_itsy/error_tests/lvalues.itsy:
--------------------------------------------------------------------------------
1 | to f() {
2 |     let a: ^int = NULL;
3 |     (a + a) = 42;
4 | }
5 | 


--------------------------------------------------------------------------------
/eg_itsy/grammar:
--------------------------------------------------------------------------------
  1 | # Something like C but re-syntaxed.
  2 | 
  3 | top
  4 |         : '' declaration* :end.
  5 | 
  6 | 
  7 | # Declarations
  8 | 
  9 | declaration
 10 |         : function_definition
 11 |         | decl
 12 |         .
 13 | 
 14 | function_definition: :position (
 15 |           "to" id 
 16 |             [:position '(' [param_decl**',' :chain] ')' [':' type | :position :Void] :Signature]
 17 |             block  :To
 18 |         ).
 19 | 
 20 | param_decl
 21 |         : id++',' :hug ':' type  :spread_params
 22 |         .
 23 | 
 24 | decl
 25 |         : type_decl
 26 |         | var_decl
 27 |         .
 28 | 
 29 | type_decl: :position (
 30 |           "typedef" id '=' type ';'              :Typedef
 31 |         | "enum" (id | :None) '{'
 32 |               [enumerator**',' :hug] ','?
 33 |           '}'                                    :Enum
 34 |         | "struct" :'struct' id '{' [field* :chain] '}'  :Record
 35 |         | "union"  :'union'  id '{' [field* :chain] '}'  :Record
 36 |         ).
 37 | 
 38 | field
 39 |         : param_decl ';'
 40 |         .
 41 | 
 42 | enumerator
 43 |         : id ('=' elem_exp | :None) :hug
 44 |         .
 45 | 
 46 | var_decl: :position (
 47 |           "let" [id++',' :hug] ':' type ('=' elem_exp | :None) ';'  :Let
 48 |         ).
 49 | 
 50 | 
 51 | # Types
 52 | 
 53 | type:   :position (
 54 |           '^' type                                  :Pointer
 55 |         | '[' (exp | :None) ']' type                :Array
 56 |         | '(' [[type :'' :hug]**',' :hug] ')' type  :Signature
 57 |         | "void"                                    :Void
 58 |         | id                                        :Type_name
 59 |         ).
 60 | 
 61 | 
 62 | # Statements
 63 | 
 64 | block:  :position (
 65 |           '{' [(decl | statement)* :hug] '}'  :Block
 66 |         ).
 67 | 
 68 | statement
 69 |         : block
 70 |         | if_stmt
 71 |         | :position (
 72 |             "while" exp block                       :While
 73 |           | "do" block "while" exp ';'              :Do
 74 |           | "for" opt_exp ';' opt_exp ';' opt_exp block
 75 |                                                     :For
 76 |           | "continue" ';'                          :Continue
 77 |           | "break" ';'                             :Break
 78 |           | "return" opt_exp ';'                    :Return
 79 |           | "match" exp '{' [case* :hug] '}'        :Switch
 80 |           | opt_exp ';'                             :Exp
 81 |           )
 82 |         .
 83 | opt_exp : exp | :None.
 84 | 
 85 | if_stmt : :position (
 86 |             "if" exp block ( "else" (if_stmt | block)
 87 |                            | :None                   ) :If_stmt
 88 |         ).
 89 | 
 90 | case:   :position (
 91 |           "on" [elem_exp++',' :hug] block  :Case
 92 |         | "else"                    block  :Default
 93 |         ).
 94 | 
 95 | 
 96 | # Expressions
 97 | 
 98 | exp
 99 |         : assignment_exp (',' assignment_exp :Seq)*
100 |         .
101 | 
102 | elem_exp = assignment_exp.   # "no comma" expression.
103 | 
104 | assignment_exp
105 |         : if_exp (assignment_operator assignment_exp :Assign)?
106 |         .
107 | 
108 | assignment_operator
109 |         : '='   :None
110 |         | '*='  :'*'
111 |         | '/='  :'/'
112 |         | '%='  :'%'
113 |         | '+='  :'+'
114 |         | '-='  :'-'
115 |         | '<<=' :'<<'
116 |         | '>>=' :'>>'
117 |         | '&='  :'&'
118 |         | '@='  :'@'
119 |         | '|='  :'|'
120 |         .
121 | 
122 | if_exp  : logical_or_exp ("if" logical_or_exp "else" if_exp  :If_exp)?.
123 | 
124 | logical_or_exp
125 |         : logical_and_exp ('||' logical_and_exp :Or)*
126 |         .
127 | 
128 | logical_and_exp
129 |         : exp3 ('&&' logical_and_exp :And)*
130 |         .
131 | 
132 | exp3
133 |         : exp4 ( '=='     :'==' exp4 :Binary_exp
134 |                | '!='     :'!=' exp4 :Binary_exp
135 |                | '<='     :'<=' exp4 :Binary_exp
136 |                | '>='     :'>=' exp4 :Binary_exp
137 |                | '<' !'=' :'<'  exp4 :Binary_exp
138 |                | '>' !'=' :'>'  exp4 :Binary_exp
139 |                )*
140 |         .
141 | 
142 | exp4
143 |         : exp5 ( '+' !/[+=]/ :'+' exp5 :Binary_exp
144 |                | '-' !/[-=]/ :'-' exp5 :Binary_exp
145 |                | '|' !/[|=]/ :'|' exp5 :Binary_exp
146 |                | '@' !'='    :'@' exp5 :Binary_exp
147 |                )*
148 |         .
149 | 
150 | exp5
151 |         : exp6 ( '*'  !'='    :'*'  exp6 :Binary_exp
152 |                | '/' !/[=\/]/ :'/'  exp6 :Binary_exp
153 |                | '%'  !'='    :'%'  exp6 :Binary_exp
154 |                | '&'  !/[&=]/ :'&'  exp6 :Binary_exp
155 |                | '<<' !'='    :'<<' exp6 :Binary_exp
156 |                | '>>' !'='    :'>>' exp6 :Binary_exp
157 |                )*
158 |         .
159 | 
160 | exp6    : unary_exp ( ':' type       :Cast )*.
161 | 
162 | unary_exp
163 |         : :position (
164 |             '++' unary_exp     :'++' :Pre_incr
165 |           | '--' unary_exp     :'--' :Pre_incr
166 |           | '&' !/[&=]/ unary_exp    :Address_of
167 |           | unary_operator unary_exp :Unary_exp
168 |           | "sizeof" ( ':' type      :Sizeof_type
169 |                      | unary_exp     :Sizeof )
170 |           )
171 |         | postfix_exp
172 |         .
173 | 
174 | unary_operator
175 |         : '-' !'-' :'-'
176 |         | '~'      :'~'
177 |         | '!' !'=' :'!'
178 |         .
179 | 
180 | postfix_exp
181 |         : primary_exp
182 |             ( '[' exp ']'                   :Index
183 |             | '(' [elem_exp**',' :hug] ')'  :Call
184 |             | '.' id                        :Dot
185 |             | '^'                           :Deref
186 |             | '++'                    :'++' :Post_incr
187 |             | '--'                    :'--' :Post_incr
188 |             )*
189 |         .
190 | 
191 | primary_exp
192 |         : '(' exp ')'
193 |         | :position (
194 |              id             :Variable
195 |            | integer        :'integer' :Literal
196 |            | string_literal :'string'  :Literal
197 |            | char_literal   :'char'    :Literal
198 |            | '[' [elem_exp**',' :hug] ','? ']' :Compound_exp
199 |            )
200 |         .
201 | 
202 | string_literal ~: /("[^"]*")/ FNORD.   # TODO
203 | char_literal   ~: /('[^']*')/ FNORD.  # TODO
204 | 
205 | FNORD         ~:      whitespace*.
206 | whitespace    ~:      /\s+/ | comment.
207 | comment       ~:      /\/\/.*/.
208 | keyword       ~:      /break|continue|do|else|enum|for|if|let|match|on|return|sizeof|struct|to|typedef|union|void|while/ /\b/.
209 | 
210 | id:                   !keyword /([A-Za-z_][A-Za-z_0-9]*)/.
211 | integer:              /(0x[0-9A-Fa-f]+|\d+)/.    # TODO negative too; and unsigned, etc.
212 | 


--------------------------------------------------------------------------------
/eg_itsy/itsy.py:
--------------------------------------------------------------------------------
 1 | """
 2 | Tie the modules together into a compiler.
 3 | """
 4 | 
 5 | import ast
 6 | from parson import Grammar, Unparsable
 7 | from complainer import Complainer
 8 | from c_emitter import c_emit
 9 | import primitives
10 | import typecheck
11 | import sys
12 | 
13 | with open('grammar') as f:
14 |     grammar_source = f.read()
15 | parser = Grammar(grammar_source).bind(ast)
16 | 
17 | with open('c_prelude.h') as f:
18 |     c_prelude = f.read()
19 | 
20 | 
21 | def main(argv):
22 |     assert 2 <= len(argv), "usage: %s source_file.itsy [output_file.c]" % argv[0]
23 |     return to_c_main(*argv[1:])
24 | 
25 | def to_c_main(filename, out_filename=None):
26 |     if out_filename is None:
27 |         out_filename = filename[:-5] + filename[-5:].replace('.itsy', '') + '.c'
28 |     with open(filename) as f:
29 |         text = f.read()
30 |     opt_c = c_from_itsy(Complainer(text, filename))
31 |     if opt_c is None:
32 |         return 1
33 |     with open(out_filename, 'w') as f:
34 |         f.write(opt_c)
35 |     return 0
36 | 
37 | def c_from_itsy(complainer):
38 |     try:
39 |         defs = parser.top(complainer.text)
40 |     except Unparsable as exc:
41 |         complainer.syntax_error(exc)
42 |         return None
43 |     typecheck.check(defs, primitives.prims, complainer)
44 |     if not complainer.ok():
45 |         return None
46 |     c_defs = map(c_emit, defs)
47 |     return c_prelude + '\n' + '\n\n'.join(c_defs) + '\n'
48 | 
49 | 
50 | if __name__ == '__main__':
51 |     sys.exit(main(sys.argv))
52 | 


--------------------------------------------------------------------------------
/eg_itsy/primitives.py:
--------------------------------------------------------------------------------
 1 | """
 2 | Built-in global definitions
 3 | """
 4 | 
 5 | from ast import Int_type, Float_type
 6 | 
 7 | prims = {}
 8 | 
 9 | prims['int8']    = Int_type(1, 'i')
10 | prims['int16']   = Int_type(2, 'i')
11 | prims['int32']   = Int_type(4, 'i')
12 | prims['int64']   = Int_type(8, 'i')
13 | 
14 | prims['uint8']   = Int_type(1, 'u')
15 | prims['uint16']  = Int_type(2, 'u')
16 | prims['uint32']  = Int_type(4, 'u')
17 | prims['uint64']  = Int_type(8, 'u')
18 | 
19 | prims['float32'] = Float_type(4)
20 | prims['float64'] = Float_type(8)
21 | 
22 | # XXX platform-dependent; make configurable or something
23 | # XXX check that these match my C compiler's defs
24 | prims['bool']    = prims['uint8'] 
25 | prims['char']    = prims['int8'] 
26 | prims['int']     = prims['int64'] 
27 | prims['uint']    = prims['uint64']
28 | prims['size_t']  = prims['uint64']
29 | 
30 | #prims['true']    = XXX
31 | 
32 | # TODO: defs from c_prelude.h
33 | 


--------------------------------------------------------------------------------
/eg_itsy/reref.sh:
--------------------------------------------------------------------------------
1 | # Reset the reference outputs to match the current outputs.
2 | 
3 | cd eg
4 | for f in *.c; do
5 |     cp ${f} ${f}.ref
6 | done
7 | 


--------------------------------------------------------------------------------
/eg_itsy/structs.py:
--------------------------------------------------------------------------------
 1 | """
 2 | Define a named-tuple-like type, but simpler.
 3 | Also Visitor to dispatch on datatypes defined this way.
 4 | """
 5 | 
 6 | # TODO figure out how to use __slots__
 7 | 
 8 | def Struct(field_names, name=None, supertype=(object,)):
 9 |     if isinstance(field_names, (str, unicode)):
10 |         field_names = tuple(field_names.split())
11 | 
12 |     if name is None:
13 |         name = 'Struct<%s>' % ','.join(field_names)
14 |         def get_name(self): return self.__class__.__name__
15 |     else:
16 |         def get_name(self): return name
17 | 
18 |     def __init__(self, *args):
19 |         if len(field_names) != len(args):
20 | 	    raise TypeError("%s takes %d arguments (%d given)"
21 |                             % (get_name(self), len(field_names), len(args)))
22 |         self.__dict__.update(zip(field_names, args))
23 | 
24 |     def __repr__(self):
25 |         return '%s(%s)' % (get_name(self), ', '.join(repr(getattr(self, f))
26 |                                                      for f in field_names))
27 | 
28 |     # (for use with pprint)
29 |     def my_as_sexpr(self):         # XXX better name?
30 |         return (get_name(self),) + tuple(as_sexpr(getattr(self, f))
31 |                                          for f in field_names)
32 |     my_as_sexpr.__name__ = 'as_sexpr'
33 | 
34 |     return type(name,
35 |                 supertype,
36 |                 dict(__init__=__init__,
37 |                      __repr__=__repr__,
38 |                      as_sexpr=my_as_sexpr,
39 |                      _meta_fields=field_names))
40 | 
41 | def as_sexpr(obj):
42 |     if hasattr(obj, 'as_sexpr'):
43 |         return getattr(obj, 'as_sexpr')()
44 |     elif isinstance(obj, list):
45 |         return map(as_sexpr, obj)
46 |     elif isinstance(obj, tuple):
47 |         return tuple(map(as_sexpr, obj))
48 |     else:
49 |         return obj
50 | 
51 | 
52 | # Is there a nicer way to do this?
53 | 
54 | class Visitor(object):
55 |     def __call__(self, subject, *args):
56 |         tag = subject.__class__.__name__
57 |         method = getattr(self, tag, None)
58 |         if method is None:
59 |             try:
60 |                 method = getattr(self, 'default')
61 |             except AttributeError:
62 |                 raise AttributeError("%r has no method for %r argument %r" % (self, tag, subject))
63 |         return method(subject, *args)
64 | 


--------------------------------------------------------------------------------
/eg_itsy/testme.sh:
--------------------------------------------------------------------------------
 1 | #!/bin/bash
 2 | # Automated test.
 3 | 
 4 | python -m coverage erase
 5 | 
 6 | for f in error_tests/*.itsy; do
 7 |     echo
 8 |     echo "Should fail:" ${f}
 9 |     if python -m coverage run --source=. -a itsy.py ${f}; then
10 |         echo "Didn't fail!"
11 |     fi
12 | done
13 | 
14 | for f in eg/*.itsy; do
15 |     echo
16 |     echo "To C:" ${f}
17 |     if python -m coverage run --source=. -a itsy.py ${f}; then
18 |         echo -n    # Expected success (btw what's a no-op in bash?)
19 |     else
20 |         echo "Failed!"
21 |     fi
22 |     fc=${f%.*}.c
23 |     if test -f ${fc}.ref; then
24 |         diff -u ${fc}.ref ${fc}
25 |         # TODO raise error at exit if there was a diff
26 |     else
27 |         echo '  (No ref)'
28 |     fi
29 | done
30 | 
31 | echo
32 | echo 'Halping halpme'
33 | python -m coverage run --source=. -a pyhalp.py <halpme.py 
34 | # So halpme didn't get noticed by coverage.py because it was Halp that ran it.
35 | # Just screen it out of the coverage report below.
36 | 
37 | echo
38 | python -m coverage report -m --omit='halpme.py,pyhalp.py,structs.py'
39 | 


--------------------------------------------------------------------------------
/eg_json.py:
--------------------------------------------------------------------------------
 1 | """
 2 | Example: parse JSON.
 3 | """
 4 | 
 5 | from parson import Grammar
 6 | 
 7 | literals = dict(true=True,
 8 |                 false=False,
 9 |                 null=None)
10 | 
11 | # Following http://www.json.org/
12 | json_parse = Grammar(r"""  value :end.
13 | 
14 | object    :  '{' pair ** ',' '}'      :mk_object.
15 | pair      :  string ':' value         :hug.
16 | 
17 | array     :  '[' value ** ',' ']'     :hug.
18 | 
19 | value     :  string | number
20 |           |  object | array
21 |           |  /(true|false|null)\b/    :mk_literal.
22 | 
23 | string   ~:  '"' char* '"' FNORD      :join.
24 | char     ~:  /([^\x00-\x1f"\\])/
25 |           |  /\\(["\/\\])/
26 |           |  /(\\[bfnrt])/            :escape
27 |           |  /(\\u[0-9a-fA-F]{4})/    :escape.
28 | 
29 | number   ~:  { '-'? int frac? exp? } FNORD :float.
30 | int      ~:  '0' !/\d/
31 |           |  /[1-9]\d*/.
32 | frac     ~:  '.' /\d+/.
33 | exp      ~:  /[eE][+-]?\d+/.
34 | 
35 | FNORD    ~:  /\s*/.
36 | """)(mk_literal = literals.get,
37 |      mk_object  = lambda *pairs: dict(pairs),
38 |      escape     = lambda string: string.decode('unicode-escape'))
39 | 
40 | # XXX The spec says "whitespace may be inserted between any pair of
41 | # tokens, but leaves open just what's a token. So is the '-' in '-1' a
42 | # token? Should I allow whitespace there?
43 | 
44 | ## json_parse('[1,1]')
45 | #. ((1.0, 1.0),)
46 | ## json_parse('true')
47 | #. (True,)
48 | ## json_parse.attempt('truetrue')
49 | ## json_parse(r'"hey \b\n \u01ab o hai"')
50 | #. (u'hey \x08\n \u01ab o hai',)
51 | ## json_parse('{"hey": true}')
52 | #. ({'hey': True},)
53 | ## json_parse('[{"hey": true}]')
54 | #. (({'hey': True},),)
55 | ## json_parse('[{"hey": true}, [-12.34]]')
56 | #. (({'hey': True}, (-12.34,)),)
57 | ## json_parse('0')
58 | #. (0.0,)
59 | ## json_parse('0.125e-2')
60 | #. (0.00125,)
61 | ## json_parse('5.12')
62 | #. (5.12,)
63 | ## json_parse('5e3')
64 | #. (5000.0,)
65 | 
66 | ## json_parse.attempt( '0377')
67 | ## json_parse.attempt('{"hi"]')
68 | 
69 | # Udacity CS212 problem 3.1:
70 | 
71 | ## json_parse('["testing", 1, 2, 3]')
72 | #. (('testing', 1.0, 2.0, 3.0),)
73 | 
74 | ## json_parse('-123.456e+789')
75 | #. (-inf,)
76 | 
77 | ## json_parse('{"age": 21, "state":"CO","occupation":"rides the rodeo"}')
78 | #. ({'age': 21.0, 'state': 'CO', 'occupation': 'rides the rodeo'},)
79 | 


--------------------------------------------------------------------------------
/eg_linear_equations.py:
--------------------------------------------------------------------------------
 1 | """
 2 | Systems of linear equations over complex numbers, reimplementing a
 3 | hand-coded parser by Dave Long.
 4 | """
 5 | 
 6 | from parson import Grammar
 7 | 
 8 | grammar = r"""  equations :end.
 9 | 
10 | equations: equation ** ';'.
11 | 
12 | equation:  sum '=' sum                   :hug.
13 | sum:       term ++ '+'                   :sum.
14 | term:      number ('*' variable | :'')   :swap
15 |          | variable [:'1' :complex]      :hug.
16 | variable:  /([a-zA-Z]+)/ .
17 | number:    ( (real ',' :'+')? real {'j'}
18 |            | real )                      :join :complex.
19 | 
20 | real  ~: { '-'? /\d+/ ('.' /\d*/)? } FNORD.
21 | FNORD ~: /\s*/.
22 | """
23 | parse = Grammar(grammar)(sum=lambda *terms: dict((('',0j),) + terms),
24 |                          swap=lambda x,y: (y,x))
25 | 
26 | examples = """\
27 | 		ht = 2j; wd = 1; sw = 0 
28 | 		ht = 2j; 2j*wd = ht; nw = 2j
29 | 		wd = 1; sw = 0; nw = 2j
30 | 		ne = 1,2j; sw = 0; 2j*wd = ht
31 | 		ne = 1,2j; nw = 2j; se = 1
32 | 		0.5*nw + 0.5*sw = 1j; nw = 2j; wd = 1
33 | 		0.5*nw + 0.5*sw = 1j; nw = 2j; wd = -0.5j*ht
34 | """.splitlines()
35 | 
36 | def test():
37 |     for s in examples:
38 |         print s
39 |         for lhs, rhs in parse(s):
40 |             print '  ', lhs, '\t', '=', rhs
41 | 
42 | ## test()
43 | #. 		ht = 2j; wd = 1; sw = 0 
44 | #.    {'': 0j, 'ht': (1+0j)} 	= {'': 2j}
45 | #.    {'': 0j, 'wd': (1+0j)} 	= {'': (1+0j)}
46 | #.    {'': 0j, 'sw': (1+0j)} 	= {'': 0j}
47 | #. 		ht = 2j; 2j*wd = ht; nw = 2j
48 | #.    {'': 0j, 'ht': (1+0j)} 	= {'': 2j}
49 | #.    {'': 0j, 'wd': 2j} 	= {'': 0j, 'ht': (1+0j)}
50 | #.    {'': 0j, 'nw': (1+0j)} 	= {'': 2j}
51 | #. 		wd = 1; sw = 0; nw = 2j
52 | #.    {'': 0j, 'wd': (1+0j)} 	= {'': (1+0j)}
53 | #.    {'': 0j, 'sw': (1+0j)} 	= {'': 0j}
54 | #.    {'': 0j, 'nw': (1+0j)} 	= {'': 2j}
55 | #. 		ne = 1,2j; sw = 0; 2j*wd = ht
56 | #.    {'': 0j, 'ne': (1+0j)} 	= {'': (1+2j)}
57 | #.    {'': 0j, 'sw': (1+0j)} 	= {'': 0j}
58 | #.    {'': 0j, 'wd': 2j} 	= {'': 0j, 'ht': (1+0j)}
59 | #. 		ne = 1,2j; nw = 2j; se = 1
60 | #.    {'': 0j, 'ne': (1+0j)} 	= {'': (1+2j)}
61 | #.    {'': 0j, 'nw': (1+0j)} 	= {'': 2j}
62 | #.    {'': 0j, 'se': (1+0j)} 	= {'': (1+0j)}
63 | #. 		0.5*nw + 0.5*sw = 1j; nw = 2j; wd = 1
64 | #.    {'': 0j, 'sw': (0.5+0j), 'nw': (0.5+0j)} 	= {'': 1j}
65 | #.    {'': 0j, 'nw': (1+0j)} 	= {'': 2j}
66 | #.    {'': 0j, 'wd': (1+0j)} 	= {'': (1+0j)}
67 | #. 		0.5*nw + 0.5*sw = 1j; nw = 2j; wd = -0.5j*ht
68 | #.    {'': 0j, 'sw': (0.5+0j), 'nw': (0.5+0j)} 	= {'': 1j}
69 | #.    {'': 0j, 'nw': (1+0j)} 	= {'': 2j}
70 | #.    {'': 0j, 'wd': (1+0j)} 	= {'': 0j, 'ht': -0.5j}
71 | 


--------------------------------------------------------------------------------
/eg_metapeg.py:
--------------------------------------------------------------------------------
 1 | """
 2 | Useless example grammar for testing.
 3 | """
 4 | 
 5 | from parson import *
 6 | 
 7 | def meta_mk_feed(name):
 8 |     def fn(*args): return '%s(%s)' % (name, ' '.join(map(repr, args)))
 9 |     return label(feed(fn), ':'+name)
10 | def meta_mk_rule_ref(name): return '<'+name+'>'  # XXX
11 | 
12 | def mk_empty(): return empty
13 | 
14 | def fnordify(peg): return peg #+ '<FNORD>'
15 | 
16 | meta_grammar = r""" rule+ :end | anon :end.
17 | 
18 | anon         :  :None :'' [:'' :literal fnordly pe :chain] :hug  ('.' rule*)?.
19 | 
20 | rule         :  name {'~'?} ('='   pe
21 |                             |':'~whitespace [pe :seclude])
22 |                 '.'                        :hug.
23 | 
24 | pe           :  term ('|' pe :either)?
25 |              |                             :mk_empty.
26 | term         :  factor (term :chain)?.
27 | factor       :  '!' factor                 :invert
28 |              |  primary ('**' primary :star
29 |                         |'++' primary :plus
30 |                         |'*' :star
31 |                         |'+' :plus
32 |                         |'?' :maybe)?.
33 | primary      :  '(' pe ')'
34 |              |  '[' pe ']'                 :seclude
35 |              |  '{' pe '}'                 :capture
36 |              |  qstring  :literal fnordly
37 |              |  dqstring :literal fnordly
38 |              |  regex    :match   fnordly
39 |              |  ':'~( name                 :meta_mk_feed
40 |                     | qstring              :push)
41 |              |  name                       :meta_mk_rule_ref.
42 | 
43 | fnordly      =  ('~' | :fnordify).
44 | 
45 | name         :  /([A-Za-z_]\w*)/.
46 | 
47 | FNORD       ~:  whitespace?.
48 | whitespace  ~:  /(?:\s|#.*)+/.
49 | 
50 | qstring     ~:  /'/  quoted_char* /'/ FNORD :join.
51 | dqstring    ~:  '"' dquoted_char* '"' FNORD :join.
52 | regex       ~:  '/'   regex_char* '/' FNORD :join.
53 | 
54 | quoted_char ~:  /\\(.)/ | /([^'])/.
55 | dquoted_char~:  /\\(.)/ | /([^"])/.
56 | regex_char  ~:  /(\\.)/ | /([^\/])/.
57 | """
58 | 
59 | metapeg = Grammar(meta_grammar)(**globals())
60 | ## for k, af, v in metapeg(meta_grammar): print k, af, v
61 | #. None  (literal('') ((('<rule>')+ :end)|('<anon>' :end)))
62 | #. anon  [(:None (push('') ([(push('') (:literal ('<fnordly>' ('<pe>' :chain))))] (:hug ((literal('.') ('<rule>')*))?))))]
63 | #. rule  [('<name>' (capture((literal('~'))?) (((literal('=') '<pe>')|(literal(':') ('<whitespace>' [('<pe>' :seclude)]))) (literal('.') :hug))))]
64 | #. pe  [(('<term>' ((literal('|') ('<pe>' :either)))?)|:mk_empty)]
65 | #. term  [('<factor>' (('<term>' :chain))?)]
66 | #. factor  [((literal('!') ('<factor>' :invert))|('<primary>' (((literal('**') ('<primary>' :star))|((literal('++') ('<primary>' :plus))|((literal('*') :star)|((literal('+') :plus)|(literal('?') :maybe))))))?))]
67 | #. primary  [((literal('(') ('<pe>' literal(')')))|((literal('[') ('<pe>' (literal(']') :seclude)))|((literal('{') ('<pe>' (literal('}') :capture)))|(('<qstring>' (:literal '<fnordly>'))|(('<dqstring>' (:literal '<fnordly>'))|(('<regex>' (:match '<fnordly>'))|((literal(':') (('<name>' :meta_mk_feed)|('<qstring>' :push)))|('<name>' :meta_mk_rule_ref))))))))]
68 | #. fnordly  (literal('~')|:fnordify)
69 | #. name  [/([A-Za-z_]\w*)/]
70 | #. FNORD ~ [('<whitespace>')?]
71 | #. whitespace ~ [/(?:\s|#.*)+/]
72 | #. qstring ~ [(/'/ (('<quoted_char>')* (/'/ ('<FNORD>' :join))))]
73 | #. dqstring ~ [(literal('"') (('<dquoted_char>')* (literal('"') ('<FNORD>' :join))))]
74 | #. regex ~ [(literal('/') (('<regex_char>')* (literal('/') ('<FNORD>' :join))))]
75 | #. quoted_char ~ [(/\\(.)/|/([^'])/)]
76 | #. dquoted_char ~ [(/\\(.)/|/([^"])/)]
77 | #. regex_char ~ [(/(\\.)/|/([^\/])/)]
78 | 


--------------------------------------------------------------------------------
/eg_microses.py:
--------------------------------------------------------------------------------
  1 | """
  2 | Let's try and port
  3 | ~/othergit/quasiParserGenerator/test/microses/microses.es6
  4 | """
  5 | 
  6 | from parson import Grammar
  7 | 
  8 | g = r""" body :end.
  9 | 
 10 | # Exclude "arguments" from IDENT in microses.
 11 | # XXX defnordify
 12 | RESERVED_WORD:
 13 |              KEYWORD | ES6_ONLY_KEYWORD | FUTURE_RESERVED_WORD
 14 |            | "arguments".
 15 | 
 16 | KEYWORD:     "false" | "true"
 17 |            | "break" | "case" | "catch" | "const"
 18 |            | "debugger" | "default" | "delete"
 19 |            | "else" | "export" | "finally"
 20 |            | "for" | "if" | "import"
 21 |            | "return" | "switch" | "throw" | "try"
 22 |            | "typeof" | "void" | "while".
 23 | 
 24 | # We enumerate these anyway, in order to exclude them from the
 25 | # IDENT token.
 26 | ES6_ONLY_KEYWORD:
 27 |              "null" | "class" | "continue" | "do" | "extends"
 28 |            | "function" | "in" | "instanceof" | "new" | "super"
 29 |            | "this" | "var" | "with" | "yield".
 30 | 
 31 | FUTURE_RESERVED_WORD:
 32 |              "enum" | "await"
 33 |            | "implements" | "interface" | "package"
 34 |            | "private" | "protected" | "public".
 35 | 
 36 | 
 37 | primaryExpr: (NUMBER | STRING | {"true" | "false" | "null"})   :Data   # XXX do it my way instead? also, "null" was missing in the original
 38 |            | '[' arg ** ',' ']'                :hug :Array
 39 |            | '{' prop ** ',' '}'               :hug :Object
 40 |            | quasiExpr
 41 |            | '(' expr ')'
 42 |            | IDENT                                  :Variable
 43 |            | HOLE                                   :ExprHole.
 44 | 
 45 | pattern:     (NUMBER | STRING | {"true" | "false" | "null"})   :MatchData # XXX ditto
 46 |            | '[' param ** ',' ']'              :hug :MatchArray
 47 |            | '{' propParam ** ',' '}'          :hug :MatchObject
 48 |            | IDENT                                  :MatchVariable
 49 |            | HOLE                                   :PatternHole.
 50 | 
 51 | arg:         '...' expr                             :Spread
 52 |            | expr.
 53 | 
 54 | param:       '...' pattern                          :Rest
 55 |            | IDENT '=' expr                         :Optional
 56 |            | pattern.
 57 | 
 58 | # No method definition.  XXX why not?
 59 | prop:        '...' expr                             :SpreadObj
 60 |            | key ':' expr                           :Prop
 61 |            | IDENT :None                            :Prop.
 62 | 
 63 | propParam:  '...' pattern                           :RestObj
 64 |            | key ':' pattern                        :MatchProp
 65 |            | IDENT '=' expr                         :OptionalProp
 66 |            | IDENT :None                            :MatchProp.   # ditto
 67 | 
 68 | key:         IDENT | RESERVED_WORD | STRING | NUMBER
 69 |            | '[' expr ']'                           :Computed.
 70 | 
 71 | # XXX look up es6 quasiliteral syntax
 72 | quasiExpr ~: '`' qfill ** qhole '`' FNORD :hug      :Quasi.
 73 | qfill     ~: {(!'`' !'${' :anyone)*}.
 74 | qhole     ~: '${' FNORD expr '}'.
 75 | 
 76 | later:       NO_NEWLINE '!'.
 77 | 
 78 | # No "new", "super", or MetaProperty. Without "new" we don't need
 79 | # separate MemberExpr and CallExpr productions.
 80 | postExpr:    primaryExpr postOp*.
 81 | postOp     = '.' IDENT                              :Get
 82 |            | '[' expr ']'                           :Index
 83 |            | '(' [arg**',' :hug] ')'                :Call
 84 |            | quasiExpr                              :Tag
 85 |            | later ( IDENT                          :GetLater
 86 |                    | '[' expr ']'                   :IndexLater
 87 |                    | '(' [arg**',' :hug] ')'        :CallLater
 88 |                    | quasiExpr                      :TagLater ).
 89 | 
 90 | preExpr:     "delete" fieldExpr                     :Delete
 91 |            | preOp preExpr                          :UnaryOp
 92 |            | postExpr.
 93 | 
 94 | # No prefix or postfix "++" or "--".
 95 | preOp:       { "void" | "typeof" | '+' | '-' | '!' }.  # XXX strip
 96 | 
 97 | # No bitwise operators, "instanceof", or "in".  Unlike ES6, none
 98 | # of the relational operators associate. To help readers, mixing
 99 | # relational operators always requires explicit parens.
100 | multExpr:    preExpr ({'*' | '|' | '%'} preExpr :BinaryOp)*.
101 | addExpr:     multExpr ({'+' | '-'} multExpr :BinaryOp)*.
102 | relExpr:     addExpr (relOp addExpr :BinaryOp)?.
103 | relOp:       { '<=' | '>=' | '<' | '>' | '===' | '!==' }.
104 | andThenExpr: relExpr ('&&' relExpr :AndThen)*.
105 | orElseExpr:  andThenExpr ('||' andThenExpr :OrElse)*.
106 | 
107 | # No trinary ("?:") expression
108 | # No comma expression, so assignment expression is expr.
109 | expr:        lValue assignOp expr                   :Assign
110 |            | arrow
111 |            | orElseExpr.
112 | 
113 | lValue:      fieldExpr | IDENT.  # (out of order in the original)
114 | 
115 | fieldExpr:   primaryExpr
116 |                ( '.' IDENT                          :Get
117 |                | '[' expr ']'                       :Index
118 |                | later ( '.' IDENT                  :GetLater
119 |                        | '[' expr ']'               :IndexLater ) ).
120 | 
121 | assignOp:    {'=' !'='  # (the ! is implicit in the original?)
122 |              | '*=' | '/=' | '%=' | '+=' | '-='}.
123 | 
124 | arrow:       params :hug NO_NEWLINE '=>' ( block    :Arrow
125 |                                          | expr     :Lambda ).
126 | 
127 | params:      IDENT
128 |            | '(' param ** ',' ')'.
129 | 
130 | # No "var", empty statement, "continue", "with", "do/while",
131 | # "for/in", or labelled statement. None of the insane variations
132 | # of "for". Only blocks are accepted for flow-of-control
133 | # statements.
134 | statement:   block
135 |            | "if" '(' expr ')'
136 |                block
137 |                ("else" block | :None)               :If
138 |            | "for" '(' declaration
139 |                       (expr|:None) ';'
140 |                       (expr|:None) ')'
141 |                block                                :For
142 |            | "for" '(' declOp binding "of" expr ')'
143 |                block                                :ForOf
144 |            | "while" '(' expr ')' block             :While
145 |            | "try" block catcher (finalizer|:None)  :Try
146 |            | "try" block :None   finalizer          :Try
147 |            | "switch" '(' expr ')'
148 |                '{' [branch* :hug] '}'               :Switch
149 |            | terminator
150 |            | "debugger" ';'                         :Debugger
151 |            | expr ';'.
152 | 
153 | # Each case branch must end in a terminating statement. No
154 | # labelled break.
155 | terminator:  "return" NO_NEWLINE expr ';'           :Return
156 |            | "return" :None ';'                     :Return
157 |            | "break" ';'                            :Break
158 |            | "throw" expr ';'                       :Throw.
159 | 
160 | # no "function", generator, or "class" declaration.
161 | declaration: declOp [binding ** ',' :hug] ';'       :Decl.
162 | declOp:      {"const"|"let"}.    # XXX must be stripped
163 | # Initializer is mandatory
164 | binding:     pattern '=' expr                       :hug.
165 | 
166 | catcher:     "catch" '(' pattern ')' block          :hug.
167 | finalizer:   "finally" block.
168 | 
169 | branch:      caseLabel+ :hug
170 |                '{' body terminator '}'              :Branch.
171 | caseLabel:   "case" expr ':'                        :Case
172 |            | "default" ':'                          :Default.
173 | 
174 | block:       '{' body '}'                           :Block.
175 | body:        (statement | declaration)*             :hug.
176 | 
177 | FNORD      ~: space*.
178 | space      ~: /\s+|\/\/.*/ | '/*' (!'*/' :anyone)* '*/'.
179 | 
180 | NO_NEWLINE ~: . # XXX
181 | HOLE       ~: 'XXX I will be a hole'.
182 | 
183 | NUMBER      : /(\d+)/.   # XXX
184 | STRING     ~: /"([^"]*)"/ FNORD # XXX
185 |             | /'([^']*)'/ FNORD.
186 | IDENT      ~: !RESERVED_WORD {IdentifierName} FNORD.
187 | 
188 | # XXX incomplete
189 | IdentifierName   ~= IdentifierStart IdentifierPart*.
190 | IdentifierStart  ~= UnicodeLetter
191 |                   | '$'
192 |                   | '_'.
193 | IdentifierPart   ~= IdentifierStart.
194 | UnicodeLetter    ~= /[A-Za-z]/.
195 | 
196 | """
197 | 
198 | #import sys; sys.setrecursionlimit(5000)
199 | gr = Grammar(g)
200 | import microses
201 | grr = gr.bind(microses).expecting_one_result()
202 | 
203 | def test(filename):
204 |     from pprint import pprint
205 |     print 'testing', filename
206 |     with open(filename) as f:
207 |         text = f.read()
208 |     result = grr(text)
209 |     for form in result:
210 |         pprint(form.as_sexpr())
211 | 
212 | ## grr('a.b =c;')
213 | #. (Assign(Get(Variable('a'), 'b'), '=', Variable('c')),)
214 | 
215 | ## test('es6/a.es6')
216 | #. testing es6/a.es6
217 | #. ('Decl',
218 | #.  'let ',
219 | #.  ((('MatchVariable', 'a'),
220 | #.    ('BinaryOp',
221 | #.     ('Get', ('Variable', 'module'), 'exports'),
222 | #.     '*',
223 | #.     ('Data', '2'))),))
224 | #. ('Assign', ('Get', ('Variable', 'module'), 'exports'), '= ', ('Data', '5'))
225 | 


--------------------------------------------------------------------------------
/eg_misc.py:
--------------------------------------------------------------------------------
  1 | """
  2 | A bunch of small examples, some of them from the LPEG documentation.
  3 | Crudely converted from Peglet. TODO: make them nicer.
  4 | """
  5 | 
  6 | from parson import Grammar, Unparsable, exceptionally
  7 | 
  8 | parse_words = Grammar(r'/\W*(\w+)/*')()
  9 | 
 10 | # The equivalent re.split() would return extra '' results first and last:
 11 | ## parse_words('"Hi, there", he said.')
 12 | #. ('Hi', 'there', 'he', 'said')
 13 | 
 14 | class Tagger(dict):
 15 |     def __missing__(self, key):
 16 |         return lambda *parts: (key,) + parts
 17 | 
 18 | name = Grammar(r"""
 19 | name    :  title first middle last.
 20 | title   :  (/(Dr|Mr|Ms|Mrs|St)[.]?/ | /(Pope(?:ss)?)/) _ :Title |.
 21 | first   :  /([A-Za-z]+)/ _ :First.
 22 | middle  :  (/([A-Z])[.]/ | /([A-Za-z]+)/) _ :Middle |.
 23 | last    :  /([A-Za-z]+)/ :Last.
 24 | _       :  /\s+/.
 25 | """).bind(Tagger())
 26 | 
 27 | ## name.name('Popess Darius Q. Bacon')
 28 | #. (('Title', 'Popess'), ('First', 'Darius'), ('Middle', 'Q'), ('Last', 'Bacon'))
 29 | 
 30 | ichbins = Grammar(r"""
 31 | _ sexp :end.
 32 | 
 33 | sexp     :  /\\(.)/         _ :lit_char
 34 |          |  '"' qchar* '"'  _ :join
 35 |          |  symchar+        _ :join
 36 |          |  /'/ _ sexp        :quote
 37 |          |  '(' _ sexp* ')' _ :hug.
 38 | 
 39 | qchar    :  /\\(.)/
 40 |          |  /([^"])/.
 41 | 
 42 | symchar  :  /([^\s\\"'()])/.
 43 | 
 44 | _        :  /\s*/.
 45 | """)(lit_char = ord,
 46 |      quote    = lambda x: ('quote', x))
 47 | 
 48 | ## ichbins.sexp('(hey)')
 49 | #. (('hey',),)
 50 | 
 51 | ## ichbins('hi')
 52 | #. ('hi',)
 53 | ## ichbins(r"""(hi '(john mccarthy) \c )""")
 54 | #. (('hi', ('quote', ('john', 'mccarthy')), 99),)
 55 | ## ichbins(r""" ""  """)
 56 | #. ('',)
 57 | ## ichbins(r""" "hey"  """)
 58 | #. ('hey',)
 59 | 
 60 | # From http://www.inf.puc-rio.br/~roberto/lpeg/
 61 | 
 62 | as_and_bs = Grammar(r"""
 63 | S :end.
 64 | 
 65 | S     :  'a' B
 66 |       |  'b' A
 67 |       |  .
 68 | 
 69 | A     :  'a' S
 70 |       |  'b' A A.
 71 | 
 72 | B     :  'b' S
 73 |       |  'a' B B.
 74 | """)()
 75 | 
 76 | ## as_and_bs("abaabbbbaa")
 77 | #. ()
 78 | 
 79 | sum_nums = Grammar(r"""
 80 | num ** ',' :end :hug :sum.
 81 | num: /(\d+)/ :int.
 82 | """)()
 83 | 
 84 | ## sum_nums('10,30,43')
 85 | #. (83,)
 86 | 
 87 | one_word = Grammar(r"/\w+/ :position")()
 88 | 
 89 | ## one_word('hello')
 90 | #. (5,)
 91 | ## one_word('hello there')
 92 | #. (5,)
 93 | ## one_word.attempt(' ')
 94 | 
 95 | namevalues = Grammar(r"""  pair* :end.
 96 | pair   :  name '=' name /[,;]?/ :hug.
 97 | name   :  /(\w+)/.
 98 | FNORD ~:  /\s*/.
 99 | """)()
100 | namevalues_dict = lambda s: dict(namevalues(s))
101 | ## namevalues_dict("a=b, c = hi; next = pi")
102 | #. {'a': 'b', 'c': 'hi', 'next': 'pi'}
103 | 
104 | # Splitting a string. TODO: But with lpeg it's parametric over a pattern p.
105 | # NB this assumes p doesn't match '', and that it doesn't capture.
106 | 
107 | splitting = Grammar(r"""
108 | split  :  (p | chunk :join) split | .  # XXX why not a *?
109 | chunk  :  p
110 |        |  /(.)/ chunk.
111 | p      :  /\s/.
112 | """)()
113 | ## splitting.split('hello a world  is    nice    ')
114 | #. ('hello', 'a', 'world', 'is', 'nice')
115 | ## splitting.chunk('hello a world  is    nice    ')
116 | #. ('h', 'e', 'l', 'l', 'o')
117 | 
118 | # Searching for a pattern: also parameterized by p.
119 | # (skipped)
120 | 
121 | balanced_parens = Grammar(r"""
122 | bal  :  '(' c* ')'.
123 | c    :  /[^()]/
124 |      |  bal.
125 | """)()
126 | 
127 | ## balanced_parens.bal.attempt('()')
128 | #. ()
129 | ## balanced_parens.bal.attempt('(()')
130 | 
131 | # gsub: another parameterized one
132 | 
133 | gsub = lambda text, pattern, replacement: ''.join(Grammar(r"""
134 | gsub:  (p | /(.)/) gsub
135 |     |  .
136 | p:     :pattern :replace.
137 | """)(pattern=pattern, replace=lambda: replacement).gsub(text))
138 | 
139 | ## gsub('hi there WHEEWHEE to you WHEEEE', 'WHEE', 'GLARG')
140 | #. 'hi there GLARGGLARG to you GLARGEE'
141 | 
142 | csv = Grammar(r"""
143 | record  :  field ** ',' !/./.
144 | 
145 | field   :  '"' qchar* /"\s*/ :join
146 |         |  /([^,"\n]*)/.
147 | 
148 | qchar   :  /([^"])/
149 |         |  '""' :'"'.
150 | """)()
151 | 
152 | ## csv.record('')
153 | #. ('',)
154 | ## csv.record('""')
155 | #. ('',)
156 | ## csv.record("""hi,there,,"this,isa""test"   """)
157 | #. ('hi', 'there', '', 'this,isa"test')
158 | 
159 | 
160 | ## Grammar('x  :  .')().x('')
161 | #. ()
162 | 
163 | def p(grammar, rule, text):
164 |     parse = getattr(Grammar(grammar)(**globals()),
165 |                     rule)
166 |     try:
167 |         return parse(text)
168 |     except Unparsable, e:
169 |         return e
170 | 
171 | metagrammar = r"""
172 | grammar  :  '' rule+.
173 | rule     :  name '=' expr '.'    :make_rule.
174 | expr     :  term ('|' expr       :alt)?.
175 | term     :  factors (':' name    :reduce_)?.
176 | factors  :  factor factors       :seq
177 |          |                       :empty.
178 | factor   :  /'((?:\\.|[^'])*)'/  :literal
179 |          |  name                 :rule_ref.
180 | name     :  /(\w+)/.
181 | FNORD   ~:  /\s*/.
182 | """
183 | 
184 | def make_rule(name, expr): return '%s: %s' % (name, expr)
185 | def alt(e1, e2):           return '%s/%s' % (e1, e2)
186 | def reduce_(e, name):      return '%s =>%s' % (e, name)
187 | def seq(e1, *e2):          return '%s+%s' % ((e1,) + e2) if e2 else e1
188 | def empty():               return '<>'
189 | def literal(regex):        return '/%s/' % regex
190 | def rule_ref(name):        return '<%s>' % name
191 | 
192 | ## p(metagrammar, 'grammar', ' hello = bargle. goodbye = hey there.aloha=.')
193 | #. ('hello: <bargle>+<>', 'goodbye: <hey>+<there>+<>', 'aloha: <>')
194 | ## p(metagrammar, 'grammar', ' hello arg = bargle.')
195 | #. Unparsable(grammar, ' hello ', 'arg = bargle.')
196 | ## p(metagrammar, 'term', "'goodbye' world")
197 | #. ('/goodbye/+<world>+<>',)
198 | 
199 | bal = r"""
200 | FNORD       ~:  /\s*/.
201 | allbalanced  :  '' bal :end.
202 | bal          :  '(' bal ')' :hug bal
203 |              |  /(\w+)/
204 |              |  .
205 | """
206 | ## p(bal, 'allbalanced', '(x) y')
207 | #. (('x',), 'y')
208 | ## p(bal, 'allbalanced', 'x y')
209 | #. Unparsable(allbalanced, 'x ', 'y')
210 | 
211 | curl = r"""
212 | FNORD    ~:  /\s*/.
213 | one_expr  :  '' expr :end.
214 | expr      :  '{' expr* '}' :hug
215 |           |  /([^{}\s]+)/.
216 | """
217 | ## p(curl, 'one_expr', '')
218 | #. Unparsable(one_expr, '', '')
219 | ## p(curl, 'one_expr', '{}')
220 | #. ((),)
221 | ## p(curl, 'one_expr', 'hi')
222 | #. ('hi',)
223 | ## p(curl, 'one_expr', '{hi {there} {{}}}')
224 | #. (('hi', ('there',), ((),)),)
225 | 
226 | multiline_rules = r"""
227 | hi  :  /this/ /is/
228 |        /a/ /rule/
229 |     |  /or/ /this/.
230 | """
231 | 
232 | ## p(multiline_rules, 'hi', "thisisarule")
233 | #. ()
234 | ## p(multiline_rules, 'hi', "orthis")
235 | #. ()
236 | ## p(multiline_rules, 'hi', "thisisnot")
237 | #. Unparsable(hi, 'thisis', 'not')
238 | 
239 | paras = Grammar(r"""
240 | paras: para* _ :end.
241 | para:  _ word+ (/\n\n/ | :end) :hug.
242 | word:  /(\S+)/ _.
243 | _:     (!/\n\n/ /\s/)*.
244 | """)()
245 | 
246 | eg = r"""  hi  there   hey
247 | how are you?
248 |   fine.
249 | 
250 | thanks.
251 | 
252 | ok then."""
253 | 
254 | ## exceptionally(lambda: paras.paras(eg))
255 | #. (('hi', 'there', 'hey', 'how', 'are', 'you?', 'fine.'), ('thanks.',), ('ok', 'then.'))
256 | 


--------------------------------------------------------------------------------
/eg_oberon0.py:
--------------------------------------------------------------------------------
  1 | """
  2 | The Oberon-0 programming language.
  3 | Wirth, _Compiler Construction_, Appendix A.
  4 | """
  5 | 
  6 | from parson import Grammar
  7 | 
  8 | grammar_source = r"""
  9 | ident:                !keyword /([A-Za-z][A-Za-z0-9]*)/.
 10 | integer:              digit+ FNORD :join :int.
 11 | selector:             ('.' ident | '[' expression ']')*.
 12 | factor:               ident selector
 13 |                     | integer
 14 |                     | '(' expression ')'
 15 |                     | '~' factor.
 16 | term:                 factor ++ MulOperator.
 17 | MulOperator:          '*' | "DIV" | "MOD" | '&'.
 18 | SimpleExpression:     ('+'|'-')? term ++ AddOperator.
 19 | AddOperator:          '+' | '-' | "OR".
 20 | expression:           SimpleExpression (relation SimpleExpression)?.
 21 | relation:             '=' | '#' | '<=' | '<' | '>=' | '>'.
 22 | assignment:           ident selector ':=' expression.
 23 | ActualParameters:     '(' expression ** ',' ')'.
 24 | ProcedureCall:        ident ActualParameters?.
 25 | IfStatement:          "IF" expression "THEN" StatementSequence
 26 |                       ("ELSIF" expression "THEN" StatementSequence)*
 27 |                       ("ELSE" StatementSequence)?
 28 |                       "END".
 29 | WhileStatement:       "WHILE" expression "DO" StatementSequence "END".
 30 | statement:            (assignment | ProcedureCall | IfStatement | WhileStatement)?.
 31 | StatementSequence:    statement ++ ';'.   # XXX isn't it a problem that statement can be empty?
 32 | IdentList:            ident ++ ','.
 33 | ArrayType:            "ARRAY" expression "OF" type.
 34 | FieldList:            (IdentList ':' type)?.
 35 | RecordType:           "RECORD" FieldList ++ ';' "END".
 36 | type:                 ident | ArrayType | RecordType.
 37 | FPSection:            ("VAR")? IdentList ':' type.
 38 | FormalParameters:     '(' FPSection ** ';' ')'.
 39 | ProcedureHeading:     "PROCEDURE" ident FormalParameters?.
 40 | ProcedureBody:        declarations ("BEGIN" StatementSequence)? "END".
 41 | ProcedureDeclaration: ProcedureHeading ';' ProcedureBody ident.
 42 | declarations:         ("CONST" (ident '=' expression ';')*)?
 43 |                       ("TYPE" (ident '=' type ';')*)?
 44 |                       ("VAR" (IdentList ':' type ';')*)?
 45 |                       (ProcedureDeclaration ';')*.
 46 | module:               "MODULE" ident ';' declarations
 47 |                       ("BEGIN" StatementSequence)? "END" ident '.'.
 48 | 
 49 | FNORD         ~:      whitespace*.
 50 | whitespace    ~:      /\s+/ | comment.
 51 | comment       ~:      '(*' commentchunk* '*)'.
 52 | commentchunk  ~:      comment | !'*)' :anyone.   # XXX are comments nested in Oberon-0?
 53 | keyword       ~:      /BEGIN|END|MODULE|VAR|TYPE|CONST|PROCEDURE|RECORD|ARRAY|OF|WHILE|DO|IF|ELSIF|THEN|ELSE|OR|DIV|MOD/ /\b/.
 54 | digit         ~:      /(\d)/.
 55 | 
 56 | top:                  '' module :end.
 57 | """
 58 | grammar = Grammar(grammar_source)()
 59 | 
 60 | # TODO test for expected parse failures
 61 | 
 62 | ## from parson import exceptionally
 63 | ## import glob
 64 | 
 65 | ## for filename in sorted(glob.glob('ob-bad/*.ob')): print exceptionally(lambda: test(filename))
 66 | #. testing ob-bad/badassign.ob
 67 | #. (top, 'MODULE badassign;\n\nBEGIN\n    ', '1 := 2\nEND badassign.\n')
 68 | #. testing ob-bad/badcase.ob
 69 | #. (top, 'MODULE badcase;\n\nVAR\n    avar : INTEGER;\n    bvar : BOOLEAN;\n\nBEGIN\n    CASE ', 'bvar OF\n        18 : avar := 19\n    END;\n\n    CASE 1 < 2 OF\n        avar : avar := 3\n      | avar + 1 .. avar + 10 : avar := 5\n    END;\n\n    CASE avar OF\n        3 DIV 0 : avar := 1\n    END\nEND badcase.\n')
 70 | #. testing ob-bad/badfor.ob
 71 | #. (top, 'MODULE badfor;\n\nCONST\n    aconst = 10;\n    \nTYPE\n    atype = INTEGER;    \n\nVAR\n    avar : INTEGER;\n    bvar : BOOLEAN;\n    cvar : INTEGER;\n\nBEGIN\n    FOR ', 'aconst := 1 TO 10 DO\n        avar := 1\n    END;\n\n    FOR atype := 1 TO 10 DO\n        avar := 1\n    END;\n\n    FOR bvar := FALSE TO TRUE DO\n        avar := 42\n    END;\n\n    FOR avar := 1 TO 2 BY cvar * 2 DO\n        avar := 42\n    END;\n\n    FOR dvar := 1 TO 2 DO\n        dvar := 99\n    END;\n\n    FOR avar := 8 TO 10 BY 3 DIV 0 DO\n        cvar := 100\n    END\nEND badfor.\n')
 72 | #. testing ob-bad/commentnoend.ob
 73 | #. (top, "MODULE commentnoend;\n (* started off well,\n   but didn't finish\nEND commentnoend.\n", '')
 74 | #. testing ob-bad/keywordasname.ob
 75 | #. (top, 'MODULE ', 'VAR;\nEND VAR.\n')
 76 | #. testing ob-bad/repeatsection.ob
 77 | #. (top, 'MODULE repeatsection;\n\nCONST\n    aconst = 10;\n\n', 'CONST\n    aconst = 20;\n\nEND repeatsection.\n')
 78 | 
 79 | ## for filename in sorted(glob.glob('ob-ok/*.ob')): test(filename)
 80 | #. testing ob-ok/arrayname.ob
 81 | #. testing ob-ok/assign.ob
 82 | #. testing ob-ok/badarg.ob
 83 | #. testing ob-ok/badarray.ob
 84 | #. testing ob-ok/badcond.ob
 85 | #. testing ob-ok/badeq.ob
 86 | #. testing ob-ok/badproc.ob
 87 | #. testing ob-ok/badrecord.ob
 88 | #. testing ob-ok/badwhile.ob
 89 | #. testing ob-ok/comment.ob
 90 | #. testing ob-ok/cond.ob
 91 | #. testing ob-ok/condname.ob
 92 | #. testing ob-ok/const.ob
 93 | #. testing ob-ok/emptybody.ob
 94 | #. testing ob-ok/emptydeclsections.ob
 95 | #. testing ob-ok/emptymodule.ob
 96 | #. testing ob-ok/factorial.ob
 97 | #. testing ob-ok/gcd.ob
 98 | #. testing ob-ok/intoverflow.ob
 99 | #. testing ob-ok/keywordprefix.ob
100 | #. testing ob-ok/nominalarg.ob
101 | #. testing ob-ok/nonintconstant.ob
102 | #. testing ob-ok/nonlocalvar.ob
103 | #. testing ob-ok/nonmoduleasmodulename.ob
104 | #. testing ob-ok/proc.ob
105 | #. testing ob-ok/recordname.ob
106 | #. testing ob-ok/recurse.ob
107 | #. testing ob-ok/redefinteger.ob
108 | #. testing ob-ok/redeftrue.ob
109 | #. testing ob-ok/selfref.ob
110 | #. testing ob-ok/type.ob
111 | #. testing ob-ok/typenodecl.ob
112 | #. testing ob-ok/var.ob
113 | #. testing ob-ok/while.ob
114 | #. testing ob-ok/whilename.ob
115 | #. testing ob-ok/wrongmodulename.ob
116 | #. testing ob-ok/wrongprocedurename.ob
117 | 
118 | def test(filename):
119 |     print 'testing', filename
120 |     with open(filename) as f:
121 |         text = f.read()
122 |     grammar.top(text)
123 | 


--------------------------------------------------------------------------------
/eg_oberon0_with_lexer.py:
--------------------------------------------------------------------------------
  1 | """
  2 | Like eg_oberon0.py, but using a separate lexer.
  3 | """
  4 | 
  5 | import re
  6 | from parson import Grammar, one_that, label, alter, match
  7 | 
  8 | class LexedGrammar(Grammar):
  9 |     def __init__(self, string):
 10 |         super(LexedGrammar, self).__init__(string)
 11 |         self.literals = set()
 12 |         self.keywords = set()
 13 |     def literal(self, string):
 14 |         self.literals.add(string)
 15 |         return literal_kind(string)
 16 |     def match(self, regex):
 17 |         assert False
 18 |     def keyword(self, string):
 19 |         self.keywords.add(string)
 20 |         return literal_kind(string)
 21 | 
 22 | def literal_kind(string):
 23 |     return one_that(lambda token: token.kind == string)
 24 | 
 25 | class Token(object):
 26 |     def __init__(self, kind, string):
 27 |         self.kind = kind
 28 |         self.string = string
 29 |     def __repr__(self):
 30 |         return repr(self.string)
 31 | 
 32 | grammar_source = r"""
 33 | selector:             ('.' :IDENT | '[' expression ']')*.
 34 | factor:               :IDENT selector
 35 |                     | :INTEGER
 36 |                     | '(' expression ')'
 37 |                     | '~' factor.
 38 | term:                 factor ++ MulOperator.
 39 | MulOperator:          '*' | "DIV" | "MOD" | '&'.
 40 | SimpleExpression:     ('+'|'-')? term ++ AddOperator.
 41 | AddOperator:          '+' | '-' | "OR".
 42 | expression:           SimpleExpression (relation SimpleExpression)?.
 43 | relation:             '=' | '#' | '<=' | '<' | '>=' | '>'.
 44 | assignment:           :IDENT selector ':=' expression.
 45 | ActualParameters:     '(' expression ** ',' ')'.
 46 | ProcedureCall:        :IDENT ActualParameters?.
 47 | IfStatement:          "IF" expression "THEN" StatementSequence
 48 |                       ("ELSIF" expression "THEN" StatementSequence)*
 49 |                       ("ELSE" StatementSequence)?
 50 |                       "END".
 51 | WhileStatement:       "WHILE" expression "DO" StatementSequence "END".
 52 | statement:            (assignment | ProcedureCall | IfStatement | WhileStatement)?.
 53 | StatementSequence:    statement ++ ';'.   # XXX isn't it a problem that statement can be empty?
 54 | IdentList:            :IDENT ++ ','.
 55 | ArrayType:            "ARRAY" expression "OF" type.
 56 | FieldList:            (IdentList ':' type)?.
 57 | RecordType:           "RECORD" FieldList ++ ';' "END".
 58 | type:                 :IDENT | ArrayType | RecordType.
 59 | FPSection:            ("VAR")? IdentList ':' type.
 60 | FormalParameters:     '(' FPSection ** ';' ')'.
 61 | ProcedureHeading:     "PROCEDURE" :IDENT FormalParameters?.
 62 | ProcedureBody:        declarations ("BEGIN" StatementSequence)? "END".
 63 | ProcedureDeclaration: ProcedureHeading ';' ProcedureBody :IDENT.
 64 | declarations:         ("CONST" (:IDENT '=' expression ';')*)?
 65 |                       ("TYPE" (:IDENT '=' type ';')*)?
 66 |                       ("VAR" (IdentList ':' type ';')*)?
 67 |                       (ProcedureDeclaration ';')*.
 68 | module:               "MODULE" :IDENT ';' declarations
 69 |                       ("BEGIN" StatementSequence)? "END" :IDENT '.'.
 70 | 
 71 | top:                  module :end.
 72 | """
 73 | builder = LexedGrammar(grammar_source)
 74 | grammar = builder(IDENT   = literal_kind('#IDENT'),
 75 |                   INTEGER = literal_kind('#INTEGER'))
 76 | 
 77 | ## builder.keywords
 78 | #. set(['THEN', 'BEGIN', 'END', 'DO', 'OF', 'ARRAY', 'MODULE', 'ELSE', 'RECORD', 'WHILE', 'ELSIF', 'VAR', 'CONST', 'DIV', 'MOD', 'TYPE', 'OR', 'PROCEDURE', 'IF'])
 79 | ## builder.literals
 80 | #. set(['#', '<=', '>=', '&', ')', '(', '+', '*', '-', ',', '.', ':=', ':', '=', ';', '[', '>', ']', '<', '~'])
 81 | 
 82 | def one_of(strings):
 83 |     # Sort longest first because re's '|' matches left-to-right, not greedily:
 84 |     alts = sorted(strings, key=len, reverse=True) 
 85 |     return '|'.join(map(re.escape, alts))
 86 |                         
 87 | a_literal = one_of(builder.literals)
 88 | a_keyword = one_of(builder.keywords)
 89 | 
 90 | lex_grammar_source = r"""  token* :end.
 91 | token      :  whitespace | :keyword :Token | :punct :Token | ident | integer.
 92 | 
 93 | whitespace :  /\s+/ | comment.
 94 | comment    :  '(*' in_comment* '*)'.
 95 | in_comment :  comment | !'*)' :anyone.   # XXX are comments nested in Oberon-0?
 96 | 
 97 | ident      :  /([A-Za-z][A-Za-z0-9]*)/ :'#IDENT' :Token.
 98 | integer    :  /(\d+)/ :'#INTEGER' :Token.
 99 | """
100 | lex_grammar = Grammar(lex_grammar_source)(Token = lambda s, kind=None: Token(kind or s, s),
101 |                                           keyword = match(r'(%s)\b' % a_keyword),
102 |                                           punct   = match(r'(%s)' % a_literal))
103 | 
104 | ## import sys; sys.setrecursionlimit(5000)
105 | ## import glob
106 | ## from parson import exceptionally
107 | 
108 | ## for filename in sorted(glob.glob('ob-bad/*.ob')): print exceptionally(lambda: test(filename))
109 | #. testing ob-bad/badassign.ob
110 | #. (top, ('MODULE', 'badassign', ';', 'BEGIN'), ('1', ':=', '2', 'END', 'badassign', '.'))
111 | #. testing ob-bad/badcase.ob
112 | #. ((literal('') ((token)* end)), 'MODULE badcase;\n\nVAR\n    avar : INTEGER;\n    bvar : BOOLEAN;\n\nBEGIN\n    CASE bvar OF\n        18 : avar := 19\n    END;\n\n    CASE 1 < 2 OF\n        avar : avar := 3\n      ', '| avar + 1 .. avar + 10 : avar := 5\n    END;\n\n    CASE avar OF\n        3 DIV 0 : avar := 1\n    END\nEND badcase.\n')
113 | #. testing ob-bad/badfor.ob
114 | #. (top, ('MODULE', 'badfor', ';', 'CONST', 'aconst', '=', '10', ';', 'TYPE', 'atype', '=', 'INTEGER', ';', 'VAR', 'avar', ':', 'INTEGER', ';', 'bvar', ':', 'BOOLEAN', ';', 'cvar', ':', 'INTEGER', ';', 'BEGIN', 'FOR'), ('aconst', ':=', '1', 'TO', '10', 'DO', 'avar', ':=', '1', 'END', ';', 'FOR', 'atype', ':=', '1', 'TO', '10', 'DO', 'avar', ':=', '1', 'END', ';', 'FOR', 'bvar', ':=', 'FALSE', 'TO', 'TRUE', 'DO', 'avar', ':=', '42', 'END', ';', 'FOR', 'avar', ':=', '1', 'TO', '2', 'BY', 'cvar', '*', '2', 'DO', 'avar', ':=', '42', 'END', ';', 'FOR', 'dvar', ':=', '1', 'TO', '2', 'DO', 'dvar', ':=', '99', 'END', ';', 'FOR', 'avar', ':=', '8', 'TO', '10', 'BY', '3', 'DIV', '0', 'DO', 'cvar', ':=', '100', 'END', 'END', 'badfor', '.'))
115 | #. testing ob-bad/commentnoend.ob
116 | #. ((literal('') ((token)* end)), "MODULE commentnoend;\n (* started off well,\n   but didn't finish\nEND commentnoend.\n", '')
117 | #. testing ob-bad/keywordasname.ob
118 | #. (top, ('MODULE',), ('VAR', ';', 'END', 'VAR', '.'))
119 | #. testing ob-bad/repeatsection.ob
120 | #. (top, ('MODULE', 'repeatsection', ';', 'CONST', 'aconst', '=', '10', ';'), ('CONST', 'aconst', '=', '20', ';', 'END', 'repeatsection', '.'))
121 | 
122 | ## for filename in sorted(glob.glob('ob-ok/*.ob')): test(filename)
123 | #. testing ob-ok/arrayname.ob
124 | #. testing ob-ok/assign.ob
125 | #. testing ob-ok/badarg.ob
126 | #. testing ob-ok/badarray.ob
127 | #. testing ob-ok/badcond.ob
128 | #. testing ob-ok/badeq.ob
129 | #. testing ob-ok/badproc.ob
130 | #. testing ob-ok/badrecord.ob
131 | #. testing ob-ok/badwhile.ob
132 | #. testing ob-ok/comment.ob
133 | #. testing ob-ok/cond.ob
134 | #. testing ob-ok/condname.ob
135 | #. testing ob-ok/const.ob
136 | #. testing ob-ok/emptybody.ob
137 | #. testing ob-ok/emptydeclsections.ob
138 | #. testing ob-ok/emptymodule.ob
139 | #. testing ob-ok/factorial.ob
140 | #. testing ob-ok/gcd.ob
141 | #. testing ob-ok/intoverflow.ob
142 | #. testing ob-ok/keywordprefix.ob
143 | #. testing ob-ok/nominalarg.ob
144 | #. testing ob-ok/nonintconstant.ob
145 | #. testing ob-ok/nonlocalvar.ob
146 | #. testing ob-ok/nonmoduleasmodulename.ob
147 | #. testing ob-ok/proc.ob
148 | #. testing ob-ok/recordname.ob
149 | #. testing ob-ok/recurse.ob
150 | #. testing ob-ok/redefinteger.ob
151 | #. testing ob-ok/redeftrue.ob
152 | #. testing ob-ok/selfref.ob
153 | #. testing ob-ok/type.ob
154 | #. testing ob-ok/typenodecl.ob
155 | #. testing ob-ok/var.ob
156 | #. testing ob-ok/while.ob
157 | #. testing ob-ok/whilename.ob
158 | #. testing ob-ok/wrongmodulename.ob
159 | #. testing ob-ok/wrongprocedurename.ob
160 | 
161 | def test(filename):
162 |     print 'testing', filename
163 |     with open(filename) as f:
164 |         text = f.read()
165 |     tokens = lex_grammar(text)
166 |     if 0:
167 |         for token in tokens:
168 |             print token.kind,
169 |         print
170 |     if tokens is not None:
171 |         grammar.top(tokens)
172 | 


--------------------------------------------------------------------------------
/eg_outline.py:
--------------------------------------------------------------------------------
 1 | """
 2 | Parse an outline using indentation.
 3 | After Higher Order Perl, section 8.6.
 4 | """
 5 | 
 6 | import parson as P
 7 | 
 8 | def Node(margin):
 9 |     return P.seclude(P.match(r'( {%d,})' % margin)
10 |                      + P.dynamic(lambda indent:
11 |                                  (line + Node(len(indent)+1).star()) >> P.hug))
12 | 
13 | line = '* ' + P.match(r'(.*)\n?')
14 | 
15 | outline = Node(0).star() + ~P.anyone
16 | 
17 | 
18 | eg = """\
19 | * Hello
20 |   * Aloha
21 |     * Bonjour
22 |     * Adieu
23 |   * also
24 | * Whatever
25 |   * yay?"""
26 | 
27 | ## from pprint import pprint; pprint(outline(eg))
28 | #. (('Hello', ('Aloha', ('Bonjour',), ('Adieu',)), ('also',)),
29 | #.  ('Whatever', ('yay?',)))
30 | 


--------------------------------------------------------------------------------
/eg_phone_num.py:
--------------------------------------------------------------------------------
 1 | """
 2 | The phone-number example at the top of
 3 | https://github.com/modernserf/little-language-lab
 4 | """
 5 | 
 6 | from parson import Grammar
 7 | 
 8 | # const join = (...values) => values.join("")
 9 | # const phone = lang`
10 | # Root      = ~("+"? "1" _)? AreaCode ~(_ "-"? _) Exchange ~(_ "-"? _) Line
11 | #             ${(areaCode, exchange, line) => ({ areaCode, exchange, line })}
12 | # AreaCode  = "(" _ (D D D ${join}) _ ")" ${(_, __, digits) => digits}
13 | #           | D D D   ${join}
14 | # Exchange  = D D D   ${join}
15 | # Line      = D D D D ${join}
16 | # D         = %digit
17 | # _         = %whitespace*
18 | # `
19 | # phone.match("+1 (800) 555-1234")
20 | # // { ok: true, value: { areaCode: "800", exchange: "555", line: "1234" } }
21 | 
22 | 
23 | # Version 1: just gimme the data.
24 | 
25 | grammar1 = r"""
26 | Root      : ('+'? '1' _)? AreaCode _ '-'? _ Exchange _ '-'? _ Line.
27 | AreaCode  : '(' _ {D D D} _ ')'
28 |           | {D D D}.
29 | Exchange  : {D D D}.
30 | Line      : {D D D D}.
31 | D         = /\d/.
32 | _         = /\s*/.
33 | """
34 | g1 = Grammar(grammar1)()
35 | ## g1.Root("+1 (800) 555-1234")
36 | #. ('800', '555', '1234')
37 | 
38 | 
39 | # Version 2, returning a dict.
40 | # We have to pass the dict constructor in as a semantic parameter, since
41 | # Python lacks template strings.
42 | 
43 | grammar2 = r"""
44 | Root      : ('+'? '1' _)? AreaCode _ '-'? _ Exchange _ '-'? _ Line :hug :dict.
45 | AreaCode  : :'areaCode' ('(' _ {D D D} _ ')'
46 |                         | {D D D}) :hug.
47 | Exchange  : :'exchange' {D D D} :hug.
48 | Line      : :'line' {D D D D} :hug.
49 | D         = /\d/.
50 | _         = /\s*/.
51 | """
52 | parse_phone_number2 = Grammar(grammar2)(dict=dict).Root.expecting_one_result()
53 | ## parse_phone_number2("+1 (800) 555-1234")
54 | #. {'areaCode': '800', 'line': '1234', 'exchange': '555'}
55 | 
56 | 
57 | # Version 3, more my usual style.
58 | # (Pass in semantic actions for the main productions
59 | # and avoid the _ noise using a FNORD production.)
60 | 
61 | from structs import Struct
62 | 
63 | class PhoneNumber(Struct('area_code exchange line')):
64 |     pass
65 | 
66 | grammar3 = r"""
67 | Root        : /[+]?1/? AreaCode '-'? Exchange '-'? Line :PhoneNumber.
68 | AreaCode    : '(' AreaDigits ')' | AreaDigits.
69 | 
70 | AreaDigits ~: /(\d\d\d)/.
71 | Exchange   ~: /(\d\d\d)/.
72 | Line       ~: /(\d\d\d\d)/.
73 | FNORD      ~: /\s*/.
74 | """
75 | g3 = Grammar(grammar3)(PhoneNumber=PhoneNumber)
76 | parse_phone_number3 = g3.Root.expecting_one_result()
77 | ## parse_phone_number3("+1 (800) 555-1234")
78 | #. PhoneNumber('800', '555', '1234')
79 | 


--------------------------------------------------------------------------------
/eg_pother.py:
--------------------------------------------------------------------------------
  1 | """
  2 | Parser adapted from github.com/darius/pother as an example and for testing.
  3 | """
  4 | 
  5 | from parson import Grammar
  6 | 
  7 | def make_var(v):        return v
  8 | def make_const(c):      return c
  9 | def make_lam(v, e):     return '(lambda (%s) %s)' % (v, e)
 10 | def make_app(e1, e2):   return '(%s %s)' % (e1, e2)
 11 | def make_send(e1, e2):  return '(%s <- %s)' % (e1, e2)
 12 | def make_lit_sym(v):    return '(quote %s)' % v
 13 | 
 14 | def make_let(decls, e):
 15 |     return '(let %s %s)' % (' '.join(decls), e)
 16 | 
 17 | def make_defer(v):      return '(defer %s)' % v
 18 | def make_bind(v, e):    return '(bind %s %s)' % (v, e)
 19 | def make_eqn(vs, e):
 20 |     assert isinstance(vs, tuple)
 21 |     assert isinstance(e, str), "hey %r" % (e,)
 22 |     return '((%s) %s)' % (' '.join(vs), e)
 23 | 
 24 | def make_list_pattern(*params):
 25 |     return '(list %s)' % ' '.join(map(str, params))
 26 | 
 27 | def make_list_expr(es):
 28 |     return '(list %s)' % ' '.join(map(str, es))
 29 | 
 30 | def make_case(e, cases): return ('(case %s %s)'
 31 |                                  % (e, ' '.join('(%s %s)' % pair
 32 |                                                 for pair in cases)))
 33 | 
 34 | def foldr(f, z, xs):
 35 |     for x in reversed(xs):
 36 |         z = f(x, z)
 37 |     return z
 38 | 
 39 | def fold_app(f, fs):  return reduce(make_app, fs, f)
 40 | def fold_apps(fs):    return reduce(make_app, fs)
 41 | def fold_send(f, fs): return reduce(make_send, fs, f)
 42 | def fold_lam(vp, e):  return foldr(make_lam, e, vp)
 43 | 
 44 | # XXX not sure about paramlist here:
 45 | fold_infix_app = lambda _left, _op, _right: \
 46 |     fold_app(_op, [fold_apps(_left), _right])
 47 | 
 48 | # XXX & was \ for lambda
 49 | 
 50 | # XXX
 51 | #                         [Param,operator,_,Param,  
 52 | #                          lambda _left,_op,_right: [_op, _left, _right]]
 53 | 
 54 | toy_grammar = Grammar(r"""  E :end.
 55 | 
 56 | E          :  Fp '`' V '`' E     :fold_infix_app
 57 |            |  Fp                 :fold_apps
 58 |            |  '&' Vp '=>' E      :fold_lam
 59 |            |  "let" Decls E      :make_let
 60 |            |  "case" E Cases     :make_case.
 61 | 
 62 | Cases      :  Case+ :hug.
 63 | Case       :  '|' Param '=>' E   :hug.
 64 | 
 65 | Param      :  Const
 66 |            |  V
 67 |            |  '(' Param ')'
 68 |            |  '[' ParamList ']'.
 69 | 
 70 | ParamList  :  Param ',' Param    :make_list_pattern.
 71 | 
 72 | Decls      :  Decl+ :hug.
 73 | Decl       :  "defer" V ';'      :make_defer
 74 |            |  "bind" V '=' E ';' :make_bind
 75 |            |  Vp '=' E ';'       :make_eqn.
 76 | 
 77 | Fp         :  F+ :hug.
 78 | F          :  Const              :make_const
 79 |            |  V                  :make_var
 80 |            |  '(' E ')'
 81 |            |  '{' F Fp '}'       :fold_send
 82 |            |  '[' E ** ',' ']'   :hug :make_list_expr.
 83 | 
 84 | Vp         :  V+ :hug.
 85 | V          :  Identifier
 86 |            |  Operator.
 87 | 
 88 | Identifier :  /(?!let\b|case\b|defer\b|bind\b)([A-Za-z_]\w*)/.
 89 | Operator   :  /(<=|:=|[!+-.])/.
 90 | 
 91 | Const      :  '.' V               :make_lit_sym
 92 |            |  /"([^"]*)"/         :repr
 93 |            |  /(-?\d+)/
 94 |            |  '(' ')'            :'()'
 95 |            |  '[' ']'            :'[]'.
 96 | 
 97 | FNORD     ~:  /\s*/.
 98 | """)(**globals()).expecting_one_result()
 99 | 
100 | ## toy_grammar('.+')
101 | #. '(quote +)'
102 | ## toy_grammar('0 .+')
103 | #. '(0 (quote +))'
104 | 
105 | ## print toy_grammar('0')
106 | #. 0
107 | ## print toy_grammar('x')
108 | #. x
109 | ## print toy_grammar('let x=y; x')
110 | #. (let ((x) y) x)
111 | ## print toy_grammar.attempt('')
112 | #. None
113 | ## print toy_grammar('x x . y')
114 | #. ((x x) (quote y))
115 | ## print toy_grammar.attempt('(when (in the)')
116 | #. None
117 | ## print toy_grammar('&M => (&f => M (f f)) (&f => M (f f))')
118 | #. (lambda (M) ((lambda (f) (M (f f))) (lambda (f) (M (f f)))))
119 | ## print toy_grammar('&a b c => a b')
120 | #. (lambda (a) (lambda (b) (lambda (c) (a b))))
121 | 
122 | ## toy_grammar('x')
123 | #. 'x'
124 | ## toy_grammar('let x=y; x')
125 | #. '(let ((x) y) x)'
126 | ## toy_grammar.attempt('')
127 | ## toy_grammar('x x . y')
128 | #. '((x x) (quote y))'
129 | ## toy_grammar.attempt('(when (in the)')
130 | ## toy_grammar('&M => (&f => M (f f)) (&f => M (f f))')
131 | #. '(lambda (M) ((lambda (f) (M (f f))) (lambda (f) (M (f f)))))'
132 | ## toy_grammar('&a b c => a b')
133 | #. '(lambda (a) (lambda (b) (lambda (c) (a b))))'
134 | 
135 | mint = r"""
136 | let make_mint name =
137 |     case make_brand name
138 |       | [sealer, unsealer] =>
139 | 
140 |         let defer mint;
141 |             real_mint name msg = case msg
142 | 
143 |               | .__print_on => &out => out .print (name .. "'s mint")
144 | 
145 |               | .make_purse => &initial_balance =>
146 |                   (let _ = assert (is_int initial_balance);
147 |                        _ = assert (0 .<= initial_balance);
148 |                        balance = make_box initial_balance;
149 |                        decr amount = (let _ = assert (is_int amount);
150 |                                           _ = assert ((0 .<= amount) 
151 |                                                       .and (amount .<= balance));
152 |                                       balance .:= (balance .! .- amount));
153 |                        purse msg = case msg
154 |                          | .__print_on => &out =>
155 |                              out .print ("has " .. (to_str balance)
156 |                                                 .. name .. " bucks")
157 |                          | .balance  => balance .!
158 |                          | .sprout   => mint .make_purse 0
159 |                          | .get_decr => sealer .seal decr
160 |                          | .deposit  => &amount source =>
161 |                              (let _ = unsealer .unseal (source .get_decr) amount;
162 |                               balance .:= (balance .! .+ amount));
163 |                    purse);
164 | 
165 |             bind mint = real_mint;
166 |         mint;
167 | 
168 | make_mint
169 | """
170 | #try: print toy_grammar(mint)
171 | #except Unparsable, e:
172 | #    print e.args[1][0]
173 | #    print 'XXX'
174 | #    print e.args[1][1]
175 | #print toy_grammar('let defer mint; mint')
176 | ## print toy_grammar(mint)
177 | #. (let ((make_mint name) (case (make_brand name) ((list sealer unsealer) (let (defer mint) ((real_mint name msg) (case msg ((quote __print_on) (lambda (out) ((out (quote print)) ((name (quote .)) "'s mint")))) ((quote make_purse) (lambda (initial_balance) (let ((_) (assert (is_int initial_balance))) ((_) (assert ((0 (quote <=)) initial_balance))) ((balance) (make_box initial_balance)) ((decr amount) (let ((_) (assert (is_int amount))) ((_) (assert ((((0 (quote <=)) amount) (quote and)) ((amount (quote <=)) balance)))) ((balance (quote :=)) (((balance (quote !)) (quote -)) amount)))) ((purse msg) (case msg ((quote __print_on) (lambda (out) ((out (quote print)) (((((('has ' (quote .)) (to_str balance)) (quote .)) name) (quote .)) ' bucks')))) ((quote balance) (balance (quote !))) ((quote sprout) ((mint (quote make_purse)) 0)) ((quote get_decr) ((sealer (quote seal)) decr)) ((quote deposit) (lambda (amount) (lambda (source) (let ((_) (((unsealer (quote unseal)) (source (quote get_decr))) amount)) ((balance (quote :=)) (((balance (quote !)) (quote +)) amount)))))))) purse))))) (bind mint real_mint) mint)))) make_mint)
178 | 
179 | mintskel = r"""
180 | let make_mint name =
181 |     case make_brand name
182 |       | [sealer, unsealer] =>
183 | 
184 |         let defer mint;
185 |         mint;
186 | 
187 | make_mint
188 | """
189 | ## print toy_grammar(mintskel)
190 | #. (let ((make_mint name) (case (make_brand name) ((list sealer unsealer) (let (defer mint) mint)))) make_mint)
191 | 
192 | voting = r"""
193 | let make_one_shot f =
194 |         let armed = make_box True;
195 |         &x => let _ = assert (armed .! .not);
196 |                   _ = armed .:= False;
197 |         f x;
198 |     
199 |     start_voting voters choices timer =
200 |         let ballot_box = map (&_ => make_box 0) choices;
201 |             poll voter =
202 |                 let make_checkbox pair = 
203 |                         case pair
204 |                           | [choice, tally] =>
205 |                             [choice, make_one_shot (&_ => 
206 |                                        tally .:= (tally .! .+ 1))];
207 |                     ballot = map make_checkbox (zip choices ballot_box);
208 |                 {voter ballot};
209 |             _ = for_each poll voters;
210 |         [close_polls, totals];
211 | 
212 | start_voting
213 | """
214 | ## print toy_grammar(voting)
215 | #. (let ((make_one_shot f) (let ((armed) (make_box True)) (lambda (x) (let ((_) (assert ((armed (quote !)) (quote not)))) ((_) ((armed (quote :=)) False)) (f x))))) ((start_voting voters choices timer) (let ((ballot_box) ((map (lambda (_) (make_box 0))) choices)) ((poll voter) (let ((make_checkbox pair) (case pair ((list choice tally) (list (((choice ,) make_one_shot) (lambda (_) ((tally (quote :=)) (((tally (quote !)) (quote +)) 1)))))))) ((ballot) ((map make_checkbox) ((zip choices) ballot_box))) (voter <- ballot))) ((_) ((for_each poll) voters)) (list ((close_polls ,) totals)))) start_voting)
216 | 


--------------------------------------------------------------------------------
/eg_precedence.py:
--------------------------------------------------------------------------------
 1 | """
 2 | Infix parsing with operator precedence (inefficient implementation).
 3 | """
 4 | 
 5 | from parson import Grammar, recur, seclude, either, fail
 6 | 
 7 | def PrececedenceParser(primary_expr, table):
 8 |     return foldr(lambda make_expr, subexpr: make_expr(subexpr),
 9 |                  primary_expr,
10 |                  table)
11 | 
12 | def LeftAssoc(*pairs):
13 |     return lambda subexpr: \
14 |         seclude(subexpr + alt([peg + subexpr + oper
15 |                                for peg, oper in pairs]).star())
16 | 
17 | def RightAssoc(*pairs):
18 |     return lambda subexpr: \
19 |         recur(lambda expr:
20 |                   seclude(subexpr + alt([peg + expr + oper
21 |                                          for peg, oper in pairs]).maybe()))
22 | 
23 | def alt(pegs):
24 |     return foldr(either, fail, pegs)
25 | 
26 | def foldr(f, z, xs):
27 |     for x in reversed(xs):
28 |         z = f(x, z)
29 |     return z
30 | 
31 | 
32 | # eg_calc.py example
33 | 
34 | from operator import *
35 | from parson import delay
36 | 
37 | _    = delay(lambda: g.FNORD)
38 | exp3 = delay(lambda: g.exp3)
39 | 
40 | exp1 = PrececedenceParser(exp3, [
41 |         LeftAssoc(('*'+_, mul), ('//'+_, div), ('/'+_, truediv), ('%'+_, mod)),
42 |         RightAssoc(('^'+_, pow)),
43 |         ])
44 | 
45 | exps = PrececedenceParser(exp1, [
46 |         LeftAssoc(('+'+_, add), ('-'+_, sub)),
47 |         ])
48 | 
49 | g = Grammar(r"""
50 | :exps :end.
51 | 
52 | exp3 : '(' :exps ')'
53 |      | '-' :exp1 :neg
54 |      | /(\d+)/ :int.
55 | 
56 | FNORD ~= /\s*/.
57 | """)(**globals())
58 | 
59 | ## g('42 *(5-3) + -2^2')
60 | #. (80,)
61 | ## g('2^3^2')
62 | #. (512,)
63 | ## g('5-3-1')
64 | #. (1,)
65 | ## g('3//2')
66 | #. (1,)
67 | ## g('3/2')
68 | #. (1.5,)
69 | 


--------------------------------------------------------------------------------
/eg_puzzler.py:
--------------------------------------------------------------------------------
  1 | """
  2 | Port of ~/git/mccarthy-to-bryant/puzzler.py
  3 | Uses modules from that repo (not included in this one).
  4 | """
  5 | 
  6 | import operator
  7 | from parson import Grammar, Unparsable
  8 | import dd
  9 | 
 10 | def mk_var(name):
 11 |     return dd.Variable(enter(name))
 12 | 
 13 | var_names = []
 14 | def enter(name):
 15 |     try:
 16 |         return var_names.index(name)
 17 |     except ValueError:
 18 |         var_names.append(name)
 19 |         return len(var_names) - 1
 20 | 
 21 | # This grammar is complicated by requiring that whitespace mean AND
 22 | # *only* within a line, not spanning lines -- an implicit AND spanning
 23 | # lines would be too error prone.
 24 | g = r"""   expr :end.
 25 | 
 26 | expr:      sentence (',' expr   :And)?.
 27 | 
 28 | sentence:  sum ('=' sum         :Equiv)?.
 29 | 
 30 | sum:       term ( '|' sum       :Or
 31 |                 | '=>' term     :Implies )?.
 32 | 
 33 | term:      factor (factor :And)* FNORD.
 34 | 
 35 | factor  ~: '~'_ primary         :Not
 36 |          | primary.
 37 | 
 38 | primary ~: '(' FNORD expr ')'_
 39 |          | id                   :Var.
 40 | 
 41 | id      ~: /([A-Za-z_]\w*)/_.
 42 | 
 43 | _       ~: /[ \t]*/.    # Spaces within a line.
 44 | FNORD   ~: /\s+|#.*/*.  # Spaces/comments that may span lines.
 45 | """
 46 | 
 47 | parse = Grammar(g)(
 48 |     Equiv   = dd.Equiv,
 49 |     Implies = dd.Implies,
 50 |     And     = operator.and_,
 51 |     Or      = operator.or_,
 52 |     Not     = operator.inv,
 53 |     Var     = mk_var
 54 | ).expecting_one_result()
 55 | 
 56 | def solve(condition):
 57 |     if dd.is_valid(condition):
 58 |         print("Valid.")
 59 |     else:
 60 |         show(dd.satisfy(condition, 1))
 61 | 
 62 | def show(opt_env):
 63 |     if opt_env is None:
 64 |         print("Unsatisfiable.")
 65 |     else:
 66 |         for k, v in sorted(opt_env.items()):
 67 |             if k is not None:
 68 |                 print("%s%s" % ("" if v else "~", var_names[k]))
 69 | 
 70 | ## solve(parse(' hey (there | ~there), ~hey | ~there'))
 71 | #. hey
 72 | #. ~there
 73 | ## solve(parse(' hey (there, ~there)'))
 74 | #. Unsatisfiable.
 75 | ## solve(parse('a=>b = ~b=>~a'))
 76 | #. Valid.
 77 | 
 78 | 
 79 | def main(filename, text):
 80 |     try:
 81 |         problem = parse(text)
 82 |     except Unparsable as e:
 83 |         syntax_error(e, filename)
 84 |         sys.exit(1)
 85 |     else:
 86 |         solve(problem)
 87 | 
 88 | # TODO: extract something I can stick in the library
 89 | #       let's try writing something similar for oberon0-with-lexer, to triangulate
 90 | 
 91 | def syntax_error(e, filename):
 92 |     line_no, prefix, suffix = where(e)
 93 |     prefix, suffix = sanitize(prefix), sanitize(suffix)
 94 |     sys.stderr.write("%s:%d:%d: Syntax error\n" % (filename, line_no, len(prefix)))
 95 |     sys.stderr.write('  ' + prefix + suffix + '\n')
 96 |     sys.stderr.write('  ' + ' '*len(prefix) + '^\n')
 97 | 
 98 | def where(e):
 99 |     before, after = e.failure
100 |     line_no = before.count('\n')
101 |     prefix = (before+'\n').splitlines()[line_no]
102 |     suffix = (after+'\n').splitlines()[0] # XXX what if right on newline?
103 |     return line_no+1, prefix, suffix
104 | 
105 | def sanitize(s):
106 |     "Make s predictably printable, sans control characters like tab."
107 |     return ''.join(c if ' ' <= c < chr(127) else ' ' # XXX crude
108 |                    for c in s)
109 | 
110 | 
111 | if __name__ == '__main__':
112 |     import sys
113 |     main('stdin', sys.stdin.read())  # (try it on carroll or wise-pigs)
114 | 


--------------------------------------------------------------------------------
/eg_regex.py:
--------------------------------------------------------------------------------
 1 | """
 2 | Parse a regular expression and generate the strings it matches.
 3 | Generator from
 4 | http://www.udacity.com/wiki/CS212%20Unit%203%20Code?course=cs212#regex_generatorpy
 5 | http://forums.udacity.com/questions/5008809/unit3-18-startx-paramater-on-genseq-is-a-hack#cs212
 6 | and in embryo
 7 | https://github.com/darius/halp/blob/master/examples/learn-the-hell-out-of-regular-expressions/whats_a_regex_soln.py
 8 | """
 9 | 
10 | from parson import Grammar, join
11 | 
12 | def generate(regex, Ns):
13 |     "Return the strings matching regex whose length is in Ns."
14 |     return sorted(parser(regex)(Ns),
15 |                   key=lambda s: (len(s), s))
16 | 
17 | def literal(s):   return lambda Ns: set([s]) if len(s) in Ns else null
18 | def either(x, y): return lambda Ns: x(Ns) | y(Ns)
19 | def plus(x):      return chain(x, star(x))
20 | def star(x):      return lambda Ns: optional(chain(nonempty(x), star(x)))(Ns)
21 | def nonempty(x):  return lambda Ns: x(Ns - set([0]))
22 | def oneof(chars): return lambda Ns: set(chars) if 1 in Ns else null
23 | def chain(x, y):  return lambda Ns: genseq(x, y, Ns)
24 | def optional(x):  return either(empty(), x)
25 | def dot():        return oneof('?')  # (Could be more, for lots more output.)
26 | def empty():      return literal('')
27 | 
28 | null = frozenset([])
29 | 
30 | def genseq(x, y, Ns):
31 |     """Return the set of matches to xy whose total length is in Ns. We
32 |     ask y only for lengths that are remainders after an x-match in
33 |     0..max(Ns). (And we call neither x nor y if there are no Ns.)"""
34 |     if not Ns:
35 |         return null
36 |     xmatches = x(set(range(max(Ns)+1)))
37 |     Ns_x = set(len(m) for m in xmatches)
38 |     Ns_y = set(n-m for n in Ns for m in Ns_x if n-m >= 0)
39 |     ymatches = y(Ns_y)
40 |     return set(m1+m2 for m1 in xmatches for m2 in ymatches if len(m1+m2) in Ns)
41 | 
42 | grammar = Grammar(r"""  exp :end.
43 | exp      :  term ('|' exp     :either)*
44 |          |                    :empty.
45 | term     :  factor (term      :chain)*.
46 | factor   :  primary (  '*'    :star
47 |                      | '+'    :plus
48 |                      | '?'    :optional
49 |                     )?.
50 | primary  :  '(' exp ')'
51 |          |  '[' char* ']'     :join :oneof
52 |          |  '.'               :dot
53 |          |  /\\(.)/           :literal
54 |          |  /([^.()*+?|[\]])/ :literal.
55 | char     :  /\\(.)/
56 |          |  /([^\]])/.
57 | """)
58 | parser = grammar(**globals()).expecting_one_result()
59 | 
60 | ## generate('.+', range(5))
61 | #. ['?', '??', '???', '????']
62 | ## generate('a[xy]+z()*|c.hi', range(5))
63 | #. ['axz', 'ayz', 'axxz', 'axyz', 'ayxz', 'ayyz', 'c?hi']
64 | ## generate('(Chloe|Yvette), a( precocious)? (toddler|writer)', range(28))
65 | #. ['Chloe, a writer', 'Chloe, a toddler', 'Yvette, a writer', 'Yvette, a toddler', 'Chloe, a precocious writer', 'Chloe, a precocious toddler', 'Yvette, a precocious writer']
66 | 
67 | ## parser.attempt('{"hi"](')
68 | 


--------------------------------------------------------------------------------
/eg_roman.py:
--------------------------------------------------------------------------------
 1 | """
 2 | Convert from roman numeral to int.
 3 | """
 4 | 
 5 | from parson import Grammar
 6 | 
 7 | g = Grammar(r"""
 8 | numeral = digit+ :end.
 9 | digit = 'CM'  :'900' | 'CD' :'400' | 'XC'  :'90' | 'XL' :'40' | 'IX'  :'9' | 'IV' :'4'
10 |       |  'M' :'1000' |  'D' :'500' |  'C' :'100' |  'L' :'50' |  'X' :'10' |  'V' :'5' | 'I' :'1'.
11 | """)()
12 | 
13 | def int_from_roman(string):
14 |     return sum(map(int, g.numeral(string.strip())))
15 | 
16 | ## int_from_roman('MCMLXXIX')
17 | #. 1979
18 | 


--------------------------------------------------------------------------------
/eg_templite.py:
--------------------------------------------------------------------------------
  1 | """
  2 | A template language, similar to templite by Ned Batchelder.
  3 | https://github.com/aosabook/500lines/tree/master/template-engine
  4 | (Still missing a few features.)
  5 | """
  6 | 
  7 | from parson import Grammar
  8 | from structs import Struct, Visitor
  9 | 
 10 | grammar = r"""  block :end.
 11 | 
 12 | block:     chunk* :hug :Block.
 13 | 
 14 | chunk:     '{#' (!'#}' /./)* '#}'
 15 |         |  '{{'_ expr '}}'                                                  :Expr
 16 |         |  '{%'_ 'if'_ expr '%}'                block '{%'_ 'endif'_  '%}'  :If
 17 |         |  '{%'_ 'for'_ ident _ 'in'_ expr '%}' block '{%'_ 'endfor'_ '%}'  :For
 18 |         |  (!/{[#{%]/ /(.)/)+ :join                                         :Literal.
 19 | 
 20 | expr:      access ('|' function :Call)* _ .
 21 | access:    ident :VarRef ('.' ident :Access)*.
 22 | function:  ident.
 23 | 
 24 | ident:     /([A-Za-z_][A-Za-z_0-9]*)/.
 25 | 
 26 | _:         /\s*/.
 27 | """
 28 | 
 29 | class Block(Struct('chunks')): pass
 30 | class Literal(Struct('string')): pass
 31 | class If(Struct('expr block')): pass
 32 | class For(Struct('variable expr block')): pass
 33 | class Expr(Struct('expr')): pass
 34 | class VarRef(Struct('variable')): pass
 35 | class Access(Struct('base attribute')): pass
 36 | class Call(Struct('operand function')): pass
 37 | 
 38 | parse = Grammar(grammar)(**globals()).expecting_one_result()
 39 | 
 40 | def compile_template(text):
 41 |     code = gen(parse(text))
 42 |     env = {}
 43 |     exec(code, env)
 44 |     return env['_expand']
 45 | 
 46 | def gen(template):
 47 |     py = """\
 48 | def _expand(_context):
 49 |     _acc = []
 50 |     _append = _acc.append
 51 |     %s
 52 |     %s
 53 |     return ''.join(_acc)"""
 54 |     decls = '\n'.join('v_%s = _context[%r]' % (name, name)
 55 |                       for name in free_vars(template))
 56 |     return py % (indent(decls), indent(gen_visitor(template)))
 57 | 
 58 | class Gen(Visitor):
 59 |     def Block(self, t):   return '\n'.join(map(self, t.chunks))
 60 |     def Literal(self, t): return '_append(%r)' % t.string
 61 |     def If(self, t):      return ('if %s:\n    %s'
 62 |                                   % (self(t.expr),
 63 |                                      indent(self(t.block))))
 64 |     def For(self, t):     return ('for v_%s in %s:\n    %s'
 65 |                                   % (t.variable, self(t.expr),
 66 |                                      indent(self(t.block))))
 67 |     def Expr(self, t):    return '_append(str(%s))' % self(t.expr)
 68 |     def VarRef(self, t):  return 'v_%s' % t.variable
 69 |     def Access(self, t):  return '%s.%s' % (self(t.base), t.attribute)
 70 |     def Call(self, t):    return '%s(%s)' % (t.function, self(t.operand))
 71 | 
 72 | gen_visitor = Gen()
 73 | 
 74 | class FreeVars(Visitor):
 75 |     def Block(self, t):   return set().union(*map(self, t.chunks))
 76 |     def Literal(self, t): return set()
 77 |     def If(self, t):      return self(t.expr) | self(t.block)
 78 |     def For(self, t):     return ((self(t.expr) | self(t.block))
 79 |                                   - set([t.variable]))
 80 |     def Expr(self, t):    return self(t.expr)
 81 |     def VarRef(self, t):  return set([t.variable])
 82 |     def Access(self, t):  return self(t.base)
 83 |     def Call(self, t):    return self(t.operand)
 84 | 
 85 | free_vars = FreeVars()
 86 | 
 87 | def indent(s):
 88 |     return s.replace('\n', '\n    ')
 89 | 
 90 | ## parse('hello {{world}} yay')
 91 | #. Block((Literal('hello '), Expr(VarRef('world')), Literal(' yay')))
 92 | 
 93 | ## print gen(parse('hello {{world}} yay'))
 94 | #. def _expand(_context):
 95 | #.     _acc = []
 96 | #.     _append = _acc.append
 97 | #.     v_world = _context['world']
 98 | #.     _append('hello ')
 99 | #.     _append(str(v_world))
100 | #.     _append(' yay')
101 | #.     return ''.join(_acc)
102 | 
103 | ## f = compile_template('hello {{world}} yay'); print f(dict(world="globe"))
104 | #. hello globe yay
105 | 
106 | ## print gen(parse('{% if foo.bar %} {% for x in xs|ok %} {{x}} {% endfor %} yay {% endif %}'))
107 | #. def _expand(_context):
108 | #.     _acc = []
109 | #.     _append = _acc.append
110 | #.     v_xs = _context['xs']
111 | #.     v_foo = _context['foo']
112 | #.     if v_foo.bar:
113 | #.         _append(' ')
114 | #.         for v_x in ok(v_xs):
115 | #.             _append(' ')
116 | #.             _append(str(v_x))
117 | #.             _append(' ')
118 | #.         _append(' yay ')
119 | #.     return ''.join(_acc)
120 | 
121 | ## f = compile_template('hello {%for x in xs%} whee{{x}} {% endfor %} yay'); print f(dict(xs='abc'))
122 | #. hello  wheea  wheeb  wheec  yay
123 | 
124 | ## f = compile_template(' {%if x%} whee{{x}} {% endif %} yay {%if y%} ok{{y}} {% endif %}'); print f(dict(x='', y='42'))
125 | #.   yay  ok42 
126 | 


--------------------------------------------------------------------------------
/eg_trees.py:
--------------------------------------------------------------------------------
 1 | """
 2 | Testing out tree parsing.
 3 | """
 4 | 
 5 | from operator import add, sub
 6 | from parson import anyone, capture, delay, nest, one_of, one_that
 7 | 
 8 | end = ~anyone
 9 | 
10 | def match(p, x): return (p + end)([x])
11 | 
12 | def an_instance(type_): return one_that(lambda x: isinstance(x, type_))
13 | 
14 | def capture1(p): return capture(p) >> (lambda x: x[0]) # Ouch
15 | var = capture1(anyone)
16 | ## (var + var)(eg)
17 | #. ('+', 2)
18 | 
19 | calc = delay(lambda:
20 |                nest(one_of('+') + calc + calc + end) >> add
21 |              | nest(one_of('-') + calc + calc + end) >> sub
22 |              | capture1(an_instance(int)))
23 | 
24 | eg = ['+', 2, 3]
25 | ## match(calc, eg)
26 | #. (5,)
27 | 
28 | eg2 = ['+', ['-', 2, 4], 3]
29 | ## match(calc, eg2)
30 | #. (1,)
31 | 
32 | 
33 | # Exercise: transforming trees with generic walks
34 | 
35 | flatten1 = delay(lambda:
36 |                    nest(one_of('+') + flatten1.star() + end)
37 |                  | capture1(an_instance(int)))
38 | 
39 | ## match(flatten1, ['+', ['+', ['+', 1, ['+', 2]]]])
40 | #. (1, 2)
41 | 
42 | # Figure 2.7 in the OMeta thesis, more or less:
43 | 
44 | def walk(p, q=capture1(an_instance(int))):
45 |     return (  nest(one_of('+') + p.star() + end) >> tag('+')
46 |             | nest(one_of('-') + p.star() + end) >> tag('-')
47 |             | q)
48 | 
49 | def tag(constant):
50 |     return lambda *args: (constant,) + args
51 | 
52 | flatten2 = delay(lambda:
53 |                    nest(one_of('+') + flatten2 + end)
54 |                  | nest(one_of('+') + inside.star() + end) >> tag('+')
55 |                  | walk(flatten2))
56 | inside   = delay(lambda:
57 |                    nest(one_of('+') + inside.star() + end)
58 |                  | flatten2)
59 | 
60 | ## match(flatten2, ['+', ['+', ['+', 1, ['+', 2], ['+', 3, 4]]]])
61 | #. (('+', 1, 2, 3, 4),)
62 | 


--------------------------------------------------------------------------------
/eg_url.py:
--------------------------------------------------------------------------------
 1 | """
 2 | Based on https://www.w3.org/Addressing/URL/5_BNF.html
 3 | because I'm a lazy bastard; it's clearly not up to date
 4 | (as shown by 'right=wrong' below).
 5 | """
 6 | 
 7 | from parson import Grammar
 8 | 
 9 | grammar = r"""  url :end.
10 | 
11 | url           : httpaddress | mailtoaddress.
12 | 
13 | mailtoaddress : {'mailto'} ':'   :'protocol'
14 |                 {(!'@' xalpha)+} :'user'
15 |                 '@' {hostname}   :'host'.
16 | 
17 | httpaddress   : {'http'} '://' :'protocol' hostport ('/' path)? ('?' search)? ('#' fragment)?.
18 | 
19 | hostport      : host (':' port)?.
20 | 
21 | host          : {hostname | hostnumber} :'host'.
22 | hostname      : ialpha ++ '.'.
23 | hostnumber    : digits '.' digits '.' digits '.' digits.
24 | 
25 | port          : {digits} :'port'.
26 | 
27 | path          : {(segment '/')* segment?} :'path'.
28 | segment       : xpalpha+.
29 | 
30 | search        : {(xalpha+) ++ '+'}        :'search'.
31 | fragment      : {xalpha+}                 :'fragment'.
32 | 
33 | xalpha        : alpha | digit | safe | extra | escape.
34 | xpalpha       : xalpha | '+'.
35 | 
36 | ialpha        : alpha xalpha*.
37 | 
38 | alpha         : /[a-zA-Z]/.
39 | digit         : /\d/.
40 | digits        : /\d+/.
41 | safe          : /[$_@.&+-]/.
42 | extra         : /[!*"'(),]/.
43 | escape        : '%' hex hex.
44 | hex           : /[\dA-Fa-f]/.
45 | """
46 | g = Grammar(grammar)()
47 | 
48 | ## g.attempt('true')
49 | ## g('mailto:coyote@acme.com')
50 | #. ('mailto', 'protocol', 'coyote', 'user', 'acme.com', 'host')
51 | ## g('http://google.com')
52 | #. ('http', 'protocol', 'google.com', 'host')
53 | ## g.attempt('http://google.com//')
54 | ## g('http://en.wikipedia.org/wiki/Uniform_resource_locator')
55 | #. ('http', 'protocol', 'en.wikipedia.org', 'host', 'wiki/Uniform_resource_locator', 'path')
56 | ## g.attempt('http://wry.me/fun/toys/yes.html?right=wrong#fraggle')
57 | ## g(        'http://wry.me/fun/toys/yes.html?rightwrong#fraggle')
58 | #. ('http', 'protocol', 'wry.me', 'host', 'fun/toys/yes.html', 'path', 'rightwrong', 'search', 'fraggle', 'fragment')
59 | 


--------------------------------------------------------------------------------
/eg_wc.py:
--------------------------------------------------------------------------------
 1 | """
 2 | Word-count function.
 3 | """
 4 | 
 5 | from parson import match, feed
 6 | 
 7 | blanks = match(r'\s*')
 8 | marks  = match(r'\S+')
 9 | zero   = feed(lambda:   0)
10 | add1   = feed(lambda n: n+1)
11 | 
12 | wc     = zero + blanks + (add1 + marks + blanks).star()
13 | 
14 | ## wc('  ')
15 | #. (0,)
16 | ## wc('a b c ')
17 | #. (3,)
18 | ## wc(example_input)
19 | #. (10,)
20 | 
21 | example_input = r"""  hi  there   hey
22 | how are you?
23 |   fine.
24 | 
25 | thanks.
26 | 
27 | ok then."""
28 | 


--------------------------------------------------------------------------------
/microses.py:
--------------------------------------------------------------------------------
  1 | """
  2 | Abstract syntax for MicroSES.
  3 | """
  4 | 
  5 | from structs import Struct as S
  6 | 
  7 | class Data(S('string')):
  8 |     pass
  9 | 
 10 | class Array(S('args')):
 11 |     pass
 12 | 
 13 | class Object(S('props')):
 14 |     pass
 15 | 
 16 | class Variable(S('name')):
 17 |     pass
 18 | 
 19 | class ExprHole(S('')):
 20 |     pass
 21 | 
 22 | class MatchData(S('string')):
 23 |     pass
 24 | 
 25 | class MatchArray(S('params')):
 26 |     pass
 27 | 
 28 | class MatchObject(S('prop_params')):
 29 |     pass
 30 | 
 31 | class MatchVariable(S('name')):
 32 |     pass
 33 | 
 34 | class PatternHole(S('')):
 35 |     pass
 36 | 
 37 | class Spread(S('expr')):
 38 |     pass
 39 | 
 40 | class Rest(S('pattern')):
 41 |     pass
 42 | 
 43 | class Optional(S('name expr')):
 44 |     pass
 45 | 
 46 | class SpreadObj(S('expr')):
 47 |     pass
 48 | 
 49 | class Prop(S('key expr_opt')):
 50 |     pass
 51 | 
 52 | class RestObj(S('pattern')):
 53 |     pass
 54 | 
 55 | class MatchProp(S('key pattern')):
 56 |     pass
 57 | 
 58 | class OptionalProp(S('name expr')):
 59 |     pass
 60 | 
 61 | class Computed(S('expr')):
 62 |     pass
 63 | 
 64 | class Quasi(S('string')):
 65 |     pass
 66 | 
 67 | def QUnpack():
 68 |     pass
 69 | 
 70 | class Get(S('primary name')):
 71 |     pass
 72 | 
 73 | class Index(S('primary expr')):
 74 |     pass
 75 | 
 76 | class Call(S('primary args')):
 77 |     pass
 78 | 
 79 | class Tag(S('primary quasi')):
 80 |     pass
 81 | 
 82 | class GetLater(S('primary name')):
 83 |     pass
 84 | 
 85 | class IndexLater(S('primary expr')):
 86 |     pass
 87 | 
 88 | class CallLater(S('primary args')):
 89 |     pass
 90 | 
 91 | class TagLater(S('primary quasi')):
 92 |     pass
 93 | 
 94 | class Delete(S('field_expr')):
 95 |     pass
 96 | 
 97 | class UnaryOp(S('op expr')):
 98 |     pass
 99 | 
100 | class BinaryOp(S('expr1 op expr2')):
101 |     pass
102 | 
103 | class AndThen(S('expr1 expr2')):
104 |     pass
105 | 
106 | class OrElse(S('expr1 expr2')):
107 |     pass
108 | 
109 | class Assign(S('lvalue op expr')):
110 |     pass
111 | 
112 | class Arrow(S('params block')):
113 |     pass
114 | 
115 | class Lambda(S('params expr')):
116 |     pass
117 | 
118 | class If(S('test then else_opt')):
119 |     pass
120 | 
121 | class For(S('decls test_opt update_opt block')):
122 |     pass
123 | 
124 | class ForOf(S('decl_op binding expr block')):
125 |     pass
126 | 
127 | class Decl(S('decl_op bindings')):
128 |     pass
129 | 
130 | class While(S('expr block')):
131 |     pass
132 | 
133 | class Try(S('block catcher_opt finalizer_opt')):
134 |     pass
135 | 
136 | class Switch(S('expr branches')):
137 |     pass
138 | 
139 | class Debugger(S('')):
140 |     pass
141 | 
142 | class Return(S('expr_opt')):
143 |     pass
144 | 
145 | class Break(S('')):
146 |     pass
147 | 
148 | class Throw(S('expr')):
149 |     pass
150 | 
151 | class Branch(S('labels body terminator')):
152 |     pass
153 | 
154 | class Case(S('expr')):
155 |     pass
156 | 
157 | class Default(S('')):
158 |     pass
159 | 
160 | class Block(S('body')):
161 |     pass
162 | 


--------------------------------------------------------------------------------
/peg.py:
--------------------------------------------------------------------------------
  1 | """
  2 | A PEG parser using explicit control instead of recursion.
  3 | Avoids Python stack overflow.
  4 | Also a step towards compiling instead of interpreting.
  5 | 
  6 | to do: test this: optimize q==y with Nip
  7 | to do: produce code instead of closures
  8 | to do: bounce less often
  9 | """
 10 | 
 11 | # peg constructors
 12 | 
 13 | Fail = 'fail', None
 14 | def Alter(fn):     return 'alter', fn
 15 | def Item(ok):      return 'item', ok
 16 | def Ref(name):     return 'ref', name
 17 | def Chain(q, r):   return 'chain', (q, r)
 18 | def Cond(q, n, y): return 'cond', (q, n, y)
 19 | 
 20 | 
 21 | # (program, peg, fail_cont, success_cont) -> cont
 22 | # where program: dict(string -> cont)
 23 | 
 24 | def translate(pr, peg, f, s):
 25 |     tag, arg = peg
 26 |     if   tag == 'fail':  return KDrop(f) 
 27 |     elif tag == 'alter': return KAlter(arg, s)
 28 |     elif tag == 'item':  return KItem(arg, f, s)
 29 |     elif tag == 'ref':   return KCall(pr, arg, f, s)
 30 |     elif tag == 'chain': return translate(pr, arg[0], f,
 31 |                                           translate(pr, arg[1], f, s))
 32 |     elif tag == 'cond':
 33 |         q, n, y = arg
 34 |         if y == q:
 35 |             yy = KNip(s)
 36 |         elif y[0] == 'chain' and y[1][0] == q:
 37 |             yy = KNip(translate(pr, y[1][1], f, s))
 38 |         else:
 39 |             yy = KDrop(translate(pr, y, f, s))
 40 |         return KDup(translate(pr, q, translate(pr, n, f, s), yy))
 41 |     else:
 42 |         assert False
 43 | 
 44 | def run(q, defns, vs, cs):
 45 |     pr = {}
 46 |     for name, defn in defns.items():
 47 |         pr[name] = translate(pr, defn, KFail, KSucceed)
 48 |     return trampoline(translate(pr, q, KFinalFail, KFinalSucceed),
 49 |                       ((vs,cs,()), ()))
 50 | 
 51 | 
 52 | # The parsing machine.
 53 | # continuation: trail -> result
 54 | # trail: () | ((vs,cs,ks), trail)
 55 | # vs: tuple of values from semantic actions
 56 | # cs: input character sequence (the current tail thereof)
 57 | # ks: () | ((fail_cont,success_cont), ks)
 58 | 
 59 | def trampoline(cont, trail):
 60 |     while cont is not None:
 61 |         cont, trail = cont(trail)
 62 |     return trail
 63 | 
 64 | def KFinalFail(trail):
 65 |     assert trail is ()
 66 |     return None, 'fail'
 67 | def KFinalSucceed(((vs,cs,ks), trail)):
 68 |     assert trail is ()
 69 |     assert ks is ()
 70 |     return None, (vs, cs)
 71 | 
 72 | def KDrop(cont): return lambda (_, trail): (cont, trail)
 73 | def KNip(cont): return lambda (entry, (_, trail)): (cont, (entry, trail))
 74 | def KDup(cont): return lambda (entry, trail): (cont, (entry, (entry, trail)))
 75 | 
 76 | def KAlter(fn, s):
 77 |     return lambda ((vs,cs,ks), trail): (s, ((fn(*vs),cs,ks), trail))
 78 | 
 79 | def KItem(ok, f, s):
 80 |     return lambda ((vs,cs,ks), trail): (
 81 |         (s, ((vs+(cs[0],), cs[1:], ks), trail)) if cs and ok(cs[0])
 82 |         else (f, trail))
 83 | 
 84 | def KCall(pr, name, f, s):
 85 |     return lambda ((vs,cs,ks), trail): (
 86 |         pr[name], ((vs,cs,((f,s),ks)), trail))
 87 | def KFail(((vs,cs,((fk,_),ks)), trail)):
 88 |     return fk, trail
 89 | def KSucceed(((vs,cs,((_,sk),ks)), trail)):
 90 |     return sk, ((vs,cs,ks), trail)
 91 | 
 92 | 
 93 | # Smoke test
 94 | 
 95 | def Lit(c): return Item(lambda c1: c == c1)
 96 | def Or(q, r): return Cond(q, r, q)
 97 | # XXX is this really equivalent to a 'native' Or. As a tail call, too.
 98 | Succeed = Alter(lambda *vals: vals)
 99 | 
100 | bit = Or(Lit('0'), Lit('1'))
101 | twobits = Chain(bit, bit)
102 | 
103 | nbits_defs = {'nbits': Cond(bit, Succeed, Chain(bit, Ref('nbits')))}
104 | 
105 | def test(string):
106 | #    return run(Lit('0'), (), string)
107 | #    return run(bit, (), string)
108 | #    return run(twobits, {}, (), string)
109 |     return run(Ref('nbits'), nbits_defs, (), string)
110 | 
111 | ## test('xy')
112 | #. ((), 'xy')
113 | ## test('01101a')
114 | #. (('0', '1', '1', '0', '1'), 'a')
115 | 


--------------------------------------------------------------------------------
/peglet_to_parson.py:
--------------------------------------------------------------------------------
 1 | """
 2 | Convert a Peglet grammar to a Parson one.
 3 | """
 4 | 
 5 | import re
 6 | from parson import Grammar, alter
 7 | 
 8 | name = r'[A-Za-z_]\w*'
 9 | 
10 | grammar = Grammar(r"""
11 | grammar  :  _? rule* :end.
12 | rule     :  name _ '= ' :equ token* :'.' _?.
13 | token    :  '|'                     :'|'
14 |          |  /(\/\w*\/\s)/
15 |          |  name !(_ '= ')
16 |          |  '!'                     :'!'
17 |          |  _ !(name _ '= ' | :end)
18 |          |  !('= '|name) /(\S+)/    :mk_regex.
19 | name     :  /("""+name+""")/ !!(/\s/ | :end).
20 | _        :  /(\s+)/.
21 | """)
22 | def mk_regex(s): return '/' + s.replace('/', '\\/') + '/'
23 | 
24 | def peglet_to_parson(text):
25 |     nonterminals = set()
26 |     def equ(name, space):
27 |         nonterminals.add(name)
28 |         return name, space, ': '
29 |     g = grammar(equ=alter(equ), mk_regex=mk_regex)
30 |     tokens = g.grammar(text)
31 |     return ''.join(':'+token if re.match(name+'$', token) and token not in nonterminals
32 |                    else token
33 |                    for token in tokens)
34 | 
35 | if __name__ == '__main__':
36 |     import sys
37 |     print peglet_to_parson(sys.stdin.read())
38 | 


--------------------------------------------------------------------------------
/pegvm.py:
--------------------------------------------------------------------------------
  1 | """
  2 | A PEG parser using explicit control instead of recursion.
  3 | Avoids Python stack overflow.
  4 | Also a step towards compiling instead of interpreting.
  5 | 
  6 | to do: test this: optimize q==y with Nip
  7 | to do: produce code instead of closures
  8 | to do: bounce less often
  9 | 
 10 | to do: compare:
 11 |   (Knuth 1971), Donald Knuth describes an abstract parsing machine. This
 12 |   machine runs programs in which the instructions either recognize and
 13 |   consume a token from the input, or call a subroutine to recognize an
 14 |   instance of a non-terminal. Each instruction has two continuations for
 15 |   success and failure, and part of the subroutine mechanism is that if a
 16 |   subroutine returns with failure, then the input pointer is reset to
 17 |   where it was when the subroutine was called. A subroutine that
 18 |   returns successfully, however, deletes the record of the old position
 19 |   of the input pointer. There is a natural translation of context-free
 20 |   grammars into programs for this machine, which behave exactly like
 21 |   combinator parsers based on Maybe.
 22 | """
 23 | 
 24 | dbg = 1
 25 | 
 26 | # peg constructors
 27 | 
 28 | Fail = 'fail', ()
 29 | def Alter(fn):     return 'alter', fn
 30 | def Item(ok):      return 'item', ok
 31 | def Ref(name):     return 'ref', name
 32 | def Chain(q, r):   return 'chain', (q, r)
 33 | def Cond(q, n, y): return 'cond', (q, n, y)
 34 | 
 35 | 
 36 | # (install, program, peg, fail_cont, success_cont) -> cont
 37 | # where program: dict(string -> cont)
 38 | #       code: [('operator', (operands,))]
 39 | #       cont: int -- index into code
 40 | def translate(install, pr, peg, f, s):
 41 |     tag, arg = peg
 42 |     if   tag == 'fail':  return install(KDrop, f) 
 43 |     elif tag == 'alter': return install(KAlter, arg, s)
 44 |     elif tag == 'item':  return install(KItem, arg, f, s)
 45 |     elif tag == 'ref':   return install(KCall, pr, arg, f, s) # TODO: how about a jump op for when f is KFail, s is KSucceed?
 46 |     elif tag == 'chain': return translate(install, pr, arg[0], f,
 47 |                                           translate(install, pr, arg[1], f, s))
 48 |     elif tag == 'cond':
 49 |         q, n, y = arg
 50 |         if y == q:
 51 |             yy = install(KNip, s)
 52 |         elif y[0] == 'chain' and y[1][0] == q:
 53 |             yy = install(KNip, translate(install, pr, y[1][1], f, s))
 54 |         else:
 55 |             yy = install(KDrop, translate(install, pr, y, f, s))
 56 |         return install(KDup, translate(install, pr, q, translate(install, pr, n, f, s), yy))
 57 |     else:
 58 |         assert False
 59 | 
 60 | def run(q, defns, vs, cs):
 61 |     code = []
 62 |     def install(*insn):
 63 |         try: return code.index(insn)
 64 |         except ValueError:
 65 |             try: return len(code)
 66 |             finally: code.append(insn)
 67 |     pr = {}
 68 |     entry = translate(install, pr, q, install(KFinalFail), install(KFinalSucceed))
 69 |     for name, defn in defns.items():
 70 |         pr[name] = translate(install, pr, defn, install(KFail), install(KSucceed))
 71 |     if dbg:
 72 |         for name in sorted(pr.keys()):
 73 |             print name, '=>', pr[name]
 74 |         for i, insn in enumerate(code):
 75 |             print i, show(insn)
 76 |     return trampoline(entry, code,
 77 |                       ((vs,cs,()), ()))
 78 | 
 79 | def show(insn):
 80 |     return '%s(%s)' % (insn[0].__name__,
 81 |                        ', '.join(x.__name__ if callable(x) else repr(x)
 82 |                                  for x in insn[1:]))
 83 | 
 84 | # The parsing machine.
 85 | # continuation: trail -> result
 86 | # trail: () | ((vs,cs,ks), trail)
 87 | # vs: tuple of values from semantic actions
 88 | # cs: input character sequence (the current tail thereof)
 89 | # ks: () | ((fail_cont,success_cont), ks)
 90 | 
 91 | def trampoline(pc, code, trail):
 92 |     while pc is not None:
 93 |         insn = code[pc]
 94 |         if dbg: print 'pc', pc, 'insn', show(insn)
 95 |         pc, trail = insn[0](*insn[1:])(trail)
 96 |         assert pc is None or isinstance(pc, int), pc
 97 |     return trail
 98 | 
 99 | def KFinalFail():
100 |     def k(trail):
101 |         assert trail is ()
102 |         return None, 'fail'
103 |     return k
104 | def KFinalSucceed():
105 |     def k(((vs,cs,ks), trail)):
106 |         assert trail is ()
107 |         assert ks is ()
108 |         return None, (vs, cs)
109 |     return k
110 | 
111 | def KDrop(cont): return lambda (_, trail): (cont, trail)
112 | def KNip(cont): return lambda (entry, (_, trail)): (cont, (entry, trail))
113 | def KDup(cont): return lambda (entry, trail): (cont, (entry, (entry, trail)))
114 | 
115 | def KAlter(fn, s):
116 |     return lambda ((vs,cs,ks), trail): (s, ((fn(*vs),cs,ks), trail))
117 | 
118 | def KItem(ok, f, s):
119 |     return lambda ((vs,cs,ks), trail): (
120 |         (s, ((vs+(cs[0],), cs[1:], ks), trail)) if cs and ok(cs[0])
121 |         else (f, trail))
122 | 
123 | def KCall(pr, name, f, s):
124 |     return lambda ((vs,cs,ks), trail): (
125 |         pr[name], ((vs,cs,((f,s),ks)), trail))
126 | def KFail():
127 |     return lambda ((vs,cs,((fk,_),ks)), trail): (fk, trail)
128 | def KSucceed():
129 |     return lambda ((vs,cs,((_,sk),ks)), trail): (sk, ((vs,cs,ks), trail))
130 | 
131 | 
132 | # Smoke test
133 | 
134 | def Lit(c):
135 |     def t(c1): return c == c1
136 |     t.__name__ = 'eq %r' % c
137 |     return Item(t)
138 | def Or(q, r): return Cond(q, r, q)
139 | def identity(*vals): return vals
140 | # XXX is this really equivalent to a 'native' Or. As a tail call, too.
141 | Succeed = Alter(identity)
142 | 
143 | bit = Or(Lit('0'), Lit('1'))
144 | twobits = Chain(bit, bit)
145 | 
146 | nbits_defs = {'nbits': Cond(bit, Succeed, Chain(bit, Ref('nbits')))}
147 | 
148 | def test(string):
149 | #    return run(Lit('0'), (), string)
150 | #    return run(bit, (), string)
151 | #    return run(twobits, {}, (), string)
152 |     return run(Ref('nbits'), nbits_defs, (), string)
153 | 
154 | ## test('xy')
155 | #. nbits => 12
156 | #. 0 KFinalFail()
157 | #. 1 KFinalSucceed()
158 | #. 2 KCall({'nbits': 12}, 'nbits', 0, 1)
159 | #. 3 KFail()
160 | #. 4 KSucceed()
161 | #. 5 KCall({'nbits': 12}, 'nbits', 3, 4)
162 | #. 6 KNip(5)
163 | #. 7 KAlter(identity, 4)
164 | #. 8 KNip(6)
165 | #. 9 KItem(eq '1', 7, 6)
166 | #. 10 KItem(eq '0', 9, 8)
167 | #. 11 KDup(10)
168 | #. 12 KDup(11)
169 | #. pc 2 insn KCall({'nbits': 12}, 'nbits', 0, 1)
170 | #. pc 12 insn KDup(11)
171 | #. pc 11 insn KDup(10)
172 | #. pc 10 insn KItem(eq '0', 9, 8)
173 | #. pc 9 insn KItem(eq '1', 7, 6)
174 | #. pc 7 insn KAlter(identity, 4)
175 | #. pc 4 insn KSucceed()
176 | #. pc 1 insn KFinalSucceed()
177 | #. ((), 'xy')
178 | ## test('01101a')
179 | #. nbits => 12
180 | #. 0 KFinalFail()
181 | #. 1 KFinalSucceed()
182 | #. 2 KCall({'nbits': 12}, 'nbits', 0, 1)
183 | #. 3 KFail()
184 | #. 4 KSucceed()
185 | #. 5 KCall({'nbits': 12}, 'nbits', 3, 4)
186 | #. 6 KNip(5)
187 | #. 7 KAlter(identity, 4)
188 | #. 8 KNip(6)
189 | #. 9 KItem(eq '1', 7, 6)
190 | #. 10 KItem(eq '0', 9, 8)
191 | #. 11 KDup(10)
192 | #. 12 KDup(11)
193 | #. pc 2 insn KCall({'nbits': 12}, 'nbits', 0, 1)
194 | #. pc 12 insn KDup(11)
195 | #. pc 11 insn KDup(10)
196 | #. pc 10 insn KItem(eq '0', 9, 8)
197 | #. pc 8 insn KNip(6)
198 | #. pc 6 insn KNip(5)
199 | #. pc 5 insn KCall({'nbits': 12}, 'nbits', 3, 4)
200 | #. pc 12 insn KDup(11)
201 | #. pc 11 insn KDup(10)
202 | #. pc 10 insn KItem(eq '0', 9, 8)
203 | #. pc 9 insn KItem(eq '1', 7, 6)
204 | #. pc 6 insn KNip(5)
205 | #. pc 5 insn KCall({'nbits': 12}, 'nbits', 3, 4)
206 | #. pc 12 insn KDup(11)
207 | #. pc 11 insn KDup(10)
208 | #. pc 10 insn KItem(eq '0', 9, 8)
209 | #. pc 9 insn KItem(eq '1', 7, 6)
210 | #. pc 6 insn KNip(5)
211 | #. pc 5 insn KCall({'nbits': 12}, 'nbits', 3, 4)
212 | #. pc 12 insn KDup(11)
213 | #. pc 11 insn KDup(10)
214 | #. pc 10 insn KItem(eq '0', 9, 8)
215 | #. pc 8 insn KNip(6)
216 | #. pc 6 insn KNip(5)
217 | #. pc 5 insn KCall({'nbits': 12}, 'nbits', 3, 4)
218 | #. pc 12 insn KDup(11)
219 | #. pc 11 insn KDup(10)
220 | #. pc 10 insn KItem(eq '0', 9, 8)
221 | #. pc 9 insn KItem(eq '1', 7, 6)
222 | #. pc 6 insn KNip(5)
223 | #. pc 5 insn KCall({'nbits': 12}, 'nbits', 3, 4)
224 | #. pc 12 insn KDup(11)
225 | #. pc 11 insn KDup(10)
226 | #. pc 10 insn KItem(eq '0', 9, 8)
227 | #. pc 9 insn KItem(eq '1', 7, 6)
228 | #. pc 7 insn KAlter(identity, 4)
229 | #. pc 4 insn KSucceed()
230 | #. pc 4 insn KSucceed()
231 | #. pc 4 insn KSucceed()
232 | #. pc 4 insn KSucceed()
233 | #. pc 4 insn KSucceed()
234 | #. pc 4 insn KSucceed()
235 | #. pc 1 insn KFinalSucceed()
236 | #. (('0', '1', '1', '0', '1'), 'a')
237 | 


--------------------------------------------------------------------------------
/setup.py:
--------------------------------------------------------------------------------
 1 | from distutils.core import setup
 2 | 
 3 | version = '0.1.0dev'
 4 | 
 5 | setup(name = 'Parson',
 6 |       version = version,
 7 |       author = 'Darius Bacon',
 8 |       author_email = 'darius@wry.me',
 9 |       py_modules = ['parson'],
10 |       url = 'https://github.com/darius/parson',
11 |       description = "A fancier parsing package.", # XXX
12 |       long_description = open('README.md').read(),
13 |       license = 'GNU General Public License (GPL)',
14 |       classifiers = [
15 |         'Development Status :: 4 - Beta',
16 |         'Intended Audience :: Developers',
17 |         'Intended Audience :: Education',
18 |         'License :: OSI Approved :: GNU General Public License (GPL)',
19 |         'Natural Language :: English',
20 |         'Operating System :: OS Independent',
21 |         'Programming Language :: Python :: 2.6',
22 |         'Topic :: Software Development :: Interpreters',
23 |         'Topic :: Software Development :: Libraries :: Python Modules',
24 |         'Topic :: Text Processing',
25 |         ],
26 |       keywords = 'parse,parser,parsing,peg,packrat,regex,grammar',
27 |       )
28 | 


--------------------------------------------------------------------------------
/structs.py:
--------------------------------------------------------------------------------
 1 | """
 2 | Define a named-tuple-like type, but simpler.
 3 | Also Visitor to dispatch on datatypes defined this way.
 4 | This module is only used for examples, at least currently.
 5 | """
 6 | 
 7 | # TODO figure out how to use __slots__
 8 | 
 9 | def Struct(field_names, name=None, supertype=(object,)):
10 |     if isinstance(field_names, (str, unicode)):
11 |         field_names = tuple(field_names.split())
12 | 
13 |     if name is None:
14 |         name = 'Struct<%s>' % ','.join(field_names)
15 |         def get_name(self): return self.__class__.__name__
16 |     else:
17 |         def get_name(self): return name
18 | 
19 |     def __init__(self, *args):
20 |         if len(field_names) != len(args):
21 | 	    raise TypeError("%s takes %d arguments (%d given)"
22 |                             % (get_name(self), len(field_names), len(args)))
23 |         self.__dict__.update(zip(field_names, args))
24 | 
25 |     def __repr__(self):
26 |         return '%s(%s)' % (get_name(self), ', '.join(repr(getattr(self, f))
27 |                                                      for f in field_names))
28 | 
29 |     # (for use with pprint)
30 |     def my_as_sexpr(self):         # XXX better name?
31 |         return (get_name(self),) + tuple(as_sexpr(getattr(self, f))
32 |                                          for f in field_names)
33 |     my_as_sexpr.__name__ = 'as_sexpr'
34 | 
35 |     return type(name,
36 |                 supertype,
37 |                 dict(__init__=__init__,
38 |                      __repr__=__repr__,
39 |                      as_sexpr=my_as_sexpr))
40 | 
41 | def as_sexpr(obj):
42 |     if hasattr(obj, 'as_sexpr'):
43 |         return getattr(obj, 'as_sexpr')()
44 |     elif isinstance(obj, list):
45 |         return map(as_sexpr, obj)
46 |     elif isinstance(obj, tuple):
47 |         return tuple(map(as_sexpr, obj))
48 |     else:
49 |         return obj
50 | 
51 | 
52 | # Is there a nicer way to do this?
53 | 
54 | class Visitor(object):
55 |     def __call__(self, subject, *args):
56 |         tag = subject.__class__.__name__
57 |         method = getattr(self, tag, None)
58 |         if method is None:
59 |             method = getattr(self, 'default')
60 |         return method(subject, *args)
61 | 


--------------------------------------------------------------------------------
/testsmoke.py:
--------------------------------------------------------------------------------
  1 | """
  2 | Smoke test for parson
  3 | """
  4 | 
  5 | from parson import *
  6 | 
  7 | # Smoke test: combinators
  8 | 
  9 | ## empty
 10 | #. empty
 11 | ## fail.attempt('hello')
 12 | ## empty('hello')
 13 | #. ()
 14 | ## match(r'(x)').attempt('hello')
 15 | ## match(r'(h)')('hello')
 16 | #. ('h',)
 17 | 
 18 | ## (match(r'(H)') | match('(.)'))('hello')
 19 | #. ('h',)
 20 | ## (match(r'(h)') + match('(.)'))('hello')
 21 | #. ('h', 'e')
 22 | 
 23 | ## (match(r'h(e)') + match(r'(.)'))('hello')
 24 | #. ('e', 'l')
 25 | ## (~match(r'h(e)') + match(r'(.)'))('xhello')
 26 | #. ('x',)
 27 | 
 28 | ## empty.run('', [0], (0, ()))
 29 | #. [(0, ())]
 30 | ## chain(empty, empty)('')
 31 | #. ()
 32 | 
 33 | ## (match(r'(.)') >> hug)('hello')
 34 | #. (('h',),)
 35 | 
 36 | ## match(r'(.)').star()('')
 37 | #. ()
 38 | 
 39 | ## (match(r'(.)').star())('hello')
 40 | #. ('h', 'e', 'l', 'l', 'o')
 41 | 
 42 | ## (match(r'(.)').star() >> join)('hello')
 43 | #. ('hello',)
 44 | 
 45 | 
 46 | # Example
 47 | 
 48 | def make_var(v):         return v
 49 | def make_lam(v, e):      return '(lambda (%s) %s)' % (v, e)
 50 | def make_app(e1, e2):    return '(%s %s)' % (e1, e2)
 51 | def make_let(v, e1, e2): return '(let ((%s %s)) %s)' % (v, e1, e2)
 52 | 
 53 | eof        = match(r'$')
 54 | _          = match(r'\s*')
 55 | identifier = match(r'([A-Za-z_]\w*)\s*')
 56 | 
 57 | def test1():
 58 |     V     = identifier
 59 |     E     = delay(lambda: 
 60 |             V                        >> make_var
 61 |           | '\\' +_+ V + '.' +_+ E   >> make_lam
 62 |           | '(' +_+ E + E + ')' +_   >> make_app)
 63 |     start = _+ E #+ eof
 64 |     return lambda s: start(s)[0]
 65 | 
 66 | ## test1()('x y')
 67 | #. 'x'
 68 | ## test1()(r'\x.x')
 69 | #. '(lambda (x) x)'
 70 | ## test1()('(x   x)')
 71 | #. '(x x)'
 72 | 
 73 | 
 74 | def test2(string):
 75 |     V     = identifier
 76 |     F     = delay(lambda: 
 77 |             V                                     >> make_var
 78 |           | '\\' +_+ V.plus() + hug + '.' +_+ E   >> fold_lam
 79 |           | '(' +_+ E + ')' +_)
 80 |     E     = F + F.star()                          >> fold_app
 81 |     start = _+ E
 82 | 
 83 |     vals = start.attempt(string)
 84 |     return vals and vals[0]
 85 | 
 86 | def fold_app(f, *fs): return reduce(make_app, fs, f)
 87 | def fold_lam(vp, e): return foldr(make_lam, e, vp)
 88 | 
 89 | def foldr(f, z, xs):
 90 |     for x in reversed(xs):
 91 |         z = f(x, z)
 92 |     return z
 93 | 
 94 | ## test2('x')
 95 | #. 'x'
 96 | ## test2('\\x.x')
 97 | #. '(lambda (x) x)'
 98 | ## test2('(x x)')
 99 | #. '(x x)'
100 | 
101 | ## test2('hello')
102 | #. 'hello'
103 | ## test2(' x')
104 | #. 'x'
105 | ## test2('\\x . y  ')
106 | #. '(lambda (x) y)'
107 | ## test2('((hello world))')
108 | #. '(hello world)'
109 | 
110 | ## test2('  hello ')
111 | #. 'hello'
112 | ## test2('hello     there hi')
113 | #. '((hello there) hi)'
114 | ## test2('a b c d e')
115 | #. '((((a b) c) d) e)'
116 | 
117 | ## test2('')
118 | ## test2('x x . y')
119 | #. '(x x)'
120 | ## test2('\\.x')
121 | ## test2('(when (in the)')
122 | ## test2('((when (in the)))')
123 | #. '(when (in the))'
124 | 
125 | ## test2('\\a.a')
126 | #. '(lambda (a) a)'
127 | 
128 | ## test2('  \\hello . (hello)x \t')
129 | #. '(lambda (hello) (hello x))'
130 | 
131 | ## test2('\\M . (\\f . M (f f)) (\\f . M (f f))')
132 | #. '(lambda (M) ((lambda (f) (M (f f))) (lambda (f) (M (f f)))))'
133 | 
134 | ## test2('\\a b.a')
135 | #. '(lambda (a) (lambda (b) a))'
136 | 
137 | ## test2('\\a b c . a b')
138 | #. '(lambda (a) (lambda (b) (lambda (c) (a b))))'
139 | 
140 | 
141 | # Smoke test: grammars
142 | 
143 | ## exceptionally(lambda: Grammar(r"a = . b = a. a = .")())
144 | #. GrammarError('Multiply-defined rules: a',)
145 | 
146 | ## exceptionally(lambda: Grammar(r"a = b|c|d. c = .")())
147 | #. GrammarError('Undefined rules: b, d',)
148 | 
149 | ## exceptionally(lambda: Grammar(r"a = ")())
150 | #. GrammarError('Bad grammar', ('a = ', ''))
151 | 
152 | pushy = Grammar(r"""
153 | main: :'x'.
154 | """)()
155 | ## pushy.main('')
156 | #. ('x',)
157 | 
158 | nums = Grammar(r"""
159 | # This is a comment.
160 | main : nums !/./.  # So's this.
161 | nums : num ** ','.
162 | num  : /([0-9]+)/ :int.
163 | """)()
164 | sum_nums = lambda s: sum(nums.main(s))
165 | 
166 | ## sum_nums('10,30,43')
167 | #. 83
168 | ## nums.nums('10,30,43')
169 | #. (10, 30, 43)
170 | ## nums.nums('')
171 | #. ()
172 | ## nums.num('10,30,43')
173 | #. (10,)
174 | 
175 | ## nums.main('10,30,43')
176 | #. (10, 30, 43)
177 | ## nums.main.attempt('10,30,43 xxx')
178 | 
179 | 
180 | gsub_grammar = Grammar(r"""
181 | gsub = [:p :replace | /(.)/]*.
182 | """)
183 | def gsub(text, p, replacement):
184 |     g = gsub_grammar(p=p, replace=lambda: replacement)
185 |     return ''.join(g.gsub(text))
186 | ## gsub('hi there WHEEWHEE to you WHEEEE', 'WHEE', 'GLARG')
187 | #. 'hi there GLARGGLARG to you GLARGEE'
188 | 
189 | 
190 | def catch_position(parse, string):
191 |     try: parse(string)
192 |     except Unparsable, e:
193 |         print e.position
194 | 
195 | ## catch_position(Grammar(r" 'x'* /$/ ")(), 'xxxhi')
196 | #. 3
197 | 
198 | 
199 | # Like test2, but in the grammar syntax and using immediate actions
200 | # instead of folds:
201 | test3_grammar = Grammar(r"""
202 | start: FNORD E.
203 | E:     F (F :make_app)*.
204 | F:     V :make_var
205 |     |  '\\' Lam
206 |     |  '(' E ')'.
207 | Lam:   V ('.' E | Lam) :make_lam.
208 | V:     /([A-Za-z]+)/.
209 | FNORD ~: /\s*/.
210 | """)
211 | test3 = test3_grammar(**globals()).start.expecting_one_result()
212 | 
213 | # Same checks as test2:
214 | 
215 | ## test3('x')
216 | #. 'x'
217 | ## test3('\\x.x')
218 | #. '(lambda (x) x)'
219 | ## test3('(x x)')
220 | #. '(x x)'
221 | 
222 | ## test3('hello')
223 | #. 'hello'
224 | ## test3(' x')
225 | #. 'x'
226 | ## test3('\\x . y  ')
227 | #. '(lambda (x) y)'
228 | ## test3('((hello world))')
229 | #. '(hello world)'
230 | 
231 | ## test3('  hello ')
232 | #. 'hello'
233 | ## test3('hello     there hi')
234 | #. '((hello there) hi)'
235 | ## test3('a b c d e')
236 | #. '((((a b) c) d) e)'
237 | 
238 | ## test3.attempt('')
239 | ## test3('x x . y')
240 | #. '(x x)'
241 | ## test3.attempt('\\.x')
242 | ## test3.attempt('(when (in the)')
243 | ## test3('((when (in the)))')
244 | #. '(when (in the))'
245 | 
246 | ## test3('\\a.a')
247 | #. '(lambda (a) a)'
248 | 
249 | ## test3('  \\hello . (hello)x \t')
250 | #. '(lambda (hello) (hello x))'
251 | 
252 | ## test3('\\M . (\\f . M (f f)) (\\f . M (f f))')
253 | #. '(lambda (M) ((lambda (f) (M (f f))) (lambda (f) (M (f f)))))'
254 | 
255 | ## test3('\\a b.a')
256 | #. '(lambda (a) (lambda (b) a))'
257 | 
258 | ## test3('\\a b c . a b')
259 | #. '(lambda (a) (lambda (b) (lambda (c) (a b))))'
260 | 


--------------------------------------------------------------------------------
/treepeg.py:
--------------------------------------------------------------------------------
  1 | """
  2 | Exploring making tree parsing central.
  3 | """
  4 | 
  5 | import re
  6 | 
  7 | 
  8 | # Some derived combinators
  9 | 
 10 | def invert(p):    return cond(p, fail, succeed)
 11 | def either(p, q): return cond(p, p, q)
 12 | def both(p, q):   return cond(p, q, fail)
 13 | 
 14 | def feed(p, f): return alter(p, lambda *vals: (f(*vals),))
 15 | 
 16 | def maybe(p):   return either(p, succeed)
 17 | def plus(p):    return recur(lambda p_plus: chain(p, maybe(p_plus)))
 18 | def star(p):    return maybe(plus(p))
 19 | 
 20 | def recur(fn):
 21 |     p = delay(lambda: fn(p))
 22 |     return p
 23 | 
 24 | 
 25 | # Peg objects
 26 | 
 27 | def Peg(x):
 28 |     if isinstance(x, _Peg):           return x
 29 | #    if isinstance(x, (str, unicode)): return literal(x)
 30 |     if callable(x):                   return satisfying(x)
 31 |     raise ValueError("Not a Peg", x)
 32 | 
 33 | class _Peg(object):
 34 |     def __init__(self, run):
 35 |         self.run = run
 36 | 
 37 |     def __call__(self, sequence):
 38 |         for vals, _ in self.run(sequence):
 39 |             return vals
 40 |         return None
 41 | 
 42 |     def __add__(self, other):  return chain(self, Peg(other))
 43 |     def __radd__(self, other): return chain(Peg(other), self)
 44 |     def __or__(self, other):   return either(self, Peg(other))
 45 |     def __ror__(self, other):  return either(Peg(other), self)
 46 | 
 47 |     __rshift__ = feed
 48 |     __invert__ = invert
 49 |     maybe = maybe
 50 |     plus = plus
 51 |     star = star
 52 | 
 53 | 
 54 | # Basic combinators
 55 | 
 56 | nil = ['nil']
 57 | 
 58 | fail    = _Peg(lambda s: [])
 59 | succeed = _Peg(lambda s: [((), s)])
 60 | 
 61 | ## anything('hi')
 62 | #. ('hi',)
 63 | ## chain(anything, succeed)('hi')
 64 | #. ('hi',)
 65 | 
 66 | def cond(p, q, r):
 67 |     def run(s):
 68 |         pv = p.run(s)
 69 |         choice = q if pv else r
 70 |         if choice is p: return pv # (an optimization)
 71 |         else: return choice.run(s)
 72 |     return _Peg(run)
 73 | 
 74 | def satisfying(ok):
 75 |     "Eat a subject s when ok(s), producing (s,)."
 76 |     return _Peg(lambda s: [((s,), nil)] if s is not nil and ok(s) else [])
 77 | 
 78 | def chain(p, q):
 79 |     return _Peg(lambda s: [(pvals + qvals, qnub)
 80 |                            for pvals, pnub in p.run(s)
 81 |                            for qvals, qnub in q.run(pnub)])
 82 | 
 83 | def alter(p, f):
 84 |     return _Peg(lambda s: [(f(*vals), nub)
 85 |                            for vals, nub in p.run(s)])
 86 | 
 87 | def delay(thunk):
 88 |     def run(s):
 89 |         q.run = Peg(thunk()).run
 90 |         return q.run(s)
 91 |     q = _Peg(run)
 92 |     return q
 93 | 
 94 | def item(p):
 95 |     "Eat the first item of a sequence, iff p matches it."
 96 |     def run(s):
 97 |         if s is nil: return []
 98 |         try: first = s[0]
 99 |         except IndexError: return []
100 |         except TypeError: return []
101 |         except KeyError: return []
102 |         return [(vals, s[1:]) for vals, _ in p.run(first)]
103 |     return _Peg(run)
104 | 
105 | def match(regex, flags=0):
106 |     compiled = re.compile(regex, flags)
107 |     return _Peg(lambda s: 
108 |                 [] if s is nil
109 |                 else [(m.groups(), s[m.end():])
110 |                       for m in [compiled.match(s)] if m])
111 | 
112 | def capture(p):
113 |     def run(s):
114 |         for vals, nub in p.run(s):
115 |             # XXX use the position change instead, once we're tracking that:
116 |             if s is not nil and nub is not nil:
117 |                 i = len(s) - len(nub)
118 |                 if s[i:] == nub:
119 |                     return [((s[:i],), nub)]
120 |             raise Exception("Bad capture")
121 |         return []
122 |     return _Peg(run)
123 | 
124 | ## capture(match('h..') + match('.'))('hi there')
125 | #. ('hi t',)
126 | ## capture(item(anything) + item(anything))([3])
127 | ## capture(item(anything) + item(anything))([3, 1])
128 | #. ([3, 1],)
129 | 
130 | 
131 | # More derived combinators
132 | 
133 | ## startswith('hi')('hi there')
134 | #. ()
135 | 
136 | def startswith(s): return match(re.escape(s))
137 | 
138 | anything = satisfying(lambda s: True)
139 | def literal(c): return drop(satisfying(lambda s: c == s))
140 | def drop(p): return alter(p, lambda *vals: ())
141 | 
142 | end = invert(item(anything))   # Hmmm
143 | 
144 | def an_instance(type_):
145 |     return satisfying(lambda x: isinstance(x, type_))
146 | 
147 | def alt(*ps):
148 |     if not ps: return fail
149 |     if not ps[1:]: return ps[0]
150 |     return either(ps[0], alt(*ps[1:]))
151 | 
152 | def items(*ps):
153 |     if not ps: return end
154 |     return chain(item(ps[0]), items(*ps[1:]))
155 | 
156 | def seq(*ps):
157 |     if not ps: return succeed
158 |     return chain(ps[0], seq(*ps[1:]))
159 | 
160 | give = lambda c: feed(succeed, lambda: c)
161 | 
162 | 
163 | # Examples
164 | 
165 | from operator import *
166 | 
167 | ## fail(42)
168 | ## anything(42)
169 | #. (42,)
170 | ## chain(item(literal(5)), item(literal(0)))([5, 0, 2])
171 | #. ()
172 | ## an_instance(int)(42)
173 | #. (42,)
174 | 
175 | calc = delay(lambda:
176 |        alt(feed(items(literal('+'), calc, calc), add),
177 |            feed(items(literal('-'), calc, calc), sub),
178 |            an_instance(int)))
179 | 
180 | ## calc(42)
181 | #. (42,)
182 | ## calc(['-', 3, 1])
183 | #. (2,)
184 | ## calc(['+', ['-', 2, 1], 3])
185 | #. (4,)
186 | 
187 | singleton = lambda v: (v,)
188 | cat = lambda *lists: sum(lists, ())
189 | flatten1 = delay(lambda:
190 |            alt(seq(item(literal('+')), star(item(flatten1)), end),
191 |                an_instance(int)))
192 | 
193 | ## flatten1(['+', ['+', ['+', 1, ['+', 2]]]])
194 | #. (1, 2)
195 | ## flatten1(42)
196 | #. (42,)
197 | ## flatten1(['+'])
198 | #. ()
199 | ## flatten1(['+', 42])
200 | #. (42,)
201 | ## flatten1(['+', 42, 43])
202 | #. (42, 43)
203 | 
204 | ## chain(item(literal('+')), anything)(['+', 42])
205 | #. ([42],)
206 | 
207 | ## star(item(anything))([1,2,3])
208 | #. (1, 2, 3)
209 | 
210 | ## star(match('hi() '))('hi hi hi there')
211 | #. ('', '', '')
212 | 


--------------------------------------------------------------------------------