├── .gitignore ├── README.md ├── combinator_grammars.py ├── eg_B_compiler ├── README.md ├── ast.py ├── b2020.parson ├── bcomp.py ├── eg │ ├── eg0.b │ ├── eg0.s.ref │ ├── eg1.b │ ├── eg1.s.ref │ ├── eg2.b │ ├── eg2.s.ref │ ├── eg3.b │ └── eg3.s.ref ├── error_tests │ └── notb.b ├── gen_vm_asm.py ├── structs.py └── testme.sh ├── eg_basic.py ├── eg_bicicleta.py ├── eg_calc.py ├── eg_calc_compile.py ├── eg_calc_to_rpn.py ├── eg_ebnf ├── c_emit.py ├── ebnf.py ├── metagrammar.py ├── notes.text ├── structs.py └── vm.py ├── eg_fp.py ├── eg_itsy ├── README.md ├── ast.py ├── c_emitter.py ├── c_prelude.h ├── complainer.py ├── eg │ ├── examples.itsy │ ├── regex.itsy │ ├── sieve.itsy │ ├── superopt.itsy │ └── um.itsy ├── error_tests │ ├── bad.itsy │ ├── bad2.itsy │ └── lvalues.itsy ├── grammar ├── halpme.py ├── itsy.py ├── primitives.py ├── reref.sh ├── structs.py ├── testme.sh └── typecheck.py ├── eg_json.py ├── eg_linear_equations.py ├── eg_metapeg.py ├── eg_microses.py ├── eg_misc.py ├── eg_mutagen_from_js.py ├── eg_oberon0.py ├── eg_oberon0_with_lexer.py ├── eg_outline.py ├── eg_phone_num.py ├── eg_pother.py ├── eg_precedence.py ├── eg_puzzler.py ├── eg_regex.py ├── eg_roman.py ├── eg_templite.py ├── eg_trees.py ├── eg_url.py ├── eg_wc.py ├── microses.py ├── parson.py ├── peg.py ├── peglet_to_parson.py ├── pegvm.py ├── setup.py ├── structs.py ├── testsmoke.py └── treepeg.py /.gitignore: -------------------------------------------------------------------------------- 1 | *.py[cod] 2 | *~ 3 | 4 | # C extensions 5 | *.so 6 | 7 | # Packages 8 | *.egg 9 | *.egg-info 10 | dist 11 | build 12 | eggs 13 | parts 14 | bin 15 | var 16 | sdist 17 | develop-eggs 18 | .installed.cfg 19 | lib 20 | lib64 21 | 22 | # Installer logs 23 | pip-log.txt 24 | 25 | # Unit test / coverage reports 26 | .coverage 27 | .tox 28 | nosetests.xml 29 | 30 | # Translations 31 | *.mo 32 | 33 | # Mr Developer 34 | .mr.developer.cfg 35 | .project 36 | .pydevproject 37 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | parson 2 | ====== 3 | 4 | Yet another PEG parser combinator library in Python. Selling points: 5 | 6 | * The optional concrete syntax for grammars incorporates semantic 7 | actions in a concise host-language-independent way. A Parson 8 | grammar won't tie you to Python. 9 | 10 | * Whole grammars can be analyzed and compiled, even if built at 11 | runtime using combinators. (Contrast with a monadic library, where 12 | this is uncomputable.) 13 | 14 | * Semantic actions take and return values in a kind of point-free 15 | style. 16 | 17 | * You can use the concrete syntax with about as little ceremony as 18 | `re.match`. 19 | 20 | * You can parse non-string sequences. 21 | 22 | Anti-selling points: 23 | 24 | * This library's in fluid design still, undocumented, utterly 25 | untuned, etc. I'd like you to use it if you think you might give 26 | feedback on the design; otherwise, no promises. 27 | 28 | * Semantic actions work in a nontraditional way that may remind you 29 | of Forth and which I haven't yet tried to make play well in typed 30 | languages like Haskell. It's concise and just right for parsing, 31 | but maybe in the end it'll turn out too cute and make me rip it 32 | out if I want this to be used. 33 | 34 | * I don't intend to make grammars work in other host languages 35 | before the design settles. (I have done this a bit for the 36 | [Peglet](https://github.com/darius/peglet) library, a more basic 37 | and settled expression of the same approach to actions: it has 38 | Python and JavaScript ports.) 39 | 40 | I guess the most similar library out there is LPEG, and that's way way 41 | more polished. 42 | 43 | 44 | Examples 45 | ======== 46 | 47 | For now, see all the eg_whatever.py files here. eg_calc.py, 48 | eg_misc.py, eg_wc.py, and eg_regex.py have the smallest ones. 49 | eg_trees.py shows parsing of tree structures, OMeta-style. Other 50 | examples include programming languages and other somewhat-bigger 51 | stuff. 52 | 53 | Basic things still to explain: 54 | * grammar syntax 55 | * combinators 56 | * recursion with combinators 57 | * actions 58 | 59 | Examples of where I've used it for more than examples: 60 | * [IDEAL](https://github.com/darius/unreal/blob/master/parser.py), a drawing language 61 | * [Linogram](https://github.com/darius/goobergram/blob/master/parser.py), also a drawing language 62 | * [Pythological](https://github.com/darius/pythological/blob/master/parser.py), a MiniKanren with a vaguely Prologish frontend 63 | * [tinyhiss](https://github.com/darius/tinyhiss/blob/master/parser.py) -- Smalltalkish 64 | * [Squee](https://github.com/darius/squee/blob/master/parse_sans_offsides.py), an experimental language not much like any others 65 | * [Toot](https://github.com/darius/toot/blob/master/parse.py), a tutorial on writing a bytecode compiler 66 | 67 | 68 | Needs more work: 69 | ================ 70 | 71 | * There's a way to make a grammar automatically skip whitespace and 72 | comments and such ('FNORD' rules), which probably should be done 73 | differently. 74 | 75 | * It should be made easy to use with a separate lexer, and I haven't 76 | tried this enough to say it's ready (it's probably not). 77 | 78 | * It should also be easy to write a 'real' compiler, where source-location 79 | info gets added to all the AST nodes or whatever representation 80 | you're building. This is doable but should be more automated. 81 | 82 | After these design issues, this ought to be ported to a 83 | different-enough language to bring out issues of working nicely with 84 | multiple languages. 85 | 86 | After *that*, I think it'd be time to tackle quality of implementation. 87 | -------------------------------------------------------------------------------- /combinator_grammars.py: -------------------------------------------------------------------------------- 1 | """ 2 | A convenience for defining recursive grammars in the combinator DSL. 3 | The delay() combinator works for this, but code using it is maybe uglier. 4 | """ 5 | 6 | import parson as P 7 | 8 | class Grammar(object): # XXX call it something else? name clash 9 | def __init__(self): 10 | object.__setattr__(self, '_rules', {}) 11 | object.__setattr__(self, '_stubs', {}) 12 | 13 | def __getattr__(self, name): 14 | try: return self._rules[name] 15 | except KeyError: pass 16 | try: return self._stubs[name] 17 | except KeyError: pass 18 | self._stubs[name] = result = P.delay(lambda: self._rules[name], '<%s>', name) 19 | return result 20 | 21 | def __setattr__(self, name, value): 22 | self._rules[name] = value 23 | 24 | # Example: 25 | ## g = Grammar() 26 | ## g.a = 'A' + g.b 27 | ## g.b = 'B' 28 | ## g.a('AB') 29 | #. () 30 | 31 | # TODO try fancier examples 32 | # TODO investigate implementing via descriptors instead 33 | # TODO nicer error when misused 34 | -------------------------------------------------------------------------------- /eg_B_compiler/README.md: -------------------------------------------------------------------------------- 1 | A compiler from 2 | https://github.com/johnwcowan/pdp8x/blob/master/b202x.md to a custom 3 | virtual machine. 4 | 5 | May not exactly fit the spec: I haven't yet added all the new 6 | features, and I followed the C operator-precedence table, which might 7 | have minor differences from B. 8 | -------------------------------------------------------------------------------- /eg_B_compiler/ast.py: -------------------------------------------------------------------------------- 1 | """ 2 | Abstract syntax of B2020. 3 | """ 4 | 5 | from structs import Struct 6 | 7 | 8 | # Global declarations 9 | 10 | class Global( Struct('name opt_size opt_init')): pass 11 | class Proc( Struct('name params stmt')): pass 12 | 13 | 14 | # Statements 15 | 16 | class Auto( Struct('decls')): pass 17 | class Extern( Struct('names')): pass 18 | class Static( Struct('names')): pass 19 | class Block( Struct('stmts')): pass 20 | class If_stmt( Struct('exp then_ opt_else')): pass 21 | class While( Struct('exp stmt')): pass 22 | class Switch( Struct('exp stmt')): pass 23 | class Goto( Struct('exp')): pass 24 | class Return( Struct('opt_exp')): pass 25 | class Label( Struct('name stmt')): pass 26 | class Case( Struct('literal stmt')): pass 27 | class Exp( Struct('opt_exp')): pass 28 | 29 | 30 | # Expressions 31 | 32 | class Assign( Struct('e1 binop e2')): pass 33 | class If_exp( Struct('e1 e2 e3')): pass 34 | class Binary_exp( Struct('e1 binop e2')): pass 35 | class Call( Struct('e1 args')): pass 36 | class Pre_incr( Struct('op e1')): pass 37 | class Post_incr( Struct('e1 op')): pass 38 | class Literal( Struct('text kind')): pass # TODO check octal constants for /[89]/ 39 | class Variable( Struct('name')): pass 40 | class Unary_exp( Struct('unop e1')): pass 41 | 42 | class Address_of( Struct('e1')): pass # TODO these are currently under Unary_exp instead 43 | 44 | class And( Struct('e1 e2')): pass # TODO maybe use instead of Binary_exp 45 | class Or( Struct('e1 e2')): pass 46 | 47 | def Index(e1, e2): 48 | return Unary_exp('*', Binary_exp(e1, '+', e2)) 49 | -------------------------------------------------------------------------------- /eg_B_compiler/b2020.parson: -------------------------------------------------------------------------------- 1 | # Changes from the old B grammar: 2 | # spell extrn as extern 3 | # \ instead of * as a character and string escape 4 | # the && and || operators avoid the need for special treatment of & and | 5 | # octal constants have octal digits only -- leaving this up to semantic actions 6 | # the assignment operators are reversed (+= as in C, not =+ as in B)] 7 | # declare internal variables with static instead of no keyword 8 | # allocate arrays with syntax like `auto x[42]` 9 | 10 | # Not yet since it's a new feature, not just a change: 11 | # initialize variables in declarations with = 12 | 13 | 14 | program: 15 | _ definition* :end. 16 | 17 | definition: 18 | name ('[' (constant) ']' | :None) ['=' ival++',' :hug | :None] ';' :Global 19 | | name '(' [name**',' :hug] ')' statement :Proc. 20 | 21 | ival: 22 | constant 23 | | name :Variable. 24 | 25 | statement: 26 | "auto" [name ('[' constant ']' | :None) :hug]++',' ';' :hug :Auto 27 | | "extern" name++',' ';' :hug :Extern 28 | | "static" name++',' ';' :hug :Static 29 | | '{' statement* '}' :hug :Block 30 | | "if" '(' exp ')' statement ("else" statement | :None) :If_stmt 31 | | "while" '(' exp ')' statement :While 32 | | "switch" exp statement :Switch 33 | | "goto" exp ';' :Goto 34 | | "return" (exp | :None) ';' :Return 35 | | "case" constant ':' statement :Case 36 | | name ':' statement :Label 37 | | (exp | :None) ';' :Exp. 38 | 39 | 40 | #### 41 | # https://en.cppreference.com/w/c/language/operator_precedence 42 | # XXX This isn't identical to kbman precedences 43 | #### 44 | 45 | exp1: 46 | ( name :Variable 47 | | constant 48 | | '(' exp ')' 49 | ) ( inc_dec :Post_incr 50 | | '[' exp ']' :Index 51 | | '(' [exp**',' :hug] ')' :Call 52 | )*. 53 | 54 | exp2: 55 | unaryop exp2 :Unary_exp 56 | | '&' !/[&=]/ exp2 :Address_of 57 | | inc_dec exp2 :Pre_incr 58 | | exp1. 59 | 60 | exp3: exp2 (op3 exp2 :Binary_exp)*. op3 ~: { '*' !/=/ | '/' !/[*=]/ | '%' !/=/ } _. 61 | exp4: exp3 (op4 exp3 :Binary_exp)*. op4 ~: { '+' !/[+=]/ | '-' !/[-=]/ } _. 62 | exp5: exp4 (op5 exp4 :Binary_exp)*. op5 ~: { '<<' !/=/ | '>>' !/=/ } _. 63 | exp6: exp5 (op6 exp5 :Binary_exp)*. op6 ~: { '<=' | '>=' | '<' | '>' } _. 64 | exp7: exp6 (op7 exp6 :Binary_exp)*. op7 ~: { '==' | '!=' } _. 65 | exp8: exp7 (op8 exp7 :Binary_exp)*. op8 ~: { '&' !/[&=]/ } _. 66 | exp9: exp8 (op9 exp8 :Binary_exp)*. op9 ~: { '^' !/=/} _. 67 | exp10: exp9 (op10 exp9 :Binary_exp)*. op10 ~: { '|' !/[|=]/} _. 68 | exp11: exp10 (op11 exp10 :And)*. op11 ~: '&&' _. 69 | exp12: exp11 (op12 exp11 :Or)*. op12 ~: '||' _. 70 | exp13: exp12 ('?' exp ':' exp13 :If_exp)?. 71 | exp14: exp13 (assign exp14 :Assign)?. 72 | exp: exp14. 73 | 74 | 75 | # Lexical grammar 76 | 77 | assign ~: 78 | opassign | /(=)(?!=)/ _. 79 | 80 | inc_dec ~: 81 | { '++' 82 | | '--' 83 | } _. 84 | 85 | unaryop ~: 86 | { '-' !/[-=]/ 87 | | '~' 88 | | '!' !'=' 89 | | '*' !'=' 90 | } _. 91 | 92 | opassign ~: 93 | { '<<=' 94 | | '>>=' 95 | | '|=' 96 | | '&=' 97 | | '^=' 98 | | '-=' 99 | | '+=' 100 | | '%=' 101 | | '*=' 102 | | '/=' 103 | } _. 104 | 105 | 106 | constant ~: 107 | {'0' digit+} _ :'octal' :Literal 108 | | { digit+} _ :'decimal' :Literal 109 | | {/'/ sqchar sqchar? /'/} _ :'char' :Literal 110 | | {/"/ dqchar* /"/} _ :'string' :Literal. 111 | 112 | sqchar ~: escape | /[^']/. 113 | dqchar ~: escape | /[^"]/. 114 | escape ~: /\\./. 115 | 116 | name ~: !keyword {alpha (alpha|digit)*} _. 117 | 118 | keyword = /(auto|extern|static|if|while|switch|goto|return|case)\b/. 119 | 120 | alpha ~: /[A-Za-z_]/. # "and backspace"?! I'm just ignoring that. 121 | digit ~: /[0-9]/. 122 | 123 | FNORD ~: _. 124 | _ ~: (/\s+/ | comment)*. 125 | 126 | comment ~: '/*' commentbody. # (The following awkward definition is to save Python stack space.) 127 | commentbody ~: '*/' | /[^*]+/ commentbody | '*' commentbody. 128 | 129 | # TODO better definition: 130 | # comment ~: '/*' (!'*/' :anyone)* '*/'. 131 | -------------------------------------------------------------------------------- /eg_B_compiler/bcomp.py: -------------------------------------------------------------------------------- 1 | """ 2 | Tie the modules together into a compiler. 3 | It writes VM assembly to stdout. 4 | """ 5 | 6 | import sys 7 | 8 | import ast 9 | from gen_vm_asm import gen_program 10 | from parson import Grammar, Unparsable 11 | 12 | with open('b2020.parson') as f: 13 | grammar_source = f.read() 14 | parser = Grammar(grammar_source).bind(ast) 15 | 16 | def main(argv): 17 | err = 0 18 | for filename in argv[1:]: 19 | err |= compiler_main(filename) 20 | return err 21 | 22 | def compiler_main(filename, out_filename=None): 23 | with open(filename) as f: 24 | text = f.read() 25 | try: 26 | global_decls = parser.program(text) 27 | except Unparsable as exc: 28 | (before, after) = exc.failure 29 | complain(filename, before, after, "Syntax error") 30 | return 1 31 | gen_program(global_decls) 32 | return 0 33 | 34 | def complain(filename, before, after, plaint): 35 | line_no = before.count('\n') 36 | prefix = (before+'\n').splitlines()[line_no] 37 | suffix = (after+'\n').splitlines()[0] # XXX what if right on newline? 38 | prefix, suffix = sanitize(prefix), sanitize(suffix) 39 | message = ["%s:%d:%d: %s" % (filename, line_no+1, len(prefix), plaint), 40 | ' ' + prefix + suffix, 41 | ' ' + ' '*len(prefix) + '^'] 42 | sys.stderr.write('\n'.join(message) + '\n') 43 | 44 | def sanitize(s): 45 | "Make s predictably printable, sans control characters like tab." 46 | unprintable = chr(127) 47 | return ''.join(c if ' ' <= c < unprintable else ' ' # XXX crude 48 | for c in s) 49 | 50 | if __name__ == '__main__': 51 | sys.exit(main(sys.argv)) 52 | -------------------------------------------------------------------------------- /eg_B_compiler/eg/eg0.b: -------------------------------------------------------------------------------- 1 | printn() {} 2 | -------------------------------------------------------------------------------- /eg_B_compiler/eg/eg0.s.ref: -------------------------------------------------------------------------------- 1 | printn proc 2 | params 3 | return_void 4 | endproc 5 | 6 | -------------------------------------------------------------------------------- /eg_B_compiler/eg/eg1.b: -------------------------------------------------------------------------------- 1 | /* The following function will print a non-negative number, n, to 2 | the base b, where 2<=b<=10, This routine uses the fact that 3 | in the ASCII character set, the digits 0 to 9 have sequential 4 | code values. */ 5 | 6 | printn(n,b) { 7 | extern putchar; 8 | auto a; 9 | 10 | if(a=n/b) /* assignment, not test for equality */ 11 | printn(a, b); /* recursive */ 12 | putchar(n%b + '0'); 13 | } 14 | -------------------------------------------------------------------------------- /eg_B_compiler/eg/eg1.s.ref: -------------------------------------------------------------------------------- 1 | printn proc 2 | params n, b 3 | putchar extern 4 | a local 5 | addr a 6 | value n 7 | value b 8 | op2 / 9 | assign = 10 | if_not endif.0 11 | value printn 12 | value a 13 | value b 14 | call 2 15 | pop 16 | endif.0 17 | value putchar 18 | value n 19 | value b 20 | op2 % 21 | push '0' 22 | op2 + 23 | call 1 24 | pop 25 | return_void 26 | endproc 27 | 28 | -------------------------------------------------------------------------------- /eg_B_compiler/eg/eg2.b: -------------------------------------------------------------------------------- 1 | /* The following program will calculate the constant e-2 to about 2 | 4000 decimal digits, and print it 50 characters to the line in 3 | groups of 5 characters. The method is simple output conversion 4 | of the expansion 5 | 1/2! + 1/3! + ... = .111.... 6 | where the bases of the digits are 2, 3, 4, . . . */ 7 | 8 | main() { 9 | extern putchar, n, v; 10 | auto i, c, col, a; 11 | 12 | i = col = 0; 13 | while(i${fs}; then 9 | echo "Didn't fail!" 10 | fi 11 | done 12 | 13 | for f in eg/*.b; do 14 | echo 15 | echo "To assembly:" ${f} 16 | fs=${f%.*}.s 17 | if python bcomp.py ${f} >${fs}; then 18 | echo -n # Expected success (btw what's a no-op in bash?) 19 | else 20 | echo "Failed!" 21 | fi 22 | if test -f ${fs}.ref; then 23 | diff -u ${fs}.ref ${fs} 24 | # TODO raise error at exit if there was a diff 25 | else 26 | echo ' (No ref)' 27 | fi 28 | done 29 | -------------------------------------------------------------------------------- /eg_basic.py: -------------------------------------------------------------------------------- 1 | """ 2 | BASIC interpreter, inspired by Tiny BASIC. 3 | """ 4 | 5 | import bisect, operator, sys 6 | from parson import Grammar, alter 7 | 8 | def chat(): 9 | print "I am Puny Basic. Enter 'bye' to dismiss me." 10 | while True: 11 | try: text = raw_input('> ').strip() 12 | except EOFError: break 13 | if text == 'bye': break 14 | try: basic.command(text) 15 | except Exception as e: 16 | # TODO: put the current line# in the prompt instead, if any; 17 | # should work nicely with a resumable STOP statement 18 | print e, ('' if pc is None else 'at line %d' % lines[pc][0]) 19 | 20 | grammar = Grammar(r""" 21 | command : /(\d+)/ :int /(.*)/ /$/ :set_line 22 | | "run" /$/ :run 23 | | "new" /$/ :new 24 | | "load" /(\S+)/ /$/ :load 25 | | "save" /(\S+)/ /$/ :save 26 | | stmt 27 | | /$/. 28 | 29 | stmt : "print" printing /$/ :next 30 | | '?' printing /$/ :next 31 | | "input" id /$/ :input :next 32 | | "goto" exp /$/ :goto 33 | | "if" relexp "then" exp /$/ :if_goto 34 | | "gosub" exp /$/ :gosub 35 | | "return" /$/ :return_ 36 | | "end" /$/ :end 37 | | "list" /$/ :list :next 38 | | "rem" /.*/ /$/ :next 39 | | "let"? id '=' exp /$/ :store :next. 40 | 41 | printing : (display writes)?. 42 | writes : ';' printing 43 | | ',' :space printing 44 | | :newline. 45 | 46 | display ~: exp :write 47 | | '"' [qchar :write]* '"' FNORD. 48 | qchar ~: /"(")/ # Two consecutive double-quotes mean '"'. 49 | | /([^"])/. # Any other character just means itself. 50 | 51 | relexp : exp ( '<>' exp :ne 52 | | '<=' exp :le 53 | | '<' exp :lt 54 | | '=' exp :eq 55 | | '>=' exp :ge 56 | | '>' exp :gt 57 | )?. 58 | exp : exp1 ( '+' exp1 :add 59 | | '-' exp1 :sub 60 | )*. 61 | exp1 : exp2 ( '*' exp2 :mul 62 | | '/' exp2 :idiv 63 | )*. 64 | exp2 : primary ('^' exp2 :pow)?. 65 | 66 | primary : '-' exp1 :neg 67 | | /(\d+)/ :int 68 | | id :fetch 69 | | '(' exp ')'. 70 | 71 | id : /([a-z])/. # TODO: longer names, screening out reserved words 72 | 73 | FNORD ~: /\s*/. 74 | """) 75 | 76 | 77 | lines = [] # A sorted array of (line_number, source_line) pairs. 78 | pc = None # The program counter: an index into lines[], or None. 79 | return_stack = [] # A stack of line numbers of GOSUBs in progress. 80 | env = {} # Current variable values. 81 | 82 | def run(): 83 | reset() 84 | go() 85 | 86 | def reset(): 87 | global pc 88 | pc = 0 if lines else None 89 | return_stack[:] = [] 90 | env.clear() 91 | 92 | def go(): 93 | global pc 94 | while pc is not None: # TODO: check for stopped, instead 95 | _, line = lines[pc] 96 | pc, = basic.stmt(line) 97 | 98 | def new(): 99 | lines[:] = [] 100 | reset() 101 | 102 | def load(filename): 103 | with open(filename) as f: 104 | new() 105 | for line in f: 106 | basic.command(line) 107 | 108 | def save(filename): 109 | with open(filename, 'w') as f: 110 | for pair in lines: 111 | f.write('%d %s\n' % pair) 112 | 113 | def listing(): 114 | for n, line in lines: 115 | print n, line 116 | 117 | def find(n): # The slice of lines[] including line n, or where to insert it. 118 | i = bisect.bisect(lines, (n, '')) 119 | return slice(i, i+1 if i < len(lines) and lines[i][0] == n else i) 120 | 121 | def set_line(n, text): 122 | lines[find(n)] = [(n, text)] if text else [] 123 | 124 | def goto(n): 125 | sl = find(n) 126 | if sl.start == sl.stop: raise Exception("Missing line", n) 127 | return sl.start 128 | 129 | def if_goto(flag, n): 130 | return goto(n) if flag else next_line(pc) 131 | 132 | def next_line(a_pc): 133 | return None if a_pc in (None, len(lines)-1) else a_pc+1 134 | 135 | def gosub(n): 136 | target = goto(n) 137 | return_stack.append(lines[pc][0]) 138 | return target 139 | 140 | def return_(): 141 | return next_line(goto(return_stack.pop())) 142 | 143 | # Parson's default meaning for a function appearing in a grammar is a 144 | # semantic action returning one value. In this Basic we do some actions 145 | # only for effect: this wraps those actions to produce no values. 146 | def for_effect(fn): 147 | def fn_for_effect(*args): 148 | fn(*args) 149 | return () 150 | return alter(fn_for_effect) 151 | 152 | basic = grammar( 153 | fetch = env.__getitem__, 154 | store = for_effect(env.__setitem__), 155 | input = for_effect(lambda var: env.__setitem__(var, int(raw_input()))), 156 | set_line = for_effect(set_line), 157 | goto = goto, 158 | if_goto = if_goto, 159 | gosub = gosub, 160 | return_ = return_, 161 | eq = operator.eq, 162 | ne = operator.ne, 163 | lt = operator.lt, 164 | le = operator.le, 165 | ge = operator.ge, 166 | gt = operator.gt, 167 | add = operator.add, 168 | sub = operator.sub, 169 | mul = operator.mul, 170 | idiv = operator.idiv, 171 | pow = operator.pow, 172 | neg = operator.neg, 173 | end = lambda: None, 174 | list = for_effect(listing), 175 | run = for_effect(run), 176 | next = lambda: next_line(pc), 177 | new = for_effect(new), 178 | load = for_effect(load), 179 | save = for_effect(save), 180 | write = for_effect(lambda x: sys.stdout.write(str(x))), 181 | space = for_effect(lambda: sys.stdout.write(' ')), 182 | newline = for_effect(lambda: sys.stdout.write('\n')), 183 | ) 184 | 185 | 186 | if __name__ == '__main__': 187 | chat() 188 | 189 | ## basic.command('100 print "hello"') 190 | #. () 191 | ## lines 192 | #. [(100, 'print "hello"')] 193 | ## basic.command('100 print "goodbye"') 194 | #. () 195 | ## lines 196 | #. [(100, 'print "goodbye"')] 197 | ## basic.command('99 print 42,') 198 | #. () 199 | ## lines 200 | #. [(99, 'print 42,'), (100, 'print "goodbye"')] 201 | 202 | ## basic.command('run') 203 | #. 42 goodbye 204 | #. () 205 | 206 | 207 | ## basic.command('print') 208 | #. (None,) 209 | ## basic.command('let x = 5') 210 | #. (None,) 211 | ## basic.command('print x*x') 212 | #. 25 213 | #. (None,) 214 | ## basic.command('print 2+2; -5, "hi"') 215 | #. 4-5 hi 216 | #. (None,) 217 | ## basic.command('? 42 * (5-3) + -2^2') 218 | #. 80 219 | #. (None,) 220 | ## basic.command('print 2^3^2, ') 221 | #. 512 222 | #. (None,) 223 | ## basic.command('print 5-3-1') 224 | #. 1 225 | #. (None,) 226 | ## basic.command('print 3/2') 227 | #. 1 228 | #. (None,) 229 | 230 | ## basic.command('new') 231 | #. () 232 | ## basic.command('load countdown.bas') 233 | #. () 234 | ## basic.command('list') 235 | #. 10 let a = 10 236 | #. 20 if a < 0 then 60 237 | #. 30 print a 238 | #. 40 a = a - 1 239 | #. 50 goto 20 240 | #. 60 print "Blast off!" 241 | #. 70 end 242 | #. (None,) 243 | ## basic.command('run') 244 | #. 10 245 | #. 9 246 | #. 8 247 | #. 7 248 | #. 6 249 | #. 5 250 | #. 4 251 | #. 3 252 | #. 2 253 | #. 1 254 | #. 0 255 | #. Blast off! 256 | #. () 257 | -------------------------------------------------------------------------------- /eg_calc.py: -------------------------------------------------------------------------------- 1 | """ 2 | The customary calculator example. 3 | """ 4 | 5 | import operator 6 | from parson import Grammar 7 | 8 | calc = Grammar(r""" exp0 :end. 9 | 10 | exp0 : exp1 ( '+' exp1 :add 11 | | '-' exp1 :sub )*. 12 | exp1 : exp2 ( '*' exp2 :mul 13 | | '//' exp2 :div 14 | | '/' exp2 :truediv 15 | | '%' exp2 :mod )*. 16 | exp2 : exp3 ( '^' exp2 :pow )?. 17 | 18 | exp3 : '(' exp0 ')' 19 | | '-' exp1 :neg 20 | | /(\d+)/ :int. 21 | 22 | FNORD~: /\s*/. 23 | 24 | """).bind(operator).expecting_one_result() 25 | 26 | ## calc('42 * (5-3) + -2^2') 27 | #. 80 28 | ## calc('2^3^2') 29 | #. 512 30 | ## calc('5-3-1') 31 | #. 1 32 | ## calc('3//2') 33 | #. 1 34 | ## calc('3/2') 35 | #. 1.5 36 | -------------------------------------------------------------------------------- /eg_calc_compile.py: -------------------------------------------------------------------------------- 1 | """ 2 | After http://www.vpri.org/pdf/rn2010001_programm.pdf 3 | """ 4 | 5 | from parson import Grammar 6 | 7 | def assign(v, exp): return exp(0) + ['sw r0, ' + v] 8 | 9 | def ld_const(value): return lambda s: ['lc r%d, %d' % (s, value)] 10 | def ld_var(name): return lambda s: ['lw r%d, %s' % (s, name)] 11 | 12 | def add(exp1, exp2): return lambda s: (exp1(s) + exp2(s+1) 13 | + ['add r%d, r%d, r%d' % (s, s+1, s)]) 14 | def mul(exp1, exp2): return lambda s: (exp1(s) + exp2(s+1) 15 | + ['mul r%d, r%d, r%d' % (s, s+1, s)]) 16 | 17 | g = Grammar(r""" stmt :end. 18 | 19 | stmt : ident ':=' exp0 :assign. 20 | 21 | exp0 : exp1 ('+' exp1 :add)*. 22 | exp1 : exp2 ('*' exp2 :mul)*. 23 | 24 | exp2 : '(' exp0 ')' 25 | | /(\d+)/ :int :ld_const 26 | | ident :ld_var. 27 | 28 | ident : /([A-Za-z]+)/. 29 | 30 | FNORD~: /\s*/. 31 | 32 | """)(**globals()).expecting_one_result() 33 | 34 | ## for line in g('v := 42 * (5+3) + 2*2'): print line 35 | #. lc r0, 42 36 | #. lc r1, 5 37 | #. lc r2, 3 38 | #. add r1, r2, r1 39 | #. mul r0, r1, r0 40 | #. lc r1, 2 41 | #. lc r2, 2 42 | #. mul r1, r2, r1 43 | #. add r0, r1, r0 44 | #. sw r0, v 45 | -------------------------------------------------------------------------------- /eg_calc_to_rpn.py: -------------------------------------------------------------------------------- 1 | """ 2 | Tiny example of 'compiling'. 3 | """ 4 | 5 | from parson import Grammar, alter 6 | 7 | g = Grammar(r""" stmt* :end. 8 | 9 | stmt : ident '=' exp0 ';' :assign. 10 | 11 | exp0 : exp1 ('+' exp1 :'add')*. 12 | exp1 : exp2 ('*' exp2 :'mul')*. 13 | 14 | exp2 : '(' exp0 ')' 15 | | /(\d+)/ 16 | | ident :'fetch'. 17 | 18 | ident : /([A-Za-z]+)/. 19 | 20 | FNORD ~: /\s*/. 21 | """)(assign=alter(lambda name, *rpn: rpn + (name, 'store'))) 22 | 23 | ## print ' '.join(g('v = 42 * (5+3) + 2*2; v = v + 1;')) 24 | #. 42 5 3 add mul 2 2 mul add v store v fetch 1 add v store 25 | -------------------------------------------------------------------------------- /eg_ebnf/c_emit.py: -------------------------------------------------------------------------------- 1 | """ 2 | Generate a parser in C. 3 | """ 4 | 5 | import codecs 6 | from structs import Visitor 7 | 8 | def gen_kinds_enum(self): 9 | return '\n'.join(gen_kinds(self)) 10 | 11 | def gen_parser(self): 12 | return '\n'.join(codegen(self)) 13 | 14 | # TODO shouldn't an EOF also be a kind? 15 | def gen_kinds(grammar): 16 | tokens = grammar.lexer_symbols() 17 | kinds = sorted(map(c_encode_token, tokens)) 18 | yield 'enum {' 19 | for kind in kinds: 20 | yield kind + ',' 21 | yield '};' 22 | 23 | def gen_lexer_fns(grammar): 24 | syms = grammar.lexer_symbols() 25 | assert all(t.text for t in syms) 26 | assert len(syms) == len(set(t.text for t in syms)) 27 | lits = tuple(t for t in syms if t.kind == 'literal') 28 | kwds = tuple(t for t in syms if t.kind == 'keyword') 29 | yield gen_lexer_fn('lex_lits', lits) 30 | yield '' 31 | yield gen_lexer_fn('lex_keywords', kwds) 32 | # TODO skip lex_keywords if no keywords. In principle there might be no lits, too. 33 | 34 | def gen_lexer_fn(name, syms): 35 | return ('void %s(void) %s' 36 | % (name, embrace('\n'.join(gen_trie_lexer(syms))))) 37 | 38 | def gen_trie_lexer(syms): 39 | trie = sprout({t.text: t for t in syms}) 40 | for line in gen_lex_dispatch(trie, 0): 41 | yield line 42 | 43 | def sprout(rel): 44 | """Given a map of {string: value}, represent it as a trie 45 | (opt_value_for_empty_string, {leading_char: subtrie}).""" 46 | parts = map_from_relation((k[0], (k[1:], v)) 47 | for k,v in rel.items() if k) 48 | return (rel.get(''), 49 | {head: sprout(dict(tails)) for head,tails in parts.items()}) 50 | 51 | def map_from_relation(pairs): 52 | result = {} 53 | for k, v in pairs: 54 | result.setdefault(k, []).append(v) 55 | return result 56 | 57 | def gen_lex_dispatch((opt_on_empty, branches), offset): 58 | heads = sorted(branches.keys()) 59 | if opt_on_empty: 60 | default = ('token.kind = %s; scan += %d; return;' 61 | % (c_encode_token(opt_on_empty), offset)) 62 | else: 63 | default = '' 64 | if heads: 65 | yield 'switch (scan[%d]) {' % offset 66 | for head in heads: 67 | yield 'case %s:' % c_char_literal(head) 68 | for line in gen_lex_dispatch(branches[head], offset + 1): 69 | yield ' ' + line 70 | yield ' break;' 71 | if default: 72 | yield 'default:' 73 | yield ' ' + default 74 | yield '}' 75 | elif default: 76 | yield default 77 | 78 | def c_char_literal(ch): 79 | # TODO anywhere this doesn't match C? 80 | return "'%s'" % codecs.encode(ch, 'string_escape') 81 | 82 | def c_encode_token(token): 83 | # TODO rename to TOKEN_%s or something 84 | return 'kind_%s' % ''.join(escapes.get(c, c) for c in token.text) 85 | 86 | escapes = { 87 | '!': '_BANG', 88 | '@': '_AT', 89 | '#': '_HASH', 90 | '$': '_DOLLAR', 91 | '%': '_PERCENT', 92 | '^': '_HAT', 93 | '&': '_AMPERSAND', 94 | '*': '_STAR', 95 | '(': '_LPAREN', 96 | ')': '_RPAREN', 97 | '-': '_DASH', 98 | '_': '_UNDERSCORE', 99 | '\\': '_BACKSLASH', 100 | '|': '_BAR', 101 | "'": '_QUOTE', 102 | '"': '_DOUBLEQUOTE', 103 | '/': '_SLASH', 104 | '?': '_QUESTION', 105 | ',': '_COMMA', 106 | '.': '_DOT', 107 | '<': '_LESS', 108 | '>': '_GREATER', 109 | '[': '_LBRACKET', 110 | ']': '_RBRACKET', 111 | '=': '_EQUALS', 112 | '+': '_PLUS', 113 | '`': '_BACKQUOTE', 114 | '~': '_TILDE', 115 | '{': '_LBRACE', 116 | '}': '_RBRACE', 117 | ';': '_SEMICOLON', 118 | ':': '_COLON', 119 | } 120 | 121 | def codegen(grammar): 122 | for plaint in grammar.errors: 123 | yield '// ' + plaint 124 | if grammar.errors: yield '' 125 | for block in gen_lexer_fns(grammar): 126 | yield block 127 | yield '' 128 | for name in grammar.nonterminals: 129 | yield 'void parse_%s(void);' % name 130 | for name in grammar.nonterminals: 131 | body = gen(grammar.directed[name]) 132 | yield '' 133 | yield 'void parse_%s(void) %s' % (name, embrace(body)) 134 | 135 | def embrace(s): return '{%s\n}' % indent('\n' + s) 136 | def indent(s): return s.replace('\n', '\n ') 137 | 138 | class Gen(Visitor): 139 | def Empty(self, t): return '' 140 | def Symbol(self, t): return 'eat(%s);' % c_encode_token(t) 141 | def Call(self, t): return 'parse_%s();' % t.name 142 | def Branch(self, t): return gen_switch(t) 143 | def Fail(self, t): return 'parser_fail();' 144 | def Chain(self, t): return '\n'.join(filter(None, [self(t.e1), self(t.e2)])) 145 | def Loop(self, t): return gen_while(t.firsts, self(t.body)) 146 | def Action(self, t): return '/* XXX action */' 147 | gen = Gen() 148 | 149 | def gen_while(firsts, body): 150 | test = ' || '.join(map(gen_test, sorted(firsts))) 151 | return 'while (%s) %s' % (test, embrace(body)) 152 | 153 | def gen_test(token): 154 | return 'token.kind == %s' % c_encode_token(token) 155 | 156 | def gen_switch(t): 157 | cases = ['%s %s' % ('\n'.join('case %s:' % c_encode_token(c) 158 | for c in sorted(kinds)), 159 | embrace(gen(alt))) 160 | for kinds, alt in t.cases] 161 | default = 'default: ' + embrace(gen(t.default)) 162 | return 'switch (token.kind) ' + embrace(' break;\n'.join(cases + [default])) 163 | 164 | 165 | # Smoke test 166 | 167 | ## from ebnf import Grammar, eg 168 | ## import operator 169 | ## actions = dict(X=lambda: 3, **operator.__dict__) 170 | 171 | ## egg = Grammar(eg, actions) 172 | 173 | ## print gen_parser(egg) 174 | #. void lex_lits(void) { 175 | #. switch (scan[0]) { 176 | #. case '(': 177 | #. token.kind = kind__LPAREN; scan += 1; return; 178 | #. break; 179 | #. case ')': 180 | #. token.kind = kind__RPAREN; scan += 1; return; 181 | #. break; 182 | #. case '*': 183 | #. token.kind = kind__STAR; scan += 1; return; 184 | #. break; 185 | #. case '+': 186 | #. token.kind = kind__PLUS; scan += 1; return; 187 | #. break; 188 | #. case '-': 189 | #. token.kind = kind__DASH; scan += 1; return; 190 | #. break; 191 | #. case 'b': 192 | #. token.kind = kind_b; scan += 1; return; 193 | #. break; 194 | #. case 'x': 195 | #. token.kind = kind_x; scan += 1; return; 196 | #. break; 197 | #. case 'y': 198 | #. token.kind = kind_y; scan += 1; return; 199 | #. break; 200 | #. } 201 | #. } 202 | #. 203 | #. void lex_keywords(void) { 204 | #. 205 | #. } 206 | #. 207 | #. void parse_A(void); 208 | #. void parse_B(void); 209 | #. void parse_C(void); 210 | #. void parse_exp(void); 211 | #. void parse_term(void); 212 | #. void parse_factor(void); 213 | #. 214 | #. void parse_A(void) { 215 | #. switch (token.kind) { 216 | #. case kind_b: { 217 | #. parse_B(); 218 | #. eat(kind_x); 219 | #. parse_A(); 220 | #. } break; 221 | #. case kind_y: { 222 | #. eat(kind_y); 223 | #. } break; 224 | #. default: { 225 | #. parser_fail(); 226 | #. } 227 | #. } 228 | #. } 229 | #. 230 | #. void parse_B(void) { 231 | #. eat(kind_b); 232 | #. } 233 | #. 234 | #. void parse_C(void) { 235 | #. 236 | #. } 237 | #. 238 | #. void parse_exp(void) { 239 | #. parse_term(); 240 | #. switch (token.kind) { 241 | #. case kind__PLUS: { 242 | #. eat(kind__PLUS); 243 | #. parse_exp(); 244 | #. /* XXX action */ 245 | #. } break; 246 | #. case kind__DASH: { 247 | #. eat(kind__DASH); 248 | #. parse_exp(); 249 | #. /* XXX action */ 250 | #. } break; 251 | #. default: { 252 | #. 253 | #. } 254 | #. } 255 | #. } 256 | #. 257 | #. void parse_term(void) { 258 | #. parse_factor(); 259 | #. while (token.kind == kind__STAR) { 260 | #. eat(kind__STAR); 261 | #. parse_factor(); 262 | #. /* XXX action */ 263 | #. } 264 | #. } 265 | #. 266 | #. void parse_factor(void) { 267 | #. switch (token.kind) { 268 | #. case kind_x: { 269 | #. eat(kind_x); 270 | #. /* XXX action */ 271 | #. } break; 272 | #. case kind__LPAREN: { 273 | #. eat(kind__LPAREN); 274 | #. parse_exp(); 275 | #. eat(kind__RPAREN); 276 | #. } break; 277 | #. default: { 278 | #. parser_fail(); 279 | #. } 280 | #. } 281 | #. } 282 | -------------------------------------------------------------------------------- /eg_ebnf/metagrammar.py: -------------------------------------------------------------------------------- 1 | """ 2 | Abstract and concrete syntax of grammars. 3 | """ 4 | 5 | from structs import Struct as _S 6 | 7 | class Empty (_S('')): pass 8 | class Symbol(_S('text kind')): pass 9 | class Call (_S('name')): pass 10 | class Either(_S('e1 e2')): pass 11 | class Chain (_S('e1 e2')): pass 12 | class Star (_S('e1')): pass 13 | class Action(_S('name')): pass 14 | 15 | # TODO more efficient implementations: 16 | def Maybe(e1): return Either(e1, Empty()) 17 | def Plus(e1): return Chain(e1, Star(e1)) 18 | def Plus2(e1, e2): return Chain(e1, Star(Chain(e2, e1))) 19 | def Star2(e1, e2): return Maybe(Plus2(e1, e2)) 20 | 21 | metagrammar_text = r""" 22 | '' rule* :end. 23 | 24 | rule : name ':' exp '.' :hug. 25 | 26 | exp : term ('|' exp :Either)? 27 | | :Empty. 28 | 29 | term : factor (term :Chain)?. 30 | factor : primary ('**' primary :Star2 31 | |'++' primary :Plus2 32 | |'*' :Star 33 | |'+' :Plus 34 | |'?' :Maybe 35 | )?. 36 | 37 | primary : qstring :'literal' :Symbol 38 | | dqstring :'keyword' :Symbol 39 | | '$' name :'lexer' :Symbol 40 | | name :Call 41 | | ':' name :Action 42 | | ':' qstring :Action 43 | | '[' exp ']' # Dunno if we'll still want this for semantics. 44 | # I'm keeping this production enabled because it's 45 | # used in itsy.grammar, but XXX this should be either 46 | # deleted or given a proper semantic action. 47 | | '(' exp ')'. 48 | 49 | name : /([A-Za-z_]\w*)/. 50 | 51 | qstring ~: /'/ quoted_char* /'/ FNORD :join. 52 | dqstring ~: '"' dquoted_char* '"' FNORD :join. 53 | 54 | quoted_char ~: /\\(.)/ | /([^'])/. 55 | dquoted_char~: /\\(.)/ | /([^"])/. 56 | 57 | FNORD ~: whitespace?. 58 | whitespace ~: /(?:\s|#.*)+/. 59 | """ 60 | -------------------------------------------------------------------------------- /eg_ebnf/notes.text: -------------------------------------------------------------------------------- 1 | see also 2 | https://python-history.blogspot.com/2018/05/the-origins-of-pgen.html 3 | https://github.com/rvirding/spell1 4 | https://os.ghalkes.nl/LLnextgen/ 5 | 6 | def Maybe(e1): return Either(e1, Empty()) 7 | def Plus(e1): return Chain(e1, Star(e1)) # inefficient because dup 8 | def Plus2(e1, e2): return Chain(e1, Star(Chain(e2, e1))) # inefficient because dup 9 | def Star2(e1, e2): return Maybe(Plus2(e1, e2)) 10 | # TODO more efficient implementations 11 | So, how to do that? We could generate custom code for each, but that's 12 | extra work, especially if we want to vary backends. 13 | 14 | [[e*]] = loop { unless(!!e) break; [[e]]; } 15 | [[e+]] = loop { [[e]]; unless(!!e) break; } 16 | [[e++sep]] = loop { [[e]]; unless(!!sep) break; [[sep]]; } 17 | [[e**sep]] = if(!!e) [[e++sep]]; # assuming e is not nullable 18 | 19 | This suggests replacing the Star type with a Loop type like 20 | 21 | Loop(break_at, es) 22 | where break_at is an index in [0..len(es)] 23 | inserting a break test at that index, 24 | which checks against the first-set of es[break_at % len(es)] 25 | 26 | or similarly 27 | 28 | Loop(es_before_break, es_after_break) 29 | 30 | (These lists es here can be restricted to length <= 1, but they do 31 | require empty-list to be distinct from Empty(), to make it clear when 32 | the break test's lookahead wraps around.) 33 | 34 | nullable(Loop(before, after)) = nullable(before) 35 | firsts(Loop(before, after)) = firsts(before) | (firsts(after) if nullable(before) else {}) 36 | # I guess. 37 | 38 | Another approach: just translate everything to BNF and let tail-call 39 | optimization sort it out. Not a crazy idea, but I think it'd take more 40 | work in imperative target languages with no goto, and produce 41 | less-predictable code there, and might not fly well with semantic 42 | actions. 43 | 44 | I think we could simplify the current code by combining analyze and 45 | directify into one function (saving the analysis and the direct-form 46 | for each rule). And coalesce the interpreter and VM compiler into a 47 | nonrecursive interpreter. 48 | -------------------------------------------------------------------------------- /eg_ebnf/structs.py: -------------------------------------------------------------------------------- 1 | """ 2 | Define a named-tuple-like type, but for immutable values, and simpler. 3 | Also Visitor to dispatch on datatypes defined this way. 4 | """ 5 | 6 | # TODO figure out how to use __slots__ 7 | 8 | def Struct(field_names, name=None, supertype=(object,)): 9 | if isinstance(field_names, (str, unicode)): 10 | field_names = tuple(field_names.split()) 11 | 12 | if name is None: 13 | name = 'Struct<%s>' % ','.join(field_names) 14 | def get_name(self): return self.__class__.__name__ 15 | else: 16 | def get_name(self): return name 17 | 18 | def __init__(self, *args): 19 | if len(field_names) != len(args): 20 | raise TypeError("%s takes %d arguments (%d given)" 21 | % (get_name(self), len(field_names), len(args))) 22 | self.__dict__.update(zip(field_names, args)) 23 | 24 | def __repr__(self): 25 | return '%s(%s)' % (get_name(self), ', '.join(repr(getattr(self, f)) 26 | for f in field_names)) 27 | 28 | def __hash__(self): 29 | return hash((name, tuple(map(self.__dict__.__getitem__, field_names)))) 30 | 31 | def __eq__(self, other): 32 | return (self.__class__ is other.__class__ # I guess... 33 | and all(self.__dict__[field] == other.__dict__[field] 34 | for field in field_names)) 35 | def __ne__(self, other): 36 | return not __eq__(self, other) 37 | def compare(self, other): 38 | if self.__class__ is not other.__class__: 39 | raise NotImplemented 40 | return cmp(map(self.__dict__.__getitem__, field_names), 41 | map(other.__dict__.__getitem__, field_names)) 42 | 43 | # (for use with pprint) 44 | def my_as_sexpr(self): # XXX better name? 45 | return (get_name(self),) + tuple(as_sexpr(getattr(self, f)) 46 | for f in field_names) 47 | my_as_sexpr.__name__ = 'as_sexpr' 48 | 49 | return type(name, 50 | supertype, 51 | dict(__init__=__init__, 52 | __repr__=__repr__, 53 | __hash__=__hash__, 54 | __eq__=__eq__, 55 | __ne__=__ne__, 56 | __lt__=lambda self, other: compare(self, other) < 0, 57 | __le__=lambda self, other: compare(self, other) <= 0, 58 | __gt__=lambda self, other: compare(self, other) > 0, 59 | __ge__=lambda self, other: compare(self, other) >= 0, 60 | as_sexpr=my_as_sexpr, 61 | _meta_fields=field_names)) 62 | 63 | def as_sexpr(obj): 64 | if hasattr(obj, 'as_sexpr'): 65 | return getattr(obj, 'as_sexpr')() 66 | elif isinstance(obj, list): 67 | return map(as_sexpr, obj) 68 | elif isinstance(obj, tuple): 69 | return tuple(map(as_sexpr, obj)) 70 | else: 71 | return obj 72 | 73 | 74 | # Is there a nicer way to do this? 75 | 76 | class Visitor(object): 77 | def __call__(self, subject, *args): 78 | tag = subject.__class__.__name__ 79 | method = getattr(self, tag, None) 80 | if method is None: 81 | try: 82 | method = getattr(self, 'default') 83 | except AttributeError: 84 | raise AttributeError("%r has no method for %r argument %r" % (self, tag, subject)) 85 | return method(subject, *args) 86 | 87 | 88 | # Test comparisons and hashing: 89 | ## class Action(Struct('name')): pass 90 | ## Action('x') == Action('x') 91 | #. True 92 | ## Action('x') == Action('y') 93 | #. False 94 | ## Action('x') < Action('y') 95 | #. True 96 | ## Action('x') > Action('y') 97 | #. False 98 | ## d = {Action('x'): 1} 99 | ## d[Action('x')] 100 | #. 1 101 | ## set([Action('x'), Action('x')]) 102 | #. set([Action('x')]) 103 | -------------------------------------------------------------------------------- /eg_ebnf/vm.py: -------------------------------------------------------------------------------- 1 | """ 2 | Parse by interpreting a compiled VM. 3 | """ 4 | 5 | from structs import Visitor 6 | from ebnf import Grammar 7 | 8 | def compile_grammar(grammar): 9 | labels = {} 10 | insns = [] 11 | for name in grammar.nonterminals: 12 | labels[name] = len(insns) 13 | insns.extend(compiling(grammar.directed[name])) 14 | insns.append(('return', None)) 15 | # TODO: make sure the client knows about grammar.errors 16 | return Code(insns, labels, grammar.actions) 17 | 18 | class Code(object): 19 | def __init__(self, insns, labels, actions): 20 | self.insns = insns 21 | self.labels = labels 22 | self.actions = actions 23 | self.label_of_addr = dict(zip(labels.values(), labels.keys())) 24 | 25 | def show(self): 26 | for pc in range(len(self.insns)): 27 | self.show_insn(pc) 28 | 29 | def show_insn(self, pc): 30 | label = self.label_of_addr.get(pc, '') 31 | op, arg = self.insns[pc] 32 | if op == 'return': 33 | arg = '' 34 | elif op == 'branch': 35 | cases, default = arg 36 | cases = [(','.join(sorted(s.text for s in kinds)), dest) 37 | for kinds, dest in cases] 38 | arg = cases, default 39 | print '%-10s %3d %-6s %r' % (label, pc, op, arg) 40 | 41 | def parse(self, tokens, start='start'): 42 | limit = 100 43 | tokens = list(tokens) + [None] # EOF sentinel 44 | i = 0 45 | frames = [[]] 46 | return_stack = [None] 47 | pc = self.labels[start] 48 | print 'starting', pc 49 | while pc is not None: 50 | limit -= 1 51 | if limit <= 0: break 52 | print ' '*50, zip(return_stack, frames) 53 | self.show_insn(pc) 54 | op, arg = self.insns[pc] 55 | pc += 1 56 | if op == 'call': 57 | frames.append([]) 58 | return_stack.append(pc) 59 | pc = self.labels[arg] 60 | elif op == 'return': 61 | pc = return_stack.pop() 62 | results = frames.pop() 63 | if pc is None: 64 | return results 65 | frames[-1].extend(results) 66 | elif op == 'eat': 67 | if tokens[i] == arg.text: 68 | i += 1 69 | else: 70 | raise SyntaxError("Missing %r" % arg) 71 | elif op == 'branch': 72 | cases, default = arg 73 | for kinds, dest in cases: 74 | if tokens[i] in [s.text for s in kinds]: # XXX awkward 75 | pc += dest 76 | break 77 | else: 78 | pc += default 79 | elif op == 'jump': 80 | pc += arg 81 | elif op == 'act': 82 | action = self.actions[arg] 83 | frame = frames[-1] 84 | frame[:] = [action(*frame)] 85 | elif op == 'fail': 86 | raise SyntaxError("Unexpected token %r; expecting one of %r" 87 | % (tokens[i], sorted(arg))) 88 | else: 89 | assert False 90 | 91 | class Compiling(Visitor): 92 | def Empty(self, t): return [] 93 | def Symbol(self, t): return [('eat', t)] 94 | def Call(self, t): return [('call', t.name)] 95 | def Branch(self, t): return compile_branch(t) 96 | def Fail(self, t): return [('fail', t.possibles)] 97 | def Chain(self, t): return self(t.e1) + self(t.e2) 98 | def Loop(self, t): return compile_loop(t) 99 | def Action(self, t): return [('act', t.name)] 100 | compiling = Compiling() 101 | 102 | def compile_branch(t): 103 | cases = [] 104 | insns = [] 105 | fixups = [] 106 | for kinds, alt in t.cases: 107 | dest = len(insns) # Offset from the branch insn 108 | cases.append((kinds, dest)) 109 | insns.extend(compiling(alt)) 110 | fixups.append(len(insns)) 111 | insns.append(None) # to be fixed up 112 | default = len(insns) # Offset from the branch insn 113 | insns.extend(compiling(t.default)) 114 | for addr in fixups: 115 | insns[addr] = ('jump', len(insns) - (addr + 1)) # Skip to the common exit point 116 | insns.insert(0, ('branch', (cases, default))) 117 | return insns 118 | 119 | def compile_loop(t): 120 | body = compiling(t.body) 121 | return ([('branch', ([(t.firsts, 0)], len(body)+1))] 122 | + body 123 | + [('jump', -len(body)-2)]) 124 | 125 | 126 | # Smoke test 127 | 128 | ## from ebnf import eg 129 | ## import operator 130 | ## actions = dict(X=lambda: 3, **operator.__dict__) 131 | ## egg = Grammar(eg, actions) 132 | 133 | ## egc = compile_grammar(egg) 134 | ## egc.show() 135 | #. A 0 branch ([('b', 0), ('y', 4)], 6) 136 | #. 1 call 'B' 137 | #. 2 eat Symbol('x', 'literal') 138 | #. 3 call 'A' 139 | #. 4 jump 3 140 | #. 5 eat Symbol('y', 'literal') 141 | #. 6 jump 1 142 | #. 7 fail frozenset([Symbol('b', 'literal'), Symbol('y', 'literal')]) 143 | #. 8 return '' 144 | #. B 9 eat Symbol('b', 'literal') 145 | #. 10 return '' 146 | #. C 11 return '' 147 | #. exp 12 call 'term' 148 | #. 13 branch ([('+', 0), ('-', 4)], 8) 149 | #. 14 eat Symbol('+', 'literal') 150 | #. 15 call 'exp' 151 | #. 16 act 'add' 152 | #. 17 jump 4 153 | #. 18 eat Symbol('-', 'literal') 154 | #. 19 call 'exp' 155 | #. 20 act 'sub' 156 | #. 21 jump 0 157 | #. 22 return '' 158 | #. term 23 call 'factor' 159 | #. 24 branch ([('*', 0)], 4) 160 | #. 25 eat Symbol('*', 'literal') 161 | #. 26 call 'factor' 162 | #. 27 act 'mul' 163 | #. 28 jump -5 164 | #. 29 return '' 165 | #. factor 30 branch ([('x', 0), ('(', 3)], 7) 166 | #. 31 eat Symbol('x', 'literal') 167 | #. 32 act 'X' 168 | #. 33 jump 5 169 | #. 34 eat Symbol('(', 'literal') 170 | #. 35 call 'exp' 171 | #. 36 eat Symbol(')', 'literal') 172 | #. 37 jump 1 173 | #. 38 fail frozenset([Symbol('x', 'literal'), Symbol('(', 'literal')]) 174 | #. 39 return '' 175 | ### egc.parse("x", start='exp') 176 | ### egc.parse("x+x-x", start='exp') 177 | ### egc.parse("x+(x*x+x)", start='exp') 178 | -------------------------------------------------------------------------------- /eg_fp.py: -------------------------------------------------------------------------------- 1 | """ 2 | A concatenative variant of John Backus's FP language. 3 | http://en.wikipedia.org/wiki/FP_%28programming_language%29 4 | """ 5 | 6 | from __future__ import division 7 | 8 | from parson import Grammar 9 | 10 | program = {} 11 | 12 | def FP(text): 13 | global program 14 | program = dict(primitives) 15 | program.update(fp_parse(text)) 16 | 17 | def mk_def(name, exp): return (name, exp) 18 | def mk_call(name): return lambda arg: program[name](arg) 19 | def mk_if(c, t, e): return lambda arg: (t if c(arg) else e)(arg) 20 | def mk_compose(g, f): return lambda arg: f(g(arg)) 21 | def mk_map(f): return lambda arg: map(f, arg) 22 | def mk_insertl(f): return lambda arg: insertl(f, arg) 23 | def mk_insertr(f): return lambda arg: insertr(f, arg) 24 | def mk_filter(f): return lambda arg: filter(f, arg) 25 | def mk_aref(n): return (lambda arg: arg[n-1]) if 0 < n else (lambda arg: arg[n]) 26 | def mk_literal(n): return lambda _: n 27 | def mk_op(name): return ops[name] 28 | def mk_list(*exps): return lambda arg: [f(arg) for f in exps] 29 | 30 | escape = lambda s: s.decode('unicode-escape') 31 | 32 | fp_parse = Grammar(r""" def* :end. 33 | 34 | def : name '==' exp '.' :mk_def. 35 | 36 | exp : term ('->' term ';' exp :mk_if)?. 37 | 38 | term : factor (term :mk_compose)?. 39 | 40 | factor : '@' factor :mk_map 41 | | '/' factor :mk_insertr 42 | | '\\' factor :mk_insertl 43 | | '?' factor :mk_filter 44 | | primary. 45 | 46 | primary : integer :mk_aref 47 | | '~' integer :mk_literal 48 | | string :mk_literal 49 | | name :mk_call 50 | | /([<=>*%+-])/~ !opchar '' 51 | :mk_op 52 | | '[' exp ** ',' ']' :mk_list 53 | | '(' exp ')'. 54 | 55 | opchar : /[\w@\/\\?<=>*%+-]/. 56 | 57 | decimal : /(\d+)/ :int. 58 | integer : /(-?\d+)/ :int. 59 | name : /([A-Za-z]\w*)/. 60 | 61 | string ~: '"' schar* '"' FNORD :join. 62 | schar ~: /([^\x00-\x1f"\\])/ 63 | | /\\(["\\])/ 64 | | /(\\[bfnrt])/ :escape. 65 | 66 | FNORD ~: /\s*/. 67 | 68 | """)(**globals()) 69 | 70 | def insertl(f, xs): 71 | if not xs: return function_identity(f) 72 | return reduce(lambda x, y: f([x, y]), xs) 73 | 74 | def insertr(f, xs): 75 | if not xs: return function_identity(f) 76 | z = xs[-1] 77 | for x in xs[-2::-1]: 78 | z = f([x, z]) 79 | return z 80 | 81 | add = lambda (x, y): x + y 82 | sub = lambda (x, y): x - y 83 | mul = lambda (x, y): x * y 84 | divide = lambda (x, y): x / y 85 | intdiv = lambda (x, y): x // y 86 | mod = lambda (x, y): x % y 87 | eq = lambda (x, y): x == y 88 | lt = lambda (x, y): x < y 89 | gt = lambda (x, y): x > y 90 | 91 | ops = {'+': add, '-': sub, '*': mul, '%': divide, # N.B. '/' is reserved for insertr 92 | '=': eq, '<': lt, '>': gt} 93 | 94 | primitives = dict( 95 | apndl = lambda (x, xs): [x] + xs, 96 | apndr = lambda (xs, x): xs + [x], 97 | chain = lambda lists: sum(lists, []), 98 | distl = lambda (x, ys): [[x, y] for y in ys], 99 | distr = lambda (xs, y): [[x, y] for x in xs], 100 | div = intdiv, 101 | enumerate = lambda xs: [(x, i) for i,x in enumerate(xs, 1)], # XXX unused 102 | id = lambda x: x, 103 | iota = lambda n: range(1, n+1), 104 | join = lambda (strs, sep): sep.join(strs), 105 | length = len, 106 | mod = mod, 107 | rev = lambda xs: xs[::-1], 108 | slice = lambda (xs, n): [xs[:n-1], xs[n-1], xs[n:]], 109 | sort = sorted, 110 | split = lambda (s, sep): s.split(sep), 111 | tl = lambda xs: xs[1:], 112 | transpose = lambda arg: zip(*arg), 113 | ) 114 | primitives['and'] = lambda (x, y): x and y 115 | primitives['or'] = lambda (x, y): x or y 116 | 117 | def function_identity(f): 118 | if f in (add, sub): return 0 119 | if f in (mul, divide, intdiv): return 1 120 | # XXX could add chain, and, or, lt, gt, ... 121 | raise Exception("No known identity element", f) 122 | 123 | 124 | examples = r""" 125 | factorial == iota /*. 126 | 127 | e_sum == [~0, iota] apndl @(factorial [~1, id] %) /+. 128 | 129 | dot == transpose @* \+. 130 | matmult == [1, 2 transpose] distr @distl @@dot. 131 | 132 | iszero == [id, ~0] =. 133 | divisible == mod iszero. 134 | iseven == [id, ~2] divisible. 135 | 136 | max == /(< -> 2; 1). 137 | 138 | qsort == [length, ~2] < -> id; 139 | [id, 1] distr [?< @1 qsort, ?= @1, ?> @1 qsort] chain. 140 | 141 | euler1 == iota ?([[id, ~3] divisible, [id, ~5] divisible] or) /+. 142 | 143 | fibs == [~40, 1] < -> tl; [[1,2] +, id] apndl fibs. 144 | euler2 == [~2,~1] fibs ?iseven /+. 145 | 146 | fibsr == [~40, -1] < -> rev tl rev; [id, [-1,-2] +] apndr fibsr. 147 | euler2r == [~1,~2] fibsr ?iseven /+. 148 | """ 149 | 150 | def defs(names): return [program[name] for name in names.split()] 151 | 152 | ## FP(examples) 153 | ## factorial, e_sum, dot, matmult = defs('factorial e_sum dot matmult') 154 | ## divisible, euler1 = defs('divisible euler1') 155 | ## qmax, qsort = defs('max qsort') 156 | ## qmax([1, 5, 3]) 157 | #. 5 158 | ## qmax([5, 1]) 159 | #. 5 160 | ## qsort([]) 161 | #. [] 162 | ## qsort([3,1,4,1,5,9]) 163 | #. [1, 1, 3, 4, 5, 9] 164 | 165 | ## fibs, euler2, fibsr = defs('fibs euler2 fibsr') 166 | ## fibs([1,1]) 167 | #. [34, 21, 13, 8, 5, 3, 2, 1, 1] 168 | ## euler2(0) 169 | #. 44 170 | ## fibsr([1,1]) 171 | #. [1, 1, 2, 3, 5, 8, 13, 21, 34] 172 | 173 | ## divisible([9, 5]), divisible([10, 5]), 174 | #. (False, True) 175 | ## euler1(9) 176 | #. 23 177 | 178 | ## factorial(0) 179 | #. 1 180 | ## factorial(5) 181 | #. 120 182 | 183 | ## dot([[1,2], [3,4]]) 184 | #. 11 185 | ## dot([]) 186 | #. 0 187 | 188 | ## matmult([ [], [] ]) 189 | #. [] 190 | ## matmult([ [[4]], [[5]] ]) 191 | #. [[20]] 192 | ## matmult([ [[2,0],[0,2]], [[5,6],[7,8]] ]) 193 | #. [[10, 12], [14, 16]] 194 | ## matmult([ [[0,1],[1,0]], [[5,6],[7,8]] ]) 195 | #. [[7, 8], [5, 6]] 196 | 197 | ## e_sum(20) 198 | #. 2.718281828459045 199 | 200 | 201 | # Inspired by James Morris, "Real programming in functional 202 | # languages", figure 1. 203 | 204 | kwic_program = r""" 205 | kwic == lines split kwiclines lines join. 206 | 207 | kwiclines == @(words split generate) chain sort @2. 208 | generate == [id, length iota] distl @label. 209 | label == slice [2, 210 | [1, [["<",2,">"] chars join], 3] chain words join]. 211 | 212 | chars == [id, ""]. 213 | words == [id, " "]. 214 | lines == [id, "\n"]. 215 | """ 216 | 217 | ## FP(kwic_program) 218 | ## kwic, = defs('kwic') 219 | ## print kwic("leaves of grass\nflowers of evil") 220 | #. flowers of 221 | #. of evil 222 | #. leaves of 223 | #. of grass 224 | #. flowers evil 225 | #. leaves grass 226 | 227 | 228 | # Prime numbers from 3 to ? 229 | # (adapted from an example from Andy Valencia's C implementation of FP) 230 | primes_program = r""" 231 | primes == candidates ?isprime. 232 | isprime == [id, [id, ~4] div candidates] distl ?divisible isempty. 233 | isempty == [length, ~0] =. 234 | divisible == [mod, ~0] =. 235 | candidates == iota @(double add1). 236 | double == [~2, id] *. 237 | add1 == [id, ~1] +. 238 | """ 239 | ## FP(primes_program) 240 | ## primes, = defs('primes') 241 | ## primes(20) 242 | #. [3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41] 243 | -------------------------------------------------------------------------------- /eg_itsy/README.md: -------------------------------------------------------------------------------- 1 | ## Itsy is not C but can translate to it directly. 2 | 3 | An example language inspired by the design goals of Per Vognsen's 4 | [Ion](https://github.com/pervognsen/bitwise/blob/master/notes/ion_motivation.md) 5 | language, but the syntax was slapped together mostly before seeing his 6 | work. (The operator precedences are taken from Ion, though.) Also, I'm 7 | pretty unconcerned about familiarity to C people. 8 | 9 | Not finished, and I dunno if it ever will be. 10 | 11 | The syntax for pointers and arrays (their declaration and use) has 12 | appeared earlier in the SPECS alternative syntax for C++ (Ben Werther 13 | & Damian Conway) and in [Odin](https://github.com/odin-lang/Odin). I 14 | didn't get it from them but by following the implications of a remark 15 | in Ritchie's "The Development of the C Language": "Sethi observed that 16 | many of the nested declarations and expressions would become simpler 17 | if the indirection operator had been taken as a postfix operator 18 | instead of prefix, but by then it was too late to change." For this he 19 | references R. Sethi, "Uniform Syntax for Type Expressions and 20 | Declarators." Softw., Pract. Exper. 11 (6): 623-628 21 | (1981). Unfortunately I can't find that paper online. 22 | -------------------------------------------------------------------------------- /eg_itsy/ast.py: -------------------------------------------------------------------------------- 1 | """ 2 | Abstract syntax of itsy. 3 | """ 4 | 5 | from structs import Struct 6 | 7 | 8 | # Declarations 9 | 10 | class To( Struct('pos name signature body')): pass 11 | class Typedef( Struct('pos name type')): pass 12 | class Let( Struct('pos names type opt_exp')): pass 13 | class Enum( Struct('pos opt_name pairs')): pass 14 | class Record( Struct('pos kind opt_name fields')): pass # kind = 'struct' or 'union' 15 | 16 | 17 | # Statements 18 | 19 | class Block( Struct('pos parts')): pass 20 | 21 | class Exp( Struct('pos opt_exp')): pass 22 | class Return( Struct('pos opt_exp')): pass 23 | class Break( Struct('pos ')): pass 24 | class Continue( Struct('pos ')): pass 25 | class While( Struct('pos exp block')): pass 26 | class Do( Struct('pos block exp')): pass 27 | class For( Struct('pos opt_e1 opt_e2 opt_e3 block')): pass 28 | class If_stmt( Struct('pos exp then_ opt_else')): pass 29 | class Switch( Struct('pos exp cases')): pass 30 | 31 | class Case( Struct('pos exps block')): pass 32 | class Default( Struct('pos block')): pass 33 | 34 | 35 | # Types 36 | 37 | class Type_name( Struct('pos name')): pass 38 | class Pointer( Struct('pos type')): pass 39 | class Array( Struct('pos size type')): pass 40 | class Signature( Struct('pos params return_type')): pass # params are (type, (name or '')) pairs 41 | 42 | # TODO rename fields like 'type' to 'base_type' or something 43 | 44 | class Int_type( Struct('size signedness')): pass 45 | class Float_type(Struct('size')): pass 46 | 47 | def Void(pos): return Type_name(pos, 'void') # for now, anyway 48 | 49 | def spread_params(names, type_): 50 | return tuple((type_, name) for name in names) 51 | 52 | def chain(*seqs): 53 | return sum(seqs, ()) 54 | 55 | 56 | # Expressions 57 | 58 | class Seq( Struct('e1 e2')): pass 59 | class Assign( Struct('e1 opt_binop e2')): pass 60 | class If_exp( Struct('e1 e2 e3')): pass 61 | class And( Struct('e1 e2')): pass 62 | class Or( Struct('e1 e2')): pass 63 | class Binary_exp( Struct('e1 binop e2')): pass 64 | class Index( Struct('e1 e2')): pass 65 | class Call( Struct('e1 args')): pass 66 | class Dot( Struct('e1 field')): pass 67 | class Deref( Struct('e1')): pass 68 | class Post_incr( Struct('e1 op')): pass 69 | class Cast( Struct('e1 type')): pass 70 | 71 | class Literal( Struct('pos text kind')): pass 72 | class Variable( Struct('pos name')): pass 73 | class Address_of( Struct('pos e1')): pass 74 | class Sizeof_type( Struct('pos type')): pass 75 | class Sizeof( Struct('pos e1')): pass 76 | class Unary_exp( Struct('pos unop e1')): pass 77 | class Pre_incr( Struct('pos e1 op')): pass 78 | class Compound_exp(Struct('pos exps')): pass 79 | -------------------------------------------------------------------------------- /eg_itsy/c_emitter.py: -------------------------------------------------------------------------------- 1 | """ 2 | Emit C code from an AST. 3 | """ 4 | 5 | from structs import Visitor 6 | import ast 7 | 8 | def indent(s): 9 | return s.replace('\n', '\n ') 10 | 11 | def embrace(lines): 12 | return '{\n %s\n}' % indent('\n'.join(lines)) 13 | 14 | def opt_c_exp(opt_e, if_some='', p=0): 15 | return '' if opt_e is None else if_some + c_exp(opt_e, p) 16 | 17 | def opt_space(opt_s): 18 | return ' %s' % opt_s if opt_s else '' 19 | 20 | 21 | # Declarations and statements (either can appear in a block) 22 | # TODO rename 'declaration' to avoid confusion with c_decl below 23 | 24 | class CEmitter(Visitor): 25 | 26 | def To(self, t): 27 | return '%s %s' % (c_decl(t.signature, t.name), c(t.body)) 28 | 29 | def Typedef(self, t): 30 | return 'typedef %s;' % c_decl(t.type, t.name) 31 | 32 | def Let(self, t): 33 | assert t.opt_exp is None or len(t.names) == 1 34 | assign = opt_c_exp(t.opt_exp, ' = ', elem_prec) 35 | return '\n'.join('%s%s;' % (c_decl(t.type, name), assign) 36 | for name in t.names) 37 | 38 | def Enum(self, t): 39 | enums = ['%s%s,' % (name, opt_c_exp(opt_exp, ' = ')) 40 | for name, opt_exp in t.pairs] 41 | lines = [] 42 | if t.opt_name: 43 | lines.append('typedef enum %s %s;' % (t.opt_name, t.opt_name)) 44 | lines.append('enum%s %s;' % (opt_space(t.opt_name), embrace(enums))) 45 | return '\n'.join(lines) 46 | 47 | def Record(self, t): 48 | lines = [] 49 | if t.opt_name: 50 | lines.append('typedef %s %s %s;' % (t.kind, t.opt_name, t.opt_name)) 51 | c_defn = '%s%s %s;' % (t.kind, 52 | opt_space(t.opt_name), 53 | embrace(c_decl(type_, name) + ';' 54 | for type_, name in t.fields)) 55 | lines.append(c_defn) 56 | return '\n'.join(lines) 57 | 58 | def Block(self, t): 59 | return embrace(map(c, t.parts)); 60 | 61 | def Exp(self, t): 62 | return opt_c_exp(t.opt_exp) + ';' 63 | 64 | def Return(self, t): 65 | return 'return%s;' % opt_c_exp(t.opt_exp, ' ') 66 | 67 | def Break(self, t): 68 | return 'break;' 69 | 70 | def Continue(self, t): 71 | return 'continue;' 72 | 73 | def While(self, t): 74 | return 'while (%s) %s' % (c_exp(t.exp), c(t.block)) 75 | 76 | def Do(self, t): 77 | return 'do %s while (%s);' % (c(t.block), c_exp(t.exp)) 78 | 79 | def If_stmt(self, t): 80 | branches = [] 81 | while isinstance(t, ast.If_stmt): 82 | branches.append('if (%s) %s' % (c_exp(t.exp), c(t.then_))) 83 | t = t.opt_else 84 | if t is not None: 85 | branches.append(c(t)) 86 | return ' else '.join(branches) 87 | 88 | def For(self, t): 89 | return 'for (%s; %s; %s) %s' % (opt_c_exp(t.opt_e1), 90 | opt_c_exp(t.opt_e2), 91 | opt_c_exp(t.opt_e3), 92 | c(t.block)) 93 | 94 | def Switch(self, t): 95 | return 'switch (%s) %s' % (c_exp(t.exp), 96 | embrace(map(c, t.cases))) 97 | 98 | def Case(self, t): 99 | cases = '\n'.join('case %s:' % c_exp(e) for e in t.exps) 100 | return '%s %s break;' % (cases, c(t.block)) 101 | 102 | def Default(self, t): 103 | return 'default: %s break;' % c(t.block) 104 | 105 | c = c_emit = CEmitter() 106 | 107 | 108 | # Types 109 | 110 | def c_type(type_): 111 | return c_decl(type_, '') 112 | 113 | def c_decl(type_, name): 114 | return ('%s %s' % decl_pair(type_, name, 0)).rstrip() 115 | 116 | class DeclPair(Visitor): 117 | 118 | def Type_name(self, t, e, p): # e: expression-like C fragment, p: surrounding precedence 119 | return t.name, e 120 | 121 | def Pointer(self, t, e, p): 122 | return self(t.type, 123 | '*%s' % hug(e, p, 2), 124 | 2) 125 | 126 | def Array(self, t, e, p): 127 | return self(t.type, 128 | '%s[%s]' % (hug(e, p, 1), opt_c_exp(t.size)), 129 | 1) 130 | 131 | def Signature(self, t, e, p): 132 | c_params = ', '.join(c_decl(type_, name) for type_, name in t.params) 133 | return self(t.return_type, 134 | '%s(%s)' % (hug(e, p, 1), c_params or 'void'), 135 | 1) 136 | 137 | decl_pair = DeclPair() 138 | 139 | def hug(s, outer, inner): 140 | return s if outer <= inner else '(%s)' % s # XXX is '<=' quite right? instead of '<'? 141 | 142 | 143 | # Expressions 144 | 145 | def c_exp(e, p=0): # p: surrounding precedence 146 | return c_exp_emitter(e, p) 147 | 148 | class CExpEmitter(Visitor): 149 | 150 | def Literal(self, t, p): 151 | return t.text 152 | 153 | def Variable(self, t, p): 154 | return t.name 155 | 156 | def Address_of(self, t, p): 157 | return fmt1(p, unary_prec, '&%s', t.e1) 158 | 159 | def Sizeof_type(self, t, p): 160 | return 'sizeof(%s)' % c_type(t.type) 161 | 162 | def Sizeof(self, t, p): 163 | return fmt1(p, unary_prec, 'sizeof %s', t.e1) 164 | 165 | def Deref(self, t, p): 166 | return fmt1(p, unary_prec, '*%s', t.e1) 167 | 168 | def Unary_exp(self, t, p): 169 | return fmt1(p, unary_prec, t.unop + '%s', t.e1) 170 | 171 | def Cast(self, t, p): 172 | return wrap(p, cast_prec, '(%s) %s' % (c_type(t.type), 173 | self(t.e1, cast_prec))) 174 | 175 | def Seq(self, t, p): 176 | return fmt2(p, ',', t.e1, t.e2, fmt_str = '%s%s %s') 177 | 178 | def Pre_incr(self, t, p): 179 | return fmt1(p, unary_prec, t.op+'%s', t.e1) 180 | 181 | def Post_incr(self, t, p): 182 | return fmt1(p, postfix_prec, '%s'+t.op, t.e1) 183 | 184 | def If_exp(self, t, p): 185 | lp, rp = binaries['?:'] 186 | return wrap(p, rp, # TODO recheck that rp is the right thing here in place of the usual lp 187 | '%s ? %s : %s' % (self(t.e2, lp), 188 | self(t.e1, 0), 189 | self(t.e3, rp))) 190 | 191 | def Assign(self, t, p): 192 | return fmt2(p, (t.opt_binop or '') + '=', t.e1, t.e2) # TODO clumsy 193 | 194 | def Binary_exp(self, t, p): 195 | op = '^' if t.binop == '@' else t.binop 196 | return fmt2(p, op, t.e1, t.e2) 197 | 198 | def Index(self, t, p): 199 | return wrap(p, postfix_prec, 200 | '%s[%s]' % (self(t.e1, postfix_prec), 201 | self(t.e2, 0))) 202 | 203 | def Call(self, t, p): 204 | return wrap(p, postfix_prec, 205 | '%s(%s)' % (self(t.e1, postfix_prec), 206 | ', '.join(self(e, elem_prec) 207 | for e in t.args))) 208 | 209 | def Dot(self, t, p): 210 | if isinstance(t.e1, ast.Deref): 211 | lhs, op = t.e1.e1, '->' 212 | else: 213 | lhs, op = t.e1, '.' 214 | return wrap(p, postfix_prec, self(lhs, postfix_prec) + op + t.field) 215 | 216 | def And(self, t, p): 217 | return fmt2(p, '&&', t.e1, t.e2) 218 | 219 | def Or(self, t, p): 220 | return fmt2(p, '||', t.e1, t.e2) 221 | 222 | def Compound_exp(self, t, p): 223 | # XXX I think C imposes restrictions on where this can appear? 224 | # Also, the indentation might be awful without more work 225 | return embrace(c_exp(e, elem_prec) + ',' for e in t.exps) 226 | 227 | c_exp_emitter = CExpEmitter() 228 | 229 | 230 | # Parenthesizing by precedence 231 | 232 | infix_precedence_tower = """\ 233 | , 234 | = 235 | ?: 236 | || 237 | && 238 | | 239 | ^ 240 | & 241 | == != 242 | < > <= >= 243 | << >> 244 | + - 245 | * / % 246 | (cast) 247 | (unary) 248 | (postfix)""".splitlines() 249 | 250 | binaries = {op: (2*i, 2*i+1) 251 | for i, line in enumerate(infix_precedence_tower) 252 | for op in line.split()} 253 | cast_prec = binaries['(cast)'][0] 254 | unary_prec = binaries['(unary)'][0] 255 | postfix_prec = binaries['(postfix)'][0] 256 | 257 | elem_prec = binaries['='][0] # The next precedence after ',' 258 | # Make the left precedence of the assignment operator be unary_expression, and right-associative: 259 | binaries['='] = (unary_prec, binaries['='][0]) 260 | 261 | # Also, ?: is also right-associative: 262 | binaries['?:'] = (binaries['?:'][0], binaries['?:'][0]) 263 | 264 | def fmt1(outer, inner, fmt_str, e1): 265 | return wrap(outer, inner, fmt_str % c_exp(e1, inner)) 266 | 267 | def wrap(outer, inner, s): 268 | return '(%s)' % s if inner < outer else s 269 | 270 | def fmt2(p, op, e1, e2, fmt_str='%s %s %s'): 271 | lp, rp = binaries['=' if op.endswith('=') else op] 272 | return wrap(p, lp, fmt_str % (c_exp(e1, lp), op, c_exp(e2, rp))) 273 | -------------------------------------------------------------------------------- /eg_itsy/c_prelude.h: -------------------------------------------------------------------------------- 1 | // Standard prelude for Itsy code translated to C. 2 | 3 | #include 4 | #include 5 | #include 6 | #include 7 | #include 8 | #include 9 | #include 10 | #include 11 | #include 12 | #include 13 | #include 14 | #include 15 | 16 | typedef unsigned int uint; 17 | 18 | typedef int8_t int8; 19 | typedef int16_t int16; 20 | typedef int32_t int32; 21 | typedef int64_t int64; 22 | 23 | typedef uint8_t uint8; 24 | typedef uint16_t uint16; 25 | typedef uint32_t uint32; 26 | typedef uint64_t uint64; 27 | 28 | typedef float float32; 29 | typedef double float64; 30 | 31 | // Definitions from Ion by Per Vognsen. I need these for the Bitwise 32 | // homework for now. In the longer term we'll presumably want a more 33 | // flexible way to bring in external headers/libraries. 34 | 35 | #define MAX(x, y) ((x) >= (y) ? (x) : (y)) 36 | 37 | static void *xrealloc(void *ptr, size_t num_bytes) { 38 | ptr = realloc(ptr, num_bytes); 39 | if (!ptr) { 40 | perror("xrealloc failed"); 41 | exit(1); 42 | } 43 | return ptr; 44 | } 45 | 46 | static void *xmalloc(size_t num_bytes) { 47 | void *ptr = malloc(num_bytes); 48 | if (!ptr) { 49 | perror("xmalloc failed"); 50 | exit(1); 51 | } 52 | return ptr; 53 | } 54 | 55 | static void fatal(const char *fmt, ...) { 56 | va_list args; 57 | va_start(args, fmt); 58 | printf("FATAL: "); 59 | vprintf(fmt, args); 60 | printf("\n"); 61 | va_end(args); 62 | exit(1); 63 | } 64 | 65 | // Stretchy buffers, invented (?) by Sean Barrett 66 | 67 | typedef struct BufHdr { 68 | size_t len; 69 | size_t cap; 70 | char buf[]; 71 | } BufHdr; 72 | 73 | #define buf__hdr(b) ((BufHdr *)((char *)(b) - offsetof(BufHdr, buf))) 74 | 75 | #define buf_len(b) ((b) ? buf__hdr(b)->len : 0) 76 | #define buf_cap(b) ((b) ? buf__hdr(b)->cap : 0) 77 | #define buf_end(b) ((b) + buf_len(b)) 78 | #define buf_sizeof(b) ((b) ? buf_len(b)*sizeof(*b) : 0) 79 | 80 | #define buf_free(b) ((b) ? (free(buf__hdr(b)), (b) = NULL) : 0) 81 | #define buf_fit(b, n) ((n) <= buf_cap(b) ? 0 : ((b) = buf__grow((b), (n), sizeof(*(b))))) 82 | #define buf_push(b, ...) (buf_fit((b), 1 + buf_len(b)), (b)[buf__hdr(b)->len++] = (__VA_ARGS__)) 83 | 84 | static void *buf__grow(const void *buf, size_t new_len, size_t elem_size) { 85 | assert(buf_cap(buf) <= (SIZE_MAX - 1)/2); 86 | size_t new_cap = MAX(16, MAX(1 + 2*buf_cap(buf), new_len)); 87 | assert(new_len <= new_cap); 88 | assert(new_cap <= (SIZE_MAX - offsetof(BufHdr, buf))/elem_size); 89 | size_t new_size = offsetof(BufHdr, buf) + new_cap*elem_size; 90 | BufHdr *new_hdr; 91 | if (buf) { 92 | new_hdr = xrealloc(buf__hdr(buf), new_size); 93 | } else { 94 | new_hdr = xmalloc(new_size); 95 | new_hdr->len = 0; 96 | } 97 | new_hdr->cap = new_cap; 98 | return new_hdr->buf; 99 | } 100 | 101 | // End of prelude. 102 | -------------------------------------------------------------------------------- /eg_itsy/complainer.py: -------------------------------------------------------------------------------- 1 | """ 2 | Format parse errors with a vaguely-friendly display of the position. 3 | TODO Parson itself ought to help with something like this 4 | TODO Position info needs to be a range, not a single coordinate. 5 | Or at least point to the '+' in 'a+a' instead of to the 'a'. 6 | """ 7 | 8 | from structs import Struct 9 | import sys 10 | 11 | status_ok, status_error = range(2) 12 | 13 | class Complainer(object): 14 | 15 | def __init__(self, text, filename): 16 | self.text = text 17 | self.filename = filename 18 | self.status = status_ok 19 | 20 | def ok(self): 21 | return self.status == status_ok 22 | 23 | def syntax_error(self, exc): 24 | self.complain(exc.failure, "Syntax error") 25 | 26 | def semantic_error(self, plaint, pos): 27 | self.complain((self.text[:pos], self.text[pos:]), plaint) 28 | 29 | def complain(self, (before, after), plaint): 30 | self.status = status_error 31 | line_no = before.count('\n') 32 | prefix = (before+'\n').splitlines()[line_no] 33 | suffix = (after+'\n').splitlines()[0] # XXX what if right on newline? 34 | prefix, suffix = sanitize(prefix), sanitize(suffix) 35 | message = ["%s:%d:%d: %s" % (self.filename, line_no+1, len(prefix), plaint), 36 | ' ' + prefix + suffix, 37 | ' ' + ' '*len(prefix) + '^'] 38 | sys.stderr.write('\n'.join(message) + '\n') 39 | 40 | def sanitize(s): 41 | "Make s predictably printable, sans control characters like tab." 42 | return ''.join(c if ' ' <= c < chr(127) else ' ' # XXX crude 43 | for c in s) 44 | -------------------------------------------------------------------------------- /eg_itsy/eg/examples.itsy: -------------------------------------------------------------------------------- 1 | let a: int = 5; 2 | 3 | to f(x: int): int { 4 | return x * x; 5 | } 6 | 7 | to fib(n: int): int { 8 | return 1 if n < 2 else fib(n-1) + fib(n-2); 9 | } 10 | 11 | to fact(n: int): int { 12 | let p: int = 1; 13 | for ; 0 < n; --n { 14 | p *= n; 15 | } 16 | return p; 17 | } 18 | 19 | to asm_store_field(address: Address, L, R: int, cell: Cell): void { 20 | assert(address < memory_size); 21 | if VERBOSE { 22 | let temp: [12]int; 23 | unparse_cell(temp, cell); 24 | printf("%4o(%u,%u): %s\n", address, L, R, temp); 25 | } 26 | memory[address] = set_field(cell, make_field_spec(L, R), memory[address]); 27 | } 28 | 29 | to make16(a, b: uint8): int { 30 | return a:int + b:int << 8; 31 | } 32 | 33 | to foo() { 34 | do { 35 | if a == sizeof:int { continue; } 36 | else if b { break; } 37 | } while x-->0; 38 | } 39 | 40 | struct Closure { 41 | f: ^(^Closure, float64) void; 42 | free_var1, free_var2: int; 43 | } 44 | -------------------------------------------------------------------------------- /eg_itsy/eg/regex.itsy: -------------------------------------------------------------------------------- 1 | // star_thompsonlike_lowlevel.py ported to C ported to itsy 2 | // TODO explicit clarity on signed vs. unsigned 3 | 4 | enum { loud = 0 } 5 | 6 | to error(plaint: ^char) { 7 | fprintf(stderr, "%s\n", plaint); 8 | exit(1); 9 | } 10 | 11 | enum { max_insns = 8192, accept = 0 } 12 | enum { op_accept, op_eat, op_fork, op_loop, } 13 | 14 | let ninsns: int; 15 | let accepts, ops: [max_insns]uint8; 16 | let arg1, arg2: [max_insns]int16; 17 | 18 | let names: [4]^char = [ // TODO leave out the 4 19 | "win", "eat", "fork", "loop", 20 | ]; 21 | 22 | to dump1(pc: int) { 23 | printf("%c %2u: %-4s ", '*' if accepts[pc] else ' ', pc, names[ops[pc]]); 24 | printf("\n" if pc == accept else "'%c' %d\n" if ops[pc] == op_eat else "%d %d\n", 25 | arg1[pc], arg2[pc]); 26 | } 27 | 28 | to dump() { 29 | let pc: int; 30 | for pc = ninsns-1; 0 <= pc; --pc { 31 | dump1(pc); 32 | } 33 | } 34 | 35 | let occupied: [max_insns]uint8; 36 | 37 | to after(ch: char, start, end: int, next_states: ^^int) { 38 | while start != end { 39 | let r: int = arg1[start]; 40 | let s: int = arg2[start]; 41 | match ops[start] { 42 | on op_eat { 43 | if r == ch && !occupied[s] { 44 | next_states^++^ = s; 45 | occupied[s] = 1; 46 | } 47 | return; 48 | } 49 | on op_fork { 50 | after(ch, r, end, next_states); 51 | start = s; 52 | } 53 | on op_loop { 54 | after(ch, r, start, next_states); 55 | start = s; 56 | } 57 | else { 58 | error("Can't happen"); 59 | } 60 | } 61 | } 62 | } 63 | 64 | let states0, states1: [max_insns]int; 65 | 66 | to run(start: int, input: ^char): int { 67 | if accepts[start] { 68 | return 1; 69 | } 70 | let cur_start, cur_end, next_start, next_end: ^int; 71 | cur_start = states0, cur_end = cur_start; 72 | next_start = states1, next_end = next_start; 73 | cur_end++^ = start; 74 | memset(occupied, 0, ninsns); // N.B. we could avoid this by always 75 | // finishing the next_start..next_end 76 | // loop below 77 | 78 | for ; input^; ++input { 79 | let state: ^int; 80 | for state = cur_start; state < cur_end; ++state { 81 | after(input^, state^, accept, &next_end); 82 | } 83 | for state = next_start; state < next_end; ++state { 84 | if accepts[state^] { 85 | return 1; 86 | } 87 | occupied[state^] = 0; 88 | } 89 | let t: ^int = cur_start; 90 | cur_start = next_start, cur_end = next_end; 91 | next_start = next_end = t; 92 | } 93 | return 0; 94 | } 95 | 96 | to emit(op: uint8, r, s: int, accepting: uint8): int { 97 | if max_insns <= ninsns { error("Pattern too long"); } 98 | ops[ninsns] = op, arg1[ninsns] = r, arg2[ninsns] = s; 99 | accepts[ninsns] = accepting; 100 | return ninsns++; 101 | } 102 | 103 | // start, current parsing position 104 | let pattern, pp: ^char; 105 | 106 | to eat(c: char): int { 107 | return (--pp, 1) if pattern < pp && pp[-1] == c else 0; 108 | } 109 | 110 | to parsing(precedence, state: int): int { 111 | let rhs: int; 112 | if pattern == pp || pp[-1] == '(' || pp[-1] == '|' { 113 | rhs = state; 114 | } 115 | else if eat(')') { 116 | rhs = parsing(0, state); 117 | if !eat('(') { error("Mismatched ')'"); } 118 | } 119 | else if eat('*') { 120 | rhs = emit(op_loop, 0, state, accepts[state]); // (0 is a placeholder... 121 | arg1[rhs] = parsing(6, rhs); // ...filled in here.) 122 | } 123 | else { 124 | rhs = emit(op_eat, (--pp)^, state, 0); 125 | } 126 | while pattern < pp && pp[-1] != '(' { 127 | let prec: int = 3 if pp[-1] == '|' else 5; 128 | if prec <= precedence { break; } 129 | if eat('|') { 130 | let rhs2: int = parsing(prec, state); 131 | rhs = emit(op_fork, rhs, rhs2, accepts[rhs] || accepts[rhs2]); 132 | } 133 | else { 134 | rhs = parsing(prec, rhs); 135 | } 136 | } 137 | return rhs; 138 | } 139 | 140 | to parse(string: ^char): int { 141 | pattern = string; pp = pattern + strlen(pattern); 142 | ninsns = 0; 143 | let state: int = parsing(0, emit(op_accept, 0, 0, 1)); 144 | if pattern != pp { error("Bad pattern"); } 145 | return state; 146 | } 147 | 148 | to main(argc: int, argv: ^^char): int { 149 | if argc != 2 { error("Usage: grep pattern"); } 150 | let start_state: int = parse(argv[1]); 151 | if loud { 152 | printf("start: %u\n", start_state); 153 | dump(); 154 | } 155 | let matched: int = 0; 156 | let line: [9999]char; 157 | while fgets(line, sizeof line, stdin) { 158 | if run(start_state, line) { 159 | fputs(line, stdout); 160 | matched = 1; 161 | } 162 | } 163 | return !matched; 164 | } 165 | -------------------------------------------------------------------------------- /eg_itsy/eg/sieve.itsy: -------------------------------------------------------------------------------- 1 | // Sieve of Eratosthenes benchmark. 2 | 3 | enum { SIZE = 8190 } 4 | 5 | let flags: [SIZE+1] bool; 6 | 7 | to main(): int { 8 | printf("10 iterations\n"); 9 | let count: int; 10 | let iter: int; 11 | for iter = 1; iter <= 10; ++iter { 12 | count = 0; 13 | let i: int; 14 | for i = 0; i <= SIZE; ++i { flags[i] = true; } 15 | for i = 0; i <= SIZE; ++i { 16 | if flags[i] { 17 | let prime: int = i + i + 3; 18 | let k: int; 19 | for k = i + prime; k <= SIZE; k += prime { 20 | flags[k] = false; 21 | } 22 | ++count; 23 | } 24 | } 25 | } 26 | printf("%d primes\n", count); 27 | return 0; 28 | } 29 | -------------------------------------------------------------------------------- /eg_itsy/eg/superopt.itsy: -------------------------------------------------------------------------------- 1 | // Ported from my superbench repo 2 | 3 | enum { max_wires = 20 } 4 | enum { max_inputs = 5 } 5 | 6 | typedef Word = uint; 7 | 8 | let argv0: ^char = ""; 9 | 10 | to error(plaint: ^char) { 11 | fprintf(stderr, "%s: %s\n", argv0, plaint); 12 | exit(1); 13 | } 14 | 15 | let target_output: Word; 16 | 17 | let ninputs: int; 18 | let mask: Word; 19 | 20 | let found: bool = false; 21 | let nwires: int; 22 | let wires: [max_wires]Word; 23 | let linputs, rinputs: [max_wires]int; 24 | 25 | to vname(w: int): char { 26 | return w + ('A' if w < ninputs else 'a'); 27 | } 28 | 29 | to print_circuit() { 30 | let w: int; 31 | for w = ninputs; w < nwires; ++w { 32 | printf("%s%c = ~(%c %c)", 33 | "" if w == ninputs else "; ", 34 | vname(w), vname(linputs[w]), vname(rinputs[w])); 35 | } 36 | printf("\n"); 37 | } 38 | 39 | to compute(left_input, right_input: Word): Word { 40 | return ~(left_input & right_input); // A ~& operator would be nice. 41 | } 42 | 43 | to sweeping(w: int) { 44 | let ll: int; 45 | for ll = 0; ll < w; ++ll { 46 | let llwire: Word = wires[ll]; 47 | linputs[w] = ll; 48 | if w+1 == nwires { 49 | let rr: int; 50 | for rr = 0; rr <= ll; ++rr { 51 | if mask & compute(llwire, wires[rr]) == target_output { // N.B. & precedence 52 | found = true; 53 | rinputs[w] = rr; 54 | print_circuit(); 55 | } 56 | } 57 | } 58 | else { 59 | let rr: int; 60 | for rr = 0; rr <= ll; ++rr { 61 | wires[w] = compute(llwire, wires[rr]); 62 | rinputs[w] = rr; 63 | sweeping(w + 1); 64 | } 65 | } 66 | } 67 | } 68 | 69 | to tabulate_inputs() { 70 | let i: int; 71 | for i = 1; i <= ninputs; ++i { 72 | let shift: Word = 1 << (i-1); 73 | wires[ninputs-i] = (1 << shift) - 1; // XXX 1u // N.B. could leave out parens 74 | let j: int; 75 | for j = ninputs-i+1; j < ninputs; ++j { 76 | wires[j] |= wires[j] << shift; 77 | } 78 | } 79 | } 80 | 81 | to find_circuits(max_gates: int) { 82 | mask = (1 << (1 << ninputs)) - 1; 83 | tabulate_inputs(); 84 | printf("Trying 0 gates...\n"); 85 | if target_output == 0 || target_output == mask { 86 | printf("%c = %d\n", vname(ninputs), target_output & 1); 87 | return; 88 | } 89 | let w: int; 90 | for w = 0; w < ninputs; ++w { 91 | if target_output == wires[w] { 92 | printf("%c = %c\n", vname(ninputs), vname(w)); 93 | return; 94 | } 95 | } 96 | let ngates: int; 97 | for ngates = 1; ngates <= max_gates; ++ngates { 98 | printf("Trying %d gates...\n", ngates); 99 | nwires = ninputs + ngates; 100 | assert(nwires <= 26); // vnames must be letters 101 | sweeping(ninputs); 102 | if found { return; } 103 | } 104 | } 105 | 106 | to parse_uint(s: ^char, base: uint): uint { 107 | let end: ^char; 108 | let u: uint64 = strtoul(s, &end, base); 109 | if u == 0 && errno == EINVAL { 110 | error(strerror(errno)); 111 | } 112 | if end^ != '\0' { 113 | error("Literal has crud in it, or extra spaces, or something"); 114 | } 115 | return u:uint; 116 | } 117 | 118 | to superopt(tt_output: ^char, max_gates: int) { 119 | ninputs = log2(strlen(tt_output)): int; 120 | if 1 << ninputs != strlen(tt_output) { 121 | error("truth_table_output must have a power-of-2 size"); 122 | } 123 | if max_inputs < ninputs { 124 | error("Truth table too big. I can't represent so many inputs."); 125 | } 126 | target_output = parse_uint(tt_output, 2); 127 | find_circuits(max_gates); 128 | } 129 | 130 | to main(argc: int, argv: ^^char): int { 131 | argv0 = argv[0]; 132 | assert((1 << (1 << max_inputs)) - 1 <= UINT_MAX); 133 | if argc != 3 { 134 | error("Usage: circuitoptimizer truth_table_output max_gates"); 135 | } 136 | superopt(argv[1], parse_uint(argv[2], 10): int); 137 | return 0; 138 | } 139 | -------------------------------------------------------------------------------- /eg_itsy/eg/um.itsy: -------------------------------------------------------------------------------- 1 | // Interpreter for the "Universal Machine" 2 | // documented at http://boundvariable.org/task.shtml 3 | 4 | // Compile-time options: 5 | 6 | // Turn off safety checks; assume the UM image is neither incorrect 7 | // nor malevolent. 8 | enum { trusting = 0 } 9 | 10 | // Max # of arrays that can be active at once. 11 | enum { max_arrays = 8 * 1024 * 1024 } 12 | 13 | 14 | // Standard helper functions 15 | 16 | to panic(complaint: ^char) { 17 | fprintf(stderr, "%s\n", complaint); 18 | exit(1); 19 | } 20 | 21 | to allot(size: size_t): ^void { 22 | let r: ^void = malloc(size); 23 | if NULL == r && 0 < size { panic(strerror(errno)); } 24 | return r; 25 | } 26 | 27 | to open_file(filename: ^char, mode: ^char): ^FILE { 28 | if 0 == strcmp("-", filename) { 29 | return stdin if 'r' == mode[0] else stdout; 30 | } 31 | let r: ^FILE = fopen(filename, mode); 32 | if NULL == r { panic(strerror(errno)); } 33 | return r; 34 | } 35 | 36 | 37 | // UM state 38 | 39 | typedef u8 = uint8; // maybe these should be the real Itsy names 40 | typedef u32 = uint32; 41 | 42 | typedef Platter = u32; 43 | 44 | let r: [8]Platter; // The registers 45 | 46 | struct Array { 47 | size: u32; 48 | _: ^Platter; 49 | } 50 | 51 | to make_array(nplatters: u32): Array { 52 | return [nplatters, allot(nplatters * sizeof:^Platter)]: Array; // the cast eventually shouldn't be needed 53 | } 54 | 55 | to fetch(a: Array, i: u32): Platter { 56 | if !trusting && a.size <= i { panic("Fetch out of bounds"); } 57 | return a._[i]; 58 | } 59 | 60 | to store(a: Array, i: u32, p: Platter) { 61 | if !trusting && a.size <= i { panic("Store out of bounds"); } 62 | a._[i] = p; 63 | } 64 | 65 | let first_free: uint; 66 | let next_free: [max_arrays]uint; // 0 value means active 67 | let arrays: [max_arrays]Array; 68 | 69 | to set_up_free_list() { 70 | first_free = ~0; 71 | let i: uint; 72 | for i = max_arrays; 1 < i; --i { 73 | next_free[i - 1] = first_free; 74 | first_free = i - 1; 75 | } 76 | next_free[0] = 0; 77 | } 78 | 79 | to get_array(id: u32): Array { 80 | if trusting || (id < max_arrays && 0 == next_free[id]) { 81 | return arrays[id]; 82 | } 83 | panic("Bad array"); 84 | return arrays[0]; 85 | } 86 | 87 | to allocate(size: u32): u32 { 88 | let i: uint = first_free; 89 | if ~0:uint == i { panic("Out of arrays"); } 90 | first_free = next_free[i]; 91 | next_free[i] = 0; 92 | arrays[i] = make_array(size); 93 | memset(arrays[i]._, 0, size * sizeof arrays[i]._[0]); 94 | return i; 95 | } 96 | 97 | to abandon(id: u32) { 98 | let a: Array = get_array(id); 99 | free(a._); 100 | next_free[id] = first_free; 101 | first_free = id; 102 | if !trusting && 0 == id { panic("Abandoned 0"); } 103 | } 104 | 105 | to duplicate(id: u32) { 106 | if 0 != id { 107 | let a: Array = get_array(id); 108 | free(arrays[0]._); 109 | arrays[0] = make_array(a.size); 110 | memcpy(arrays[0]._, a._, a.size * sizeof a._[0]); 111 | } 112 | } 113 | 114 | enum Opcodes { 115 | cond_move, 116 | array_index, 117 | array_amend, 118 | add, 119 | mult, 120 | division, 121 | not_and, 122 | halt, 123 | alloc, 124 | abandonment, 125 | output, 126 | input, 127 | load_program, 128 | orthography, 129 | } 130 | 131 | to spin_cycle() { 132 | let finger: u32 = 0; 133 | 134 | for ;; { 135 | let insn: Platter = fetch(arrays[0], finger); 136 | ++finger; 137 | 138 | // These unfortunately for speed are not quite always used. 139 | // In the C version they were macros: 140 | let a: uint = 7 & (insn >> 6); 141 | let b: uint = 7 & (insn >> 3); 142 | let c: uint = 7 & (insn >> 0); 143 | 144 | match insn >> 28 { 145 | on cond_move { 146 | if 0 != r[c] { 147 | r[a] = r[b]; 148 | } 149 | } 150 | on array_index { 151 | r[a] = fetch(get_array(r[b]), r[c]); 152 | } 153 | on array_amend { 154 | store(get_array(r[a]), r[b], r[c]); 155 | } 156 | on add { 157 | r[a] = r[b] + r[c]; 158 | } 159 | on mult { 160 | r[a] = r[b] * r[c]; 161 | } 162 | on division { 163 | r[a] = r[b] / r[c]; 164 | } 165 | on not_and { 166 | r[a] = ~(r[b] & r[c]); 167 | } 168 | on halt { 169 | return; 170 | } 171 | on alloc { 172 | r[b] = allocate(r[c]); 173 | } 174 | on abandonment { 175 | abandon(r[c]); 176 | } 177 | on output { 178 | putchar(r[c]); 179 | } 180 | on input { 181 | fflush(stdout); 182 | let ch: int = getchar(); 183 | r[c] = ~0 if EOF == ch else 0xff & ch; 184 | } 185 | on load_program { 186 | duplicate(r[b]); 187 | finger = r[c]; 188 | } 189 | on orthography { 190 | let a1: uint = 7 & (insn >> (32 - 7)); 191 | r[a1] = insn & 0x01ffffff; 192 | } 193 | else { 194 | panic("Unknown instruction"); 195 | } 196 | } 197 | } 198 | } 199 | 200 | 201 | // Loading and running the image 202 | 203 | to make_platter(a, b, c, d: u8): u32 { 204 | return a<<24 | b<<16 | c<<8 | d; 205 | } 206 | 207 | to file_size_and_rewind(f: ^FILE): size_t { 208 | fseek(f, 0, SEEK_END); 209 | let rc: long = ftell(f); 210 | rewind(f); 211 | return rc: size_t; 212 | } 213 | 214 | to read_zero(f: ^FILE) { 215 | let size: size_t = file_size_and_rewind(f); 216 | assert(0 == size % 4); 217 | arrays[0] = make_array(size / 4); 218 | let i: u32 = 0; 219 | for i = 0; i < size / 4; ++i { 220 | let a: int = fgetc(f); 221 | let b: int = fgetc(f); 222 | let c: int = fgetc(f); 223 | let d: int = fgetc(f); 224 | assert(EOF != a && EOF != b && EOF != c && EOF != d); 225 | arrays[0]._[i] = make_platter(a, b, c, d); 226 | } 227 | assert(EOF == fgetc(f)); 228 | } 229 | 230 | to main(argc: int, argv: ^^char): int { 231 | if 2 != argc { panic("Usage: vm filename"); } 232 | set_up_free_list(); 233 | let f: ^FILE = open_file(argv[1], "rb"); 234 | read_zero(f); 235 | fclose(f); 236 | spin_cycle(); 237 | return 0; 238 | } 239 | -------------------------------------------------------------------------------- /eg_itsy/error_tests/bad.itsy: -------------------------------------------------------------------------------- 1 | // Not actual Itsy code; for testing compiler error handling. 2 | 3 | to main(argc: int, argv: ^^char): int { 4 | gooble blarg printf("Hello, sailor!\n"); 5 | return 0; 6 | } 7 | -------------------------------------------------------------------------------- /eg_itsy/error_tests/bad2.itsy: -------------------------------------------------------------------------------- 1 | // Not actual Itsy code; for testing compiler error handling. 2 | 3 | to main(argc: int, argv: ^^char): int { 4 | printf("Hello, sailor!\n"); 5 | let a, b: int = [1, 2]; // Illegal 6 | return 0; 7 | } 8 | -------------------------------------------------------------------------------- /eg_itsy/error_tests/lvalues.itsy: -------------------------------------------------------------------------------- 1 | to f() { 2 | let a: ^int = NULL; 3 | (a + a) = 42; 4 | } 5 | -------------------------------------------------------------------------------- /eg_itsy/grammar: -------------------------------------------------------------------------------- 1 | # Something like C but re-syntaxed. 2 | 3 | top 4 | : '' declaration* :end. 5 | 6 | 7 | # Declarations 8 | 9 | declaration 10 | : function_definition 11 | | decl 12 | . 13 | 14 | function_definition: :position ( 15 | "to" id 16 | [:position '(' [param_decl**',' :chain] ')' [':' type | :position :Void] :Signature] 17 | block :To 18 | ). 19 | 20 | param_decl 21 | : id++',' :hug ':' type :spread_params 22 | . 23 | 24 | decl 25 | : type_decl 26 | | var_decl 27 | . 28 | 29 | type_decl: :position ( 30 | "typedef" id '=' type ';' :Typedef 31 | | "enum" (id | :None) '{' 32 | [enumerator**',' :hug] ','? 33 | '}' :Enum 34 | | "struct" :'struct' id '{' [field* :chain] '}' :Record 35 | | "union" :'union' id '{' [field* :chain] '}' :Record 36 | ). 37 | 38 | field 39 | : param_decl ';' 40 | . 41 | 42 | enumerator 43 | : id ('=' elem_exp | :None) :hug 44 | . 45 | 46 | var_decl: :position ( 47 | "let" [id++',' :hug] ':' type ('=' elem_exp | :None) ';' :Let 48 | ). 49 | 50 | 51 | # Types 52 | 53 | type: :position ( 54 | '^' type :Pointer 55 | | '[' (exp | :None) ']' type :Array 56 | | '(' [[type :'' :hug]**',' :hug] ')' type :Signature 57 | | "void" :Void 58 | | id :Type_name 59 | ). 60 | 61 | 62 | # Statements 63 | 64 | block: :position ( 65 | '{' [(decl | statement)* :hug] '}' :Block 66 | ). 67 | 68 | statement 69 | : block 70 | | if_stmt 71 | | :position ( 72 | "while" exp block :While 73 | | "do" block "while" exp ';' :Do 74 | | "for" opt_exp ';' opt_exp ';' opt_exp block 75 | :For 76 | | "continue" ';' :Continue 77 | | "break" ';' :Break 78 | | "return" opt_exp ';' :Return 79 | | "match" exp '{' [case* :hug] '}' :Switch 80 | | opt_exp ';' :Exp 81 | ) 82 | . 83 | opt_exp : exp | :None. 84 | 85 | if_stmt : :position ( 86 | "if" exp block ( "else" (if_stmt | block) 87 | | :None ) :If_stmt 88 | ). 89 | 90 | case: :position ( 91 | "on" [elem_exp++',' :hug] block :Case 92 | | "else" block :Default 93 | ). 94 | 95 | 96 | # Expressions 97 | 98 | exp 99 | : assignment_exp (',' assignment_exp :Seq)* 100 | . 101 | 102 | elem_exp = assignment_exp. # "no comma" expression. 103 | 104 | assignment_exp 105 | : if_exp (assignment_operator assignment_exp :Assign)? 106 | . 107 | 108 | assignment_operator 109 | : '=' :None 110 | | '*=' :'*' 111 | | '/=' :'/' 112 | | '%=' :'%' 113 | | '+=' :'+' 114 | | '-=' :'-' 115 | | '<<=' :'<<' 116 | | '>>=' :'>>' 117 | | '&=' :'&' 118 | | '@=' :'@' 119 | | '|=' :'|' 120 | . 121 | 122 | if_exp : logical_or_exp ("if" logical_or_exp "else" if_exp :If_exp)?. 123 | 124 | logical_or_exp 125 | : logical_and_exp ('||' logical_and_exp :Or)* 126 | . 127 | 128 | logical_and_exp 129 | : exp3 ('&&' logical_and_exp :And)* 130 | . 131 | 132 | exp3 133 | : exp4 ( '==' :'==' exp4 :Binary_exp 134 | | '!=' :'!=' exp4 :Binary_exp 135 | | '<=' :'<=' exp4 :Binary_exp 136 | | '>=' :'>=' exp4 :Binary_exp 137 | | '<' !'=' :'<' exp4 :Binary_exp 138 | | '>' !'=' :'>' exp4 :Binary_exp 139 | )* 140 | . 141 | 142 | exp4 143 | : exp5 ( '+' !/[+=]/ :'+' exp5 :Binary_exp 144 | | '-' !/[-=]/ :'-' exp5 :Binary_exp 145 | | '|' !/[|=]/ :'|' exp5 :Binary_exp 146 | | '@' !'=' :'@' exp5 :Binary_exp 147 | )* 148 | . 149 | 150 | exp5 151 | : exp6 ( '*' !'=' :'*' exp6 :Binary_exp 152 | | '/' !/[=\/]/ :'/' exp6 :Binary_exp 153 | | '%' !'=' :'%' exp6 :Binary_exp 154 | | '&' !/[&=]/ :'&' exp6 :Binary_exp 155 | | '<<' !'=' :'<<' exp6 :Binary_exp 156 | | '>>' !'=' :'>>' exp6 :Binary_exp 157 | )* 158 | . 159 | 160 | exp6 : unary_exp ( ':' type :Cast )*. 161 | 162 | unary_exp 163 | : :position ( 164 | '++' unary_exp :'++' :Pre_incr 165 | | '--' unary_exp :'--' :Pre_incr 166 | | '&' !/[&=]/ unary_exp :Address_of 167 | | unary_operator unary_exp :Unary_exp 168 | | "sizeof" ( ':' type :Sizeof_type 169 | | unary_exp :Sizeof ) 170 | ) 171 | | postfix_exp 172 | . 173 | 174 | unary_operator 175 | : '-' !'-' :'-' 176 | | '~' :'~' 177 | | '!' !'=' :'!' 178 | . 179 | 180 | postfix_exp 181 | : primary_exp 182 | ( '[' exp ']' :Index 183 | | '(' [elem_exp**',' :hug] ')' :Call 184 | | '.' id :Dot 185 | | '^' :Deref 186 | | '++' :'++' :Post_incr 187 | | '--' :'--' :Post_incr 188 | )* 189 | . 190 | 191 | primary_exp 192 | : '(' exp ')' 193 | | :position ( 194 | id :Variable 195 | | integer :'integer' :Literal 196 | | string_literal :'string' :Literal 197 | | char_literal :'char' :Literal 198 | | '[' [elem_exp**',' :hug] ','? ']' :Compound_exp 199 | ) 200 | . 201 | 202 | string_literal ~: /("[^"]*")/ FNORD. # TODO 203 | char_literal ~: /('[^']*')/ FNORD. # TODO 204 | 205 | FNORD ~: whitespace*. 206 | whitespace ~: /\s+/ | comment. 207 | comment ~: /\/\/.*/. 208 | keyword ~: /break|continue|do|else|enum|for|if|let|match|on|return|sizeof|struct|to|typedef|union|void|while/ /\b/. 209 | 210 | id: !keyword /([A-Za-z_][A-Za-z_0-9]*)/. 211 | integer: /(0x[0-9A-Fa-f]+|\d+)/. # TODO negative too; and unsigned, etc. 212 | -------------------------------------------------------------------------------- /eg_itsy/itsy.py: -------------------------------------------------------------------------------- 1 | """ 2 | Tie the modules together into a compiler. 3 | """ 4 | 5 | import ast 6 | from parson import Grammar, Unparsable 7 | from complainer import Complainer 8 | from c_emitter import c_emit 9 | import primitives 10 | import typecheck 11 | import sys 12 | 13 | with open('grammar') as f: 14 | grammar_source = f.read() 15 | parser = Grammar(grammar_source).bind(ast) 16 | 17 | with open('c_prelude.h') as f: 18 | c_prelude = f.read() 19 | 20 | 21 | def main(argv): 22 | assert 2 <= len(argv), "usage: %s source_file.itsy [output_file.c]" % argv[0] 23 | return to_c_main(*argv[1:]) 24 | 25 | def to_c_main(filename, out_filename=None): 26 | if out_filename is None: 27 | out_filename = filename[:-5] + filename[-5:].replace('.itsy', '') + '.c' 28 | with open(filename) as f: 29 | text = f.read() 30 | opt_c = c_from_itsy(Complainer(text, filename)) 31 | if opt_c is None: 32 | return 1 33 | with open(out_filename, 'w') as f: 34 | f.write(opt_c) 35 | return 0 36 | 37 | def c_from_itsy(complainer): 38 | try: 39 | defs = parser.top(complainer.text) 40 | except Unparsable as exc: 41 | complainer.syntax_error(exc) 42 | return None 43 | typecheck.check(defs, primitives.prims, complainer) 44 | if not complainer.ok(): 45 | return None 46 | c_defs = map(c_emit, defs) 47 | return c_prelude + '\n' + '\n\n'.join(c_defs) + '\n' 48 | 49 | 50 | if __name__ == '__main__': 51 | sys.exit(main(sys.argv)) 52 | -------------------------------------------------------------------------------- /eg_itsy/primitives.py: -------------------------------------------------------------------------------- 1 | """ 2 | Built-in global definitions 3 | """ 4 | 5 | from ast import Int_type, Float_type 6 | 7 | prims = {} 8 | 9 | prims['int8'] = Int_type(1, 'i') 10 | prims['int16'] = Int_type(2, 'i') 11 | prims['int32'] = Int_type(4, 'i') 12 | prims['int64'] = Int_type(8, 'i') 13 | 14 | prims['uint8'] = Int_type(1, 'u') 15 | prims['uint16'] = Int_type(2, 'u') 16 | prims['uint32'] = Int_type(4, 'u') 17 | prims['uint64'] = Int_type(8, 'u') 18 | 19 | prims['float32'] = Float_type(4) 20 | prims['float64'] = Float_type(8) 21 | 22 | # XXX platform-dependent; make configurable or something 23 | # XXX check that these match my C compiler's defs 24 | prims['bool'] = prims['uint8'] 25 | prims['char'] = prims['int8'] 26 | prims['int'] = prims['int64'] 27 | prims['uint'] = prims['uint64'] 28 | prims['size_t'] = prims['uint64'] 29 | 30 | #prims['true'] = XXX 31 | 32 | # TODO: defs from c_prelude.h 33 | -------------------------------------------------------------------------------- /eg_itsy/reref.sh: -------------------------------------------------------------------------------- 1 | # Reset the reference outputs to match the current outputs. 2 | 3 | cd eg 4 | for f in *.c; do 5 | cp ${f} ${f}.ref 6 | done 7 | -------------------------------------------------------------------------------- /eg_itsy/structs.py: -------------------------------------------------------------------------------- 1 | """ 2 | Define a named-tuple-like type, but simpler. 3 | Also Visitor to dispatch on datatypes defined this way. 4 | """ 5 | 6 | # TODO figure out how to use __slots__ 7 | 8 | def Struct(field_names, name=None, supertype=(object,)): 9 | if isinstance(field_names, (str, unicode)): 10 | field_names = tuple(field_names.split()) 11 | 12 | if name is None: 13 | name = 'Struct<%s>' % ','.join(field_names) 14 | def get_name(self): return self.__class__.__name__ 15 | else: 16 | def get_name(self): return name 17 | 18 | def __init__(self, *args): 19 | if len(field_names) != len(args): 20 | raise TypeError("%s takes %d arguments (%d given)" 21 | % (get_name(self), len(field_names), len(args))) 22 | self.__dict__.update(zip(field_names, args)) 23 | 24 | def __repr__(self): 25 | return '%s(%s)' % (get_name(self), ', '.join(repr(getattr(self, f)) 26 | for f in field_names)) 27 | 28 | # (for use with pprint) 29 | def my_as_sexpr(self): # XXX better name? 30 | return (get_name(self),) + tuple(as_sexpr(getattr(self, f)) 31 | for f in field_names) 32 | my_as_sexpr.__name__ = 'as_sexpr' 33 | 34 | return type(name, 35 | supertype, 36 | dict(__init__=__init__, 37 | __repr__=__repr__, 38 | as_sexpr=my_as_sexpr, 39 | _meta_fields=field_names)) 40 | 41 | def as_sexpr(obj): 42 | if hasattr(obj, 'as_sexpr'): 43 | return getattr(obj, 'as_sexpr')() 44 | elif isinstance(obj, list): 45 | return map(as_sexpr, obj) 46 | elif isinstance(obj, tuple): 47 | return tuple(map(as_sexpr, obj)) 48 | else: 49 | return obj 50 | 51 | 52 | # Is there a nicer way to do this? 53 | 54 | class Visitor(object): 55 | def __call__(self, subject, *args): 56 | tag = subject.__class__.__name__ 57 | method = getattr(self, tag, None) 58 | if method is None: 59 | try: 60 | method = getattr(self, 'default') 61 | except AttributeError: 62 | raise AttributeError("%r has no method for %r argument %r" % (self, tag, subject)) 63 | return method(subject, *args) 64 | -------------------------------------------------------------------------------- /eg_itsy/testme.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | # Automated test. 3 | 4 | python -m coverage erase 5 | 6 | for f in error_tests/*.itsy; do 7 | echo 8 | echo "Should fail:" ${f} 9 | if python -m coverage run --source=. -a itsy.py ${f}; then 10 | echo "Didn't fail!" 11 | fi 12 | done 13 | 14 | for f in eg/*.itsy; do 15 | echo 16 | echo "To C:" ${f} 17 | if python -m coverage run --source=. -a itsy.py ${f}; then 18 | echo -n # Expected success (btw what's a no-op in bash?) 19 | else 20 | echo "Failed!" 21 | fi 22 | fc=${f%.*}.c 23 | if test -f ${fc}.ref; then 24 | diff -u ${fc}.ref ${fc} 25 | # TODO raise error at exit if there was a diff 26 | else 27 | echo ' (No ref)' 28 | fi 29 | done 30 | 31 | echo 32 | echo 'Halping halpme' 33 | python -m coverage run --source=. -a pyhalp.py ' # XXX 11 | 12 | def mk_empty(): return empty 13 | 14 | def fnordify(peg): return peg #+ '' 15 | 16 | meta_grammar = r""" rule+ :end | anon :end. 17 | 18 | anon : :None :'' [:'' :literal fnordly pe :chain] :hug ('.' rule*)?. 19 | 20 | rule : name {'~'?} ('=' pe 21 | |':'~whitespace [pe :seclude]) 22 | '.' :hug. 23 | 24 | pe : term ('|' pe :either)? 25 | | :mk_empty. 26 | term : factor (term :chain)?. 27 | factor : '!' factor :invert 28 | | primary ('**' primary :star 29 | |'++' primary :plus 30 | |'*' :star 31 | |'+' :plus 32 | |'?' :maybe)?. 33 | primary : '(' pe ')' 34 | | '[' pe ']' :seclude 35 | | '{' pe '}' :capture 36 | | qstring :literal fnordly 37 | | dqstring :literal fnordly 38 | | regex :match fnordly 39 | | ':'~( name :meta_mk_feed 40 | | qstring :push) 41 | | name :meta_mk_rule_ref. 42 | 43 | fnordly = ('~' | :fnordify). 44 | 45 | name : /([A-Za-z_]\w*)/. 46 | 47 | FNORD ~: whitespace?. 48 | whitespace ~: /(?:\s|#.*)+/. 49 | 50 | qstring ~: /'/ quoted_char* /'/ FNORD :join. 51 | dqstring ~: '"' dquoted_char* '"' FNORD :join. 52 | regex ~: '/' regex_char* '/' FNORD :join. 53 | 54 | quoted_char ~: /\\(.)/ | /([^'])/. 55 | dquoted_char~: /\\(.)/ | /([^"])/. 56 | regex_char ~: /(\\.)/ | /([^\/])/. 57 | """ 58 | 59 | metapeg = Grammar(meta_grammar)(**globals()) 60 | ## for k, af, v in metapeg(meta_grammar): print k, af, v 61 | #. None (literal('') ((('')+ :end)|('' :end))) 62 | #. anon [(:None (push('') ([(push('') (:literal ('' ('' :chain))))] (:hug ((literal('.') ('')*))?))))] 63 | #. rule [('' (capture((literal('~'))?) (((literal('=') '')|(literal(':') ('' [('' :seclude)]))) (literal('.') :hug))))] 64 | #. pe [(('' ((literal('|') ('' :either)))?)|:mk_empty)] 65 | #. term [('' (('' :chain))?)] 66 | #. factor [((literal('!') ('' :invert))|('' (((literal('**') ('' :star))|((literal('++') ('' :plus))|((literal('*') :star)|((literal('+') :plus)|(literal('?') :maybe))))))?))] 67 | #. primary [((literal('(') ('' literal(')')))|((literal('[') ('' (literal(']') :seclude)))|((literal('{') ('' (literal('}') :capture)))|(('' (:literal ''))|(('' (:literal ''))|(('' (:match ''))|((literal(':') (('' :meta_mk_feed)|('' :push)))|('' :meta_mk_rule_ref))))))))] 68 | #. fnordly (literal('~')|:fnordify) 69 | #. name [/([A-Za-z_]\w*)/] 70 | #. FNORD ~ [('')?] 71 | #. whitespace ~ [/(?:\s|#.*)+/] 72 | #. qstring ~ [(/'/ (('')* (/'/ ('' :join))))] 73 | #. dqstring ~ [(literal('"') (('')* (literal('"') ('' :join))))] 74 | #. regex ~ [(literal('/') (('')* (literal('/') ('' :join))))] 75 | #. quoted_char ~ [(/\\(.)/|/([^'])/)] 76 | #. dquoted_char ~ [(/\\(.)/|/([^"])/)] 77 | #. regex_char ~ [(/(\\.)/|/([^\/])/)] 78 | -------------------------------------------------------------------------------- /eg_microses.py: -------------------------------------------------------------------------------- 1 | """ 2 | Let's try and port 3 | ~/othergit/quasiParserGenerator/test/microses/microses.es6 4 | """ 5 | 6 | from parson import Grammar 7 | 8 | g = r""" body :end. 9 | 10 | # Exclude "arguments" from IDENT in microses. 11 | # XXX defnordify 12 | RESERVED_WORD: 13 | KEYWORD | ES6_ONLY_KEYWORD | FUTURE_RESERVED_WORD 14 | | "arguments". 15 | 16 | KEYWORD: "false" | "true" 17 | | "break" | "case" | "catch" | "const" 18 | | "debugger" | "default" | "delete" 19 | | "else" | "export" | "finally" 20 | | "for" | "if" | "import" 21 | | "return" | "switch" | "throw" | "try" 22 | | "typeof" | "void" | "while". 23 | 24 | # We enumerate these anyway, in order to exclude them from the 25 | # IDENT token. 26 | ES6_ONLY_KEYWORD: 27 | "null" | "class" | "continue" | "do" | "extends" 28 | | "function" | "in" | "instanceof" | "new" | "super" 29 | | "this" | "var" | "with" | "yield". 30 | 31 | FUTURE_RESERVED_WORD: 32 | "enum" | "await" 33 | | "implements" | "interface" | "package" 34 | | "private" | "protected" | "public". 35 | 36 | 37 | primaryExpr: (NUMBER | STRING | {"true" | "false" | "null"}) :Data # XXX do it my way instead? also, "null" was missing in the original 38 | | '[' arg ** ',' ']' :hug :Array 39 | | '{' prop ** ',' '}' :hug :Object 40 | | quasiExpr 41 | | '(' expr ')' 42 | | IDENT :Variable 43 | | HOLE :ExprHole. 44 | 45 | pattern: (NUMBER | STRING | {"true" | "false" | "null"}) :MatchData # XXX ditto 46 | | '[' param ** ',' ']' :hug :MatchArray 47 | | '{' propParam ** ',' '}' :hug :MatchObject 48 | | IDENT :MatchVariable 49 | | HOLE :PatternHole. 50 | 51 | arg: '...' expr :Spread 52 | | expr. 53 | 54 | param: '...' pattern :Rest 55 | | IDENT '=' expr :Optional 56 | | pattern. 57 | 58 | # No method definition. XXX why not? 59 | prop: '...' expr :SpreadObj 60 | | key ':' expr :Prop 61 | | IDENT :None :Prop. 62 | 63 | propParam: '...' pattern :RestObj 64 | | key ':' pattern :MatchProp 65 | | IDENT '=' expr :OptionalProp 66 | | IDENT :None :MatchProp. # ditto 67 | 68 | key: IDENT | RESERVED_WORD | STRING | NUMBER 69 | | '[' expr ']' :Computed. 70 | 71 | # XXX look up es6 quasiliteral syntax 72 | quasiExpr ~: '`' qfill ** qhole '`' FNORD :hug :Quasi. 73 | qfill ~: {(!'`' !'${' :anyone)*}. 74 | qhole ~: '${' FNORD expr '}'. 75 | 76 | later: NO_NEWLINE '!'. 77 | 78 | # No "new", "super", or MetaProperty. Without "new" we don't need 79 | # separate MemberExpr and CallExpr productions. 80 | postExpr: primaryExpr postOp*. 81 | postOp = '.' IDENT :Get 82 | | '[' expr ']' :Index 83 | | '(' [arg**',' :hug] ')' :Call 84 | | quasiExpr :Tag 85 | | later ( IDENT :GetLater 86 | | '[' expr ']' :IndexLater 87 | | '(' [arg**',' :hug] ')' :CallLater 88 | | quasiExpr :TagLater ). 89 | 90 | preExpr: "delete" fieldExpr :Delete 91 | | preOp preExpr :UnaryOp 92 | | postExpr. 93 | 94 | # No prefix or postfix "++" or "--". 95 | preOp: { "void" | "typeof" | '+' | '-' | '!' }. # XXX strip 96 | 97 | # No bitwise operators, "instanceof", or "in". Unlike ES6, none 98 | # of the relational operators associate. To help readers, mixing 99 | # relational operators always requires explicit parens. 100 | multExpr: preExpr ({'*' | '|' | '%'} preExpr :BinaryOp)*. 101 | addExpr: multExpr ({'+' | '-'} multExpr :BinaryOp)*. 102 | relExpr: addExpr (relOp addExpr :BinaryOp)?. 103 | relOp: { '<=' | '>=' | '<' | '>' | '===' | '!==' }. 104 | andThenExpr: relExpr ('&&' relExpr :AndThen)*. 105 | orElseExpr: andThenExpr ('||' andThenExpr :OrElse)*. 106 | 107 | # No trinary ("?:") expression 108 | # No comma expression, so assignment expression is expr. 109 | expr: lValue assignOp expr :Assign 110 | | arrow 111 | | orElseExpr. 112 | 113 | lValue: fieldExpr | IDENT. # (out of order in the original) 114 | 115 | fieldExpr: primaryExpr 116 | ( '.' IDENT :Get 117 | | '[' expr ']' :Index 118 | | later ( '.' IDENT :GetLater 119 | | '[' expr ']' :IndexLater ) ). 120 | 121 | assignOp: {'=' !'=' # (the ! is implicit in the original?) 122 | | '*=' | '/=' | '%=' | '+=' | '-='}. 123 | 124 | arrow: params :hug NO_NEWLINE '=>' ( block :Arrow 125 | | expr :Lambda ). 126 | 127 | params: IDENT 128 | | '(' param ** ',' ')'. 129 | 130 | # No "var", empty statement, "continue", "with", "do/while", 131 | # "for/in", or labelled statement. None of the insane variations 132 | # of "for". Only blocks are accepted for flow-of-control 133 | # statements. 134 | statement: block 135 | | "if" '(' expr ')' 136 | block 137 | ("else" block | :None) :If 138 | | "for" '(' declaration 139 | (expr|:None) ';' 140 | (expr|:None) ')' 141 | block :For 142 | | "for" '(' declOp binding "of" expr ')' 143 | block :ForOf 144 | | "while" '(' expr ')' block :While 145 | | "try" block catcher (finalizer|:None) :Try 146 | | "try" block :None finalizer :Try 147 | | "switch" '(' expr ')' 148 | '{' [branch* :hug] '}' :Switch 149 | | terminator 150 | | "debugger" ';' :Debugger 151 | | expr ';'. 152 | 153 | # Each case branch must end in a terminating statement. No 154 | # labelled break. 155 | terminator: "return" NO_NEWLINE expr ';' :Return 156 | | "return" :None ';' :Return 157 | | "break" ';' :Break 158 | | "throw" expr ';' :Throw. 159 | 160 | # no "function", generator, or "class" declaration. 161 | declaration: declOp [binding ** ',' :hug] ';' :Decl. 162 | declOp: {"const"|"let"}. # XXX must be stripped 163 | # Initializer is mandatory 164 | binding: pattern '=' expr :hug. 165 | 166 | catcher: "catch" '(' pattern ')' block :hug. 167 | finalizer: "finally" block. 168 | 169 | branch: caseLabel+ :hug 170 | '{' body terminator '}' :Branch. 171 | caseLabel: "case" expr ':' :Case 172 | | "default" ':' :Default. 173 | 174 | block: '{' body '}' :Block. 175 | body: (statement | declaration)* :hug. 176 | 177 | FNORD ~: space*. 178 | space ~: /\s+|\/\/.*/ | '/*' (!'*/' :anyone)* '*/'. 179 | 180 | NO_NEWLINE ~: . # XXX 181 | HOLE ~: 'XXX I will be a hole'. 182 | 183 | NUMBER : /(\d+)/. # XXX 184 | STRING ~: /"([^"]*)"/ FNORD # XXX 185 | | /'([^']*)'/ FNORD. 186 | IDENT ~: !RESERVED_WORD {IdentifierName} FNORD. 187 | 188 | # XXX incomplete 189 | IdentifierName ~= IdentifierStart IdentifierPart*. 190 | IdentifierStart ~= UnicodeLetter 191 | | '$' 192 | | '_'. 193 | IdentifierPart ~= IdentifierStart. 194 | UnicodeLetter ~= /[A-Za-z]/. 195 | 196 | """ 197 | 198 | #import sys; sys.setrecursionlimit(5000) 199 | gr = Grammar(g) 200 | import microses 201 | grr = gr.bind(microses).expecting_one_result() 202 | 203 | def test(filename): 204 | from pprint import pprint 205 | print 'testing', filename 206 | with open(filename) as f: 207 | text = f.read() 208 | result = grr(text) 209 | for form in result: 210 | pprint(form.as_sexpr()) 211 | 212 | ## grr('a.b =c;') 213 | #. (Assign(Get(Variable('a'), 'b'), '=', Variable('c')),) 214 | 215 | ## test('es6/a.es6') 216 | #. testing es6/a.es6 217 | #. ('Decl', 218 | #. 'let ', 219 | #. ((('MatchVariable', 'a'), 220 | #. ('BinaryOp', 221 | #. ('Get', ('Variable', 'module'), 'exports'), 222 | #. '*', 223 | #. ('Data', '2'))),)) 224 | #. ('Assign', ('Get', ('Variable', 'module'), 'exports'), '= ', ('Data', '5')) 225 | -------------------------------------------------------------------------------- /eg_misc.py: -------------------------------------------------------------------------------- 1 | """ 2 | A bunch of small examples, some of them from the LPEG documentation. 3 | Crudely converted from Peglet. TODO: make them nicer. 4 | """ 5 | 6 | from parson import Grammar, Unparsable, exceptionally 7 | 8 | parse_words = Grammar(r'/\W*(\w+)/*')() 9 | 10 | # The equivalent re.split() would return extra '' results first and last: 11 | ## parse_words('"Hi, there", he said.') 12 | #. ('Hi', 'there', 'he', 'said') 13 | 14 | class Tagger(dict): 15 | def __missing__(self, key): 16 | return lambda *parts: (key,) + parts 17 | 18 | name = Grammar(r""" 19 | name : title first middle last. 20 | title : (/(Dr|Mr|Ms|Mrs|St)[.]?/ | /(Pope(?:ss)?)/) _ :Title |. 21 | first : /([A-Za-z]+)/ _ :First. 22 | middle : (/([A-Z])[.]/ | /([A-Za-z]+)/) _ :Middle |. 23 | last : /([A-Za-z]+)/ :Last. 24 | _ : /\s+/. 25 | """).bind(Tagger()) 26 | 27 | ## name.name('Popess Darius Q. Bacon') 28 | #. (('Title', 'Popess'), ('First', 'Darius'), ('Middle', 'Q'), ('Last', 'Bacon')) 29 | 30 | ichbins = Grammar(r""" 31 | _ sexp :end. 32 | 33 | sexp : /\\(.)/ _ :lit_char 34 | | '"' qchar* '"' _ :join 35 | | symchar+ _ :join 36 | | /'/ _ sexp :quote 37 | | '(' _ sexp* ')' _ :hug. 38 | 39 | qchar : /\\(.)/ 40 | | /([^"])/. 41 | 42 | symchar : /([^\s\\"'()])/. 43 | 44 | _ : /\s*/. 45 | """)(lit_char = ord, 46 | quote = lambda x: ('quote', x)) 47 | 48 | ## ichbins.sexp('(hey)') 49 | #. (('hey',),) 50 | 51 | ## ichbins('hi') 52 | #. ('hi',) 53 | ## ichbins(r"""(hi '(john mccarthy) \c )""") 54 | #. (('hi', ('quote', ('john', 'mccarthy')), 99),) 55 | ## ichbins(r""" "" """) 56 | #. ('',) 57 | ## ichbins(r""" "hey" """) 58 | #. ('hey',) 59 | 60 | # From http://www.inf.puc-rio.br/~roberto/lpeg/ 61 | 62 | as_and_bs = Grammar(r""" 63 | S :end. 64 | 65 | S : 'a' B 66 | | 'b' A 67 | | . 68 | 69 | A : 'a' S 70 | | 'b' A A. 71 | 72 | B : 'b' S 73 | | 'a' B B. 74 | """)() 75 | 76 | ## as_and_bs("abaabbbbaa") 77 | #. () 78 | 79 | sum_nums = Grammar(r""" 80 | num ** ',' :end :hug :sum. 81 | num: /(\d+)/ :int. 82 | """)() 83 | 84 | ## sum_nums('10,30,43') 85 | #. (83,) 86 | 87 | one_word = Grammar(r"/\w+/ :position")() 88 | 89 | ## one_word('hello') 90 | #. (5,) 91 | ## one_word('hello there') 92 | #. (5,) 93 | ## one_word.attempt(' ') 94 | 95 | namevalues = Grammar(r""" pair* :end. 96 | pair : name '=' name /[,;]?/ :hug. 97 | name : /(\w+)/. 98 | FNORD ~: /\s*/. 99 | """)() 100 | namevalues_dict = lambda s: dict(namevalues(s)) 101 | ## namevalues_dict("a=b, c = hi; next = pi") 102 | #. {'a': 'b', 'c': 'hi', 'next': 'pi'} 103 | 104 | # Splitting a string. TODO: But with lpeg it's parametric over a pattern p. 105 | # NB this assumes p doesn't match '', and that it doesn't capture. 106 | 107 | splitting = Grammar(r""" 108 | split : (p | chunk :join) split | . # XXX why not a *? 109 | chunk : p 110 | | /(.)/ chunk. 111 | p : /\s/. 112 | """)() 113 | ## splitting.split('hello a world is nice ') 114 | #. ('hello', 'a', 'world', 'is', 'nice') 115 | ## splitting.chunk('hello a world is nice ') 116 | #. ('h', 'e', 'l', 'l', 'o') 117 | 118 | # Searching for a pattern: also parameterized by p. 119 | # (skipped) 120 | 121 | balanced_parens = Grammar(r""" 122 | bal : '(' c* ')'. 123 | c : /[^()]/ 124 | | bal. 125 | """)() 126 | 127 | ## balanced_parens.bal.attempt('()') 128 | #. () 129 | ## balanced_parens.bal.attempt('(()') 130 | 131 | # gsub: another parameterized one 132 | 133 | gsub = lambda text, pattern, replacement: ''.join(Grammar(r""" 134 | gsub: (p | /(.)/) gsub 135 | | . 136 | p: :pattern :replace. 137 | """)(pattern=pattern, replace=lambda: replacement).gsub(text)) 138 | 139 | ## gsub('hi there WHEEWHEE to you WHEEEE', 'WHEE', 'GLARG') 140 | #. 'hi there GLARGGLARG to you GLARGEE' 141 | 142 | csv = Grammar(r""" 143 | record : field ** ',' !/./. 144 | 145 | field : '"' qchar* /"\s*/ :join 146 | | /([^,"\n]*)/. 147 | 148 | qchar : /([^"])/ 149 | | '""' :'"'. 150 | """)() 151 | 152 | ## csv.record('') 153 | #. ('',) 154 | ## csv.record('""') 155 | #. ('',) 156 | ## csv.record("""hi,there,,"this,isa""test" """) 157 | #. ('hi', 'there', '', 'this,isa"test') 158 | 159 | 160 | ## Grammar('x : .')().x('') 161 | #. () 162 | 163 | def p(grammar, rule, text): 164 | parse = getattr(Grammar(grammar)(**globals()), 165 | rule) 166 | try: 167 | return parse(text) 168 | except Unparsable, e: 169 | return e 170 | 171 | metagrammar = r""" 172 | grammar : '' rule+. 173 | rule : name '=' expr '.' :make_rule. 174 | expr : term ('|' expr :alt)?. 175 | term : factors (':' name :reduce_)?. 176 | factors : factor factors :seq 177 | | :empty. 178 | factor : /'((?:\\.|[^'])*)'/ :literal 179 | | name :rule_ref. 180 | name : /(\w+)/. 181 | FNORD ~: /\s*/. 182 | """ 183 | 184 | def make_rule(name, expr): return '%s: %s' % (name, expr) 185 | def alt(e1, e2): return '%s/%s' % (e1, e2) 186 | def reduce_(e, name): return '%s =>%s' % (e, name) 187 | def seq(e1, *e2): return '%s+%s' % ((e1,) + e2) if e2 else e1 188 | def empty(): return '<>' 189 | def literal(regex): return '/%s/' % regex 190 | def rule_ref(name): return '<%s>' % name 191 | 192 | ## p(metagrammar, 'grammar', ' hello = bargle. goodbye = hey there.aloha=.') 193 | #. ('hello: +<>', 'goodbye: ++<>', 'aloha: <>') 194 | ## p(metagrammar, 'grammar', ' hello arg = bargle.') 195 | #. Unparsable(grammar, ' hello ', 'arg = bargle.') 196 | ## p(metagrammar, 'term', "'goodbye' world") 197 | #. ('/goodbye/++<>',) 198 | 199 | bal = r""" 200 | FNORD ~: /\s*/. 201 | allbalanced : '' bal :end. 202 | bal : '(' bal ')' :hug bal 203 | | /(\w+)/ 204 | | . 205 | """ 206 | ## p(bal, 'allbalanced', '(x) y') 207 | #. (('x',), 'y') 208 | ## p(bal, 'allbalanced', 'x y') 209 | #. Unparsable(allbalanced, 'x ', 'y') 210 | 211 | curl = r""" 212 | FNORD ~: /\s*/. 213 | one_expr : '' expr :end. 214 | expr : '{' expr* '}' :hug 215 | | /([^{}\s]+)/. 216 | """ 217 | ## p(curl, 'one_expr', '') 218 | #. Unparsable(one_expr, '', '') 219 | ## p(curl, 'one_expr', '{}') 220 | #. ((),) 221 | ## p(curl, 'one_expr', 'hi') 222 | #. ('hi',) 223 | ## p(curl, 'one_expr', '{hi {there} {{}}}') 224 | #. (('hi', ('there',), ((),)),) 225 | 226 | multiline_rules = r""" 227 | hi : /this/ /is/ 228 | /a/ /rule/ 229 | | /or/ /this/. 230 | """ 231 | 232 | ## p(multiline_rules, 'hi', "thisisarule") 233 | #. () 234 | ## p(multiline_rules, 'hi', "orthis") 235 | #. () 236 | ## p(multiline_rules, 'hi', "thisisnot") 237 | #. Unparsable(hi, 'thisis', 'not') 238 | 239 | paras = Grammar(r""" 240 | paras: para* _ :end. 241 | para: _ word+ (/\n\n/ | :end) :hug. 242 | word: /(\S+)/ _. 243 | _: (!/\n\n/ /\s/)*. 244 | """)() 245 | 246 | eg = r""" hi there hey 247 | how are you? 248 | fine. 249 | 250 | thanks. 251 | 252 | ok then.""" 253 | 254 | ## exceptionally(lambda: paras.paras(eg)) 255 | #. (('hi', 'there', 'hey', 'how', 'are', 'you?', 'fine.'), ('thanks.',), ('ok', 'then.')) 256 | -------------------------------------------------------------------------------- /eg_oberon0.py: -------------------------------------------------------------------------------- 1 | """ 2 | The Oberon-0 programming language. 3 | Wirth, _Compiler Construction_, Appendix A. 4 | """ 5 | 6 | from parson import Grammar 7 | 8 | grammar_source = r""" 9 | ident: !keyword /([A-Za-z][A-Za-z0-9]*)/. 10 | integer: digit+ FNORD :join :int. 11 | selector: ('.' ident | '[' expression ']')*. 12 | factor: ident selector 13 | | integer 14 | | '(' expression ')' 15 | | '~' factor. 16 | term: factor ++ MulOperator. 17 | MulOperator: '*' | "DIV" | "MOD" | '&'. 18 | SimpleExpression: ('+'|'-')? term ++ AddOperator. 19 | AddOperator: '+' | '-' | "OR". 20 | expression: SimpleExpression (relation SimpleExpression)?. 21 | relation: '=' | '#' | '<=' | '<' | '>=' | '>'. 22 | assignment: ident selector ':=' expression. 23 | ActualParameters: '(' expression ** ',' ')'. 24 | ProcedureCall: ident ActualParameters?. 25 | IfStatement: "IF" expression "THEN" StatementSequence 26 | ("ELSIF" expression "THEN" StatementSequence)* 27 | ("ELSE" StatementSequence)? 28 | "END". 29 | WhileStatement: "WHILE" expression "DO" StatementSequence "END". 30 | statement: (assignment | ProcedureCall | IfStatement | WhileStatement)?. 31 | StatementSequence: statement ++ ';'. # XXX isn't it a problem that statement can be empty? 32 | IdentList: ident ++ ','. 33 | ArrayType: "ARRAY" expression "OF" type. 34 | FieldList: (IdentList ':' type)?. 35 | RecordType: "RECORD" FieldList ++ ';' "END". 36 | type: ident | ArrayType | RecordType. 37 | FPSection: ("VAR")? IdentList ':' type. 38 | FormalParameters: '(' FPSection ** ';' ')'. 39 | ProcedureHeading: "PROCEDURE" ident FormalParameters?. 40 | ProcedureBody: declarations ("BEGIN" StatementSequence)? "END". 41 | ProcedureDeclaration: ProcedureHeading ';' ProcedureBody ident. 42 | declarations: ("CONST" (ident '=' expression ';')*)? 43 | ("TYPE" (ident '=' type ';')*)? 44 | ("VAR" (IdentList ':' type ';')*)? 45 | (ProcedureDeclaration ';')*. 46 | module: "MODULE" ident ';' declarations 47 | ("BEGIN" StatementSequence)? "END" ident '.'. 48 | 49 | FNORD ~: whitespace*. 50 | whitespace ~: /\s+/ | comment. 51 | comment ~: '(*' commentchunk* '*)'. 52 | commentchunk ~: comment | !'*)' :anyone. # XXX are comments nested in Oberon-0? 53 | keyword ~: /BEGIN|END|MODULE|VAR|TYPE|CONST|PROCEDURE|RECORD|ARRAY|OF|WHILE|DO|IF|ELSIF|THEN|ELSE|OR|DIV|MOD/ /\b/. 54 | digit ~: /(\d)/. 55 | 56 | top: '' module :end. 57 | """ 58 | grammar = Grammar(grammar_source)() 59 | 60 | # TODO test for expected parse failures 61 | 62 | ## from parson import exceptionally 63 | ## import glob 64 | 65 | ## for filename in sorted(glob.glob('ob-bad/*.ob')): print exceptionally(lambda: test(filename)) 66 | #. testing ob-bad/badassign.ob 67 | #. (top, 'MODULE badassign;\n\nBEGIN\n ', '1 := 2\nEND badassign.\n') 68 | #. testing ob-bad/badcase.ob 69 | #. (top, 'MODULE badcase;\n\nVAR\n avar : INTEGER;\n bvar : BOOLEAN;\n\nBEGIN\n CASE ', 'bvar OF\n 18 : avar := 19\n END;\n\n CASE 1 < 2 OF\n avar : avar := 3\n | avar + 1 .. avar + 10 : avar := 5\n END;\n\n CASE avar OF\n 3 DIV 0 : avar := 1\n END\nEND badcase.\n') 70 | #. testing ob-bad/badfor.ob 71 | #. (top, 'MODULE badfor;\n\nCONST\n aconst = 10;\n \nTYPE\n atype = INTEGER; \n\nVAR\n avar : INTEGER;\n bvar : BOOLEAN;\n cvar : INTEGER;\n\nBEGIN\n FOR ', 'aconst := 1 TO 10 DO\n avar := 1\n END;\n\n FOR atype := 1 TO 10 DO\n avar := 1\n END;\n\n FOR bvar := FALSE TO TRUE DO\n avar := 42\n END;\n\n FOR avar := 1 TO 2 BY cvar * 2 DO\n avar := 42\n END;\n\n FOR dvar := 1 TO 2 DO\n dvar := 99\n END;\n\n FOR avar := 8 TO 10 BY 3 DIV 0 DO\n cvar := 100\n END\nEND badfor.\n') 72 | #. testing ob-bad/commentnoend.ob 73 | #. (top, "MODULE commentnoend;\n (* started off well,\n but didn't finish\nEND commentnoend.\n", '') 74 | #. testing ob-bad/keywordasname.ob 75 | #. (top, 'MODULE ', 'VAR;\nEND VAR.\n') 76 | #. testing ob-bad/repeatsection.ob 77 | #. (top, 'MODULE repeatsection;\n\nCONST\n aconst = 10;\n\n', 'CONST\n aconst = 20;\n\nEND repeatsection.\n') 78 | 79 | ## for filename in sorted(glob.glob('ob-ok/*.ob')): test(filename) 80 | #. testing ob-ok/arrayname.ob 81 | #. testing ob-ok/assign.ob 82 | #. testing ob-ok/badarg.ob 83 | #. testing ob-ok/badarray.ob 84 | #. testing ob-ok/badcond.ob 85 | #. testing ob-ok/badeq.ob 86 | #. testing ob-ok/badproc.ob 87 | #. testing ob-ok/badrecord.ob 88 | #. testing ob-ok/badwhile.ob 89 | #. testing ob-ok/comment.ob 90 | #. testing ob-ok/cond.ob 91 | #. testing ob-ok/condname.ob 92 | #. testing ob-ok/const.ob 93 | #. testing ob-ok/emptybody.ob 94 | #. testing ob-ok/emptydeclsections.ob 95 | #. testing ob-ok/emptymodule.ob 96 | #. testing ob-ok/factorial.ob 97 | #. testing ob-ok/gcd.ob 98 | #. testing ob-ok/intoverflow.ob 99 | #. testing ob-ok/keywordprefix.ob 100 | #. testing ob-ok/nominalarg.ob 101 | #. testing ob-ok/nonintconstant.ob 102 | #. testing ob-ok/nonlocalvar.ob 103 | #. testing ob-ok/nonmoduleasmodulename.ob 104 | #. testing ob-ok/proc.ob 105 | #. testing ob-ok/recordname.ob 106 | #. testing ob-ok/recurse.ob 107 | #. testing ob-ok/redefinteger.ob 108 | #. testing ob-ok/redeftrue.ob 109 | #. testing ob-ok/selfref.ob 110 | #. testing ob-ok/type.ob 111 | #. testing ob-ok/typenodecl.ob 112 | #. testing ob-ok/var.ob 113 | #. testing ob-ok/while.ob 114 | #. testing ob-ok/whilename.ob 115 | #. testing ob-ok/wrongmodulename.ob 116 | #. testing ob-ok/wrongprocedurename.ob 117 | 118 | def test(filename): 119 | print 'testing', filename 120 | with open(filename) as f: 121 | text = f.read() 122 | grammar.top(text) 123 | -------------------------------------------------------------------------------- /eg_oberon0_with_lexer.py: -------------------------------------------------------------------------------- 1 | """ 2 | Like eg_oberon0.py, but using a separate lexer. 3 | """ 4 | 5 | import re 6 | from parson import Grammar, one_that, label, alter, match 7 | 8 | class LexedGrammar(Grammar): 9 | def __init__(self, string): 10 | super(LexedGrammar, self).__init__(string) 11 | self.literals = set() 12 | self.keywords = set() 13 | def literal(self, string): 14 | self.literals.add(string) 15 | return literal_kind(string) 16 | def match(self, regex): 17 | assert False 18 | def keyword(self, string): 19 | self.keywords.add(string) 20 | return literal_kind(string) 21 | 22 | def literal_kind(string): 23 | return one_that(lambda token: token.kind == string) 24 | 25 | class Token(object): 26 | def __init__(self, kind, string): 27 | self.kind = kind 28 | self.string = string 29 | def __repr__(self): 30 | return repr(self.string) 31 | 32 | grammar_source = r""" 33 | selector: ('.' :IDENT | '[' expression ']')*. 34 | factor: :IDENT selector 35 | | :INTEGER 36 | | '(' expression ')' 37 | | '~' factor. 38 | term: factor ++ MulOperator. 39 | MulOperator: '*' | "DIV" | "MOD" | '&'. 40 | SimpleExpression: ('+'|'-')? term ++ AddOperator. 41 | AddOperator: '+' | '-' | "OR". 42 | expression: SimpleExpression (relation SimpleExpression)?. 43 | relation: '=' | '#' | '<=' | '<' | '>=' | '>'. 44 | assignment: :IDENT selector ':=' expression. 45 | ActualParameters: '(' expression ** ',' ')'. 46 | ProcedureCall: :IDENT ActualParameters?. 47 | IfStatement: "IF" expression "THEN" StatementSequence 48 | ("ELSIF" expression "THEN" StatementSequence)* 49 | ("ELSE" StatementSequence)? 50 | "END". 51 | WhileStatement: "WHILE" expression "DO" StatementSequence "END". 52 | statement: (assignment | ProcedureCall | IfStatement | WhileStatement)?. 53 | StatementSequence: statement ++ ';'. # XXX isn't it a problem that statement can be empty? 54 | IdentList: :IDENT ++ ','. 55 | ArrayType: "ARRAY" expression "OF" type. 56 | FieldList: (IdentList ':' type)?. 57 | RecordType: "RECORD" FieldList ++ ';' "END". 58 | type: :IDENT | ArrayType | RecordType. 59 | FPSection: ("VAR")? IdentList ':' type. 60 | FormalParameters: '(' FPSection ** ';' ')'. 61 | ProcedureHeading: "PROCEDURE" :IDENT FormalParameters?. 62 | ProcedureBody: declarations ("BEGIN" StatementSequence)? "END". 63 | ProcedureDeclaration: ProcedureHeading ';' ProcedureBody :IDENT. 64 | declarations: ("CONST" (:IDENT '=' expression ';')*)? 65 | ("TYPE" (:IDENT '=' type ';')*)? 66 | ("VAR" (IdentList ':' type ';')*)? 67 | (ProcedureDeclaration ';')*. 68 | module: "MODULE" :IDENT ';' declarations 69 | ("BEGIN" StatementSequence)? "END" :IDENT '.'. 70 | 71 | top: module :end. 72 | """ 73 | builder = LexedGrammar(grammar_source) 74 | grammar = builder(IDENT = literal_kind('#IDENT'), 75 | INTEGER = literal_kind('#INTEGER')) 76 | 77 | ## builder.keywords 78 | #. set(['THEN', 'BEGIN', 'END', 'DO', 'OF', 'ARRAY', 'MODULE', 'ELSE', 'RECORD', 'WHILE', 'ELSIF', 'VAR', 'CONST', 'DIV', 'MOD', 'TYPE', 'OR', 'PROCEDURE', 'IF']) 79 | ## builder.literals 80 | #. set(['#', '<=', '>=', '&', ')', '(', '+', '*', '-', ',', '.', ':=', ':', '=', ';', '[', '>', ']', '<', '~']) 81 | 82 | def one_of(strings): 83 | # Sort longest first because re's '|' matches left-to-right, not greedily: 84 | alts = sorted(strings, key=len, reverse=True) 85 | return '|'.join(map(re.escape, alts)) 86 | 87 | a_literal = one_of(builder.literals) 88 | a_keyword = one_of(builder.keywords) 89 | 90 | lex_grammar_source = r""" token* :end. 91 | token : whitespace | :keyword :Token | :punct :Token | ident | integer. 92 | 93 | whitespace : /\s+/ | comment. 94 | comment : '(*' in_comment* '*)'. 95 | in_comment : comment | !'*)' :anyone. # XXX are comments nested in Oberon-0? 96 | 97 | ident : /([A-Za-z][A-Za-z0-9]*)/ :'#IDENT' :Token. 98 | integer : /(\d+)/ :'#INTEGER' :Token. 99 | """ 100 | lex_grammar = Grammar(lex_grammar_source)(Token = lambda s, kind=None: Token(kind or s, s), 101 | keyword = match(r'(%s)\b' % a_keyword), 102 | punct = match(r'(%s)' % a_literal)) 103 | 104 | ## import sys; sys.setrecursionlimit(5000) 105 | ## import glob 106 | ## from parson import exceptionally 107 | 108 | ## for filename in sorted(glob.glob('ob-bad/*.ob')): print exceptionally(lambda: test(filename)) 109 | #. testing ob-bad/badassign.ob 110 | #. (top, ('MODULE', 'badassign', ';', 'BEGIN'), ('1', ':=', '2', 'END', 'badassign', '.')) 111 | #. testing ob-bad/badcase.ob 112 | #. ((literal('') ((token)* end)), 'MODULE badcase;\n\nVAR\n avar : INTEGER;\n bvar : BOOLEAN;\n\nBEGIN\n CASE bvar OF\n 18 : avar := 19\n END;\n\n CASE 1 < 2 OF\n avar : avar := 3\n ', '| avar + 1 .. avar + 10 : avar := 5\n END;\n\n CASE avar OF\n 3 DIV 0 : avar := 1\n END\nEND badcase.\n') 113 | #. testing ob-bad/badfor.ob 114 | #. (top, ('MODULE', 'badfor', ';', 'CONST', 'aconst', '=', '10', ';', 'TYPE', 'atype', '=', 'INTEGER', ';', 'VAR', 'avar', ':', 'INTEGER', ';', 'bvar', ':', 'BOOLEAN', ';', 'cvar', ':', 'INTEGER', ';', 'BEGIN', 'FOR'), ('aconst', ':=', '1', 'TO', '10', 'DO', 'avar', ':=', '1', 'END', ';', 'FOR', 'atype', ':=', '1', 'TO', '10', 'DO', 'avar', ':=', '1', 'END', ';', 'FOR', 'bvar', ':=', 'FALSE', 'TO', 'TRUE', 'DO', 'avar', ':=', '42', 'END', ';', 'FOR', 'avar', ':=', '1', 'TO', '2', 'BY', 'cvar', '*', '2', 'DO', 'avar', ':=', '42', 'END', ';', 'FOR', 'dvar', ':=', '1', 'TO', '2', 'DO', 'dvar', ':=', '99', 'END', ';', 'FOR', 'avar', ':=', '8', 'TO', '10', 'BY', '3', 'DIV', '0', 'DO', 'cvar', ':=', '100', 'END', 'END', 'badfor', '.')) 115 | #. testing ob-bad/commentnoend.ob 116 | #. ((literal('') ((token)* end)), "MODULE commentnoend;\n (* started off well,\n but didn't finish\nEND commentnoend.\n", '') 117 | #. testing ob-bad/keywordasname.ob 118 | #. (top, ('MODULE',), ('VAR', ';', 'END', 'VAR', '.')) 119 | #. testing ob-bad/repeatsection.ob 120 | #. (top, ('MODULE', 'repeatsection', ';', 'CONST', 'aconst', '=', '10', ';'), ('CONST', 'aconst', '=', '20', ';', 'END', 'repeatsection', '.')) 121 | 122 | ## for filename in sorted(glob.glob('ob-ok/*.ob')): test(filename) 123 | #. testing ob-ok/arrayname.ob 124 | #. testing ob-ok/assign.ob 125 | #. testing ob-ok/badarg.ob 126 | #. testing ob-ok/badarray.ob 127 | #. testing ob-ok/badcond.ob 128 | #. testing ob-ok/badeq.ob 129 | #. testing ob-ok/badproc.ob 130 | #. testing ob-ok/badrecord.ob 131 | #. testing ob-ok/badwhile.ob 132 | #. testing ob-ok/comment.ob 133 | #. testing ob-ok/cond.ob 134 | #. testing ob-ok/condname.ob 135 | #. testing ob-ok/const.ob 136 | #. testing ob-ok/emptybody.ob 137 | #. testing ob-ok/emptydeclsections.ob 138 | #. testing ob-ok/emptymodule.ob 139 | #. testing ob-ok/factorial.ob 140 | #. testing ob-ok/gcd.ob 141 | #. testing ob-ok/intoverflow.ob 142 | #. testing ob-ok/keywordprefix.ob 143 | #. testing ob-ok/nominalarg.ob 144 | #. testing ob-ok/nonintconstant.ob 145 | #. testing ob-ok/nonlocalvar.ob 146 | #. testing ob-ok/nonmoduleasmodulename.ob 147 | #. testing ob-ok/proc.ob 148 | #. testing ob-ok/recordname.ob 149 | #. testing ob-ok/recurse.ob 150 | #. testing ob-ok/redefinteger.ob 151 | #. testing ob-ok/redeftrue.ob 152 | #. testing ob-ok/selfref.ob 153 | #. testing ob-ok/type.ob 154 | #. testing ob-ok/typenodecl.ob 155 | #. testing ob-ok/var.ob 156 | #. testing ob-ok/while.ob 157 | #. testing ob-ok/whilename.ob 158 | #. testing ob-ok/wrongmodulename.ob 159 | #. testing ob-ok/wrongprocedurename.ob 160 | 161 | def test(filename): 162 | print 'testing', filename 163 | with open(filename) as f: 164 | text = f.read() 165 | tokens = lex_grammar(text) 166 | if 0: 167 | for token in tokens: 168 | print token.kind, 169 | print 170 | if tokens is not None: 171 | grammar.top(tokens) 172 | -------------------------------------------------------------------------------- /eg_outline.py: -------------------------------------------------------------------------------- 1 | """ 2 | Parse an outline using indentation. 3 | After Higher Order Perl, section 8.6. 4 | """ 5 | 6 | import parson as P 7 | 8 | def Node(margin): 9 | return P.seclude(P.match(r'( {%d,})' % margin) 10 | + P.dynamic(lambda indent: 11 | (line + Node(len(indent)+1).star()) >> P.hug)) 12 | 13 | line = '* ' + P.match(r'(.*)\n?') 14 | 15 | outline = Node(0).star() + ~P.anyone 16 | 17 | 18 | eg = """\ 19 | * Hello 20 | * Aloha 21 | * Bonjour 22 | * Adieu 23 | * also 24 | * Whatever 25 | * yay?""" 26 | 27 | ## from pprint import pprint; pprint(outline(eg)) 28 | #. (('Hello', ('Aloha', ('Bonjour',), ('Adieu',)), ('also',)), 29 | #. ('Whatever', ('yay?',))) 30 | -------------------------------------------------------------------------------- /eg_phone_num.py: -------------------------------------------------------------------------------- 1 | """ 2 | The phone-number example at the top of 3 | https://github.com/modernserf/little-language-lab 4 | """ 5 | 6 | from parson import Grammar 7 | 8 | # const join = (...values) => values.join("") 9 | # const phone = lang` 10 | # Root = ~("+"? "1" _)? AreaCode ~(_ "-"? _) Exchange ~(_ "-"? _) Line 11 | # ${(areaCode, exchange, line) => ({ areaCode, exchange, line })} 12 | # AreaCode = "(" _ (D D D ${join}) _ ")" ${(_, __, digits) => digits} 13 | # | D D D ${join} 14 | # Exchange = D D D ${join} 15 | # Line = D D D D ${join} 16 | # D = %digit 17 | # _ = %whitespace* 18 | # ` 19 | # phone.match("+1 (800) 555-1234") 20 | # // { ok: true, value: { areaCode: "800", exchange: "555", line: "1234" } } 21 | 22 | 23 | # Version 1: just gimme the data. 24 | 25 | grammar1 = r""" 26 | Root : ('+'? '1' _)? AreaCode _ '-'? _ Exchange _ '-'? _ Line. 27 | AreaCode : '(' _ {D D D} _ ')' 28 | | {D D D}. 29 | Exchange : {D D D}. 30 | Line : {D D D D}. 31 | D = /\d/. 32 | _ = /\s*/. 33 | """ 34 | g1 = Grammar(grammar1)() 35 | ## g1.Root("+1 (800) 555-1234") 36 | #. ('800', '555', '1234') 37 | 38 | 39 | # Version 2, returning a dict. 40 | # We have to pass the dict constructor in as a semantic parameter, since 41 | # Python lacks template strings. 42 | 43 | grammar2 = r""" 44 | Root : ('+'? '1' _)? AreaCode _ '-'? _ Exchange _ '-'? _ Line :hug :dict. 45 | AreaCode : :'areaCode' ('(' _ {D D D} _ ')' 46 | | {D D D}) :hug. 47 | Exchange : :'exchange' {D D D} :hug. 48 | Line : :'line' {D D D D} :hug. 49 | D = /\d/. 50 | _ = /\s*/. 51 | """ 52 | parse_phone_number2 = Grammar(grammar2)(dict=dict).Root.expecting_one_result() 53 | ## parse_phone_number2("+1 (800) 555-1234") 54 | #. {'areaCode': '800', 'line': '1234', 'exchange': '555'} 55 | 56 | 57 | # Version 3, more my usual style. 58 | # (Pass in semantic actions for the main productions 59 | # and avoid the _ noise using a FNORD production.) 60 | 61 | from structs import Struct 62 | 63 | class PhoneNumber(Struct('area_code exchange line')): 64 | pass 65 | 66 | grammar3 = r""" 67 | Root : /[+]?1/? AreaCode '-'? Exchange '-'? Line :PhoneNumber. 68 | AreaCode : '(' AreaDigits ')' | AreaDigits. 69 | 70 | AreaDigits ~: /(\d\d\d)/. 71 | Exchange ~: /(\d\d\d)/. 72 | Line ~: /(\d\d\d\d)/. 73 | FNORD ~: /\s*/. 74 | """ 75 | g3 = Grammar(grammar3)(PhoneNumber=PhoneNumber) 76 | parse_phone_number3 = g3.Root.expecting_one_result() 77 | ## parse_phone_number3("+1 (800) 555-1234") 78 | #. PhoneNumber('800', '555', '1234') 79 | -------------------------------------------------------------------------------- /eg_pother.py: -------------------------------------------------------------------------------- 1 | """ 2 | Parser adapted from github.com/darius/pother as an example and for testing. 3 | """ 4 | 5 | from parson import Grammar 6 | 7 | def make_var(v): return v 8 | def make_const(c): return c 9 | def make_lam(v, e): return '(lambda (%s) %s)' % (v, e) 10 | def make_app(e1, e2): return '(%s %s)' % (e1, e2) 11 | def make_send(e1, e2): return '(%s <- %s)' % (e1, e2) 12 | def make_lit_sym(v): return '(quote %s)' % v 13 | 14 | def make_let(decls, e): 15 | return '(let %s %s)' % (' '.join(decls), e) 16 | 17 | def make_defer(v): return '(defer %s)' % v 18 | def make_bind(v, e): return '(bind %s %s)' % (v, e) 19 | def make_eqn(vs, e): 20 | assert isinstance(vs, tuple) 21 | assert isinstance(e, str), "hey %r" % (e,) 22 | return '((%s) %s)' % (' '.join(vs), e) 23 | 24 | def make_list_pattern(*params): 25 | return '(list %s)' % ' '.join(map(str, params)) 26 | 27 | def make_list_expr(es): 28 | return '(list %s)' % ' '.join(map(str, es)) 29 | 30 | def make_case(e, cases): return ('(case %s %s)' 31 | % (e, ' '.join('(%s %s)' % pair 32 | for pair in cases))) 33 | 34 | def foldr(f, z, xs): 35 | for x in reversed(xs): 36 | z = f(x, z) 37 | return z 38 | 39 | def fold_app(f, fs): return reduce(make_app, fs, f) 40 | def fold_apps(fs): return reduce(make_app, fs) 41 | def fold_send(f, fs): return reduce(make_send, fs, f) 42 | def fold_lam(vp, e): return foldr(make_lam, e, vp) 43 | 44 | # XXX not sure about paramlist here: 45 | fold_infix_app = lambda _left, _op, _right: \ 46 | fold_app(_op, [fold_apps(_left), _right]) 47 | 48 | # XXX & was \ for lambda 49 | 50 | # XXX 51 | # [Param,operator,_,Param, 52 | # lambda _left,_op,_right: [_op, _left, _right]] 53 | 54 | toy_grammar = Grammar(r""" E :end. 55 | 56 | E : Fp '`' V '`' E :fold_infix_app 57 | | Fp :fold_apps 58 | | '&' Vp '=>' E :fold_lam 59 | | "let" Decls E :make_let 60 | | "case" E Cases :make_case. 61 | 62 | Cases : Case+ :hug. 63 | Case : '|' Param '=>' E :hug. 64 | 65 | Param : Const 66 | | V 67 | | '(' Param ')' 68 | | '[' ParamList ']'. 69 | 70 | ParamList : Param ',' Param :make_list_pattern. 71 | 72 | Decls : Decl+ :hug. 73 | Decl : "defer" V ';' :make_defer 74 | | "bind" V '=' E ';' :make_bind 75 | | Vp '=' E ';' :make_eqn. 76 | 77 | Fp : F+ :hug. 78 | F : Const :make_const 79 | | V :make_var 80 | | '(' E ')' 81 | | '{' F Fp '}' :fold_send 82 | | '[' E ** ',' ']' :hug :make_list_expr. 83 | 84 | Vp : V+ :hug. 85 | V : Identifier 86 | | Operator. 87 | 88 | Identifier : /(?!let\b|case\b|defer\b|bind\b)([A-Za-z_]\w*)/. 89 | Operator : /(<=|:=|[!+-.])/. 90 | 91 | Const : '.' V :make_lit_sym 92 | | /"([^"]*)"/ :repr 93 | | /(-?\d+)/ 94 | | '(' ')' :'()' 95 | | '[' ']' :'[]'. 96 | 97 | FNORD ~: /\s*/. 98 | """)(**globals()).expecting_one_result() 99 | 100 | ## toy_grammar('.+') 101 | #. '(quote +)' 102 | ## toy_grammar('0 .+') 103 | #. '(0 (quote +))' 104 | 105 | ## print toy_grammar('0') 106 | #. 0 107 | ## print toy_grammar('x') 108 | #. x 109 | ## print toy_grammar('let x=y; x') 110 | #. (let ((x) y) x) 111 | ## print toy_grammar.attempt('') 112 | #. None 113 | ## print toy_grammar('x x . y') 114 | #. ((x x) (quote y)) 115 | ## print toy_grammar.attempt('(when (in the)') 116 | #. None 117 | ## print toy_grammar('&M => (&f => M (f f)) (&f => M (f f))') 118 | #. (lambda (M) ((lambda (f) (M (f f))) (lambda (f) (M (f f))))) 119 | ## print toy_grammar('&a b c => a b') 120 | #. (lambda (a) (lambda (b) (lambda (c) (a b)))) 121 | 122 | ## toy_grammar('x') 123 | #. 'x' 124 | ## toy_grammar('let x=y; x') 125 | #. '(let ((x) y) x)' 126 | ## toy_grammar.attempt('') 127 | ## toy_grammar('x x . y') 128 | #. '((x x) (quote y))' 129 | ## toy_grammar.attempt('(when (in the)') 130 | ## toy_grammar('&M => (&f => M (f f)) (&f => M (f f))') 131 | #. '(lambda (M) ((lambda (f) (M (f f))) (lambda (f) (M (f f)))))' 132 | ## toy_grammar('&a b c => a b') 133 | #. '(lambda (a) (lambda (b) (lambda (c) (a b))))' 134 | 135 | mint = r""" 136 | let make_mint name = 137 | case make_brand name 138 | | [sealer, unsealer] => 139 | 140 | let defer mint; 141 | real_mint name msg = case msg 142 | 143 | | .__print_on => &out => out .print (name .. "'s mint") 144 | 145 | | .make_purse => &initial_balance => 146 | (let _ = assert (is_int initial_balance); 147 | _ = assert (0 .<= initial_balance); 148 | balance = make_box initial_balance; 149 | decr amount = (let _ = assert (is_int amount); 150 | _ = assert ((0 .<= amount) 151 | .and (amount .<= balance)); 152 | balance .:= (balance .! .- amount)); 153 | purse msg = case msg 154 | | .__print_on => &out => 155 | out .print ("has " .. (to_str balance) 156 | .. name .. " bucks") 157 | | .balance => balance .! 158 | | .sprout => mint .make_purse 0 159 | | .get_decr => sealer .seal decr 160 | | .deposit => &amount source => 161 | (let _ = unsealer .unseal (source .get_decr) amount; 162 | balance .:= (balance .! .+ amount)); 163 | purse); 164 | 165 | bind mint = real_mint; 166 | mint; 167 | 168 | make_mint 169 | """ 170 | #try: print toy_grammar(mint) 171 | #except Unparsable, e: 172 | # print e.args[1][0] 173 | # print 'XXX' 174 | # print e.args[1][1] 175 | #print toy_grammar('let defer mint; mint') 176 | ## print toy_grammar(mint) 177 | #. (let ((make_mint name) (case (make_brand name) ((list sealer unsealer) (let (defer mint) ((real_mint name msg) (case msg ((quote __print_on) (lambda (out) ((out (quote print)) ((name (quote .)) "'s mint")))) ((quote make_purse) (lambda (initial_balance) (let ((_) (assert (is_int initial_balance))) ((_) (assert ((0 (quote <=)) initial_balance))) ((balance) (make_box initial_balance)) ((decr amount) (let ((_) (assert (is_int amount))) ((_) (assert ((((0 (quote <=)) amount) (quote and)) ((amount (quote <=)) balance)))) ((balance (quote :=)) (((balance (quote !)) (quote -)) amount)))) ((purse msg) (case msg ((quote __print_on) (lambda (out) ((out (quote print)) (((((('has ' (quote .)) (to_str balance)) (quote .)) name) (quote .)) ' bucks')))) ((quote balance) (balance (quote !))) ((quote sprout) ((mint (quote make_purse)) 0)) ((quote get_decr) ((sealer (quote seal)) decr)) ((quote deposit) (lambda (amount) (lambda (source) (let ((_) (((unsealer (quote unseal)) (source (quote get_decr))) amount)) ((balance (quote :=)) (((balance (quote !)) (quote +)) amount)))))))) purse))))) (bind mint real_mint) mint)))) make_mint) 178 | 179 | mintskel = r""" 180 | let make_mint name = 181 | case make_brand name 182 | | [sealer, unsealer] => 183 | 184 | let defer mint; 185 | mint; 186 | 187 | make_mint 188 | """ 189 | ## print toy_grammar(mintskel) 190 | #. (let ((make_mint name) (case (make_brand name) ((list sealer unsealer) (let (defer mint) mint)))) make_mint) 191 | 192 | voting = r""" 193 | let make_one_shot f = 194 | let armed = make_box True; 195 | &x => let _ = assert (armed .! .not); 196 | _ = armed .:= False; 197 | f x; 198 | 199 | start_voting voters choices timer = 200 | let ballot_box = map (&_ => make_box 0) choices; 201 | poll voter = 202 | let make_checkbox pair = 203 | case pair 204 | | [choice, tally] => 205 | [choice, make_one_shot (&_ => 206 | tally .:= (tally .! .+ 1))]; 207 | ballot = map make_checkbox (zip choices ballot_box); 208 | {voter ballot}; 209 | _ = for_each poll voters; 210 | [close_polls, totals]; 211 | 212 | start_voting 213 | """ 214 | ## print toy_grammar(voting) 215 | #. (let ((make_one_shot f) (let ((armed) (make_box True)) (lambda (x) (let ((_) (assert ((armed (quote !)) (quote not)))) ((_) ((armed (quote :=)) False)) (f x))))) ((start_voting voters choices timer) (let ((ballot_box) ((map (lambda (_) (make_box 0))) choices)) ((poll voter) (let ((make_checkbox pair) (case pair ((list choice tally) (list (((choice ,) make_one_shot) (lambda (_) ((tally (quote :=)) (((tally (quote !)) (quote +)) 1)))))))) ((ballot) ((map make_checkbox) ((zip choices) ballot_box))) (voter <- ballot))) ((_) ((for_each poll) voters)) (list ((close_polls ,) totals)))) start_voting) 216 | -------------------------------------------------------------------------------- /eg_precedence.py: -------------------------------------------------------------------------------- 1 | """ 2 | Infix parsing with operator precedence (inefficient implementation). 3 | """ 4 | 5 | from parson import Grammar, recur, seclude, either, fail 6 | 7 | def PrececedenceParser(primary_expr, table): 8 | return foldr(lambda make_expr, subexpr: make_expr(subexpr), 9 | primary_expr, 10 | table) 11 | 12 | def LeftAssoc(*pairs): 13 | return lambda subexpr: \ 14 | seclude(subexpr + alt([peg + subexpr + oper 15 | for peg, oper in pairs]).star()) 16 | 17 | def RightAssoc(*pairs): 18 | return lambda subexpr: \ 19 | recur(lambda expr: 20 | seclude(subexpr + alt([peg + expr + oper 21 | for peg, oper in pairs]).maybe())) 22 | 23 | def alt(pegs): 24 | return foldr(either, fail, pegs) 25 | 26 | def foldr(f, z, xs): 27 | for x in reversed(xs): 28 | z = f(x, z) 29 | return z 30 | 31 | 32 | # eg_calc.py example 33 | 34 | from operator import * 35 | from parson import delay 36 | 37 | _ = delay(lambda: g.FNORD) 38 | exp3 = delay(lambda: g.exp3) 39 | 40 | exp1 = PrececedenceParser(exp3, [ 41 | LeftAssoc(('*'+_, mul), ('//'+_, div), ('/'+_, truediv), ('%'+_, mod)), 42 | RightAssoc(('^'+_, pow)), 43 | ]) 44 | 45 | exps = PrececedenceParser(exp1, [ 46 | LeftAssoc(('+'+_, add), ('-'+_, sub)), 47 | ]) 48 | 49 | g = Grammar(r""" 50 | :exps :end. 51 | 52 | exp3 : '(' :exps ')' 53 | | '-' :exp1 :neg 54 | | /(\d+)/ :int. 55 | 56 | FNORD ~= /\s*/. 57 | """)(**globals()) 58 | 59 | ## g('42 *(5-3) + -2^2') 60 | #. (80,) 61 | ## g('2^3^2') 62 | #. (512,) 63 | ## g('5-3-1') 64 | #. (1,) 65 | ## g('3//2') 66 | #. (1,) 67 | ## g('3/2') 68 | #. (1.5,) 69 | -------------------------------------------------------------------------------- /eg_puzzler.py: -------------------------------------------------------------------------------- 1 | """ 2 | Port of ~/git/mccarthy-to-bryant/puzzler.py 3 | Uses modules from that repo (not included in this one). 4 | """ 5 | 6 | import operator 7 | from parson import Grammar, Unparsable 8 | import dd 9 | 10 | def mk_var(name): 11 | return dd.Variable(enter(name)) 12 | 13 | var_names = [] 14 | def enter(name): 15 | try: 16 | return var_names.index(name) 17 | except ValueError: 18 | var_names.append(name) 19 | return len(var_names) - 1 20 | 21 | # This grammar is complicated by requiring that whitespace mean AND 22 | # *only* within a line, not spanning lines -- an implicit AND spanning 23 | # lines would be too error prone. 24 | g = r""" expr :end. 25 | 26 | expr: sentence (',' expr :And)?. 27 | 28 | sentence: sum ('=' sum :Equiv)?. 29 | 30 | sum: term ( '|' sum :Or 31 | | '=>' term :Implies )?. 32 | 33 | term: factor (factor :And)* FNORD. 34 | 35 | factor ~: '~'_ primary :Not 36 | | primary. 37 | 38 | primary ~: '(' FNORD expr ')'_ 39 | | id :Var. 40 | 41 | id ~: /([A-Za-z_]\w*)/_. 42 | 43 | _ ~: /[ \t]*/. # Spaces within a line. 44 | FNORD ~: /\s+|#.*/*. # Spaces/comments that may span lines. 45 | """ 46 | 47 | parse = Grammar(g)( 48 | Equiv = dd.Equiv, 49 | Implies = dd.Implies, 50 | And = operator.and_, 51 | Or = operator.or_, 52 | Not = operator.inv, 53 | Var = mk_var 54 | ).expecting_one_result() 55 | 56 | def solve(condition): 57 | if dd.is_valid(condition): 58 | print("Valid.") 59 | else: 60 | show(dd.satisfy(condition, 1)) 61 | 62 | def show(opt_env): 63 | if opt_env is None: 64 | print("Unsatisfiable.") 65 | else: 66 | for k, v in sorted(opt_env.items()): 67 | if k is not None: 68 | print("%s%s" % ("" if v else "~", var_names[k])) 69 | 70 | ## solve(parse(' hey (there | ~there), ~hey | ~there')) 71 | #. hey 72 | #. ~there 73 | ## solve(parse(' hey (there, ~there)')) 74 | #. Unsatisfiable. 75 | ## solve(parse('a=>b = ~b=>~a')) 76 | #. Valid. 77 | 78 | 79 | def main(filename, text): 80 | try: 81 | problem = parse(text) 82 | except Unparsable as e: 83 | syntax_error(e, filename) 84 | sys.exit(1) 85 | else: 86 | solve(problem) 87 | 88 | # TODO: extract something I can stick in the library 89 | # let's try writing something similar for oberon0-with-lexer, to triangulate 90 | 91 | def syntax_error(e, filename): 92 | line_no, prefix, suffix = where(e) 93 | prefix, suffix = sanitize(prefix), sanitize(suffix) 94 | sys.stderr.write("%s:%d:%d: Syntax error\n" % (filename, line_no, len(prefix))) 95 | sys.stderr.write(' ' + prefix + suffix + '\n') 96 | sys.stderr.write(' ' + ' '*len(prefix) + '^\n') 97 | 98 | def where(e): 99 | before, after = e.failure 100 | line_no = before.count('\n') 101 | prefix = (before+'\n').splitlines()[line_no] 102 | suffix = (after+'\n').splitlines()[0] # XXX what if right on newline? 103 | return line_no+1, prefix, suffix 104 | 105 | def sanitize(s): 106 | "Make s predictably printable, sans control characters like tab." 107 | return ''.join(c if ' ' <= c < chr(127) else ' ' # XXX crude 108 | for c in s) 109 | 110 | 111 | if __name__ == '__main__': 112 | import sys 113 | main('stdin', sys.stdin.read()) # (try it on carroll or wise-pigs) 114 | -------------------------------------------------------------------------------- /eg_regex.py: -------------------------------------------------------------------------------- 1 | """ 2 | Parse a regular expression and generate the strings it matches. 3 | Generator from 4 | http://www.udacity.com/wiki/CS212%20Unit%203%20Code?course=cs212#regex_generatorpy 5 | http://forums.udacity.com/questions/5008809/unit3-18-startx-paramater-on-genseq-is-a-hack#cs212 6 | and in embryo 7 | https://github.com/darius/halp/blob/master/examples/learn-the-hell-out-of-regular-expressions/whats_a_regex_soln.py 8 | """ 9 | 10 | from parson import Grammar, join 11 | 12 | def generate(regex, Ns): 13 | "Return the strings matching regex whose length is in Ns." 14 | return sorted(parser(regex)(Ns), 15 | key=lambda s: (len(s), s)) 16 | 17 | def literal(s): return lambda Ns: set([s]) if len(s) in Ns else null 18 | def either(x, y): return lambda Ns: x(Ns) | y(Ns) 19 | def plus(x): return chain(x, star(x)) 20 | def star(x): return lambda Ns: optional(chain(nonempty(x), star(x)))(Ns) 21 | def nonempty(x): return lambda Ns: x(Ns - set([0])) 22 | def oneof(chars): return lambda Ns: set(chars) if 1 in Ns else null 23 | def chain(x, y): return lambda Ns: genseq(x, y, Ns) 24 | def optional(x): return either(empty(), x) 25 | def dot(): return oneof('?') # (Could be more, for lots more output.) 26 | def empty(): return literal('') 27 | 28 | null = frozenset([]) 29 | 30 | def genseq(x, y, Ns): 31 | """Return the set of matches to xy whose total length is in Ns. We 32 | ask y only for lengths that are remainders after an x-match in 33 | 0..max(Ns). (And we call neither x nor y if there are no Ns.)""" 34 | if not Ns: 35 | return null 36 | xmatches = x(set(range(max(Ns)+1))) 37 | Ns_x = set(len(m) for m in xmatches) 38 | Ns_y = set(n-m for n in Ns for m in Ns_x if n-m >= 0) 39 | ymatches = y(Ns_y) 40 | return set(m1+m2 for m1 in xmatches for m2 in ymatches if len(m1+m2) in Ns) 41 | 42 | grammar = Grammar(r""" exp :end. 43 | exp : term ('|' exp :either)* 44 | | :empty. 45 | term : factor (term :chain)*. 46 | factor : primary ( '*' :star 47 | | '+' :plus 48 | | '?' :optional 49 | )?. 50 | primary : '(' exp ')' 51 | | '[' char* ']' :join :oneof 52 | | '.' :dot 53 | | /\\(.)/ :literal 54 | | /([^.()*+?|[\]])/ :literal. 55 | char : /\\(.)/ 56 | | /([^\]])/. 57 | """) 58 | parser = grammar(**globals()).expecting_one_result() 59 | 60 | ## generate('.+', range(5)) 61 | #. ['?', '??', '???', '????'] 62 | ## generate('a[xy]+z()*|c.hi', range(5)) 63 | #. ['axz', 'ayz', 'axxz', 'axyz', 'ayxz', 'ayyz', 'c?hi'] 64 | ## generate('(Chloe|Yvette), a( precocious)? (toddler|writer)', range(28)) 65 | #. ['Chloe, a writer', 'Chloe, a toddler', 'Yvette, a writer', 'Yvette, a toddler', 'Chloe, a precocious writer', 'Chloe, a precocious toddler', 'Yvette, a precocious writer'] 66 | 67 | ## parser.attempt('{"hi"](') 68 | -------------------------------------------------------------------------------- /eg_roman.py: -------------------------------------------------------------------------------- 1 | """ 2 | Convert from roman numeral to int. 3 | """ 4 | 5 | from parson import Grammar 6 | 7 | g = Grammar(r""" 8 | numeral = digit+ :end. 9 | digit = 'CM' :'900' | 'CD' :'400' | 'XC' :'90' | 'XL' :'40' | 'IX' :'9' | 'IV' :'4' 10 | | 'M' :'1000' | 'D' :'500' | 'C' :'100' | 'L' :'50' | 'X' :'10' | 'V' :'5' | 'I' :'1'. 11 | """)() 12 | 13 | def int_from_roman(string): 14 | return sum(map(int, g.numeral(string.strip()))) 15 | 16 | ## int_from_roman('MCMLXXIX') 17 | #. 1979 18 | -------------------------------------------------------------------------------- /eg_templite.py: -------------------------------------------------------------------------------- 1 | """ 2 | A template language, similar to templite by Ned Batchelder. 3 | https://github.com/aosabook/500lines/tree/master/template-engine 4 | (Still missing a few features.) 5 | """ 6 | 7 | from parson import Grammar 8 | from structs import Struct, Visitor 9 | 10 | grammar = r""" block :end. 11 | 12 | block: chunk* :hug :Block. 13 | 14 | chunk: '{#' (!'#}' /./)* '#}' 15 | | '{{'_ expr '}}' :Expr 16 | | '{%'_ 'if'_ expr '%}' block '{%'_ 'endif'_ '%}' :If 17 | | '{%'_ 'for'_ ident _ 'in'_ expr '%}' block '{%'_ 'endfor'_ '%}' :For 18 | | (!/{[#{%]/ /(.)/)+ :join :Literal. 19 | 20 | expr: access ('|' function :Call)* _ . 21 | access: ident :VarRef ('.' ident :Access)*. 22 | function: ident. 23 | 24 | ident: /([A-Za-z_][A-Za-z_0-9]*)/. 25 | 26 | _: /\s*/. 27 | """ 28 | 29 | class Block(Struct('chunks')): pass 30 | class Literal(Struct('string')): pass 31 | class If(Struct('expr block')): pass 32 | class For(Struct('variable expr block')): pass 33 | class Expr(Struct('expr')): pass 34 | class VarRef(Struct('variable')): pass 35 | class Access(Struct('base attribute')): pass 36 | class Call(Struct('operand function')): pass 37 | 38 | parse = Grammar(grammar)(**globals()).expecting_one_result() 39 | 40 | def compile_template(text): 41 | code = gen(parse(text)) 42 | env = {} 43 | exec(code, env) 44 | return env['_expand'] 45 | 46 | def gen(template): 47 | py = """\ 48 | def _expand(_context): 49 | _acc = [] 50 | _append = _acc.append 51 | %s 52 | %s 53 | return ''.join(_acc)""" 54 | decls = '\n'.join('v_%s = _context[%r]' % (name, name) 55 | for name in free_vars(template)) 56 | return py % (indent(decls), indent(gen_visitor(template))) 57 | 58 | class Gen(Visitor): 59 | def Block(self, t): return '\n'.join(map(self, t.chunks)) 60 | def Literal(self, t): return '_append(%r)' % t.string 61 | def If(self, t): return ('if %s:\n %s' 62 | % (self(t.expr), 63 | indent(self(t.block)))) 64 | def For(self, t): return ('for v_%s in %s:\n %s' 65 | % (t.variable, self(t.expr), 66 | indent(self(t.block)))) 67 | def Expr(self, t): return '_append(str(%s))' % self(t.expr) 68 | def VarRef(self, t): return 'v_%s' % t.variable 69 | def Access(self, t): return '%s.%s' % (self(t.base), t.attribute) 70 | def Call(self, t): return '%s(%s)' % (t.function, self(t.operand)) 71 | 72 | gen_visitor = Gen() 73 | 74 | class FreeVars(Visitor): 75 | def Block(self, t): return set().union(*map(self, t.chunks)) 76 | def Literal(self, t): return set() 77 | def If(self, t): return self(t.expr) | self(t.block) 78 | def For(self, t): return ((self(t.expr) | self(t.block)) 79 | - set([t.variable])) 80 | def Expr(self, t): return self(t.expr) 81 | def VarRef(self, t): return set([t.variable]) 82 | def Access(self, t): return self(t.base) 83 | def Call(self, t): return self(t.operand) 84 | 85 | free_vars = FreeVars() 86 | 87 | def indent(s): 88 | return s.replace('\n', '\n ') 89 | 90 | ## parse('hello {{world}} yay') 91 | #. Block((Literal('hello '), Expr(VarRef('world')), Literal(' yay'))) 92 | 93 | ## print gen(parse('hello {{world}} yay')) 94 | #. def _expand(_context): 95 | #. _acc = [] 96 | #. _append = _acc.append 97 | #. v_world = _context['world'] 98 | #. _append('hello ') 99 | #. _append(str(v_world)) 100 | #. _append(' yay') 101 | #. return ''.join(_acc) 102 | 103 | ## f = compile_template('hello {{world}} yay'); print f(dict(world="globe")) 104 | #. hello globe yay 105 | 106 | ## print gen(parse('{% if foo.bar %} {% for x in xs|ok %} {{x}} {% endfor %} yay {% endif %}')) 107 | #. def _expand(_context): 108 | #. _acc = [] 109 | #. _append = _acc.append 110 | #. v_xs = _context['xs'] 111 | #. v_foo = _context['foo'] 112 | #. if v_foo.bar: 113 | #. _append(' ') 114 | #. for v_x in ok(v_xs): 115 | #. _append(' ') 116 | #. _append(str(v_x)) 117 | #. _append(' ') 118 | #. _append(' yay ') 119 | #. return ''.join(_acc) 120 | 121 | ## f = compile_template('hello {%for x in xs%} whee{{x}} {% endfor %} yay'); print f(dict(xs='abc')) 122 | #. hello wheea wheeb wheec yay 123 | 124 | ## f = compile_template(' {%if x%} whee{{x}} {% endif %} yay {%if y%} ok{{y}} {% endif %}'); print f(dict(x='', y='42')) 125 | #. yay ok42 126 | -------------------------------------------------------------------------------- /eg_trees.py: -------------------------------------------------------------------------------- 1 | """ 2 | Testing out tree parsing. 3 | """ 4 | 5 | from operator import add, sub 6 | from parson import anyone, capture, delay, nest, one_of, one_that 7 | 8 | end = ~anyone 9 | 10 | def match(p, x): return (p + end)([x]) 11 | 12 | def an_instance(type_): return one_that(lambda x: isinstance(x, type_)) 13 | 14 | def capture1(p): return capture(p) >> (lambda x: x[0]) # Ouch 15 | var = capture1(anyone) 16 | ## (var + var)(eg) 17 | #. ('+', 2) 18 | 19 | calc = delay(lambda: 20 | nest(one_of('+') + calc + calc + end) >> add 21 | | nest(one_of('-') + calc + calc + end) >> sub 22 | | capture1(an_instance(int))) 23 | 24 | eg = ['+', 2, 3] 25 | ## match(calc, eg) 26 | #. (5,) 27 | 28 | eg2 = ['+', ['-', 2, 4], 3] 29 | ## match(calc, eg2) 30 | #. (1,) 31 | 32 | 33 | # Exercise: transforming trees with generic walks 34 | 35 | flatten1 = delay(lambda: 36 | nest(one_of('+') + flatten1.star() + end) 37 | | capture1(an_instance(int))) 38 | 39 | ## match(flatten1, ['+', ['+', ['+', 1, ['+', 2]]]]) 40 | #. (1, 2) 41 | 42 | # Figure 2.7 in the OMeta thesis, more or less: 43 | 44 | def walk(p, q=capture1(an_instance(int))): 45 | return ( nest(one_of('+') + p.star() + end) >> tag('+') 46 | | nest(one_of('-') + p.star() + end) >> tag('-') 47 | | q) 48 | 49 | def tag(constant): 50 | return lambda *args: (constant,) + args 51 | 52 | flatten2 = delay(lambda: 53 | nest(one_of('+') + flatten2 + end) 54 | | nest(one_of('+') + inside.star() + end) >> tag('+') 55 | | walk(flatten2)) 56 | inside = delay(lambda: 57 | nest(one_of('+') + inside.star() + end) 58 | | flatten2) 59 | 60 | ## match(flatten2, ['+', ['+', ['+', 1, ['+', 2], ['+', 3, 4]]]]) 61 | #. (('+', 1, 2, 3, 4),) 62 | -------------------------------------------------------------------------------- /eg_url.py: -------------------------------------------------------------------------------- 1 | """ 2 | Based on https://www.w3.org/Addressing/URL/5_BNF.html 3 | because I'm a lazy bastard; it's clearly not up to date 4 | (as shown by 'right=wrong' below). 5 | """ 6 | 7 | from parson import Grammar 8 | 9 | grammar = r""" url :end. 10 | 11 | url : httpaddress | mailtoaddress. 12 | 13 | mailtoaddress : {'mailto'} ':' :'protocol' 14 | {(!'@' xalpha)+} :'user' 15 | '@' {hostname} :'host'. 16 | 17 | httpaddress : {'http'} '://' :'protocol' hostport ('/' path)? ('?' search)? ('#' fragment)?. 18 | 19 | hostport : host (':' port)?. 20 | 21 | host : {hostname | hostnumber} :'host'. 22 | hostname : ialpha ++ '.'. 23 | hostnumber : digits '.' digits '.' digits '.' digits. 24 | 25 | port : {digits} :'port'. 26 | 27 | path : {(segment '/')* segment?} :'path'. 28 | segment : xpalpha+. 29 | 30 | search : {(xalpha+) ++ '+'} :'search'. 31 | fragment : {xalpha+} :'fragment'. 32 | 33 | xalpha : alpha | digit | safe | extra | escape. 34 | xpalpha : xalpha | '+'. 35 | 36 | ialpha : alpha xalpha*. 37 | 38 | alpha : /[a-zA-Z]/. 39 | digit : /\d/. 40 | digits : /\d+/. 41 | safe : /[$_@.&+-]/. 42 | extra : /[!*"'(),]/. 43 | escape : '%' hex hex. 44 | hex : /[\dA-Fa-f]/. 45 | """ 46 | g = Grammar(grammar)() 47 | 48 | ## g.attempt('true') 49 | ## g('mailto:coyote@acme.com') 50 | #. ('mailto', 'protocol', 'coyote', 'user', 'acme.com', 'host') 51 | ## g('http://google.com') 52 | #. ('http', 'protocol', 'google.com', 'host') 53 | ## g.attempt('http://google.com//') 54 | ## g('http://en.wikipedia.org/wiki/Uniform_resource_locator') 55 | #. ('http', 'protocol', 'en.wikipedia.org', 'host', 'wiki/Uniform_resource_locator', 'path') 56 | ## g.attempt('http://wry.me/fun/toys/yes.html?right=wrong#fraggle') 57 | ## g( 'http://wry.me/fun/toys/yes.html?rightwrong#fraggle') 58 | #. ('http', 'protocol', 'wry.me', 'host', 'fun/toys/yes.html', 'path', 'rightwrong', 'search', 'fraggle', 'fragment') 59 | -------------------------------------------------------------------------------- /eg_wc.py: -------------------------------------------------------------------------------- 1 | """ 2 | Word-count function. 3 | """ 4 | 5 | from parson import match, feed 6 | 7 | blanks = match(r'\s*') 8 | marks = match(r'\S+') 9 | zero = feed(lambda: 0) 10 | add1 = feed(lambda n: n+1) 11 | 12 | wc = zero + blanks + (add1 + marks + blanks).star() 13 | 14 | ## wc(' ') 15 | #. (0,) 16 | ## wc('a b c ') 17 | #. (3,) 18 | ## wc(example_input) 19 | #. (10,) 20 | 21 | example_input = r""" hi there hey 22 | how are you? 23 | fine. 24 | 25 | thanks. 26 | 27 | ok then.""" 28 | -------------------------------------------------------------------------------- /microses.py: -------------------------------------------------------------------------------- 1 | """ 2 | Abstract syntax for MicroSES. 3 | """ 4 | 5 | from structs import Struct as S 6 | 7 | class Data(S('string')): 8 | pass 9 | 10 | class Array(S('args')): 11 | pass 12 | 13 | class Object(S('props')): 14 | pass 15 | 16 | class Variable(S('name')): 17 | pass 18 | 19 | class ExprHole(S('')): 20 | pass 21 | 22 | class MatchData(S('string')): 23 | pass 24 | 25 | class MatchArray(S('params')): 26 | pass 27 | 28 | class MatchObject(S('prop_params')): 29 | pass 30 | 31 | class MatchVariable(S('name')): 32 | pass 33 | 34 | class PatternHole(S('')): 35 | pass 36 | 37 | class Spread(S('expr')): 38 | pass 39 | 40 | class Rest(S('pattern')): 41 | pass 42 | 43 | class Optional(S('name expr')): 44 | pass 45 | 46 | class SpreadObj(S('expr')): 47 | pass 48 | 49 | class Prop(S('key expr_opt')): 50 | pass 51 | 52 | class RestObj(S('pattern')): 53 | pass 54 | 55 | class MatchProp(S('key pattern')): 56 | pass 57 | 58 | class OptionalProp(S('name expr')): 59 | pass 60 | 61 | class Computed(S('expr')): 62 | pass 63 | 64 | class Quasi(S('string')): 65 | pass 66 | 67 | def QUnpack(): 68 | pass 69 | 70 | class Get(S('primary name')): 71 | pass 72 | 73 | class Index(S('primary expr')): 74 | pass 75 | 76 | class Call(S('primary args')): 77 | pass 78 | 79 | class Tag(S('primary quasi')): 80 | pass 81 | 82 | class GetLater(S('primary name')): 83 | pass 84 | 85 | class IndexLater(S('primary expr')): 86 | pass 87 | 88 | class CallLater(S('primary args')): 89 | pass 90 | 91 | class TagLater(S('primary quasi')): 92 | pass 93 | 94 | class Delete(S('field_expr')): 95 | pass 96 | 97 | class UnaryOp(S('op expr')): 98 | pass 99 | 100 | class BinaryOp(S('expr1 op expr2')): 101 | pass 102 | 103 | class AndThen(S('expr1 expr2')): 104 | pass 105 | 106 | class OrElse(S('expr1 expr2')): 107 | pass 108 | 109 | class Assign(S('lvalue op expr')): 110 | pass 111 | 112 | class Arrow(S('params block')): 113 | pass 114 | 115 | class Lambda(S('params expr')): 116 | pass 117 | 118 | class If(S('test then else_opt')): 119 | pass 120 | 121 | class For(S('decls test_opt update_opt block')): 122 | pass 123 | 124 | class ForOf(S('decl_op binding expr block')): 125 | pass 126 | 127 | class Decl(S('decl_op bindings')): 128 | pass 129 | 130 | class While(S('expr block')): 131 | pass 132 | 133 | class Try(S('block catcher_opt finalizer_opt')): 134 | pass 135 | 136 | class Switch(S('expr branches')): 137 | pass 138 | 139 | class Debugger(S('')): 140 | pass 141 | 142 | class Return(S('expr_opt')): 143 | pass 144 | 145 | class Break(S('')): 146 | pass 147 | 148 | class Throw(S('expr')): 149 | pass 150 | 151 | class Branch(S('labels body terminator')): 152 | pass 153 | 154 | class Case(S('expr')): 155 | pass 156 | 157 | class Default(S('')): 158 | pass 159 | 160 | class Block(S('body')): 161 | pass 162 | -------------------------------------------------------------------------------- /peg.py: -------------------------------------------------------------------------------- 1 | """ 2 | A PEG parser using explicit control instead of recursion. 3 | Avoids Python stack overflow. 4 | Also a step towards compiling instead of interpreting. 5 | 6 | to do: test this: optimize q==y with Nip 7 | to do: produce code instead of closures 8 | to do: bounce less often 9 | """ 10 | 11 | # peg constructors 12 | 13 | Fail = 'fail', None 14 | def Alter(fn): return 'alter', fn 15 | def Item(ok): return 'item', ok 16 | def Ref(name): return 'ref', name 17 | def Chain(q, r): return 'chain', (q, r) 18 | def Cond(q, n, y): return 'cond', (q, n, y) 19 | 20 | 21 | # (program, peg, fail_cont, success_cont) -> cont 22 | # where program: dict(string -> cont) 23 | 24 | def translate(pr, peg, f, s): 25 | tag, arg = peg 26 | if tag == 'fail': return KDrop(f) 27 | elif tag == 'alter': return KAlter(arg, s) 28 | elif tag == 'item': return KItem(arg, f, s) 29 | elif tag == 'ref': return KCall(pr, arg, f, s) 30 | elif tag == 'chain': return translate(pr, arg[0], f, 31 | translate(pr, arg[1], f, s)) 32 | elif tag == 'cond': 33 | q, n, y = arg 34 | if y == q: 35 | yy = KNip(s) 36 | elif y[0] == 'chain' and y[1][0] == q: 37 | yy = KNip(translate(pr, y[1][1], f, s)) 38 | else: 39 | yy = KDrop(translate(pr, y, f, s)) 40 | return KDup(translate(pr, q, translate(pr, n, f, s), yy)) 41 | else: 42 | assert False 43 | 44 | def run(q, defns, vs, cs): 45 | pr = {} 46 | for name, defn in defns.items(): 47 | pr[name] = translate(pr, defn, KFail, KSucceed) 48 | return trampoline(translate(pr, q, KFinalFail, KFinalSucceed), 49 | ((vs,cs,()), ())) 50 | 51 | 52 | # The parsing machine. 53 | # continuation: trail -> result 54 | # trail: () | ((vs,cs,ks), trail) 55 | # vs: tuple of values from semantic actions 56 | # cs: input character sequence (the current tail thereof) 57 | # ks: () | ((fail_cont,success_cont), ks) 58 | 59 | def trampoline(cont, trail): 60 | while cont is not None: 61 | cont, trail = cont(trail) 62 | return trail 63 | 64 | def KFinalFail(trail): 65 | assert trail is () 66 | return None, 'fail' 67 | def KFinalSucceed(((vs,cs,ks), trail)): 68 | assert trail is () 69 | assert ks is () 70 | return None, (vs, cs) 71 | 72 | def KDrop(cont): return lambda (_, trail): (cont, trail) 73 | def KNip(cont): return lambda (entry, (_, trail)): (cont, (entry, trail)) 74 | def KDup(cont): return lambda (entry, trail): (cont, (entry, (entry, trail))) 75 | 76 | def KAlter(fn, s): 77 | return lambda ((vs,cs,ks), trail): (s, ((fn(*vs),cs,ks), trail)) 78 | 79 | def KItem(ok, f, s): 80 | return lambda ((vs,cs,ks), trail): ( 81 | (s, ((vs+(cs[0],), cs[1:], ks), trail)) if cs and ok(cs[0]) 82 | else (f, trail)) 83 | 84 | def KCall(pr, name, f, s): 85 | return lambda ((vs,cs,ks), trail): ( 86 | pr[name], ((vs,cs,((f,s),ks)), trail)) 87 | def KFail(((vs,cs,((fk,_),ks)), trail)): 88 | return fk, trail 89 | def KSucceed(((vs,cs,((_,sk),ks)), trail)): 90 | return sk, ((vs,cs,ks), trail) 91 | 92 | 93 | # Smoke test 94 | 95 | def Lit(c): return Item(lambda c1: c == c1) 96 | def Or(q, r): return Cond(q, r, q) 97 | # XXX is this really equivalent to a 'native' Or. As a tail call, too. 98 | Succeed = Alter(lambda *vals: vals) 99 | 100 | bit = Or(Lit('0'), Lit('1')) 101 | twobits = Chain(bit, bit) 102 | 103 | nbits_defs = {'nbits': Cond(bit, Succeed, Chain(bit, Ref('nbits')))} 104 | 105 | def test(string): 106 | # return run(Lit('0'), (), string) 107 | # return run(bit, (), string) 108 | # return run(twobits, {}, (), string) 109 | return run(Ref('nbits'), nbits_defs, (), string) 110 | 111 | ## test('xy') 112 | #. ((), 'xy') 113 | ## test('01101a') 114 | #. (('0', '1', '1', '0', '1'), 'a') 115 | -------------------------------------------------------------------------------- /peglet_to_parson.py: -------------------------------------------------------------------------------- 1 | """ 2 | Convert a Peglet grammar to a Parson one. 3 | """ 4 | 5 | import re 6 | from parson import Grammar, alter 7 | 8 | name = r'[A-Za-z_]\w*' 9 | 10 | grammar = Grammar(r""" 11 | grammar : _? rule* :end. 12 | rule : name _ '= ' :equ token* :'.' _?. 13 | token : '|' :'|' 14 | | /(\/\w*\/\s)/ 15 | | name !(_ '= ') 16 | | '!' :'!' 17 | | _ !(name _ '= ' | :end) 18 | | !('= '|name) /(\S+)/ :mk_regex. 19 | name : /("""+name+""")/ !!(/\s/ | :end). 20 | _ : /(\s+)/. 21 | """) 22 | def mk_regex(s): return '/' + s.replace('/', '\\/') + '/' 23 | 24 | def peglet_to_parson(text): 25 | nonterminals = set() 26 | def equ(name, space): 27 | nonterminals.add(name) 28 | return name, space, ': ' 29 | g = grammar(equ=alter(equ), mk_regex=mk_regex) 30 | tokens = g.grammar(text) 31 | return ''.join(':'+token if re.match(name+'$', token) and token not in nonterminals 32 | else token 33 | for token in tokens) 34 | 35 | if __name__ == '__main__': 36 | import sys 37 | print peglet_to_parson(sys.stdin.read()) 38 | -------------------------------------------------------------------------------- /pegvm.py: -------------------------------------------------------------------------------- 1 | """ 2 | A PEG parser using explicit control instead of recursion. 3 | Avoids Python stack overflow. 4 | Also a step towards compiling instead of interpreting. 5 | 6 | to do: test this: optimize q==y with Nip 7 | to do: produce code instead of closures 8 | to do: bounce less often 9 | 10 | to do: compare: 11 | (Knuth 1971), Donald Knuth describes an abstract parsing machine. This 12 | machine runs programs in which the instructions either recognize and 13 | consume a token from the input, or call a subroutine to recognize an 14 | instance of a non-terminal. Each instruction has two continuations for 15 | success and failure, and part of the subroutine mechanism is that if a 16 | subroutine returns with failure, then the input pointer is reset to 17 | where it was when the subroutine was called. A subroutine that 18 | returns successfully, however, deletes the record of the old position 19 | of the input pointer. There is a natural translation of context-free 20 | grammars into programs for this machine, which behave exactly like 21 | combinator parsers based on Maybe. 22 | """ 23 | 24 | dbg = 1 25 | 26 | # peg constructors 27 | 28 | Fail = 'fail', () 29 | def Alter(fn): return 'alter', fn 30 | def Item(ok): return 'item', ok 31 | def Ref(name): return 'ref', name 32 | def Chain(q, r): return 'chain', (q, r) 33 | def Cond(q, n, y): return 'cond', (q, n, y) 34 | 35 | 36 | # (install, program, peg, fail_cont, success_cont) -> cont 37 | # where program: dict(string -> cont) 38 | # code: [('operator', (operands,))] 39 | # cont: int -- index into code 40 | def translate(install, pr, peg, f, s): 41 | tag, arg = peg 42 | if tag == 'fail': return install(KDrop, f) 43 | elif tag == 'alter': return install(KAlter, arg, s) 44 | elif tag == 'item': return install(KItem, arg, f, s) 45 | elif tag == 'ref': return install(KCall, pr, arg, f, s) # TODO: how about a jump op for when f is KFail, s is KSucceed? 46 | elif tag == 'chain': return translate(install, pr, arg[0], f, 47 | translate(install, pr, arg[1], f, s)) 48 | elif tag == 'cond': 49 | q, n, y = arg 50 | if y == q: 51 | yy = install(KNip, s) 52 | elif y[0] == 'chain' and y[1][0] == q: 53 | yy = install(KNip, translate(install, pr, y[1][1], f, s)) 54 | else: 55 | yy = install(KDrop, translate(install, pr, y, f, s)) 56 | return install(KDup, translate(install, pr, q, translate(install, pr, n, f, s), yy)) 57 | else: 58 | assert False 59 | 60 | def run(q, defns, vs, cs): 61 | code = [] 62 | def install(*insn): 63 | try: return code.index(insn) 64 | except ValueError: 65 | try: return len(code) 66 | finally: code.append(insn) 67 | pr = {} 68 | entry = translate(install, pr, q, install(KFinalFail), install(KFinalSucceed)) 69 | for name, defn in defns.items(): 70 | pr[name] = translate(install, pr, defn, install(KFail), install(KSucceed)) 71 | if dbg: 72 | for name in sorted(pr.keys()): 73 | print name, '=>', pr[name] 74 | for i, insn in enumerate(code): 75 | print i, show(insn) 76 | return trampoline(entry, code, 77 | ((vs,cs,()), ())) 78 | 79 | def show(insn): 80 | return '%s(%s)' % (insn[0].__name__, 81 | ', '.join(x.__name__ if callable(x) else repr(x) 82 | for x in insn[1:])) 83 | 84 | # The parsing machine. 85 | # continuation: trail -> result 86 | # trail: () | ((vs,cs,ks), trail) 87 | # vs: tuple of values from semantic actions 88 | # cs: input character sequence (the current tail thereof) 89 | # ks: () | ((fail_cont,success_cont), ks) 90 | 91 | def trampoline(pc, code, trail): 92 | while pc is not None: 93 | insn = code[pc] 94 | if dbg: print 'pc', pc, 'insn', show(insn) 95 | pc, trail = insn[0](*insn[1:])(trail) 96 | assert pc is None or isinstance(pc, int), pc 97 | return trail 98 | 99 | def KFinalFail(): 100 | def k(trail): 101 | assert trail is () 102 | return None, 'fail' 103 | return k 104 | def KFinalSucceed(): 105 | def k(((vs,cs,ks), trail)): 106 | assert trail is () 107 | assert ks is () 108 | return None, (vs, cs) 109 | return k 110 | 111 | def KDrop(cont): return lambda (_, trail): (cont, trail) 112 | def KNip(cont): return lambda (entry, (_, trail)): (cont, (entry, trail)) 113 | def KDup(cont): return lambda (entry, trail): (cont, (entry, (entry, trail))) 114 | 115 | def KAlter(fn, s): 116 | return lambda ((vs,cs,ks), trail): (s, ((fn(*vs),cs,ks), trail)) 117 | 118 | def KItem(ok, f, s): 119 | return lambda ((vs,cs,ks), trail): ( 120 | (s, ((vs+(cs[0],), cs[1:], ks), trail)) if cs and ok(cs[0]) 121 | else (f, trail)) 122 | 123 | def KCall(pr, name, f, s): 124 | return lambda ((vs,cs,ks), trail): ( 125 | pr[name], ((vs,cs,((f,s),ks)), trail)) 126 | def KFail(): 127 | return lambda ((vs,cs,((fk,_),ks)), trail): (fk, trail) 128 | def KSucceed(): 129 | return lambda ((vs,cs,((_,sk),ks)), trail): (sk, ((vs,cs,ks), trail)) 130 | 131 | 132 | # Smoke test 133 | 134 | def Lit(c): 135 | def t(c1): return c == c1 136 | t.__name__ = 'eq %r' % c 137 | return Item(t) 138 | def Or(q, r): return Cond(q, r, q) 139 | def identity(*vals): return vals 140 | # XXX is this really equivalent to a 'native' Or. As a tail call, too. 141 | Succeed = Alter(identity) 142 | 143 | bit = Or(Lit('0'), Lit('1')) 144 | twobits = Chain(bit, bit) 145 | 146 | nbits_defs = {'nbits': Cond(bit, Succeed, Chain(bit, Ref('nbits')))} 147 | 148 | def test(string): 149 | # return run(Lit('0'), (), string) 150 | # return run(bit, (), string) 151 | # return run(twobits, {}, (), string) 152 | return run(Ref('nbits'), nbits_defs, (), string) 153 | 154 | ## test('xy') 155 | #. nbits => 12 156 | #. 0 KFinalFail() 157 | #. 1 KFinalSucceed() 158 | #. 2 KCall({'nbits': 12}, 'nbits', 0, 1) 159 | #. 3 KFail() 160 | #. 4 KSucceed() 161 | #. 5 KCall({'nbits': 12}, 'nbits', 3, 4) 162 | #. 6 KNip(5) 163 | #. 7 KAlter(identity, 4) 164 | #. 8 KNip(6) 165 | #. 9 KItem(eq '1', 7, 6) 166 | #. 10 KItem(eq '0', 9, 8) 167 | #. 11 KDup(10) 168 | #. 12 KDup(11) 169 | #. pc 2 insn KCall({'nbits': 12}, 'nbits', 0, 1) 170 | #. pc 12 insn KDup(11) 171 | #. pc 11 insn KDup(10) 172 | #. pc 10 insn KItem(eq '0', 9, 8) 173 | #. pc 9 insn KItem(eq '1', 7, 6) 174 | #. pc 7 insn KAlter(identity, 4) 175 | #. pc 4 insn KSucceed() 176 | #. pc 1 insn KFinalSucceed() 177 | #. ((), 'xy') 178 | ## test('01101a') 179 | #. nbits => 12 180 | #. 0 KFinalFail() 181 | #. 1 KFinalSucceed() 182 | #. 2 KCall({'nbits': 12}, 'nbits', 0, 1) 183 | #. 3 KFail() 184 | #. 4 KSucceed() 185 | #. 5 KCall({'nbits': 12}, 'nbits', 3, 4) 186 | #. 6 KNip(5) 187 | #. 7 KAlter(identity, 4) 188 | #. 8 KNip(6) 189 | #. 9 KItem(eq '1', 7, 6) 190 | #. 10 KItem(eq '0', 9, 8) 191 | #. 11 KDup(10) 192 | #. 12 KDup(11) 193 | #. pc 2 insn KCall({'nbits': 12}, 'nbits', 0, 1) 194 | #. pc 12 insn KDup(11) 195 | #. pc 11 insn KDup(10) 196 | #. pc 10 insn KItem(eq '0', 9, 8) 197 | #. pc 8 insn KNip(6) 198 | #. pc 6 insn KNip(5) 199 | #. pc 5 insn KCall({'nbits': 12}, 'nbits', 3, 4) 200 | #. pc 12 insn KDup(11) 201 | #. pc 11 insn KDup(10) 202 | #. pc 10 insn KItem(eq '0', 9, 8) 203 | #. pc 9 insn KItem(eq '1', 7, 6) 204 | #. pc 6 insn KNip(5) 205 | #. pc 5 insn KCall({'nbits': 12}, 'nbits', 3, 4) 206 | #. pc 12 insn KDup(11) 207 | #. pc 11 insn KDup(10) 208 | #. pc 10 insn KItem(eq '0', 9, 8) 209 | #. pc 9 insn KItem(eq '1', 7, 6) 210 | #. pc 6 insn KNip(5) 211 | #. pc 5 insn KCall({'nbits': 12}, 'nbits', 3, 4) 212 | #. pc 12 insn KDup(11) 213 | #. pc 11 insn KDup(10) 214 | #. pc 10 insn KItem(eq '0', 9, 8) 215 | #. pc 8 insn KNip(6) 216 | #. pc 6 insn KNip(5) 217 | #. pc 5 insn KCall({'nbits': 12}, 'nbits', 3, 4) 218 | #. pc 12 insn KDup(11) 219 | #. pc 11 insn KDup(10) 220 | #. pc 10 insn KItem(eq '0', 9, 8) 221 | #. pc 9 insn KItem(eq '1', 7, 6) 222 | #. pc 6 insn KNip(5) 223 | #. pc 5 insn KCall({'nbits': 12}, 'nbits', 3, 4) 224 | #. pc 12 insn KDup(11) 225 | #. pc 11 insn KDup(10) 226 | #. pc 10 insn KItem(eq '0', 9, 8) 227 | #. pc 9 insn KItem(eq '1', 7, 6) 228 | #. pc 7 insn KAlter(identity, 4) 229 | #. pc 4 insn KSucceed() 230 | #. pc 4 insn KSucceed() 231 | #. pc 4 insn KSucceed() 232 | #. pc 4 insn KSucceed() 233 | #. pc 4 insn KSucceed() 234 | #. pc 4 insn KSucceed() 235 | #. pc 1 insn KFinalSucceed() 236 | #. (('0', '1', '1', '0', '1'), 'a') 237 | -------------------------------------------------------------------------------- /setup.py: -------------------------------------------------------------------------------- 1 | from distutils.core import setup 2 | 3 | version = '0.1.0dev' 4 | 5 | setup(name = 'Parson', 6 | version = version, 7 | author = 'Darius Bacon', 8 | author_email = 'darius@wry.me', 9 | py_modules = ['parson'], 10 | url = 'https://github.com/darius/parson', 11 | description = "A fancier parsing package.", # XXX 12 | long_description = open('README.md').read(), 13 | license = 'GNU General Public License (GPL)', 14 | classifiers = [ 15 | 'Development Status :: 4 - Beta', 16 | 'Intended Audience :: Developers', 17 | 'Intended Audience :: Education', 18 | 'License :: OSI Approved :: GNU General Public License (GPL)', 19 | 'Natural Language :: English', 20 | 'Operating System :: OS Independent', 21 | 'Programming Language :: Python :: 2.6', 22 | 'Topic :: Software Development :: Interpreters', 23 | 'Topic :: Software Development :: Libraries :: Python Modules', 24 | 'Topic :: Text Processing', 25 | ], 26 | keywords = 'parse,parser,parsing,peg,packrat,regex,grammar', 27 | ) 28 | -------------------------------------------------------------------------------- /structs.py: -------------------------------------------------------------------------------- 1 | """ 2 | Define a named-tuple-like type, but simpler. 3 | Also Visitor to dispatch on datatypes defined this way. 4 | This module is only used for examples, at least currently. 5 | """ 6 | 7 | # TODO figure out how to use __slots__ 8 | 9 | def Struct(field_names, name=None, supertype=(object,)): 10 | if isinstance(field_names, (str, unicode)): 11 | field_names = tuple(field_names.split()) 12 | 13 | if name is None: 14 | name = 'Struct<%s>' % ','.join(field_names) 15 | def get_name(self): return self.__class__.__name__ 16 | else: 17 | def get_name(self): return name 18 | 19 | def __init__(self, *args): 20 | if len(field_names) != len(args): 21 | raise TypeError("%s takes %d arguments (%d given)" 22 | % (get_name(self), len(field_names), len(args))) 23 | self.__dict__.update(zip(field_names, args)) 24 | 25 | def __repr__(self): 26 | return '%s(%s)' % (get_name(self), ', '.join(repr(getattr(self, f)) 27 | for f in field_names)) 28 | 29 | # (for use with pprint) 30 | def my_as_sexpr(self): # XXX better name? 31 | return (get_name(self),) + tuple(as_sexpr(getattr(self, f)) 32 | for f in field_names) 33 | my_as_sexpr.__name__ = 'as_sexpr' 34 | 35 | return type(name, 36 | supertype, 37 | dict(__init__=__init__, 38 | __repr__=__repr__, 39 | as_sexpr=my_as_sexpr)) 40 | 41 | def as_sexpr(obj): 42 | if hasattr(obj, 'as_sexpr'): 43 | return getattr(obj, 'as_sexpr')() 44 | elif isinstance(obj, list): 45 | return map(as_sexpr, obj) 46 | elif isinstance(obj, tuple): 47 | return tuple(map(as_sexpr, obj)) 48 | else: 49 | return obj 50 | 51 | 52 | # Is there a nicer way to do this? 53 | 54 | class Visitor(object): 55 | def __call__(self, subject, *args): 56 | tag = subject.__class__.__name__ 57 | method = getattr(self, tag, None) 58 | if method is None: 59 | method = getattr(self, 'default') 60 | return method(subject, *args) 61 | -------------------------------------------------------------------------------- /testsmoke.py: -------------------------------------------------------------------------------- 1 | """ 2 | Smoke test for parson 3 | """ 4 | 5 | from parson import * 6 | 7 | # Smoke test: combinators 8 | 9 | ## empty 10 | #. empty 11 | ## fail.attempt('hello') 12 | ## empty('hello') 13 | #. () 14 | ## match(r'(x)').attempt('hello') 15 | ## match(r'(h)')('hello') 16 | #. ('h',) 17 | 18 | ## (match(r'(H)') | match('(.)'))('hello') 19 | #. ('h',) 20 | ## (match(r'(h)') + match('(.)'))('hello') 21 | #. ('h', 'e') 22 | 23 | ## (match(r'h(e)') + match(r'(.)'))('hello') 24 | #. ('e', 'l') 25 | ## (~match(r'h(e)') + match(r'(.)'))('xhello') 26 | #. ('x',) 27 | 28 | ## empty.run('', [0], (0, ())) 29 | #. [(0, ())] 30 | ## chain(empty, empty)('') 31 | #. () 32 | 33 | ## (match(r'(.)') >> hug)('hello') 34 | #. (('h',),) 35 | 36 | ## match(r'(.)').star()('') 37 | #. () 38 | 39 | ## (match(r'(.)').star())('hello') 40 | #. ('h', 'e', 'l', 'l', 'o') 41 | 42 | ## (match(r'(.)').star() >> join)('hello') 43 | #. ('hello',) 44 | 45 | 46 | # Example 47 | 48 | def make_var(v): return v 49 | def make_lam(v, e): return '(lambda (%s) %s)' % (v, e) 50 | def make_app(e1, e2): return '(%s %s)' % (e1, e2) 51 | def make_let(v, e1, e2): return '(let ((%s %s)) %s)' % (v, e1, e2) 52 | 53 | eof = match(r'$') 54 | _ = match(r'\s*') 55 | identifier = match(r'([A-Za-z_]\w*)\s*') 56 | 57 | def test1(): 58 | V = identifier 59 | E = delay(lambda: 60 | V >> make_var 61 | | '\\' +_+ V + '.' +_+ E >> make_lam 62 | | '(' +_+ E + E + ')' +_ >> make_app) 63 | start = _+ E #+ eof 64 | return lambda s: start(s)[0] 65 | 66 | ## test1()('x y') 67 | #. 'x' 68 | ## test1()(r'\x.x') 69 | #. '(lambda (x) x)' 70 | ## test1()('(x x)') 71 | #. '(x x)' 72 | 73 | 74 | def test2(string): 75 | V = identifier 76 | F = delay(lambda: 77 | V >> make_var 78 | | '\\' +_+ V.plus() + hug + '.' +_+ E >> fold_lam 79 | | '(' +_+ E + ')' +_) 80 | E = F + F.star() >> fold_app 81 | start = _+ E 82 | 83 | vals = start.attempt(string) 84 | return vals and vals[0] 85 | 86 | def fold_app(f, *fs): return reduce(make_app, fs, f) 87 | def fold_lam(vp, e): return foldr(make_lam, e, vp) 88 | 89 | def foldr(f, z, xs): 90 | for x in reversed(xs): 91 | z = f(x, z) 92 | return z 93 | 94 | ## test2('x') 95 | #. 'x' 96 | ## test2('\\x.x') 97 | #. '(lambda (x) x)' 98 | ## test2('(x x)') 99 | #. '(x x)' 100 | 101 | ## test2('hello') 102 | #. 'hello' 103 | ## test2(' x') 104 | #. 'x' 105 | ## test2('\\x . y ') 106 | #. '(lambda (x) y)' 107 | ## test2('((hello world))') 108 | #. '(hello world)' 109 | 110 | ## test2(' hello ') 111 | #. 'hello' 112 | ## test2('hello there hi') 113 | #. '((hello there) hi)' 114 | ## test2('a b c d e') 115 | #. '((((a b) c) d) e)' 116 | 117 | ## test2('') 118 | ## test2('x x . y') 119 | #. '(x x)' 120 | ## test2('\\.x') 121 | ## test2('(when (in the)') 122 | ## test2('((when (in the)))') 123 | #. '(when (in the))' 124 | 125 | ## test2('\\a.a') 126 | #. '(lambda (a) a)' 127 | 128 | ## test2(' \\hello . (hello)x \t') 129 | #. '(lambda (hello) (hello x))' 130 | 131 | ## test2('\\M . (\\f . M (f f)) (\\f . M (f f))') 132 | #. '(lambda (M) ((lambda (f) (M (f f))) (lambda (f) (M (f f)))))' 133 | 134 | ## test2('\\a b.a') 135 | #. '(lambda (a) (lambda (b) a))' 136 | 137 | ## test2('\\a b c . a b') 138 | #. '(lambda (a) (lambda (b) (lambda (c) (a b))))' 139 | 140 | 141 | # Smoke test: grammars 142 | 143 | ## exceptionally(lambda: Grammar(r"a = . b = a. a = .")()) 144 | #. GrammarError('Multiply-defined rules: a',) 145 | 146 | ## exceptionally(lambda: Grammar(r"a = b|c|d. c = .")()) 147 | #. GrammarError('Undefined rules: b, d',) 148 | 149 | ## exceptionally(lambda: Grammar(r"a = ")()) 150 | #. GrammarError('Bad grammar', ('a = ', '')) 151 | 152 | pushy = Grammar(r""" 153 | main: :'x'. 154 | """)() 155 | ## pushy.main('') 156 | #. ('x',) 157 | 158 | nums = Grammar(r""" 159 | # This is a comment. 160 | main : nums !/./. # So's this. 161 | nums : num ** ','. 162 | num : /([0-9]+)/ :int. 163 | """)() 164 | sum_nums = lambda s: sum(nums.main(s)) 165 | 166 | ## sum_nums('10,30,43') 167 | #. 83 168 | ## nums.nums('10,30,43') 169 | #. (10, 30, 43) 170 | ## nums.nums('') 171 | #. () 172 | ## nums.num('10,30,43') 173 | #. (10,) 174 | 175 | ## nums.main('10,30,43') 176 | #. (10, 30, 43) 177 | ## nums.main.attempt('10,30,43 xxx') 178 | 179 | 180 | gsub_grammar = Grammar(r""" 181 | gsub = [:p :replace | /(.)/]*. 182 | """) 183 | def gsub(text, p, replacement): 184 | g = gsub_grammar(p=p, replace=lambda: replacement) 185 | return ''.join(g.gsub(text)) 186 | ## gsub('hi there WHEEWHEE to you WHEEEE', 'WHEE', 'GLARG') 187 | #. 'hi there GLARGGLARG to you GLARGEE' 188 | 189 | 190 | def catch_position(parse, string): 191 | try: parse(string) 192 | except Unparsable, e: 193 | print e.position 194 | 195 | ## catch_position(Grammar(r" 'x'* /$/ ")(), 'xxxhi') 196 | #. 3 197 | 198 | 199 | # Like test2, but in the grammar syntax and using immediate actions 200 | # instead of folds: 201 | test3_grammar = Grammar(r""" 202 | start: FNORD E. 203 | E: F (F :make_app)*. 204 | F: V :make_var 205 | | '\\' Lam 206 | | '(' E ')'. 207 | Lam: V ('.' E | Lam) :make_lam. 208 | V: /([A-Za-z]+)/. 209 | FNORD ~: /\s*/. 210 | """) 211 | test3 = test3_grammar(**globals()).start.expecting_one_result() 212 | 213 | # Same checks as test2: 214 | 215 | ## test3('x') 216 | #. 'x' 217 | ## test3('\\x.x') 218 | #. '(lambda (x) x)' 219 | ## test3('(x x)') 220 | #. '(x x)' 221 | 222 | ## test3('hello') 223 | #. 'hello' 224 | ## test3(' x') 225 | #. 'x' 226 | ## test3('\\x . y ') 227 | #. '(lambda (x) y)' 228 | ## test3('((hello world))') 229 | #. '(hello world)' 230 | 231 | ## test3(' hello ') 232 | #. 'hello' 233 | ## test3('hello there hi') 234 | #. '((hello there) hi)' 235 | ## test3('a b c d e') 236 | #. '((((a b) c) d) e)' 237 | 238 | ## test3.attempt('') 239 | ## test3('x x . y') 240 | #. '(x x)' 241 | ## test3.attempt('\\.x') 242 | ## test3.attempt('(when (in the)') 243 | ## test3('((when (in the)))') 244 | #. '(when (in the))' 245 | 246 | ## test3('\\a.a') 247 | #. '(lambda (a) a)' 248 | 249 | ## test3(' \\hello . (hello)x \t') 250 | #. '(lambda (hello) (hello x))' 251 | 252 | ## test3('\\M . (\\f . M (f f)) (\\f . M (f f))') 253 | #. '(lambda (M) ((lambda (f) (M (f f))) (lambda (f) (M (f f)))))' 254 | 255 | ## test3('\\a b.a') 256 | #. '(lambda (a) (lambda (b) a))' 257 | 258 | ## test3('\\a b c . a b') 259 | #. '(lambda (a) (lambda (b) (lambda (c) (a b))))' 260 | -------------------------------------------------------------------------------- /treepeg.py: -------------------------------------------------------------------------------- 1 | """ 2 | Exploring making tree parsing central. 3 | """ 4 | 5 | import re 6 | 7 | 8 | # Some derived combinators 9 | 10 | def invert(p): return cond(p, fail, succeed) 11 | def either(p, q): return cond(p, p, q) 12 | def both(p, q): return cond(p, q, fail) 13 | 14 | def feed(p, f): return alter(p, lambda *vals: (f(*vals),)) 15 | 16 | def maybe(p): return either(p, succeed) 17 | def plus(p): return recur(lambda p_plus: chain(p, maybe(p_plus))) 18 | def star(p): return maybe(plus(p)) 19 | 20 | def recur(fn): 21 | p = delay(lambda: fn(p)) 22 | return p 23 | 24 | 25 | # Peg objects 26 | 27 | def Peg(x): 28 | if isinstance(x, _Peg): return x 29 | # if isinstance(x, (str, unicode)): return literal(x) 30 | if callable(x): return satisfying(x) 31 | raise ValueError("Not a Peg", x) 32 | 33 | class _Peg(object): 34 | def __init__(self, run): 35 | self.run = run 36 | 37 | def __call__(self, sequence): 38 | for vals, _ in self.run(sequence): 39 | return vals 40 | return None 41 | 42 | def __add__(self, other): return chain(self, Peg(other)) 43 | def __radd__(self, other): return chain(Peg(other), self) 44 | def __or__(self, other): return either(self, Peg(other)) 45 | def __ror__(self, other): return either(Peg(other), self) 46 | 47 | __rshift__ = feed 48 | __invert__ = invert 49 | maybe = maybe 50 | plus = plus 51 | star = star 52 | 53 | 54 | # Basic combinators 55 | 56 | nil = ['nil'] 57 | 58 | fail = _Peg(lambda s: []) 59 | succeed = _Peg(lambda s: [((), s)]) 60 | 61 | ## anything('hi') 62 | #. ('hi',) 63 | ## chain(anything, succeed)('hi') 64 | #. ('hi',) 65 | 66 | def cond(p, q, r): 67 | def run(s): 68 | pv = p.run(s) 69 | choice = q if pv else r 70 | if choice is p: return pv # (an optimization) 71 | else: return choice.run(s) 72 | return _Peg(run) 73 | 74 | def satisfying(ok): 75 | "Eat a subject s when ok(s), producing (s,)." 76 | return _Peg(lambda s: [((s,), nil)] if s is not nil and ok(s) else []) 77 | 78 | def chain(p, q): 79 | return _Peg(lambda s: [(pvals + qvals, qnub) 80 | for pvals, pnub in p.run(s) 81 | for qvals, qnub in q.run(pnub)]) 82 | 83 | def alter(p, f): 84 | return _Peg(lambda s: [(f(*vals), nub) 85 | for vals, nub in p.run(s)]) 86 | 87 | def delay(thunk): 88 | def run(s): 89 | q.run = Peg(thunk()).run 90 | return q.run(s) 91 | q = _Peg(run) 92 | return q 93 | 94 | def item(p): 95 | "Eat the first item of a sequence, iff p matches it." 96 | def run(s): 97 | if s is nil: return [] 98 | try: first = s[0] 99 | except IndexError: return [] 100 | except TypeError: return [] 101 | except KeyError: return [] 102 | return [(vals, s[1:]) for vals, _ in p.run(first)] 103 | return _Peg(run) 104 | 105 | def match(regex, flags=0): 106 | compiled = re.compile(regex, flags) 107 | return _Peg(lambda s: 108 | [] if s is nil 109 | else [(m.groups(), s[m.end():]) 110 | for m in [compiled.match(s)] if m]) 111 | 112 | def capture(p): 113 | def run(s): 114 | for vals, nub in p.run(s): 115 | # XXX use the position change instead, once we're tracking that: 116 | if s is not nil and nub is not nil: 117 | i = len(s) - len(nub) 118 | if s[i:] == nub: 119 | return [((s[:i],), nub)] 120 | raise Exception("Bad capture") 121 | return [] 122 | return _Peg(run) 123 | 124 | ## capture(match('h..') + match('.'))('hi there') 125 | #. ('hi t',) 126 | ## capture(item(anything) + item(anything))([3]) 127 | ## capture(item(anything) + item(anything))([3, 1]) 128 | #. ([3, 1],) 129 | 130 | 131 | # More derived combinators 132 | 133 | ## startswith('hi')('hi there') 134 | #. () 135 | 136 | def startswith(s): return match(re.escape(s)) 137 | 138 | anything = satisfying(lambda s: True) 139 | def literal(c): return drop(satisfying(lambda s: c == s)) 140 | def drop(p): return alter(p, lambda *vals: ()) 141 | 142 | end = invert(item(anything)) # Hmmm 143 | 144 | def an_instance(type_): 145 | return satisfying(lambda x: isinstance(x, type_)) 146 | 147 | def alt(*ps): 148 | if not ps: return fail 149 | if not ps[1:]: return ps[0] 150 | return either(ps[0], alt(*ps[1:])) 151 | 152 | def items(*ps): 153 | if not ps: return end 154 | return chain(item(ps[0]), items(*ps[1:])) 155 | 156 | def seq(*ps): 157 | if not ps: return succeed 158 | return chain(ps[0], seq(*ps[1:])) 159 | 160 | give = lambda c: feed(succeed, lambda: c) 161 | 162 | 163 | # Examples 164 | 165 | from operator import * 166 | 167 | ## fail(42) 168 | ## anything(42) 169 | #. (42,) 170 | ## chain(item(literal(5)), item(literal(0)))([5, 0, 2]) 171 | #. () 172 | ## an_instance(int)(42) 173 | #. (42,) 174 | 175 | calc = delay(lambda: 176 | alt(feed(items(literal('+'), calc, calc), add), 177 | feed(items(literal('-'), calc, calc), sub), 178 | an_instance(int))) 179 | 180 | ## calc(42) 181 | #. (42,) 182 | ## calc(['-', 3, 1]) 183 | #. (2,) 184 | ## calc(['+', ['-', 2, 1], 3]) 185 | #. (4,) 186 | 187 | singleton = lambda v: (v,) 188 | cat = lambda *lists: sum(lists, ()) 189 | flatten1 = delay(lambda: 190 | alt(seq(item(literal('+')), star(item(flatten1)), end), 191 | an_instance(int))) 192 | 193 | ## flatten1(['+', ['+', ['+', 1, ['+', 2]]]]) 194 | #. (1, 2) 195 | ## flatten1(42) 196 | #. (42,) 197 | ## flatten1(['+']) 198 | #. () 199 | ## flatten1(['+', 42]) 200 | #. (42,) 201 | ## flatten1(['+', 42, 43]) 202 | #. (42, 43) 203 | 204 | ## chain(item(literal('+')), anything)(['+', 42]) 205 | #. ([42],) 206 | 207 | ## star(item(anything))([1,2,3]) 208 | #. (1, 2, 3) 209 | 210 | ## star(match('hi() '))('hi hi hi there') 211 | #. ('', '', '') 212 | --------------------------------------------------------------------------------