├── .gitignore
├── LICENSE.md
├── README.md
├── chartparser.py
├── example_0.py
├── example_1.py
├── example_IF.py
└── visualizer.py


/.gitignore:
--------------------------------------------------------------------------------
1 | *.pyc
2 | 


--------------------------------------------------------------------------------
/LICENSE.md:
--------------------------------------------------------------------------------
 1 | The MIT License (MIT)
 2 | 
 3 | Copyright (c) 2016 Henri Tuhola
 4 | 
 5 | Permission is hereby granted, free of charge, to any person obtaining a copy
 6 | of this software and associated documentation files (the "Software"), to deal
 7 | in the Software without restriction, including without limitation the rights
 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
 9 | copies of the Software, and to permit persons to whom the Software is
10 | furnished to do so, subject to the following conditions:
11 | 
12 | The above copyright notice and this permission notice shall be included in all
13 | copies or substantial portions of the Software.
14 | 
15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21 | SOFTWARE.
22 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
 1 | # Chart parser
 2 | 
 3 | This parser will soon replace the custom one in lever/, until then it has not been throroughly tested. Early users beware for bugs.
 4 | 
 5 | Imagine you'd have to make sense of some input, commonly an user interaction. You get a sequence of symbols and have to recognize some structure for it. This is when you need a parsing library. Chart parsers are straightforward but potentially resource consumptive tool for this. For small inputs or simple languages, they're often sufficient.
 6 | 
 7 | Chart parser takes in a Context-free grammar and the input sequence.
 8 | 
 9 | Here's how to input a context-free grammar that the chartparser can read:
10 | 
11 |     from chartparser import NonTerminal, Terminal, Rule
12 | 
13 |     s, a = Nonterminal('s'), Nonterminal('a')
14 |     x = Terminal('x')
15 |     grammar = [
16 |         Rule(s, [s, a]),
17 |         Rule(s, []),
18 |         Rule(a, [x]),
19 |     ]
20 | 
21 | Note that you can pass a third parameter to Rule, that will be bound to `annotation`, how you use this field is entirely up to you.
22 | 
23 | Chart parsers are not usually preprocessing-free, but the preprocessing tends to happen in such short time that nobody worries about it. Here's how to form a parser:
24 | 
25 |     from chartparser preprocess
26 | 
27 |     parser = preprocess(grammar, accept=s)()
28 | 
29 | Note that `preprocess` gives a function that you can call to instantiate a parser. This is done so you can parse many inputs with single preprocessing step.
30 | 
31 | Attention: Many chart parsers can allow you to select the symbol when you start parsing, rather than when you preprocess the grammar. You are required to supply the default symbol in `preprocess`. But this is also acceptable:
32 | 
33 |     new_parser = preprocess(grammar, accept=s)
34 |     parser = new_parser(accept=a)
35 | 
36 | If this is not possible on some other variation of this library, the `new_parser()` takes a liberty to crash on argument error.
37 | 
38 | The parser contains several interfaces that you probably need for your usecase. The first one is a `.step`, and you use it to give the input into the parser. The first argument is the symbol this item represents. It can be both Terminal and Nonterminal. The second is the token put into this position. Here's an example that uses makeshift input:
39 | 
40 |     terminals = {"x": x}
41 |     input_string = "xxxxxx"
42 |     for token in input_string:
43 |         parser.step(terminals[token], token)
44 | 
45 | Additionally you have following interfaces that are available during parsing:
46 | 
47 | * `parser.accepted` tells whether the input so far can be accepted and traversed.
48 | * `parser.expect` contains terminals and nonterminals that when `.step` next will result in a non-empty parsing state.
49 | * `parser.expecting(x)` can be used to query whether parsing can `.step` with the given symbol.
50 | 
51 | Finally when you've finished your input, you want to make sense of it by `.traverse`. Here's how to do it:
52 | 
53 |     def postorder_call(rule, rhs):
54 |         return '(' + ' '.join(rhs) + ')'
55 |     def blank(symbol):
56 |         return ''
57 |     def ambiguous(sppf):
58 |         for p in sppf:
59 |             print p
60 |         raise Exception("ambiguous parse")
61 |         # This may also choose one interpretation among ambiguous results.
62 |         # in that case return one of 'p' from this function.
63 |     result = parser.traverse(postorder_call, blank, ambiguous)
64 | 
65 | Note that blank rules will never reach postorder_call. Instead the `blank` is called with the appropriate symbol. This is done so the chart parser doesn't need to produce permutations of the empty rules, if they appear. If you rather want to derive empty objects from the blank rules, you can obtain `.nullable` and `.blankset` during preprocessing:
66 | 
67 |     new_parser = preprocess(grammar, accept=s)
68 |     print new_parser.blankset
69 |     print new_parser.nullable
70 | 
71 | ## Supported
72 | 
73 | I make a promise to respond on support issues as long as I keep writing python code. I also try to keep the interface described in this readme mostly unchanged from what it is now. I did lot of care to get it this clean.
74 | 
75 | The parsing library exposes lot of other details you may need necessary to use. For example, my single traversing approach might not satisfy every urge. Those details may change in future.
76 | 
77 | ## Origins
78 | 
79 | This module is using [the work of Jeffrey Kegler](http://jeffreykegler.github.io/Marpa-web-site/). He introduced these concepts to me, and the papers about Marpa helped me to refine an adaptation of the parser on python.
80 | 
81 | Before I got to read how to write a practical parser of this kind, I found it worthwhile to read [loup's explanation on earley parsing](http://loup-vaillant.fr/tutorials/earley-parsing/), that is worthwhile to mention.
82 | 


--------------------------------------------------------------------------------
/chartparser.py:
--------------------------------------------------------------------------------
  1 | def main():
  2 |     s = Nonterminal('s')
  3 |     a = Nonterminal('a')
  4 |     b = Nonterminal('b')
  5 |     x = Terminal('x')
  6 | 
  7 |     terminals = {"x": x}
  8 | 
  9 |     accept = s
 10 |     user_grammar = [
 11 |         Rule(s, [s, a]),
 12 |         Rule(s, []),
 13 |         Rule(a, [x]),
 14 |     ]
 15 | 
 16 |     parser = preprocess(user_grammar, accept)()
 17 |     input_string = "xxxxxx"
 18 |     for token in input_string:
 19 |         parser.step(terminals[token], token)
 20 |     print parser.accepted, parser.expect, parser.expecting(x)
 21 |     print parser.traverse(lambda x, a: '(' + ' '.join(a) + ')', lambda x: "")
 22 | 
 23 | def preprocess(user_grammar, accept):
 24 |     nullable = find_nullable(user_grammar)
 25 |     grammar = {}
 26 |     blankset = {}
 27 |     for rule in build_nnf(user_grammar, nullable):
 28 |         if len(rule.rhs) == 0:
 29 |             try:
 30 |                 blankset[rule.lhs].append(rule.annotation.rule)
 31 |             except KeyError as k:
 32 |                 blankset[rule.lhs] = [rule.annotation.rule]
 33 |         else:
 34 |             try:
 35 |                 grammar[rule.lhs].append(rule)
 36 |             except KeyError as k:
 37 |                 grammar[rule.lhs] = [rule]
 38 |     def new_parser(accept=accept):
 39 |         parser = Parser(grammar, accept, [])
 40 |         # In an earley parser that uses NNF, empty input is a special case, that is taken care of here.
 41 |         if accept in nullable:
 42 |             for rule in user_grammar:
 43 |                 if rule.lhs == accept and all(x in nullable for x in rule.rhs):
 44 |                     parser.output.append(Rule(accept, [], NNF(rule, [False for x in rule.rhs])))
 45 |         # The first chart column
 46 |         transitions = {}
 47 |         nodes = {}
 48 |         current = []
 49 |         prediction(current, nodes, grammar, 0, accept)
 50 |         for eim in current:
 51 |             prediction(current, nodes, grammar, 0, eim.postdot())
 52 |             cache_transitions(transitions, eim, None)
 53 |         parser.chart.append(transitions)
 54 |         return parser
 55 |     new_parser.blankset = blankset
 56 |     new_parser.nullable = nullable
 57 |     return new_parser
 58 | 
 59 | def default_ambiguity_resolution(sppf):
 60 |     raise Exception(sppf)
 61 | 
 62 | class Parser(object):
 63 |     def __init__(self, grammar, accept, output):
 64 |         self.chart = []
 65 |         self.grammar = grammar
 66 |         self.accept = accept
 67 |         self.output = output
 68 | 
 69 |     def step(self, term, token):
 70 |         # completions proceed in non-deterministic manner, until
 71 |         # everything has been completed.
 72 |         current = []
 73 |         transitions = {}
 74 |         nodes = {}
 75 |         location = len(self.chart)
 76 |         output = []
 77 | 
 78 |         bottom = SPPF(location-1, location, token, None)
 79 |         for eim, bb in self.chart[location-1][term]:
 80 |             shift_eim(current, nodes, eim, location, bb, bottom)
 81 |         for eim in current:
 82 |             # reduction
 83 |             cc = nodes[eim]
 84 |             if eim.is_completed():
 85 |                 for before, bb in self.chart[eim.origin].get(eim.rule.lhs, ()):
 86 |                     shift_eim(current, nodes, before, location, bb, cc)
 87 |                 if eim.rule.lhs == self.accept and eim.origin == 0:
 88 |                     output.append(cc)
 89 |             prediction(current, nodes, self.grammar, location, eim.postdot())
 90 |             cache_transitions(transitions, eim, cc)
 91 |         self.chart.append(transitions)
 92 |         self.output = output
 93 | 
 94 |     @property
 95 |     def accepted(self):
 96 |         return len(self.output) > 0
 97 | 
 98 |     @property
 99 |     def expect(self):
100 |         return self.chart[-1].keys()
101 | 
102 |     def expecting(self, symbol):
103 |         return symbol in self.chart[-1]
104 | 
105 |     def traverse(self, postorder_callback, blank_callback, resolve_ambiguity=default_ambiguity_resolution):
106 |         if len(self.output) > 1:
107 |             sppf = resolve_ambiguity(None, self.output)
108 |         else:
109 |             sppf = self.output[0]
110 |         return traverse_sppf(sppf, postorder_callback, blank_callback, resolve_ambiguity)
111 | 
112 | def prediction(current, nodes, grammar, location, postdot):
113 |     if isinstance(postdot, Nonterminal):
114 |         for rule in grammar.get(postdot, ()):
115 |             eim = EIM(rule, 0, location)
116 |             if not eim in nodes:
117 |                 nodes[eim] = None
118 |                 current.append(eim)
119 | 
120 | def cache_transitions(transitions, eim, cc):
121 |     postdot = eim.postdot()
122 |     if not eim.is_completed():
123 |         try:
124 |             transitions[postdot].append((eim, cc))
125 |         except KeyError as k:
126 |             transitions[postdot] = [(eim, cc)]
127 | 
128 | def shift_eim(current, nodes, eim, location, bb, cc):
129 |     eim = eim.next()
130 |     try:
131 |         sppf = nodes[eim]
132 |         sppf.insert(bb, cc)
133 |     except KeyError as k:
134 |         assert eim.pos != 0
135 |         nodes[eim] = sppf = SPPF(eim.origin, location, eim.rule, Link(bb, cc))
136 |         current.append(eim)
137 | 
138 | def build_nnf(grammar, nullable):
139 |     for rule in grammar:
140 |         order = sum(x in nullable for x in rule.rhs)
141 |         for i in range(1 << order):
142 |             yield nihilist_rule(rule, i, nullable)
143 | 
144 | def nihilist_rule(rule, index, nullable):
145 |     present = []
146 |     rhs = []
147 |     for symbol in rule.rhs:
148 |         shift = True
149 |         if symbol in nullable:
150 |             if index & 1 == 0:
151 |                 shift = False
152 |             index >>= 1
153 |         present.append(shift)
154 |         if shift:
155 |             rhs.append(symbol)
156 |     return Rule(rule.lhs, rhs, NNF(rule, present))
157 | 
158 | def detect_right_recursion(grammar):
159 |     edges = []
160 |     for rule in grammar:
161 |         right = rule.rhs[-1] if len(rule.rhs) > 0 else None
162 |         row = []
163 |         for other in grammar:
164 |             row.append(other.lhs == right)
165 |         edges.append(row)
166 |     warshall_transitive_closure(edges)
167 |     return set(rule for i, rule in enumerate(grammar) if edges[i][i])
168 | 
169 | def warshall_transitive_closure(a):
170 |     n = len(a)
171 |     for k in range(n):
172 |         for i in range(n):
173 |             if not a[i][k]:
174 |                 continue
175 |             for j in range(n):
176 |                 if not a[k][j]:
177 |                     continue
178 |                 a[i][j] = True
179 |     return a
180 | 
181 | def find_nullable(grammar):
182 |     nullable = set()
183 |     queue = []
184 |     def new_nullable(symbol):
185 |         if symbol not in nullable:
186 |             nullable.add(symbol)
187 |             queue.append(symbol)
188 | 
189 |     inverse_lookup = {}
190 |     def new_lookup(index, symbol):
191 |         if symbol in inverse_lookup:
192 |             inverse_lookup[symbol].append(index)
193 |         else:
194 |             inverse_lookup[symbol] = [index]
195 | 
196 |     nonterminals = []
197 |     nonnullables = []
198 | 
199 |     for rule in grammar:
200 |         if len(rule) == 0:
201 |             new_nullable(rule.lhs)
202 |         elif all(isinstance(x, Nonterminal) for x in rule.rhs):
203 |             index = len(nonnullables)
204 |             for x in rule.rhs:
205 |                 if x != rule.lhs:
206 |                     new_lookup(index, x)
207 |             nonterminals.append(rule.lhs)
208 |             nonnullables.append(sum(x != rule.lhs for x in rule.rhs))
209 | 
210 |     for n in queue:
211 |         for i in inverse_lookup.get(n, ()):
212 |             nonnullables[i] -= 1
213 |             if nonnullables[i] == 0:
214 |                 new_nullable(nonterminals[i])
215 | 
216 |     return nullable
217 | 
218 | def traverse_sppf(sppf, postorder_callback, blank_callback, resolve_ambiguity):
219 |     rcount = 1
220 |     sstack = []
221 |     rstack = []
222 |     stack = [sppf]
223 |     while len(stack) > 0:
224 |         sppf = stack.pop()
225 |         if sppf.is_leaf():
226 |             sstack.append(sppf.cell)
227 |             rcount -= 1
228 |         else:
229 |             result = sppf.single()
230 |             if result is None:
231 |                 result = resolve_ambiguity(sppf)
232 |             rstack.append((rcount - 1, len(result), sppf))
233 |             rcount = len(result)
234 |             stack.extend(reversed(result))
235 |         while rcount == 0 and len(rstack) > 0:
236 |             rcount, rlen, sppf = rstack.pop(-1)
237 |             rule, args = expand(sppf.cell, blank_callback, (sstack.pop(i-rlen) for i in range(rlen)))
238 |             sstack.append(postorder_callback(rule, args))
239 |     assert len(sstack) == 1
240 |     return sstack[0]
241 | 
242 | def expand(cell, blank_callback, seq):
243 |     if isinstance(cell.annotation, NNF):
244 |         nnf = cell.annotation
245 |         result = []
246 |         for i, p in enumerate(nnf.present):
247 |             if p:
248 |                 result.append(seq.next())
249 |             else:
250 |                 result.append(blank_callback(nnf.rule.rhs[i]))
251 |         return nnf.rule, result
252 |     return cell, list(seq)
253 | 
254 | class Rule(object):
255 |     def __init__(self, lhs, rhs, annotation=None):
256 |         self.lhs = lhs
257 |         self.rhs = rhs
258 |         self.annotation = annotation
259 | 
260 |     def __len__(self):
261 |         return len(self.rhs)
262 | 
263 |     def __repr__(self):
264 |         return "{} -> {}".format(
265 |             self.lhs,
266 |             ' '.join(map(str, self.rhs)))
267 | 
268 | # Nihilist normal form
269 | class NNF(object):
270 |     def __init__(self, rule, present):
271 |         self.rule = rule
272 |         self.present = present          # tells which fields are present.
273 | 
274 | # Earlier I did not separate terminals from
275 | # non-terminals because it was not strictly
276 | # necessary. That turned out to confuse
277 | # when designing grammars.
278 | class Terminal(object):
279 |     def __init__(self, name):
280 |         self.name = name
281 | 
282 |     def __repr__(self):
283 |         return "T{!r}".format(self.name)
284 | 
285 | class Nonterminal(object):
286 |     def __init__(self, name):
287 |         self.name = name
288 | 
289 |     def __repr__(self):
290 |         return "{!s}".format(self.name)
291 | 
292 | # The chart consists explicitly from earley items.
293 | class EIM(object):
294 |     def __init__(self, rule, pos, origin):
295 |         self.rule = rule
296 |         self.pos = pos
297 |         self.origin = origin
298 |         assert 0 <= pos <= len(rule)
299 | 
300 |     def postdot(self):
301 |         if self.pos < len(self.rule):
302 |             return self.rule.rhs[self.pos]
303 |         return None
304 | 
305 |     def next(self):
306 |         if self.postdot() is not None:
307 |             return EIM(self.rule, self.pos + 1, self.origin)
308 |         return None
309 | 
310 |     def penult(self):
311 |         if self.pos + 1 == len(self.rule):
312 |             return self.postdot()
313 | 
314 |     def is_predicted(self):
315 |         return self.pos == 0
316 | 
317 |     def is_confirmed(self):
318 |         return self.pos > 0
319 | 
320 |     def is_completed(self):
321 |         return self.pos == len(self.rule)
322 | 
323 |     def __hash__(self):
324 |         return hash((self.rule, self.pos, self.origin))
325 | 
326 |     def __eq__(self, other):
327 |         return isinstance(other, EIM) and self.rule == other.rule and self.pos == other.pos and self.origin == other.origin
328 | 
329 |     def __repr__(self):
330 |         if isinstance(self.rule, Rule):
331 |             lhs = repr(self.rule.lhs)
332 |             pre = ' '.join(map(repr, self.rule.rhs[:self.pos]))
333 |             pos = ' '.join(map(repr, self.rule.rhs[self.pos:]))
334 |             return "{} -> {} * {} : {}".format(lhs, pre, pos, self.origin)
335 |         return object.__repr__(self)
336 | 
337 | # Shared packed parse forest
338 | class SPPF(object):
339 |     def __init__(self, start, stop, cell, link):
340 |         self.start = start
341 |         self.stop = stop
342 |         self.cell = cell
343 |         self.link = link
344 | 
345 |     def is_leaf(self):
346 |         return self.link is None
347 | 
348 |     def insert(self, left, right):
349 |         if self.link is None:
350 |             self.link = Link(left, right)
351 |             return self.link
352 |         link = self.link
353 |         while True:
354 |             if link.left == left and link.right == right:
355 |                 return link
356 |             if link.link is None:
357 |                 link.link = Link(left, right)
358 |                 return link.link
359 |             link = link.link
360 | 
361 |     def single(self):
362 |         result = []
363 |         link = self.link
364 |         while link.left is not None:
365 |             if link.link is not None:
366 |                 return None
367 |             result.append(link.right)
368 |             link = link.left.link
369 |         result.append(link.right)
370 |         result.reverse()
371 |         return result
372 | 
373 |     def __iter__(self):
374 |         finger = []
375 |         # To produce all parses, the sppf is fingered through.
376 |         link = self.link
377 |         while len(finger) > 0 or link is not None:
378 |             while link.left is not None:
379 |                 finger.append(link)
380 |                 link = link.left.link
381 |             # Now the link contains the head, while the tail is in the finger list.
382 |             while link is not None:
383 |                 result = [link.right]
384 |                 result.extend(x.right for x in reversed(finger))
385 |                 yield result
386 |                 link = link.link
387 |             # Now some portion of the finger is already iterated, and should be removed.
388 |             while len(finger) > 0 and link is None:
389 |                 link = finger.pop().link
390 | 
391 |     def __repr__(self):
392 |         return "[{}:{}] {}".format(self.start, self.stop, self.cell)
393 | 
394 | class Link(object):
395 |     def __init__(self, left, right, link=None):
396 |         self.left = left
397 |         self.right = right
398 |         self.link = link
399 | 
400 | if __name__=="__main__":
401 |     main()
402 | 


--------------------------------------------------------------------------------
/example_0.py:
--------------------------------------------------------------------------------
 1 | import chartparser
 2 | from chartparser import (
 3 |     Terminal, Nonterminal, Rule,
 4 |     preprocess)
 5 | import visualizer
 6 | 
 7 | def main():
 8 |     s = Nonterminal('s')
 9 |     a = Nonterminal('a')
10 |     b = Nonterminal('b')
11 |     x = Terminal('x')
12 | 
13 |     terminals = {"x": x}
14 | 
15 |     accept = s
16 |     user_grammar = [
17 |         Rule(s, [s, a]),
18 |         Rule(s, []),
19 |         Rule(a, [x]),
20 |     ]
21 | 
22 |     input_strings = [
23 |         "x",
24 |         "xxx",
25 |         "xxxxxx"
26 |     ]
27 | 
28 |     visualizer.print_grammar(user_grammar)
29 | 
30 |     parser = preprocess(user_grammar, accept)()
31 |     visualizer.print_nnf_grammar(parser.grammar)
32 | 
33 |     input_string = visualizer.select_input_string(input_strings)
34 | 
35 |     visualizer.step_through(parser, terminals, input_string)
36 | 
37 | if __name__=='__main__':
38 |     main()
39 | 


--------------------------------------------------------------------------------
/example_1.py:
--------------------------------------------------------------------------------
 1 | import chartparser
 2 | from chartparser import (
 3 |     Terminal, Nonterminal, Rule,
 4 |     preprocess)
 5 | import visualizer
 6 | 
 7 | def main():
 8 |     d0 = Terminal('0')
 9 |     d1 = Terminal('1')
10 |     d2 = Terminal('2')
11 |     d3 = Terminal('3')
12 |     d4 = Terminal('4')
13 |     d5 = Terminal('5')
14 |     d6 = Terminal('6')
15 |     d7 = Terminal('7')
16 |     d8 = Terminal('8')
17 |     d9 = Terminal('9')
18 |     plus = Terminal('+')
19 |     minus = Terminal('-')
20 |     div = Terminal('/')
21 |     mul = Terminal('*')
22 |     open_ = Terminal('(')
23 |     close_ = Terminal(')')
24 |     dot = Terminal('.')
25 |     start = Nonterminal('start')
26 |     expr = Nonterminal('expr')
27 |     term = Nonterminal('term')
28 |     factor = Nonterminal('factor')
29 |     integer = Nonterminal('integer')
30 |     digit = Nonterminal('digit')
31 | 
32 |     grammar = [
33 |         Rule(start, [expr]),
34 |         Rule(expr, [term, plus,expr]),
35 |         Rule(expr, [term, minus,expr]),
36 |         Rule(expr, [term]),
37 |         Rule(term, [factor]), # This rule was missing in issue/1
38 |         Rule(factor, [plus, factor]),
39 |         Rule(factor, [minus, factor]),
40 |         Rule(factor, [open_, expr, close_]),
41 |         Rule(factor, [integer]),
42 |         Rule(factor, [integer, dot, integer]),
43 |         Rule(integer, [digit, integer]),
44 |         Rule(integer, [digit]),
45 |         Rule(digit, [d0]),
46 |         Rule(digit, [d1]),
47 |         Rule(digit, [d2]),
48 |         Rule(digit, [d3]),
49 |         Rule(digit, [d4]),
50 |         Rule(digit, [d5]),
51 |         Rule(digit, [d6]),
52 |         Rule(digit, [d7]),
53 |         Rule(digit, [d8]),
54 |         Rule(digit, [d9]),
55 |     ]
56 | 
57 |     accept = start
58 |     terminals = {
59 |             "0": d0,
60 |             "1": d1,
61 |             "2": d2,
62 |             "3": d3,
63 |             "4": d4,
64 |             "5": d5,
65 |             "6": d6,
66 |             "7": d7,
67 |             "8": d8,
68 |             "9": d9,
69 |             "+": plus,
70 |             "-": minus,
71 |             "/": div,
72 |             "*": mul,
73 |             "(": open_,
74 |             ")": close_,
75 |             ".": dot,
76 |             }
77 | 
78 |     user_grammar = grammar
79 | 
80 |     input_strings = [
81 |         "321345",
82 |     ]
83 | 
84 |     visualizer.print_grammar(user_grammar)
85 | 
86 |     parser = preprocess(user_grammar, accept)()
87 |     visualizer.print_nnf_grammar(parser.grammar)
88 | 
89 |     input_string = visualizer.select_input_string(input_strings)
90 | 
91 |     visualizer.step_through(parser, terminals, input_string)
92 | 
93 | if __name__=='__main__':
94 |     main()
95 | 


--------------------------------------------------------------------------------
/example_IF.py:
--------------------------------------------------------------------------------
  1 | import chartparser
  2 | from chartparser import (
  3 |     Terminal, Nonterminal, Rule,
  4 |     preprocess)
  5 | 
  6 | command = Nonterminal('start')
  7 | 
  8 | look = Terminal('look')
  9 | go = Terminal('go')
 10 | to_ = Terminal('to')
 11 | inventory = Terminal('inventory')
 12 | 
 13 | place = Terminal('some place')
 14 | 
 15 | token_table = {
 16 |     "look": look,
 17 |     "go": go,
 18 |     "to": to_,
 19 |     "inventory": inventory,
 20 |     "inv":       inventory,
 21 | }
 22 | 
 23 | grammar = [
 24 |     Rule(command, [look], annotation='look'),
 25 |     Rule(command, [go, to_], annotation="goto?"),
 26 |     Rule(command, [go, to_, place], annotation="goto"),
 27 |     Rule(command, [inventory], annotation='inventory'),
 28 | ]
 29 | 
 30 | class ENVIRONMENT:
 31 |     def __init__(self):
 32 |         self.current_room = first_room
 33 |         self.inventory = set(['axe', 'lamp', 'toilet'])
 34 | 
 35 |     def get_grammar(self):
 36 |         return grammar
 37 | 
 38 | class EMPTY_ROOM:
 39 |     description = "You are in an empty room"
 40 |     def __init__(self):
 41 |         self.passages = {}
 42 |         self.items = []
 43 | 
 44 | first_room = EMPTY_ROOM()
 45 | second_room = EMPTY_ROOM()
 46 | second_room.description = "You are in an another empty room"
 47 | 
 48 | first_room.passages  = {"hole": second_room}
 49 | first_room.items = set(['some thing', 'other thing'])
 50 | 
 51 | second_room.passages = {"hole": first_room}
 52 | second_room.items = set(['dangerous thing', 'weird thing'])
 53 | 
 54 | 
 55 | def main():
 56 |     print "Welcome to the empty room"
 57 |     environment = ENVIRONMENT()
 58 | 
 59 |     while True:
 60 |         try:
 61 |             action = attempted_parse_of_input(environment)
 62 |         except EOFError as eof:
 63 |             print "okay."
 64 |             return
 65 |         if action is None:
 66 |             print "what do you mean?"
 67 |         else:
 68 |             action(environment)
 69 | 
 70 | def attempted_parse_of_input(environment):
 71 |     language = preprocess(environment.get_grammar(), command)
 72 |     parser = language()
 73 | 
 74 |     input_string = raw_input("> ")
 75 |     for word in input_string.strip().split(" "):
 76 |         cleaned_word = word.strip().lower()
 77 |         try:
 78 |             token = recognize_word(environment, cleaned_word)
 79 |             parser.step(token, cleaned_word)
 80 |         except KeyError as ke:
 81 |             print "at word {!r}".format(word)
 82 |             parser_error_message(parser)
 83 |             return None
 84 |     if parser.accepted:
 85 |         return parser.traverse(
 86 |             rule_traverse,
 87 |             empty_traverse)
 88 |     else:
 89 |         print "at end of the sentence"
 90 |         parser_error_message(parser)
 91 |         return None
 92 | 
 93 | def recognize_word(environment, cleaned_word):
 94 |     if cleaned_word in environment.current_room.passages:
 95 |         return place
 96 |     return token_table[cleaned_word]
 97 | 
 98 | def parser_error_message(parser):
 99 |     expect = list(parser.expect)
100 |     if len(expect) == 0:
101 |         print "EXPECTED nothing"
102 |     else:
103 |         print "EXPECTED one of"
104 |         for token in expect:
105 |             print "  {}".format(token)
106 | 
107 | def rule_traverse(rule, arguments):
108 |     if rule.annotation == 'look':
109 |         return _look_around_
110 |     elif rule.annotation == 'goto?':
111 |         def _goto_where_(environment):
112 |             print "go to where? places to go:"
113 |             for place in environment.current_room.passages.keys():
114 |                 print "  {}".format(place)
115 |         return _goto_where_
116 |     elif rule.annotation == 'goto':
117 |         def _goto_(environment):
118 |             print "you go to", arguments[2]
119 |             previous_room = environment.current_room
120 |             environment.current_room = previous_room.passages[arguments[2]]
121 |             _look_around_(environment)
122 |         return _goto_
123 |     elif rule.annotation == 'inventory':
124 |         def _print_inventory_(environment):
125 |             print "you have items"
126 |             for item in environment.inventory:
127 |                 print "  {}".format(item)
128 |         return _print_inventory_
129 |     elif rule.annotation == 'place':
130 |         return arguments[0]
131 |     else:
132 |         parse_tree = rule.annotation + '(' + ' '.join(
133 |             repr(item) for item in arguments) + ')'
134 |         def _placeholder_(environment):
135 |             print "not implemented"
136 |             print parse_tree
137 |         return _placeholder_
138 | 
139 | def empty_traverse(rule):
140 |     def _placeholder_(environment):
141 |         print "not implemented (empty rule)"
142 |         print rule
143 |     return _placeholder_
144 | 
145 | def _look_around_(environment):
146 |     print environment.current_room.description
147 |     if len(environment.current_room.items) > 0:
148 |         print "there are items"
149 |         for item in environment.current_room.items:
150 |             print "  {}".format(item)
151 | 
152 | if __name__=='__main__':
153 |     main()
154 | 


--------------------------------------------------------------------------------
/visualizer.py:
--------------------------------------------------------------------------------
 1 | #import traceback # For printing tracebacks
 2 | 
 3 | # This visualizer was written with the purpose to help
 4 | # study of the parsing algorithm.
 5 | 
 6 | # 1. It prints out your grammar,
 7 | # 2. then it prints the preprocessing results.
 8 | # 3. Finally it lets you choose among the input strings and
 9 | # 4. step the parser through.
10 | # 4.b. If the parsing step encounters an error,
11 | #      the error and trace is printed and the parser considered lost.
12 | # 4.c. Asks for more input or stops if no string given.
13 | 
14 | # The example grammars import this utility as a module.
15 | # But you should expect there can be some global values stored later
16 | # this is more like its own program.
17 | 
18 | # The preprocessed items are annotated with NNF nodes
19 | # and connected to the input grammar that way.
20 | # I avoided printing NNF items in the simple outputs because
21 | # it would have required more printout.
22 | 
23 | def print_grammar(grammar):
24 |     print "THE INPUT GRAMMAR"
25 |     for rule in grammar:
26 |         print "  {0.lhs} -> {1}".format(rule,
27 |             " ".join(repr(sym) for sym in rule.rhs))
28 |     raw_input("press return and continue")
29 | 
30 | def print_nnf_grammar(grammar):
31 |     print "THE PREPROCESSED GRAMMAR"
32 |     for lhs, rules in grammar.iteritems():
33 |         for rule in rules:
34 |             print "  {0.lhs} -> {1}".format(rule,
35 |                 " ".join(repr(sym) for sym in rule.rhs))
36 |     raw_input("press return and continue")
37 | 
38 | # Preprocessing also contains the blankset,
39 | # a set of nonterminals that can produce empty sequences.
40 | # We are not printing that for now.
41 | 
42 | def select_input_string(input_strings):
43 |     print "Select the input string"
44 |     for index, string in enumerate(input_strings):
45 |         print "  [{}] {!r}".format(index, string)
46 |     print "  [a] write your own string"
47 |     choice = raw_input("? ")
48 |     if choice.lower() == 'a':
49 |         return raw_input("input string: ")
50 |     else:
51 |         try:
52 |             return input_strings[int(choice)]
53 |         except ValueError as error:
54 |             print "error: {}".format(error)
55 |             return select_input_string(input_strings)
56 |         except IndexError as error:
57 |             print "error: no such option"
58 |             return select_input_string(input_strings)
59 | 
60 | def step_through(parser, terminals, input_string):
61 |     while len(input_string) > 0:
62 |         for token in input_string:
63 |             print_parsing_state(parser)
64 |             print "next token: {}".format(terminals[token])
65 |             raw_input("press return and continue")
66 |             #try:
67 |             parser.step(terminals[token], token)
68 |             #except Exception as error:
69 |             #    traceback.print_exc()
70 |             #    return
71 |         print_parsing_state(parser, True)
72 |         input_string = raw_input("more input string? ")
73 |     
74 | def print_parsing_state(parser, final=False):
75 |     print "CHART {}".format(len(parser.chart)-1)
76 |     for term, eims in parser.chart[-1].iteritems():
77 |         for eim, bb in eims:
78 |             if bb is None:
79 |                 print "  {}".format(eim)
80 |             else:
81 |                 print "  {} [{}]".format(eim, bb)
82 |     if parser.accepted:
83 |         print "INPUT ACCEPTED"
84 |         if not final:
85 |             for cc in parser.output:
86 |                 print "  {}".format(cc)
87 |         else:
88 |             print " ", parser.traverse(lambda x, a: '(' + ' '.join(a) + ')', lambda x: "")
89 | 


--------------------------------------------------------------------------------