├── .gitignore ├── LICENSE.md ├── README.md ├── chartparser.py ├── example_0.py ├── example_1.py ├── example_IF.py └── visualizer.py /.gitignore: -------------------------------------------------------------------------------- 1 | *.pyc 2 | -------------------------------------------------------------------------------- /LICENSE.md: -------------------------------------------------------------------------------- 1 | The MIT License (MIT) 2 | 3 | Copyright (c) 2016 Henri Tuhola 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. 22 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Chart parser 2 | 3 | This parser will soon replace the custom one in lever/, until then it has not been throroughly tested. Early users beware for bugs. 4 | 5 | Imagine you'd have to make sense of some input, commonly an user interaction. You get a sequence of symbols and have to recognize some structure for it. This is when you need a parsing library. Chart parsers are straightforward but potentially resource consumptive tool for this. For small inputs or simple languages, they're often sufficient. 6 | 7 | Chart parser takes in a Context-free grammar and the input sequence. 8 | 9 | Here's how to input a context-free grammar that the chartparser can read: 10 | 11 | from chartparser import NonTerminal, Terminal, Rule 12 | 13 | s, a = Nonterminal('s'), Nonterminal('a') 14 | x = Terminal('x') 15 | grammar = [ 16 | Rule(s, [s, a]), 17 | Rule(s, []), 18 | Rule(a, [x]), 19 | ] 20 | 21 | Note that you can pass a third parameter to Rule, that will be bound to `annotation`, how you use this field is entirely up to you. 22 | 23 | Chart parsers are not usually preprocessing-free, but the preprocessing tends to happen in such short time that nobody worries about it. Here's how to form a parser: 24 | 25 | from chartparser preprocess 26 | 27 | parser = preprocess(grammar, accept=s)() 28 | 29 | Note that `preprocess` gives a function that you can call to instantiate a parser. This is done so you can parse many inputs with single preprocessing step. 30 | 31 | Attention: Many chart parsers can allow you to select the symbol when you start parsing, rather than when you preprocess the grammar. You are required to supply the default symbol in `preprocess`. But this is also acceptable: 32 | 33 | new_parser = preprocess(grammar, accept=s) 34 | parser = new_parser(accept=a) 35 | 36 | If this is not possible on some other variation of this library, the `new_parser()` takes a liberty to crash on argument error. 37 | 38 | The parser contains several interfaces that you probably need for your usecase. The first one is a `.step`, and you use it to give the input into the parser. The first argument is the symbol this item represents. It can be both Terminal and Nonterminal. The second is the token put into this position. Here's an example that uses makeshift input: 39 | 40 | terminals = {"x": x} 41 | input_string = "xxxxxx" 42 | for token in input_string: 43 | parser.step(terminals[token], token) 44 | 45 | Additionally you have following interfaces that are available during parsing: 46 | 47 | * `parser.accepted` tells whether the input so far can be accepted and traversed. 48 | * `parser.expect` contains terminals and nonterminals that when `.step` next will result in a non-empty parsing state. 49 | * `parser.expecting(x)` can be used to query whether parsing can `.step` with the given symbol. 50 | 51 | Finally when you've finished your input, you want to make sense of it by `.traverse`. Here's how to do it: 52 | 53 | def postorder_call(rule, rhs): 54 | return '(' + ' '.join(rhs) + ')' 55 | def blank(symbol): 56 | return '' 57 | def ambiguous(sppf): 58 | for p in sppf: 59 | print p 60 | raise Exception("ambiguous parse") 61 | # This may also choose one interpretation among ambiguous results. 62 | # in that case return one of 'p' from this function. 63 | result = parser.traverse(postorder_call, blank, ambiguous) 64 | 65 | Note that blank rules will never reach postorder_call. Instead the `blank` is called with the appropriate symbol. This is done so the chart parser doesn't need to produce permutations of the empty rules, if they appear. If you rather want to derive empty objects from the blank rules, you can obtain `.nullable` and `.blankset` during preprocessing: 66 | 67 | new_parser = preprocess(grammar, accept=s) 68 | print new_parser.blankset 69 | print new_parser.nullable 70 | 71 | ## Supported 72 | 73 | I make a promise to respond on support issues as long as I keep writing python code. I also try to keep the interface described in this readme mostly unchanged from what it is now. I did lot of care to get it this clean. 74 | 75 | The parsing library exposes lot of other details you may need necessary to use. For example, my single traversing approach might not satisfy every urge. Those details may change in future. 76 | 77 | ## Origins 78 | 79 | This module is using [the work of Jeffrey Kegler](http://jeffreykegler.github.io/Marpa-web-site/). He introduced these concepts to me, and the papers about Marpa helped me to refine an adaptation of the parser on python. 80 | 81 | Before I got to read how to write a practical parser of this kind, I found it worthwhile to read [loup's explanation on earley parsing](http://loup-vaillant.fr/tutorials/earley-parsing/), that is worthwhile to mention. 82 | -------------------------------------------------------------------------------- /chartparser.py: -------------------------------------------------------------------------------- 1 | def main(): 2 | s = Nonterminal('s') 3 | a = Nonterminal('a') 4 | b = Nonterminal('b') 5 | x = Terminal('x') 6 | 7 | terminals = {"x": x} 8 | 9 | accept = s 10 | user_grammar = [ 11 | Rule(s, [s, a]), 12 | Rule(s, []), 13 | Rule(a, [x]), 14 | ] 15 | 16 | parser = preprocess(user_grammar, accept)() 17 | input_string = "xxxxxx" 18 | for token in input_string: 19 | parser.step(terminals[token], token) 20 | print parser.accepted, parser.expect, parser.expecting(x) 21 | print parser.traverse(lambda x, a: '(' + ' '.join(a) + ')', lambda x: "") 22 | 23 | def preprocess(user_grammar, accept): 24 | nullable = find_nullable(user_grammar) 25 | grammar = {} 26 | blankset = {} 27 | for rule in build_nnf(user_grammar, nullable): 28 | if len(rule.rhs) == 0: 29 | try: 30 | blankset[rule.lhs].append(rule.annotation.rule) 31 | except KeyError as k: 32 | blankset[rule.lhs] = [rule.annotation.rule] 33 | else: 34 | try: 35 | grammar[rule.lhs].append(rule) 36 | except KeyError as k: 37 | grammar[rule.lhs] = [rule] 38 | def new_parser(accept=accept): 39 | parser = Parser(grammar, accept, []) 40 | # In an earley parser that uses NNF, empty input is a special case, that is taken care of here. 41 | if accept in nullable: 42 | for rule in user_grammar: 43 | if rule.lhs == accept and all(x in nullable for x in rule.rhs): 44 | parser.output.append(Rule(accept, [], NNF(rule, [False for x in rule.rhs]))) 45 | # The first chart column 46 | transitions = {} 47 | nodes = {} 48 | current = [] 49 | prediction(current, nodes, grammar, 0, accept) 50 | for eim in current: 51 | prediction(current, nodes, grammar, 0, eim.postdot()) 52 | cache_transitions(transitions, eim, None) 53 | parser.chart.append(transitions) 54 | return parser 55 | new_parser.blankset = blankset 56 | new_parser.nullable = nullable 57 | return new_parser 58 | 59 | def default_ambiguity_resolution(sppf): 60 | raise Exception(sppf) 61 | 62 | class Parser(object): 63 | def __init__(self, grammar, accept, output): 64 | self.chart = [] 65 | self.grammar = grammar 66 | self.accept = accept 67 | self.output = output 68 | 69 | def step(self, term, token): 70 | # completions proceed in non-deterministic manner, until 71 | # everything has been completed. 72 | current = [] 73 | transitions = {} 74 | nodes = {} 75 | location = len(self.chart) 76 | output = [] 77 | 78 | bottom = SPPF(location-1, location, token, None) 79 | for eim, bb in self.chart[location-1][term]: 80 | shift_eim(current, nodes, eim, location, bb, bottom) 81 | for eim in current: 82 | # reduction 83 | cc = nodes[eim] 84 | if eim.is_completed(): 85 | for before, bb in self.chart[eim.origin].get(eim.rule.lhs, ()): 86 | shift_eim(current, nodes, before, location, bb, cc) 87 | if eim.rule.lhs == self.accept and eim.origin == 0: 88 | output.append(cc) 89 | prediction(current, nodes, self.grammar, location, eim.postdot()) 90 | cache_transitions(transitions, eim, cc) 91 | self.chart.append(transitions) 92 | self.output = output 93 | 94 | @property 95 | def accepted(self): 96 | return len(self.output) > 0 97 | 98 | @property 99 | def expect(self): 100 | return self.chart[-1].keys() 101 | 102 | def expecting(self, symbol): 103 | return symbol in self.chart[-1] 104 | 105 | def traverse(self, postorder_callback, blank_callback, resolve_ambiguity=default_ambiguity_resolution): 106 | if len(self.output) > 1: 107 | sppf = resolve_ambiguity(None, self.output) 108 | else: 109 | sppf = self.output[0] 110 | return traverse_sppf(sppf, postorder_callback, blank_callback, resolve_ambiguity) 111 | 112 | def prediction(current, nodes, grammar, location, postdot): 113 | if isinstance(postdot, Nonterminal): 114 | for rule in grammar.get(postdot, ()): 115 | eim = EIM(rule, 0, location) 116 | if not eim in nodes: 117 | nodes[eim] = None 118 | current.append(eim) 119 | 120 | def cache_transitions(transitions, eim, cc): 121 | postdot = eim.postdot() 122 | if not eim.is_completed(): 123 | try: 124 | transitions[postdot].append((eim, cc)) 125 | except KeyError as k: 126 | transitions[postdot] = [(eim, cc)] 127 | 128 | def shift_eim(current, nodes, eim, location, bb, cc): 129 | eim = eim.next() 130 | try: 131 | sppf = nodes[eim] 132 | sppf.insert(bb, cc) 133 | except KeyError as k: 134 | assert eim.pos != 0 135 | nodes[eim] = sppf = SPPF(eim.origin, location, eim.rule, Link(bb, cc)) 136 | current.append(eim) 137 | 138 | def build_nnf(grammar, nullable): 139 | for rule in grammar: 140 | order = sum(x in nullable for x in rule.rhs) 141 | for i in range(1 << order): 142 | yield nihilist_rule(rule, i, nullable) 143 | 144 | def nihilist_rule(rule, index, nullable): 145 | present = [] 146 | rhs = [] 147 | for symbol in rule.rhs: 148 | shift = True 149 | if symbol in nullable: 150 | if index & 1 == 0: 151 | shift = False 152 | index >>= 1 153 | present.append(shift) 154 | if shift: 155 | rhs.append(symbol) 156 | return Rule(rule.lhs, rhs, NNF(rule, present)) 157 | 158 | def detect_right_recursion(grammar): 159 | edges = [] 160 | for rule in grammar: 161 | right = rule.rhs[-1] if len(rule.rhs) > 0 else None 162 | row = [] 163 | for other in grammar: 164 | row.append(other.lhs == right) 165 | edges.append(row) 166 | warshall_transitive_closure(edges) 167 | return set(rule for i, rule in enumerate(grammar) if edges[i][i]) 168 | 169 | def warshall_transitive_closure(a): 170 | n = len(a) 171 | for k in range(n): 172 | for i in range(n): 173 | if not a[i][k]: 174 | continue 175 | for j in range(n): 176 | if not a[k][j]: 177 | continue 178 | a[i][j] = True 179 | return a 180 | 181 | def find_nullable(grammar): 182 | nullable = set() 183 | queue = [] 184 | def new_nullable(symbol): 185 | if symbol not in nullable: 186 | nullable.add(symbol) 187 | queue.append(symbol) 188 | 189 | inverse_lookup = {} 190 | def new_lookup(index, symbol): 191 | if symbol in inverse_lookup: 192 | inverse_lookup[symbol].append(index) 193 | else: 194 | inverse_lookup[symbol] = [index] 195 | 196 | nonterminals = [] 197 | nonnullables = [] 198 | 199 | for rule in grammar: 200 | if len(rule) == 0: 201 | new_nullable(rule.lhs) 202 | elif all(isinstance(x, Nonterminal) for x in rule.rhs): 203 | index = len(nonnullables) 204 | for x in rule.rhs: 205 | if x != rule.lhs: 206 | new_lookup(index, x) 207 | nonterminals.append(rule.lhs) 208 | nonnullables.append(sum(x != rule.lhs for x in rule.rhs)) 209 | 210 | for n in queue: 211 | for i in inverse_lookup.get(n, ()): 212 | nonnullables[i] -= 1 213 | if nonnullables[i] == 0: 214 | new_nullable(nonterminals[i]) 215 | 216 | return nullable 217 | 218 | def traverse_sppf(sppf, postorder_callback, blank_callback, resolve_ambiguity): 219 | rcount = 1 220 | sstack = [] 221 | rstack = [] 222 | stack = [sppf] 223 | while len(stack) > 0: 224 | sppf = stack.pop() 225 | if sppf.is_leaf(): 226 | sstack.append(sppf.cell) 227 | rcount -= 1 228 | else: 229 | result = sppf.single() 230 | if result is None: 231 | result = resolve_ambiguity(sppf) 232 | rstack.append((rcount - 1, len(result), sppf)) 233 | rcount = len(result) 234 | stack.extend(reversed(result)) 235 | while rcount == 0 and len(rstack) > 0: 236 | rcount, rlen, sppf = rstack.pop(-1) 237 | rule, args = expand(sppf.cell, blank_callback, (sstack.pop(i-rlen) for i in range(rlen))) 238 | sstack.append(postorder_callback(rule, args)) 239 | assert len(sstack) == 1 240 | return sstack[0] 241 | 242 | def expand(cell, blank_callback, seq): 243 | if isinstance(cell.annotation, NNF): 244 | nnf = cell.annotation 245 | result = [] 246 | for i, p in enumerate(nnf.present): 247 | if p: 248 | result.append(seq.next()) 249 | else: 250 | result.append(blank_callback(nnf.rule.rhs[i])) 251 | return nnf.rule, result 252 | return cell, list(seq) 253 | 254 | class Rule(object): 255 | def __init__(self, lhs, rhs, annotation=None): 256 | self.lhs = lhs 257 | self.rhs = rhs 258 | self.annotation = annotation 259 | 260 | def __len__(self): 261 | return len(self.rhs) 262 | 263 | def __repr__(self): 264 | return "{} -> {}".format( 265 | self.lhs, 266 | ' '.join(map(str, self.rhs))) 267 | 268 | # Nihilist normal form 269 | class NNF(object): 270 | def __init__(self, rule, present): 271 | self.rule = rule 272 | self.present = present # tells which fields are present. 273 | 274 | # Earlier I did not separate terminals from 275 | # non-terminals because it was not strictly 276 | # necessary. That turned out to confuse 277 | # when designing grammars. 278 | class Terminal(object): 279 | def __init__(self, name): 280 | self.name = name 281 | 282 | def __repr__(self): 283 | return "T{!r}".format(self.name) 284 | 285 | class Nonterminal(object): 286 | def __init__(self, name): 287 | self.name = name 288 | 289 | def __repr__(self): 290 | return "{!s}".format(self.name) 291 | 292 | # The chart consists explicitly from earley items. 293 | class EIM(object): 294 | def __init__(self, rule, pos, origin): 295 | self.rule = rule 296 | self.pos = pos 297 | self.origin = origin 298 | assert 0 <= pos <= len(rule) 299 | 300 | def postdot(self): 301 | if self.pos < len(self.rule): 302 | return self.rule.rhs[self.pos] 303 | return None 304 | 305 | def next(self): 306 | if self.postdot() is not None: 307 | return EIM(self.rule, self.pos + 1, self.origin) 308 | return None 309 | 310 | def penult(self): 311 | if self.pos + 1 == len(self.rule): 312 | return self.postdot() 313 | 314 | def is_predicted(self): 315 | return self.pos == 0 316 | 317 | def is_confirmed(self): 318 | return self.pos > 0 319 | 320 | def is_completed(self): 321 | return self.pos == len(self.rule) 322 | 323 | def __hash__(self): 324 | return hash((self.rule, self.pos, self.origin)) 325 | 326 | def __eq__(self, other): 327 | return isinstance(other, EIM) and self.rule == other.rule and self.pos == other.pos and self.origin == other.origin 328 | 329 | def __repr__(self): 330 | if isinstance(self.rule, Rule): 331 | lhs = repr(self.rule.lhs) 332 | pre = ' '.join(map(repr, self.rule.rhs[:self.pos])) 333 | pos = ' '.join(map(repr, self.rule.rhs[self.pos:])) 334 | return "{} -> {} * {} : {}".format(lhs, pre, pos, self.origin) 335 | return object.__repr__(self) 336 | 337 | # Shared packed parse forest 338 | class SPPF(object): 339 | def __init__(self, start, stop, cell, link): 340 | self.start = start 341 | self.stop = stop 342 | self.cell = cell 343 | self.link = link 344 | 345 | def is_leaf(self): 346 | return self.link is None 347 | 348 | def insert(self, left, right): 349 | if self.link is None: 350 | self.link = Link(left, right) 351 | return self.link 352 | link = self.link 353 | while True: 354 | if link.left == left and link.right == right: 355 | return link 356 | if link.link is None: 357 | link.link = Link(left, right) 358 | return link.link 359 | link = link.link 360 | 361 | def single(self): 362 | result = [] 363 | link = self.link 364 | while link.left is not None: 365 | if link.link is not None: 366 | return None 367 | result.append(link.right) 368 | link = link.left.link 369 | result.append(link.right) 370 | result.reverse() 371 | return result 372 | 373 | def __iter__(self): 374 | finger = [] 375 | # To produce all parses, the sppf is fingered through. 376 | link = self.link 377 | while len(finger) > 0 or link is not None: 378 | while link.left is not None: 379 | finger.append(link) 380 | link = link.left.link 381 | # Now the link contains the head, while the tail is in the finger list. 382 | while link is not None: 383 | result = [link.right] 384 | result.extend(x.right for x in reversed(finger)) 385 | yield result 386 | link = link.link 387 | # Now some portion of the finger is already iterated, and should be removed. 388 | while len(finger) > 0 and link is None: 389 | link = finger.pop().link 390 | 391 | def __repr__(self): 392 | return "[{}:{}] {}".format(self.start, self.stop, self.cell) 393 | 394 | class Link(object): 395 | def __init__(self, left, right, link=None): 396 | self.left = left 397 | self.right = right 398 | self.link = link 399 | 400 | if __name__=="__main__": 401 | main() 402 | -------------------------------------------------------------------------------- /example_0.py: -------------------------------------------------------------------------------- 1 | import chartparser 2 | from chartparser import ( 3 | Terminal, Nonterminal, Rule, 4 | preprocess) 5 | import visualizer 6 | 7 | def main(): 8 | s = Nonterminal('s') 9 | a = Nonterminal('a') 10 | b = Nonterminal('b') 11 | x = Terminal('x') 12 | 13 | terminals = {"x": x} 14 | 15 | accept = s 16 | user_grammar = [ 17 | Rule(s, [s, a]), 18 | Rule(s, []), 19 | Rule(a, [x]), 20 | ] 21 | 22 | input_strings = [ 23 | "x", 24 | "xxx", 25 | "xxxxxx" 26 | ] 27 | 28 | visualizer.print_grammar(user_grammar) 29 | 30 | parser = preprocess(user_grammar, accept)() 31 | visualizer.print_nnf_grammar(parser.grammar) 32 | 33 | input_string = visualizer.select_input_string(input_strings) 34 | 35 | visualizer.step_through(parser, terminals, input_string) 36 | 37 | if __name__=='__main__': 38 | main() 39 | -------------------------------------------------------------------------------- /example_1.py: -------------------------------------------------------------------------------- 1 | import chartparser 2 | from chartparser import ( 3 | Terminal, Nonterminal, Rule, 4 | preprocess) 5 | import visualizer 6 | 7 | def main(): 8 | d0 = Terminal('0') 9 | d1 = Terminal('1') 10 | d2 = Terminal('2') 11 | d3 = Terminal('3') 12 | d4 = Terminal('4') 13 | d5 = Terminal('5') 14 | d6 = Terminal('6') 15 | d7 = Terminal('7') 16 | d8 = Terminal('8') 17 | d9 = Terminal('9') 18 | plus = Terminal('+') 19 | minus = Terminal('-') 20 | div = Terminal('/') 21 | mul = Terminal('*') 22 | open_ = Terminal('(') 23 | close_ = Terminal(')') 24 | dot = Terminal('.') 25 | start = Nonterminal('start') 26 | expr = Nonterminal('expr') 27 | term = Nonterminal('term') 28 | factor = Nonterminal('factor') 29 | integer = Nonterminal('integer') 30 | digit = Nonterminal('digit') 31 | 32 | grammar = [ 33 | Rule(start, [expr]), 34 | Rule(expr, [term, plus,expr]), 35 | Rule(expr, [term, minus,expr]), 36 | Rule(expr, [term]), 37 | Rule(term, [factor]), # This rule was missing in issue/1 38 | Rule(factor, [plus, factor]), 39 | Rule(factor, [minus, factor]), 40 | Rule(factor, [open_, expr, close_]), 41 | Rule(factor, [integer]), 42 | Rule(factor, [integer, dot, integer]), 43 | Rule(integer, [digit, integer]), 44 | Rule(integer, [digit]), 45 | Rule(digit, [d0]), 46 | Rule(digit, [d1]), 47 | Rule(digit, [d2]), 48 | Rule(digit, [d3]), 49 | Rule(digit, [d4]), 50 | Rule(digit, [d5]), 51 | Rule(digit, [d6]), 52 | Rule(digit, [d7]), 53 | Rule(digit, [d8]), 54 | Rule(digit, [d9]), 55 | ] 56 | 57 | accept = start 58 | terminals = { 59 | "0": d0, 60 | "1": d1, 61 | "2": d2, 62 | "3": d3, 63 | "4": d4, 64 | "5": d5, 65 | "6": d6, 66 | "7": d7, 67 | "8": d8, 68 | "9": d9, 69 | "+": plus, 70 | "-": minus, 71 | "/": div, 72 | "*": mul, 73 | "(": open_, 74 | ")": close_, 75 | ".": dot, 76 | } 77 | 78 | user_grammar = grammar 79 | 80 | input_strings = [ 81 | "321345", 82 | ] 83 | 84 | visualizer.print_grammar(user_grammar) 85 | 86 | parser = preprocess(user_grammar, accept)() 87 | visualizer.print_nnf_grammar(parser.grammar) 88 | 89 | input_string = visualizer.select_input_string(input_strings) 90 | 91 | visualizer.step_through(parser, terminals, input_string) 92 | 93 | if __name__=='__main__': 94 | main() 95 | -------------------------------------------------------------------------------- /example_IF.py: -------------------------------------------------------------------------------- 1 | import chartparser 2 | from chartparser import ( 3 | Terminal, Nonterminal, Rule, 4 | preprocess) 5 | 6 | command = Nonterminal('start') 7 | 8 | look = Terminal('look') 9 | go = Terminal('go') 10 | to_ = Terminal('to') 11 | inventory = Terminal('inventory') 12 | 13 | place = Terminal('some place') 14 | 15 | token_table = { 16 | "look": look, 17 | "go": go, 18 | "to": to_, 19 | "inventory": inventory, 20 | "inv": inventory, 21 | } 22 | 23 | grammar = [ 24 | Rule(command, [look], annotation='look'), 25 | Rule(command, [go, to_], annotation="goto?"), 26 | Rule(command, [go, to_, place], annotation="goto"), 27 | Rule(command, [inventory], annotation='inventory'), 28 | ] 29 | 30 | class ENVIRONMENT: 31 | def __init__(self): 32 | self.current_room = first_room 33 | self.inventory = set(['axe', 'lamp', 'toilet']) 34 | 35 | def get_grammar(self): 36 | return grammar 37 | 38 | class EMPTY_ROOM: 39 | description = "You are in an empty room" 40 | def __init__(self): 41 | self.passages = {} 42 | self.items = [] 43 | 44 | first_room = EMPTY_ROOM() 45 | second_room = EMPTY_ROOM() 46 | second_room.description = "You are in an another empty room" 47 | 48 | first_room.passages = {"hole": second_room} 49 | first_room.items = set(['some thing', 'other thing']) 50 | 51 | second_room.passages = {"hole": first_room} 52 | second_room.items = set(['dangerous thing', 'weird thing']) 53 | 54 | 55 | def main(): 56 | print "Welcome to the empty room" 57 | environment = ENVIRONMENT() 58 | 59 | while True: 60 | try: 61 | action = attempted_parse_of_input(environment) 62 | except EOFError as eof: 63 | print "okay." 64 | return 65 | if action is None: 66 | print "what do you mean?" 67 | else: 68 | action(environment) 69 | 70 | def attempted_parse_of_input(environment): 71 | language = preprocess(environment.get_grammar(), command) 72 | parser = language() 73 | 74 | input_string = raw_input("> ") 75 | for word in input_string.strip().split(" "): 76 | cleaned_word = word.strip().lower() 77 | try: 78 | token = recognize_word(environment, cleaned_word) 79 | parser.step(token, cleaned_word) 80 | except KeyError as ke: 81 | print "at word {!r}".format(word) 82 | parser_error_message(parser) 83 | return None 84 | if parser.accepted: 85 | return parser.traverse( 86 | rule_traverse, 87 | empty_traverse) 88 | else: 89 | print "at end of the sentence" 90 | parser_error_message(parser) 91 | return None 92 | 93 | def recognize_word(environment, cleaned_word): 94 | if cleaned_word in environment.current_room.passages: 95 | return place 96 | return token_table[cleaned_word] 97 | 98 | def parser_error_message(parser): 99 | expect = list(parser.expect) 100 | if len(expect) == 0: 101 | print "EXPECTED nothing" 102 | else: 103 | print "EXPECTED one of" 104 | for token in expect: 105 | print " {}".format(token) 106 | 107 | def rule_traverse(rule, arguments): 108 | if rule.annotation == 'look': 109 | return _look_around_ 110 | elif rule.annotation == 'goto?': 111 | def _goto_where_(environment): 112 | print "go to where? places to go:" 113 | for place in environment.current_room.passages.keys(): 114 | print " {}".format(place) 115 | return _goto_where_ 116 | elif rule.annotation == 'goto': 117 | def _goto_(environment): 118 | print "you go to", arguments[2] 119 | previous_room = environment.current_room 120 | environment.current_room = previous_room.passages[arguments[2]] 121 | _look_around_(environment) 122 | return _goto_ 123 | elif rule.annotation == 'inventory': 124 | def _print_inventory_(environment): 125 | print "you have items" 126 | for item in environment.inventory: 127 | print " {}".format(item) 128 | return _print_inventory_ 129 | elif rule.annotation == 'place': 130 | return arguments[0] 131 | else: 132 | parse_tree = rule.annotation + '(' + ' '.join( 133 | repr(item) for item in arguments) + ')' 134 | def _placeholder_(environment): 135 | print "not implemented" 136 | print parse_tree 137 | return _placeholder_ 138 | 139 | def empty_traverse(rule): 140 | def _placeholder_(environment): 141 | print "not implemented (empty rule)" 142 | print rule 143 | return _placeholder_ 144 | 145 | def _look_around_(environment): 146 | print environment.current_room.description 147 | if len(environment.current_room.items) > 0: 148 | print "there are items" 149 | for item in environment.current_room.items: 150 | print " {}".format(item) 151 | 152 | if __name__=='__main__': 153 | main() 154 | -------------------------------------------------------------------------------- /visualizer.py: -------------------------------------------------------------------------------- 1 | #import traceback # For printing tracebacks 2 | 3 | # This visualizer was written with the purpose to help 4 | # study of the parsing algorithm. 5 | 6 | # 1. It prints out your grammar, 7 | # 2. then it prints the preprocessing results. 8 | # 3. Finally it lets you choose among the input strings and 9 | # 4. step the parser through. 10 | # 4.b. If the parsing step encounters an error, 11 | # the error and trace is printed and the parser considered lost. 12 | # 4.c. Asks for more input or stops if no string given. 13 | 14 | # The example grammars import this utility as a module. 15 | # But you should expect there can be some global values stored later 16 | # this is more like its own program. 17 | 18 | # The preprocessed items are annotated with NNF nodes 19 | # and connected to the input grammar that way. 20 | # I avoided printing NNF items in the simple outputs because 21 | # it would have required more printout. 22 | 23 | def print_grammar(grammar): 24 | print "THE INPUT GRAMMAR" 25 | for rule in grammar: 26 | print " {0.lhs} -> {1}".format(rule, 27 | " ".join(repr(sym) for sym in rule.rhs)) 28 | raw_input("press return and continue") 29 | 30 | def print_nnf_grammar(grammar): 31 | print "THE PREPROCESSED GRAMMAR" 32 | for lhs, rules in grammar.iteritems(): 33 | for rule in rules: 34 | print " {0.lhs} -> {1}".format(rule, 35 | " ".join(repr(sym) for sym in rule.rhs)) 36 | raw_input("press return and continue") 37 | 38 | # Preprocessing also contains the blankset, 39 | # a set of nonterminals that can produce empty sequences. 40 | # We are not printing that for now. 41 | 42 | def select_input_string(input_strings): 43 | print "Select the input string" 44 | for index, string in enumerate(input_strings): 45 | print " [{}] {!r}".format(index, string) 46 | print " [a] write your own string" 47 | choice = raw_input("? ") 48 | if choice.lower() == 'a': 49 | return raw_input("input string: ") 50 | else: 51 | try: 52 | return input_strings[int(choice)] 53 | except ValueError as error: 54 | print "error: {}".format(error) 55 | return select_input_string(input_strings) 56 | except IndexError as error: 57 | print "error: no such option" 58 | return select_input_string(input_strings) 59 | 60 | def step_through(parser, terminals, input_string): 61 | while len(input_string) > 0: 62 | for token in input_string: 63 | print_parsing_state(parser) 64 | print "next token: {}".format(terminals[token]) 65 | raw_input("press return and continue") 66 | #try: 67 | parser.step(terminals[token], token) 68 | #except Exception as error: 69 | # traceback.print_exc() 70 | # return 71 | print_parsing_state(parser, True) 72 | input_string = raw_input("more input string? ") 73 | 74 | def print_parsing_state(parser, final=False): 75 | print "CHART {}".format(len(parser.chart)-1) 76 | for term, eims in parser.chart[-1].iteritems(): 77 | for eim, bb in eims: 78 | if bb is None: 79 | print " {}".format(eim) 80 | else: 81 | print " {} [{}]".format(eim, bb) 82 | if parser.accepted: 83 | print "INPUT ACCEPTED" 84 | if not final: 85 | for cc in parser.output: 86 | print " {}".format(cc) 87 | else: 88 | print " ", parser.traverse(lambda x, a: '(' + ' '.join(a) + ')', lambda x: "") 89 | --------------------------------------------------------------------------------