├── .gitignore ├── LICENSE ├── Makefile ├── README.md ├── examples ├── c11-ast.lua ├── json-ast.lua └── lua-ast.lua ├── inputs ├── fact.c ├── fact.lua └── sample.json ├── lpegrex.lua ├── parsers ├── c11.lua ├── csv.lua ├── json.lua └── lua.lua ├── rockspecs └── lpegrex-0.2.2-1.rockspec └── tests ├── c11-test.lua ├── csv-test.lua ├── json-test.lua ├── lester.lua ├── lpegrex-test.lua └── test.lua /.gitignore: -------------------------------------------------------------------------------- 1 | *.out 2 | *.rock 3 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | The MIT License (MIT) 2 | 3 | Copyright (c) 2021 Eduardo Bart 4 | Copyright (c) 2014-2020 Sérgio Medeiros 5 | Copyright (c) 2007-2019 Lua.org, PUC-Rio. 6 | 7 | Permission is hereby granted, free of charge, to any person obtaining a copy 8 | of this software and associated documentation files (the "Software"), to deal 9 | in the Software without restriction, including without limitation the rights 10 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 11 | copies of the Software, and to permit persons to whom the Software is 12 | furnished to do so, subject to the following conditions: 13 | 14 | The above copyright notice and this permission notice shall be included in all 15 | copies or substantial portions of the Software. 16 | 17 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 18 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 19 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 20 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 21 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 22 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 23 | SOFTWARE. 24 | -------------------------------------------------------------------------------- /Makefile: -------------------------------------------------------------------------------- 1 | LUA=lua 2 | 3 | test: 4 | $(LUA) tests/test.lua 5 | $(LUA) examples/json-ast.lua inputs/sample.json 6 | $(LUA) examples/lua-ast.lua inputs/fact.lua 7 | $(LUA) examples/c11-ast.lua inputs/fact.c 8 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # LPegRex 2 | 3 | LPegRex is a re-implementation of 4 | [LPeg](http://www.inf.puc-rio.br/~roberto/lpeg/)/ 5 | [LPegLabel](https://github.com/sqmedeiros/lpeglabel) 6 | `re` module with some extensions to make 7 | easy to parse language grammars into an AST (abstract syntax tree) 8 | while maintaining readability. 9 | 10 | LPegRex stands for *LPeg Regular Expression eXtended*. 11 | 12 | ## Goals 13 | 14 | The goal of this library is to extend the LPeg 15 | [re module](http://www.inf.puc-rio.br/~roberto/lpeg/re.html) 16 | with some minor additions to make easy parsing a whole 17 | programming language grammar to an abstract syntax tree 18 | using a single, simple, compact and clear PEG grammar. 19 | 20 | For instance is in the goal of the project to parse Lua 5.4 source 21 | files with complete syntax into an abstract syntax tree under 100 lines 22 | of clear PEG grammar rules while generating an output suitable to be used analyzed by a compiler. 23 | **This goal was accomplished, see the Lua example section below.** 24 | 25 | The new extensions should not break any existing `re` syntax. 26 | 27 | This project will be later incorporated 28 | in the [Nelua](https://github.com/edubart/nelua-lang) 29 | programming language compiler. 30 | **This goal was accomplished, and LPegRex is the new parsing engine 31 | for the Nelua compiler.** 32 | 33 | ## Additional Features 34 | 35 | * New predefined patterns for control characters (`%ca` `%cb` `%ct` `%cn` `%cv` `%cf` `%cr`). 36 | * New predefined patterns for utf8 (`%utf8` `%utf8seq` `%ascii`). 37 | * New predefined pattern for spaces independent of locale (`%sp`). 38 | * New syntax for capturing arbitrary values while matching empty strings (e.g. `$true`). 39 | * New syntax for optional captures (e.g `patt~?`). 40 | * New syntax for throwing labels errors on failure of expected matches (e.g. `@rule`). 41 | * New syntax for rules that capture AST Nodes (e.g. `NodeName <== patt`). 42 | * New syntax for rules that capture tables (e.g. `MyList <-| patt`). 43 | * New syntax for matching unique tokens with automatic skipping (e.g. `` `,` ``). 44 | * New syntax for matching unique keywords with automatic skipping (e.g. `` `for` ``). 45 | * Auto generate `KEYWORD` rule based on used keywords in the grammar. 46 | * Auto generate `TOKEN` rule based on used tokens in the grammar. 47 | * Use supplied `NAME_SUFFIX` rule for generating each keyword rule. 48 | * Use supplied `SKIP` rule for generating each keyword or token rule. 49 | * Capture nodes with initial and final positions. 50 | * Support using `-` character in rule names. 51 | * Pre define some useful auxiliary functions: 52 | * `tonil` Substitute captures by `nil`. 53 | * `totrue` Substitute captures by `true`. 54 | * `tofalse` Substitute captures by `false`. 55 | * `toemptytable` Substitute captures by `{}`. 56 | * `tonumber` Substitute a string capture by its corresponding number. 57 | * `tochar` Substitute a numeric code capture by its corresponding character byte. 58 | * `toutf8char` Substitute a numeric code capture by its corresponding UTF-8 byte sequence. 59 | * `foldleft` Fold tables to the left (use only with `~>`). 60 | * `foldright` Fold tables to the right (use only with `->`). 61 | * `rfoldleft` Fold tables to the left in reverse order (use only with `->`). 62 | * `rfoldright` Fold tables to the right in reverse order (use only with `~>`) 63 | 64 | ## Quick References 65 | 66 | For reference on how to use `re` and its syntax, 67 | please check [its manual](http://www.inf.puc-rio.br/~roberto/lpeg/re.html) first. 68 | 69 | Here is a quick reference of the new syntax additions: 70 | 71 | | Purpose | Example Syntax | Equivalent Re Syntax | 72 | |-|-|-| 73 | | Rule | `name <-- patt` | `name <- patt` | 74 | | Capture node rule | `Node <== patt` | `Node <- {\| {:pos:{}:} {:tag:''->'Node':} patt {:endpos:{}:} \|}` | 75 | | Capture tagged node rule | `name : Node <== patt` | `name <- {\| {:pos:{}:} {:tag:''->'Node':} patt {:endpos:{}:} \|}` | 76 | | Capture table rule | `name <-\| patt` | `name <- {\| patt \|}` | 77 | | Match keyword | `` `keyword` `` | `'keyword' !NAME_SUFFIX SKIP` | 78 | | Match token | `` `.` `..` `` | `!('..' SKIP) '.' SKIP '..' SKIP` | 79 | | Capture token or keyword | `` {`,`} `` | `{','} SKIP` | 80 | | Optional capture | `` patt~? `` | `patt / ''->tofalse` | 81 | | Match control character | `%cn` | `%nl` | 82 | | Arbitrary capture | `$'string'` | `''->'string'` | 83 | | Expected match | `@'string' @rule` | `'string'^Expected_string rule^Expected_rule` | 84 | 85 | As you can notice the additional syntax is mostly sugar 86 | for common capture patterns that are used when defining programming language grammars. 87 | 88 | ## Folding auxiliary functions 89 | 90 | Often we need to reduce a list of captured AST nodes into a single captured AST node 91 | (e.g. when reducing a call chain), 92 | here we call this operation folding. 93 | The following table demonstrates the four ways to fold a list of nodes: 94 | 95 | | Purpose | Example Input | Corresponding Output | Syntax | 96 | |-|-|-|-| 97 | | Fold tables to the left | `{1}, {2}, {3}` | `{{{1}, 2}, 3}` | `patt ~> foldleft` | 98 | | Fold tables to the right | `{1}, {2}, {3}` | `{1, {2, {3}}}}` | `patt -> foldright` | 99 | | Fold tables to the left in reverse order | `{1}, {2}, {3}` | `{{{3}, 2}, 1}` | `patt -> rfoldleft` | 100 | | Fold tables to the right in reverse order | `{1}, {2}, {3}` | `{3, {2, {1}}` | `patt ~> rfoldright` | 101 | 102 | Where the pattern `patt` captures a list of tables with a least one capture. 103 | Note that depending on the fold operation you must use its correct arrow (`->` or `~>`). 104 | 105 | ## Capture auxiliary syntax 106 | 107 | Sometimes is useful to match empty strings and capture some arbitrary values, 108 | the following tables show auxiliary syntax to help on that: 109 | 110 | | Syntax | Captured Lua Value | 111 | |-|-| 112 | | `$nil` | `nil` | 113 | | `$true` | `true` | 114 | | `$false` | `false` | 115 | | `$name` | `defs[name]` | 116 | | `${}` | `{}` | 117 | | `$16` | `16` | 118 | | `$'string'` | `"string"` | 119 | | `p~?` | `p` captures if it matches, otherwise `false` | 120 | 121 | ## Capture auxiliary functions 122 | 123 | Sometimes is useful to substitute a list of captures by a lua value, 124 | the following tables show auxiliary functions to help on that: 125 | 126 | | Purpose | Syntax | Captured Value | 127 | |-|-|-| 128 | | Substitute captures by `nil` | `p -> tonil` | `nil` | 129 | | Substitute captures by `false` | `p -> tofalse` | `false` | 130 | | Substitute captures by `true` | `p -> totrue` | `true` | 131 | | Substitute captures by `{}` | `p -> toemptytable` | `{}` | 132 | | Substitute a capture by a number | `p -> tonumber` | Corresponding number of the captured | 133 | | Substitute a capture by a character byte | `p -> tochar` | Corresponding byte of the captured number | 134 | | Substitute a capture by UTF-8 byte sequence | `p -> toutf8char` | Corresponding UTF-8 bytes of the captured number | 135 | 136 | ## Captured node fields 137 | 138 | By default when capturing a node with `<==` syntax, LPegRex will set the following 3 fields: 139 | 140 | * `tag` Name of the node (its type) 141 | * `pos` Initial position of the node match 142 | * `endpos` Final position of the node match (usually includes following SKIP) 143 | 144 | The user can customize and change these field names or disable them by 145 | setting it's corresponding name in the `defs.__options` table when compiling the grammar, 146 | for example: 147 | 148 | ```lua 149 | local mypatt = rex.compile(mygrammar, {__options = { 150 | tag = 'name', -- 'tag' field rename to 'name' 151 | pos = 'init', -- 'pos' field renamed to 'init' 152 | endpos = false, -- don't capture node final position 153 | }}) 154 | ``` 155 | 156 | The fields `pos` and `endpos` are useful to generate error messages with precise location 157 | when analyzing the AST and the `tag` field is used to distinguish the node type. 158 | 159 | ## Captured node action 160 | 161 | In case `defs.__options.tag` is a function, then it's called and the user will be responsible for 162 | setting the tag field and return the node, this flexibility exists in case 163 | specific actions are required to be executed on node creation, for example: 164 | 165 | ```lua 166 | local mypatt = rex.compile(mygrammar, {__options = { 167 | tag = function(tag, node) 168 | print('new node', tag) 169 | node.tag = tag 170 | return node 171 | end 172 | }}) 173 | ``` 174 | 175 | Note that when this function is called the node children may be incomplete 176 | in case the node is being folded. 177 | 178 | ## Matching keywords and tokens 179 | 180 | When using the back tick syntax (e.g. `` `something` ``), 181 | LPegRex will register its contents as a **keyword** in case it begins with a letter (or `_`), 182 | or as **token** in case it contains only punctuation characters (except `_`). 183 | 184 | Both keywords and tokens always match the `SKIP` rule immediately to 185 | skip spaces, thus the rule `SKIP` must always be defined when using the back tick syntax. 186 | 187 | Tokens matches are always unique in case of common characters, that is, 188 | in case both `.` and `..` tokens are defined, the rule `` `.` `` will match 189 | `.` but not `..`. 190 | 191 | In case a **token** is found, the rule `TOKEN` will be automatically generated, 192 | this rule will match any token plus `SKIP`. 193 | 194 | In case a **keyword** is found, 195 | the rule `NAME_SUFFIX` also need to be defined, it's used 196 | to differentiate keywords from identifier names. 197 | 198 | In most cases the user will need define something like: 199 | 200 | ``` 201 | NAME_SUFFIX <- [_%w]+ 202 | SKIP <- %s+ 203 | ``` 204 | 205 | You may want to edit the `SKIP` rule to consider comments if you grammar supports them. 206 | Token and keywords will not capture `SKIP` rule when using the syntax ``{`keyword`}``. 207 | 208 | ## Capturing identifier names 209 | 210 | Often we need to create a rule that capture identifier names while ignoring grammar keywords, let call this rule `NAME`. 211 | To assist doing this the `KEYWORD` rule is automatically generated based on all defined keywords in 212 | the grammar, the user can then use it to define the `NAME` rule, in most cases something like: 213 | 214 | ``` 215 | NAME <-- !KEYWORD {NAME_PREFIX NAME_SUFFIX?} SKIP 216 | NAME_PREFIX <-- [_%a] 217 | NAME_SUFFIX <-- [_%w]+ 218 | SKIP <- %s+ 219 | ``` 220 | 221 | ## Handling syntax errors 222 | 223 | Any rule name, keyword, token or string pattern can be preceded by the token `@`, 224 | marking it as an expected match, in case the match is not fulfilled an error 225 | label will be thrown using the name `Expected_name`, where `name` is the 226 | token, keyword or rule name. 227 | 228 | Once an error label is found, the user can generate pretty syntax error 229 | messages using the function `lpegrex.calcline` to gather line information, 230 | for example: 231 | 232 | ```lua 233 | local patt = lpegrex.compile(PEG) 234 | local ast, errlabel, errpos = patt:match(source) 235 | if not ast then 236 | local lineno, colno, line = lpegrex.calcline(source, errpos) 237 | local colhelp = string.rep(' ', colno-1)..'^' 238 | error('syntax error: '..filename..':'..lineno..':'..colno..': '..errlabel.. 239 | '\n'..line..'\n'..colhelp) 240 | end 241 | ``` 242 | 243 | ## Usage Example 244 | 245 | Here is a small example parsing JSON into an AST in 12 lines of PEG rules: 246 | 247 | ```lua 248 | local lpegrex = require 'lpegrex' 249 | 250 | local patt = lpegrex.compile([[ 251 | Json <-- SKIP (Object / Array) (!.)^UnexpectedSyntax 252 | Object <== `{` (Member (`,` @Member)*)? @`}` 253 | Array <== `[` (Value (`,` @Value)*)? @`]` 254 | Member <== String `:` @Value 255 | Value <-- String / Number / Object / Array / Boolean / Null 256 | String <-- '"' {~ ('\' -> '' @ESCAPE / !'"' .)* ~} @'"' SKIP 257 | Number <-- {[+-]? (%d+ '.'? %d+? / '.' %d+) ([eE] [+-]? %d+)?} -> tonumber SKIP 258 | Boolean <-- `false` -> tofalse / `true` -> totrue 259 | Null <-- `null` -> tonil 260 | ESCAPE <-- [\/"] / ('b' $8 / 't' $9 / 'n' $10 / 'f' $12 / 'r' $13 / 'u' {%x^4} $16) -> tochar 261 | SKIP <-- %s* 262 | NAME_SUFFIX <-- [_%w]+ 263 | ]]) 264 | 265 | local source = '[{"string":"some\\ntext", "boolean":true, "number":-1.5e+2, "null":null}]' 266 | 267 | local ast, errlabel, errpos = patt:match(source) 268 | if not ast then 269 | local lineno, colno, line = lpegrex.calcline(source, errpos) 270 | local colhelp = string.rep(' ', colno-1)..'^' 271 | error('syntax error: '..lineno..':'..colno..': '..errlabel.. 272 | '\n'..line..'\n'..colhelp) 273 | end 274 | -- `ast` should be a table with the JSON 275 | print('JSON parsed with success!') 276 | ``` 277 | 278 | The above should parse into the following equivalent AST table: 279 | ```lua 280 | local ast = { tag = "Array", pos = 1, endpos = 73, 281 | { tag = "Object", pos = 2, endpos = 72, 282 | { tag = "Member", pos = 3, endpos = 24, 283 | "string","some\ntext" }, 284 | { tag = "Member", pos = 26, endpos = 40, 285 | "boolean", true }, 286 | { tag = "Member", pos = 42, endpos = 58, 287 | "number", -150.0 }, 288 | { tag = "Member", pos = 60, endpos = 71, 289 | "null", nil } 290 | } 291 | } 292 | ``` 293 | 294 | A JSON parser similar to this example can be found in 295 | [parsers/json.lua](https://github.com/edubart/lpegrex/blob/main/parsers/json.lua). 296 | 297 | ## Debugging rule entry and exit 298 | 299 | When prototyping complex grammars you may want to debug the rules that 300 | the parser is trying to match and the ones that were successfully matched. 301 | You can enable LPegRex debug mode for this 302 | by setting `lpegrex.debug = true` globally. 303 | When debug is enabled all compiled grammars will be compiled in debug mode. 304 | 305 | When debugging is enabled every attempt to match a rule will print 306 | `ENTER (:)` to `io.stderr`, 307 | and every rule successfully matched will print 308 | `LEAVE (:)` to `io.stderr`. 309 | Notice that rules failing to match will not print `LEAVE`. 310 | 311 | The following is an example of parsing `{"string":` JSON chunk 312 | using the JSON parser shown above with debugging enabled: 313 | 314 | ``` 315 | ENTER Json (1:1) 316 | ENTER SKIP (1:1) 317 | LEAVE SKIP (1:1) 318 | ENTER Object (1:1) 319 | ENTER { (1:1) 320 | ENTER Array (1:1) 321 | ENTER [ (1:1) 322 | ENTER SKIP (1:2) 323 | LEAVE SKIP (1:2) 324 | LEAVE [ (1:2) 325 | ENTER Value (1:2) 326 | ENTER String (1:2) 327 | ENTER Number (1:2) 328 | ENTER Object (1:2) 329 | ENTER { (1:2) 330 | ENTER SKIP (1:3) 331 | LEAVE SKIP (1:3) 332 | LEAVE { (1:3) 333 | ENTER Member (1:3) 334 | ENTER String (1:3) 335 | ENTER SKIP (1:11) 336 | LEAVE SKIP (1:11) 337 | LEAVE String (1:11) 338 | ENTER : (1:11) 339 | ENTER SKIP (1:12) 340 | LEAVE SKIP (1:12) 341 | LEAVE : (1:12) 342 | ``` 343 | 344 | Notice `String` ENTER at `1:3` and LEAVE at `1:11`, 345 | this means that we have matched the rule `String` in that range. 346 | Notice `Number` ENTER at `1:2` while no LEAVE is shown for `Number`, 347 | this means that we attempted to match `Number` 348 | but it failed since no LEAVE was shown afterwards. 349 | 350 | ## Installing 351 | 352 | To use LPegRex you need [LPegLabel](https://github.com/sqmedeiros/lpeglabel) 353 | to be properly installed. 354 | If you have it already installed you can just copy the 355 | [lpegrex.lua](https://github.com/edubart/lpegrex/blob/main/lpegrex.lua) file. 356 | 357 | If you can also install it using the 358 | [LuaRocks](https://luarocks.org/) package manager, 359 | with the following command: 360 | 361 | ```shell 362 | luarocks install lpegrex 363 | ``` 364 | 365 | The library should work with Lua 5.x versions (and also LuaJIT). 366 | 367 | ## Complete Lua Example 368 | 369 | A Lua 5.4 parser is defined in 370 | [parsers/lua.lua](https://github.com/edubart/lpegrex/blob/main/parsers/lua.lua), 371 | it servers as a good example on how to define a full language grammar 372 | in a single PEG that generates an AST suitable to be analyzed by a compiler, 373 | while also handling source syntax errors. 374 | 375 | A Lua AST printer using it is available in 376 | [examples/lua.lua](https://github.com/edubart/lpegrex/blob/main/examples/lua-ast.lua) 377 | You can run it to parse any Lua file and print its AST. 378 | 379 | For example by doing `lua examples/lua-ast.lua inputs/fact.lua` you should 380 | get the following AST output: 381 | 382 | ``` 383 | Block 384 | | FuncDecl 385 | | | Id 386 | | | | "fact" 387 | | | - 388 | | | | Id 389 | | | | | "n" 390 | | | Block 391 | | | | If 392 | | | | | BinaryOp 393 | | | | | | Id 394 | | | | | | | "n" 395 | | | | | | "eq" 396 | | | | | | Number 397 | | | | | | | 0 398 | | | | | Block 399 | | | | | | Return 400 | | | | | | | - 401 | | | | | | | | Number 402 | | | | | | | | | 1 403 | | | | | Block 404 | | | | | | Return 405 | | | | | | | - 406 | | | | | | | | BinaryOp 407 | | | | | | | | | Id 408 | | | | | | | | | | "n" 409 | | | | | | | | | "mul" 410 | | | | | | | | | Call 411 | | | | | | | | | | - 412 | | | | | | | | | | | BinaryOp 413 | | | | | | | | | | | | Id 414 | | | | | | | | | | | | | "n" 415 | | | | | | | | | | | | "sub" 416 | | | | | | | | | | | | Number 417 | | | | | | | | | | | | | 1 418 | | | | | | | | | | Id 419 | | | | | | | | | | | "fact" 420 | | Call 421 | | | - 422 | | | | Call 423 | | | | | - 424 | | | | | | Number 425 | | | | | | | 10 426 | | | | | Id 427 | | | | | | "fact" 428 | | | Id 429 | | | | "print" 430 | ``` 431 | 432 | ## Complete C11 example 433 | 434 | A complete C11 parser has been implemented and is available in 435 | [parsers/c11.lua](https://github.com/edubart/lpegrex/blob/main/parsers/c11.lua), 436 | it's experimental but it was verified to parse hundreds of prepossessed C file sources. 437 | 438 | A C11 AST printer using it is available in 439 | [examples/c11-ast.lua](https://github.com/edubart/lpegrex/blob/main/examples/c11-ast.lua). 440 | 441 | Note that the C file must be preprocessed, you can generate a preprocessed C file 442 | with GCC/Clang or running `gcc -E file.c > file_preprocessed.c`. 443 | 444 | For example by doing `lua examples/c11-ast.lua inputs/fact.c` you should 445 | get the following AST output: 446 | 447 | ``` 448 | translation-unit 449 | | declaration 450 | | | type-declaration 451 | | | | declaration-specifiers 452 | | | | | storage-class-specifier 453 | | | | | | "extern" 454 | | | | | type-specifier 455 | | | | | | "int" 456 | | | | init-declarator-list 457 | | | | | init-declarator 458 | | | | | | declarator 459 | | | | | | | declarator-parameters 460 | | | | | | | | identifier 461 | | | | | | | | | "printf" 462 | | | | | | | | parameter-type-list 463 | | | | | | | | | parameter-declaration 464 | | | | | | | | | | declaration-specifiers 465 | | | | | | | | | | | type-qualifier 466 | | | | | | | | | | | | "const" 467 | | | | | | | | | | | type-specifier 468 | | | | | | | | | | | | "char" 469 | | | | | | | | | | declarator 470 | | | | | | | | | | | pointer 471 | | | | | | | | | | | | identifier 472 | | | | | | | | | | | | | "format" 473 | | | | | | | | | parameter-varargs 474 | | function-definition 475 | | | declaration-specifiers 476 | | | | storage-class-specifier 477 | | | | | "static" 478 | | | | type-specifier 479 | | | | | "int" 480 | | | declarator 481 | | | | declarator-parameters 482 | | | | | identifier 483 | | | | | | "fact" 484 | | | | | parameter-type-list 485 | | | | | | parameter-declaration 486 | | | | | | | declaration-specifiers 487 | | | | | | | | type-specifier 488 | | | | | | | | | "int" 489 | | | | | | | declarator 490 | | | | | | | | identifier 491 | | | | | | | | | "n" 492 | | | declaration-list 493 | | | compound-statement 494 | | | | if-statement 495 | | | | | expression 496 | | | | | | binary-op 497 | | | | | | | identifier 498 | | | | | | | | "n" 499 | | | | | | | "==" 500 | | | | | | | integer-constant 501 | | | | | | | | "0" 502 | | | | | return-statement 503 | | | | | | expression 504 | | | | | | | integer-constant 505 | | | | | | | | "1" 506 | | | | | return-statement 507 | | | | | | expression 508 | | | | | | | binary-op 509 | | | | | | | | identifier 510 | | | | | | | | | "n" 511 | | | | | | | | "*" 512 | | | | | | | | argument-expression 513 | | | | | | | | | argument-expression-list 514 | | | | | | | | | | binary-op 515 | | | | | | | | | | | identifier 516 | | | | | | | | | | | | "n" 517 | | | | | | | | | | | "-" 518 | | | | | | | | | | | integer-constant 519 | | | | | | | | | | | | "1" 520 | | | | | | | | | identifier 521 | | | | | | | | | | "fact" 522 | | function-definition 523 | | | declaration-specifiers 524 | | | | type-specifier 525 | | | | | "int" 526 | | | declarator 527 | | | | declarator-parameters 528 | | | | | identifier 529 | | | | | | "main" 530 | | | declaration-list 531 | | | compound-statement 532 | | | | expression-statement 533 | | | | | expression 534 | | | | | | argument-expression 535 | | | | | | | argument-expression-list 536 | | | | | | | | string-literal 537 | | | | | | | | | "%d\\n" 538 | | | | | | | | argument-expression 539 | | | | | | | | | argument-expression-list 540 | | | | | | | | | | integer-constant 541 | | | | | | | | | | | "10" 542 | | | | | | | | | identifier 543 | | | | | | | | | | "fact" 544 | | | | | | | identifier 545 | | | | | | | | "printf" 546 | | | | return-statement 547 | | | | | expression 548 | | | | | | integer-constant 549 | | | | | | | "0" 550 | ``` 551 | 552 | ## Successful use case 553 | 554 | LPegRex is successfully used as the parsing engine in the Nelua programming 555 | language compiler, you can see the complete syntax defined in a single 556 | PEG grammar [in this file](https://github.com/edubart/nelua-lang/blob/master/nelua/syntaxdefs.lua). 557 | 558 | ## Try it online 559 | 560 | You can test and prototype grammars with LPegRex live in the browser using the cool [lua-wasm-playground](https://mingodad.github.io/lua-wasm-playground/) tool created by [@mingodad](https://github.com/mingodad/lua-wasm-playground). There are C11 and Lua parsers as examples there. 561 | 562 | ## Tests 563 | 564 | Most LPeg/LPegLabel tests where migrated into `tests/lpegrex-test.lua` 565 | and new tests for the addition extensions were added. 566 | 567 | To run the tests just run `lua tests/test.lua`. 568 | 569 | ## License 570 | 571 | MIT, see LICENSE file. 572 | -------------------------------------------------------------------------------- /examples/c11-ast.lua: -------------------------------------------------------------------------------- 1 | local parse_c11 = require 'parsers.c11' 2 | local lpegrex = require 'lpegrex' 3 | 4 | -- Read input file contents 5 | local filename = arg[1] 6 | local file = io.open(filename) 7 | if not file then 8 | print('failed to open file: '..filename) 9 | os.exit(false) 10 | end 11 | local source = file:read('*a') 12 | file:close() 13 | 14 | -- Parse C11 source 15 | local ast = parse_c11(source, filename) 16 | 17 | -- Print AST 18 | print(lpegrex.prettyast(ast)) 19 | -------------------------------------------------------------------------------- /examples/json-ast.lua: -------------------------------------------------------------------------------- 1 | local parse_json = require 'parsers.json' 2 | local lpegrex = require 'lpegrex' 3 | 4 | -- Read input file contents 5 | local filename = arg[1] 6 | local file = io.open(filename) 7 | if not file then 8 | print('failed to open file: '..filename) 9 | os.exit(false) 10 | end 11 | local source = file:read('*a') 12 | file:close() 13 | 14 | -- Parse JSON source 15 | local ast = parse_json(source, filename) 16 | 17 | -- Print AST 18 | print(lpegrex.prettyast(ast)) 19 | -------------------------------------------------------------------------------- /examples/lua-ast.lua: -------------------------------------------------------------------------------- 1 | local parse_lua = require 'parsers.lua' 2 | local lpegrex = require 'lpegrex' 3 | 4 | -- Read input file contents 5 | local filename = arg[1] 6 | local file = io.open(filename) 7 | if not file then 8 | print('failed to open file: '..filename) 9 | os.exit(false) 10 | end 11 | local source = file:read('*a') 12 | file:close() 13 | 14 | -- Parse Lua source 15 | local ast = parse_lua(source, filename) 16 | 17 | -- Print AST 18 | print(lpegrex.prettyast(ast)) 19 | -------------------------------------------------------------------------------- /inputs/fact.c: -------------------------------------------------------------------------------- 1 | /* Used as an input example for the C11 parser. */ 2 | extern int printf(const char* format, ...); 3 | static int fact(int n) { 4 | if(n == 0) 5 | return 1; 6 | else 7 | return n * fact(n-1); 8 | } 9 | int main() { 10 | printf("%d\n", fact(10)); 11 | return 0; 12 | } 13 | -------------------------------------------------------------------------------- /inputs/fact.lua: -------------------------------------------------------------------------------- 1 | -- Used as an input example for the Lua parser. 2 | local function fact(n) 3 | if n == 0 then 4 | return 1 5 | else 6 | return n * fact(n-1) 7 | end 8 | end 9 | print(fact(10)) 10 | -------------------------------------------------------------------------------- /inputs/sample.json: -------------------------------------------------------------------------------- 1 | [{ 2 | "string": "some\ntext", 3 | "boolean": true, 4 | "number": -1.5e+2, 5 | "null": null 6 | }] 7 | -------------------------------------------------------------------------------- /lpegrex.lua: -------------------------------------------------------------------------------- 1 | --[[ 2 | LPegRex - LPeg Regular Expression eXtended 3 | v0.2.2 - 3/Jun/2021 4 | Eduardo Bart - edub4rt@gmail.com 5 | https://github.com/edubart/lpegrex 6 | 7 | Check the project page for documentation on how to use. 8 | 9 | See end of file for LICENSE. 10 | ]] 11 | 12 | -- LPegRex depends on LPegLabel. 13 | local lpeg = require 'lpeglabel' 14 | 15 | -- Increase LPEG max stack, because the default is too low to use with complex grammars. 16 | lpeg.setmaxstack(1024) 17 | 18 | -- The LPegRex module table. 19 | local lpegrex = {} 20 | 21 | -- Cache tables for `match`, `find` and `gsub`. 22 | local mcache, fcache, gcache 23 | 24 | -- Global LPegRex options. 25 | local defrexoptions = { 26 | tag = 'tag', 27 | pos = 'pos', 28 | endpos = 'endpos', 29 | SKIP = 'SKIP', 30 | NAME_SUFFIX = 'NAME_SUFFIX', 31 | } 32 | local rexoptions 33 | 34 | -- LPeGRex syntax errors. 35 | local ErrorInfo = { 36 | NoPatt = "no pattern found", 37 | ExtraChars = "unexpected characters after the pattern", 38 | 39 | ExpPatt1 = "expected a pattern after '/'", 40 | ExpPatt2 = "expected a pattern after '&'", 41 | ExpPatt3 = "expected a pattern after '!'", 42 | ExpPatt4 = "expected a pattern after '('", 43 | ExpPatt5 = "expected a pattern after ':'", 44 | ExpPatt6 = "expected a pattern after '{~'", 45 | ExpPatt7 = "expected a pattern after '{|'", 46 | ExpPatt8 = "expected a pattern after '<-'", 47 | 48 | ExpPattOrClose = "expected a pattern or closing '}' after '{'", 49 | 50 | ExpNumName = "expected a number, '+', '-' or a name (no space) after '^'", 51 | ExpCap = "expected a string, number, '{}' or name after '->'", 52 | 53 | ExpName1 = "expected the name of a rule after '=>'", 54 | ExpName2 = "expected the name of a rule after '=' (no space)", 55 | ExpName3 = "expected the name of a rule after '<' (no space)", 56 | ExpName4 = "expected a name, number or string rule after '$' (no space)", 57 | ExpName5 = "expected a name or string rule after '@' (no space)", 58 | 59 | ExpLab1 = "expected a label after '{'", 60 | 61 | ExpTokOrKey = "expected a keyword or token string after '`'", 62 | ExpNameOrLab = "expected a name or label after '%' (no space)", 63 | 64 | ExpItem = "expected at least one item after '[' or '^'", 65 | 66 | MisClose1 = "missing closing ')'", 67 | MisClose2 = "missing closing ':}'", 68 | MisClose3 = "missing closing '~}'", 69 | MisClose4 = "missing closing '|}'", 70 | MisClose5 = "missing closing '}'", -- for the captures 71 | MisClose6 = "missing closing '>'", 72 | MisClose7 = "missing closing '}'", -- for the labels 73 | MisClose8 = "missing closing ']'", 74 | 75 | MisTerm1 = "missing terminating single quote", 76 | MisTerm2 = "missing terminating double quote", 77 | MisTerm3 = "missing terminating backtick quote", 78 | } 79 | 80 | -- Localize some functions used in compiled PEGs. 81 | local char = string.char 82 | local utf8char = utf8 and utf8.char 83 | local select, tonumber = select, tonumber 84 | local insert = table.insert 85 | 86 | -- Pattern matching any character. 87 | local Any = lpeg.P(1) 88 | 89 | -- Predefined patterns. 90 | local Predef = { 91 | nl = lpeg.P"\n", -- new line 92 | ca = lpeg.P"\a", -- audible bell 93 | cb = lpeg.P"\b", -- back feed 94 | ct = lpeg.P"\t", -- horizontal tab 95 | cn = lpeg.P"\n", -- new line 96 | cv = lpeg.P"\v", -- vertical tab 97 | cf = lpeg.P"\f", -- form feed 98 | cr = lpeg.P"\r", -- carriage return 99 | sp = lpeg.S" \n\r\t\f\v", 100 | utf8 = lpeg.R("\0\x7F", "\xC2\xFD") * lpeg.R("\x80\xBF")^0, 101 | utf8seq = lpeg.R("\xC2\xFD") * lpeg.R("\x80\xBF")^0, 102 | ascii = lpeg.R("\0\x7F"), 103 | tonil = function() return nil end, 104 | totrue = function() return true end, 105 | tofalse = function() return false end, 106 | toemptytable = function() return {} end, 107 | tochar = function(s, base) return char(tonumber(s, base)) end, 108 | toutf8char = function(s, base) return utf8char(tonumber(s, base)) end, 109 | tonumber = tonumber, 110 | } 111 | 112 | -- Fold tables to the left (use only with `~>`). 113 | -- Example: ({1}, {2}, {3}) -> {{{1}, 2}, 3} 114 | function Predef.foldleft(lhs, rhs) 115 | insert(rhs, 1, lhs) 116 | return rhs 117 | end 118 | 119 | -- Fold tables to the right (use only with `->`). 120 | -- Example: ({1}, {2}, {3}) -> {1, {2, {3}}}} 121 | function Predef.foldright(first, ...) 122 | if ... then 123 | local lhs = first 124 | for i=1,select('#', ...) do 125 | local rhs = select(i, ...) 126 | lhs[#lhs+1] = rhs 127 | lhs = rhs 128 | end 129 | end 130 | return first 131 | end 132 | 133 | -- Fold tables to the left in reverse order (use only with `->`). 134 | -- Example: ({1}, {2}, {3}) -> {{{3}, 2}, 1} 135 | function Predef.rfoldleft(first, ...) 136 | if ... then 137 | local rhs = first 138 | for i=1,select('#', ...) do 139 | local lhs = select(i, ...) 140 | insert(rhs, 1, lhs) 141 | rhs = lhs 142 | end 143 | end 144 | return first 145 | end 146 | 147 | -- Fold tables to the right in reverse order (use only with `~>`) 148 | -- Example: ({1}, {2}, {3}) -> {3, {2, {1}} 149 | function Predef.rfoldright(lhs, rhs) 150 | rhs[#rhs+1] = lhs 151 | return rhs 152 | end 153 | 154 | -- Updates the pre-defined character classes to the current locale. 155 | function lpegrex.updatelocale() 156 | lpeg.locale(Predef) 157 | -- fill default pattern classes 158 | Predef.a = Predef.alpha 159 | Predef.c = Predef.cntrl 160 | Predef.d = Predef.digit 161 | Predef.g = Predef.graph 162 | Predef.l = Predef.lower 163 | Predef.p = Predef.punct 164 | Predef.s = Predef.space 165 | Predef.u = Predef.upper 166 | Predef.w = Predef.alnum 167 | Predef.x = Predef.xdigit 168 | Predef.A = Any - Predef.a 169 | Predef.C = Any - Predef.c 170 | Predef.D = Any - Predef.d 171 | Predef.G = Any - Predef.g 172 | Predef.L = Any - Predef.l 173 | Predef.P = Any - Predef.p 174 | Predef.S = Any - Predef.s 175 | Predef.U = Any - Predef.u 176 | Predef.W = Any - Predef.w 177 | Predef.X = Any - Predef.x 178 | -- clear the cache because the locale changed 179 | mcache, fcache, gcache = {}, {}, {} 180 | -- don't hold references in cached patterns 181 | local weakmt = {__mode = "v"} 182 | setmetatable(mcache, weakmt) 183 | setmetatable(fcache, weakmt) 184 | setmetatable(gcache, weakmt) 185 | end 186 | 187 | -- Fill predefined classes using the default locale. 188 | lpegrex.updatelocale() 189 | 190 | -- Create LPegRex syntax pattern. 191 | local function mkrex() 192 | local l = lpeg 193 | local lmt = getmetatable(Any) 194 | 195 | local function expect(pattern, label) 196 | return pattern + l.T(label) 197 | end 198 | 199 | local function mult(p, n) 200 | local np = l.P(true) 201 | while n >= 1 do 202 | if n % 2 >= 1 then 203 | np = np * p 204 | end 205 | p = p * p 206 | n = n / 2 207 | end 208 | return np 209 | end 210 | 211 | local function equalcap(s, i, c) 212 | local e = #c + i 213 | if s:sub(i, e - 1) == c then 214 | return e 215 | end 216 | end 217 | 218 | local function getuserdef(id, defs) 219 | local v = defs and defs[id] or Predef[id] 220 | if not v then 221 | error("name '" .. id .. "' undefined") 222 | end 223 | return v 224 | end 225 | 226 | local function getopt(id) 227 | if rexoptions and rexoptions[id] ~= nil then 228 | return rexoptions[id] 229 | end 230 | return defrexoptions[id] 231 | end 232 | 233 | -- current grammar being generated 234 | local G, Gkeywords, Gtokens 235 | 236 | local function begindef() 237 | G, Gkeywords, Gtokens = {}, {}, {} 238 | return G 239 | end 240 | 241 | local function enddef(t) 242 | -- generate TOKEN rule 243 | if Gtokens and #Gtokens > 0 then 244 | local TOKEN = Gtokens[Gtokens[1]] 245 | for i=2,#Gtokens do 246 | TOKEN = TOKEN + Gtokens[Gtokens[i]] 247 | end 248 | G.TOKEN = TOKEN 249 | end 250 | if lpegrex.debug then 251 | for k, patt in pairs(G) do 252 | if k ~= 1 then 253 | local enter = lpeg.Cmt(lpeg.P(true), function(s, p) 254 | local lineno, colno = lpegrex.calcline(s, p) 255 | io.stderr:write(string.format('ENTER %s (%d:%d)\n', k, lineno, colno)) 256 | return true 257 | end) 258 | local leave = lpeg.Cmt(lpeg.P(true), function(s, p) 259 | local lineno, colno = lpegrex.calcline(s, p) 260 | io.stderr:write(string.format('LEAVE %s (%d:%d)\n', k, lineno, colno)) 261 | return true 262 | end) 263 | G[k] = enter * patt * leave 264 | end 265 | end 266 | end 267 | -- cleanup grammar context 268 | G, Gkeywords, Gtokens = nil, nil, nil 269 | return l.P(t) 270 | end 271 | 272 | local function adddef(t, k, exp) 273 | if t[k] then 274 | error("'"..k.."' already defined as a rule") 275 | else 276 | t[k] = exp 277 | end 278 | return t 279 | end 280 | 281 | local function firstdef(t, n, r) 282 | t[1] = n 283 | return adddef(t, n, r) 284 | end 285 | 286 | local function NT(n, b) 287 | if not b then 288 | error("rule '"..n.."' used outside a grammar") 289 | end 290 | return l.V(n) 291 | end 292 | 293 | local S = (Predef.space + "--" * (Any - Predef.nl)^0)^0 294 | local NamePrefix = l.R("AZ", "az", "__") 295 | local WordSuffix = l.R("AZ", "az", "__", "09") 296 | local NameSuffix = (WordSuffix + (l.P"-" * #WordSuffix))^0 297 | local Name = l.C(NamePrefix * NameSuffix) 298 | local TokenDigit = Predef.punct - "_" 299 | local NodeArrow = S * "<==" 300 | local TableArrow = S * "<-|" 301 | local RuleArrow = S * (l.P"<--" + "<-") 302 | local Arrow = NodeArrow + TableArrow + RuleArrow 303 | local Num = l.C(l.R"09"^1) * S / tonumber 304 | local SignedNum = l.C(l.P"-"^-1 * l.R"09"^1) * S / tonumber 305 | local String = "'" * l.C((Any - "'")^0) * expect("'", "MisTerm1") 306 | + '"' * l.C((Any - '"')^0) * expect('"', "MisTerm2") 307 | local Token = "`" * l.C(TokenDigit * (TokenDigit - '`')^0) * expect("`", "MisTerm3") 308 | local Keyword = "`" * l.C(NamePrefix * (Any - "`")^0) * expect('`', "MisTerm3") 309 | local Range = l.Cs(Any * (l.P"-"/"") * (Any - "]")) / l.R 310 | local Defs = l.Carg(1) 311 | local NamedDef = Name * Defs -- a defined name only have meaning in a given environment 312 | local Defined = "%" * NamedDef / getuserdef 313 | local Item = (Defined + Range + l.C(Any)) / l.P 314 | local Class = 315 | "[" 316 | * (l.C(l.P"^"^-1)) -- optional complement symbol 317 | * l.Cf(expect(Item, "ExpItem") * (Item - "]")^0, lmt.__add) 318 | / function(c, p) return c == "^" and Any - p or p end 319 | * expect("]", "MisClose8") 320 | 321 | local function defwithfunc(f) 322 | return l.Cg(NamedDef / getuserdef * l.Cc(f)) 323 | end 324 | 325 | local function updatetokens(s) 326 | for _,toks in ipairs(Gtokens) do 327 | if toks ~= s then 328 | if toks:find(s, 1, true) == 1 then 329 | G[s] = -G[toks] * G[s] 330 | elseif s:find(toks, 1, true) == 1 then 331 | G[toks] = -G[s] * G[toks] 332 | end 333 | end 334 | end 335 | end 336 | 337 | local function maketoken(s, cap) 338 | local p = Gtokens[s] 339 | if not p then 340 | p = l.V(s) 341 | Gtokens[s] = p 342 | Gtokens[#Gtokens+1] = s 343 | G[s] = l.P(s) * l.V(getopt("SKIP")) 344 | updatetokens(s) 345 | end 346 | if cap then 347 | p = p * l.Cc(s) 348 | end 349 | return p 350 | end 351 | 352 | local function updatekeywords(kp) 353 | local p = G.KEYWORD 354 | if not p then 355 | p = kp 356 | else 357 | p = p + kp 358 | end 359 | G.KEYWORD = p 360 | end 361 | 362 | local function makekeyword(s, cap) 363 | local p = Gkeywords[s] 364 | if not p then 365 | p = l.P(s) * -l.V(getopt("NAME_SUFFIX")) * l.V(getopt("SKIP")) 366 | Gkeywords[s] = p 367 | updatekeywords(p) 368 | end 369 | if cap then 370 | p = p * l.Cc(s) 371 | end 372 | return p 373 | end 374 | 375 | local function makenode(n, tag, p) 376 | local tagfield, posfield, endposfield = getopt('tag'), getopt('pos'), getopt('endpos') 377 | local istagfunc = type(tagfield) == 'function' 378 | if tagfield and not istagfunc then 379 | p = l.Cg(l.Cc(tag), tagfield) * p 380 | end 381 | if posfield then 382 | p = l.Cg(l.Cp(), posfield) * p 383 | end 384 | if endposfield then 385 | p = p * l.Cg(l.Cp(), endposfield) 386 | end 387 | local rp = l.Ct(p) 388 | if istagfunc then 389 | rp = l.Cc(tag) * rp / tagfield 390 | end 391 | return n, rp 392 | end 393 | 394 | local exp = l.P{ "Exp", 395 | Exp = S * ( l.V"Grammar" 396 | + l.Cf(l.V"Seq" * (S * "/" * expect(S * l.V"Seq", "ExpPatt1"))^0, lmt.__add) ); 397 | Seq = l.Cf(l.Cc(l.P"") * l.V"Prefix" * (S * l.V"Prefix")^0, lmt.__mul); 398 | Prefix = "&" * expect(S * l.V"Prefix", "ExpPatt2") / lmt.__len 399 | + "!" * expect(S * l.V"Prefix", "ExpPatt3") / lmt.__unm 400 | + l.V"Suffix"; 401 | Suffix = l.Cf(l.V"Primary" * 402 | ( S * ( l.P"+" * l.Cc(1, lmt.__pow) 403 | + l.P"*" * l.Cc(0, lmt.__pow) 404 | + l.P"?" * l.Cc(-1, lmt.__pow) 405 | + l.P"~?" * l.Cc(l.Cc(false), lmt.__add) 406 | + "^" * expect( l.Cg(Num * l.Cc(mult)) 407 | + l.Cg(l.C(l.S"+-" * l.R"09"^1) * l.Cc(lmt.__pow) 408 | + Name * l.Cc"lab" 409 | ), 410 | "ExpNumName") 411 | + "->" * expect(S * ( l.Cg((String + Num) * l.Cc(lmt.__div)) 412 | + l.P"{}" * l.Cc(nil, l.Ct) 413 | + defwithfunc(lmt.__div) 414 | ), 415 | "ExpCap") 416 | + "=>" * expect(S * defwithfunc(l.Cmt), 417 | "ExpName1") 418 | + "~>" * S * defwithfunc(l.Cf) 419 | ) --* S 420 | )^0, function(a,b,f) if f == "lab" then return a + l.T(b) end return f(a,b) end ); 421 | Primary = "(" * expect(l.V"Exp", "ExpPatt4") * expect(S * ")", "MisClose1") 422 | + String / l.P 423 | + #l.P'`' * expect( 424 | Token / maketoken 425 | + Keyword / makekeyword 426 | , "ExpTokOrKey") 427 | + Class 428 | + Defined 429 | + "%" * expect(l.P"{", "ExpNameOrLab") 430 | * expect(S * l.V"Label", "ExpLab1") 431 | * expect(S * "}", "MisClose7") / l.T 432 | + "{:" * (Name * ":" + l.Cc(nil)) * expect(l.V"Exp", "ExpPatt5") 433 | * expect(S * ":}", "MisClose2") 434 | / function(n, p) return l.Cg(p, n) end 435 | + "=" * expect(Name, "ExpName2") 436 | / function(n) return l.Cmt(l.Cb(n), equalcap) end 437 | + l.P"{}" / l.Cp 438 | + l.P"$" * expect( 439 | l.P"nil" / function() return l.Cc(nil) end 440 | + l.P"false" / function() return l.Cc(false) end 441 | + l.P"true" / function() return l.Cc(true) end 442 | + l.P"{}" / function() return l.Cc({}) end 443 | + SignedNum / function(s) return l.Cc(tonumber(s)) end 444 | + String / function(s) return l.Cc(s) end 445 | + (NamedDef / getuserdef) / l.Cc, 446 | "ExpName4") 447 | + l.P"@" * expect( 448 | String / function(s) return l.P(s) + l.T('Expected_'..s) end 449 | + Token / function(s) 450 | return maketoken(s) + l.T('Expected_'..s) 451 | end 452 | + Keyword / function(s) 453 | return makekeyword(s) + l.T('Expected_'..s) 454 | end 455 | + Name * l.Cb("G") / function(n, b) 456 | return NT(n, b) + l.T('Expected_'..n) 457 | end, 458 | "ExpName5") 459 | + "{~" * expect(l.V"Exp", "ExpPatt6") * expect(S * "~}", "MisClose3") / l.Cs 460 | + "{|" * expect(l.V"Exp", "ExpPatt7") * expect(S * "|}", "MisClose4") / l.Ct 461 | + "{" * #l.P'`' * expect( 462 | Token * l.Cc(true) / maketoken 463 | + Keyword * l.Cc(true) / makekeyword 464 | , "ExpTokOrKey") * expect(S * "}", "MisClose5") 465 | + "{" * expect(l.V"Exp", "ExpPattOrClose") * expect(S * "}", "MisClose5") / l.C 466 | + l.P"." * l.Cc(Any) 467 | + (Name * -(Arrow + (S * ":" * S * Name * Arrow)) + "<" * expect(Name, "ExpName3") 468 | * expect(">", "MisClose6")) * l.Cb("G") / NT; 469 | Label = Num + Name; 470 | RuleDefinition = Name * RuleArrow * expect(l.V"Exp", "ExpPatt8"); 471 | TableDefinition = Name * TableArrow * expect(l.V"Exp", "ExpPatt8") / 472 | function(n, p) return n, l.Ct(p) end; 473 | NodeDefinition = Name * NodeArrow * expect(l.V"Exp", "ExpPatt8") / 474 | function(n, p) return makenode(n, n, p) end; 475 | TaggedNodeDefinition = Name * S * l.P":" * S * Name * NodeArrow * expect(l.V"Exp", "ExpPatt8") / makenode; 476 | Definition = l.V"TaggedNodeDefinition" + l.V"NodeDefinition" + l.V"TableDefinition" + l.V"RuleDefinition"; 477 | Grammar = l.Cg(l.Cc(true), "G") 478 | * l.Cf(l.P"" / begindef 479 | * (l.V"Definition") / firstdef 480 | * (S * (l.Cg(l.V"Definition")))^0, adddef) / enddef; 481 | } 482 | 483 | return S * l.Cg(l.Cc(false), "G") * expect(exp, "NoPatt") / l.P 484 | * S * expect(-Any, "ExtraChars") 485 | end 486 | 487 | 488 | local rexpatt = mkrex() 489 | 490 | --[[ 491 | Compiles the given `pattern` string and returns an equivalent LPeg pattern. 492 | 493 | The given string may define either an expression or a grammar. 494 | The optional `defs` table provides extra Lua values to be used by the pattern. 495 | The optional `options table can provide the following options for node captures: 496 | * `tag` name of the node tag field, if `false` it's omitted (default "tag"). 497 | * `pos` name of the node initial position field, if `false` it's omitted (default "pos"). 498 | * `endpos` name of the node final position field, if `false` it's omitted (default "endpos"). 499 | ]] 500 | function lpegrex.compile(pattern, defs) 501 | if lpeg.type(pattern) == 'pattern' then -- already compiled 502 | return pattern 503 | end 504 | rexoptions = defs and defs.__options 505 | local ok, cp, errlabel, errpos = pcall(function() 506 | return rexpatt:match(pattern, 1, defs) 507 | end) 508 | rexoptions = nil 509 | if not ok and cp then 510 | if type(cp) == "string" then 511 | cp = cp:gsub("^[^:]+:[^:]+: ", "") 512 | end 513 | error(cp, 3) 514 | end 515 | if not cp then 516 | local lineno, colno, line, linepos = lpegrex.calcline(pattern, errpos) 517 | local err = {"syntax error(s) in pattern\n"} 518 | table.insert(err, "L"..lineno..":C"..colno..": "..ErrorInfo[errlabel]) 519 | table.insert(err, line) 520 | table.insert(err, (" "):rep(colno-1)..'^') 521 | error(table.concat(err, "\n"), 3) 522 | end 523 | return cp 524 | end 525 | 526 | --[[ 527 | Matches the given `pattern` against the `subject` string. 528 | 529 | If the match succeeds, returns the index in the `subject` of the first character after the match, 530 | or the captured values (if the pattern captured any value). 531 | 532 | An optional numeric argument `init` makes the match start at that position in the subject string. 533 | ]] 534 | function lpegrex.match(subject, pattern, init) 535 | local cp = mcache[pattern] 536 | if not cp then 537 | cp = lpegrex.compile(pattern) 538 | mcache[pattern] = cp 539 | end 540 | return cp:match(subject, init or 1) 541 | end 542 | 543 | --[[ 544 | Searches the given `pattern` in the given `subject`. 545 | 546 | If it finds a match, returns the index where this occurrence starts and the index where it ends. 547 | Otherwise, returns nil. 548 | 549 | An optional numeric argument `init` makes the search starts at that position in the `subject` string. 550 | ]] 551 | function lpegrex.find(subject, pattern, init) 552 | local cp = fcache[pattern] 553 | if not cp then 554 | cp = lpegrex.compile(pattern) 555 | cp = cp / 0 556 | cp = lpeg.P{lpeg.Cp() * cp * lpeg.Cp() + 1 * lpeg.V(1)} 557 | fcache[pattern] = cp 558 | end 559 | local i, e = cp:match(subject, init or 1) 560 | if i then 561 | return i, e - 1 562 | else 563 | return i 564 | end 565 | end 566 | 567 | --[[ 568 | Does a global substitution, 569 | replacing all occurrences of `pattern` in the given `subject` by `replacement`. 570 | ]] 571 | function lpegrex.gsub(subject, pattern, replacement) 572 | local cache = gcache[pattern] or {} 573 | gcache[pattern] = cache 574 | local cp = cache[replacement] 575 | if not cp then 576 | cp = lpegrex.compile(pattern) 577 | cp = lpeg.Cs((cp / replacement + 1)^0) 578 | cache[replacement] = cp 579 | end 580 | return cp:match(subject) 581 | end 582 | 583 | local calclinepatt = lpeg.Ct(((Any - Predef.nl)^0 * lpeg.Cp() * Predef.nl)^0) 584 | 585 | --[[ 586 | Extract line information from `position` in `subject`. 587 | Returns line number, column number, line content, line start position and line end position. 588 | ]] 589 | function lpegrex.calcline(subject, position) 590 | if position < 0 then error 'invalid position' end 591 | local sublen = #subject 592 | if position > sublen then position = sublen end 593 | local caps = calclinepatt:match(subject:sub(1,position)) 594 | local ncaps = #caps 595 | local lineno = ncaps + 1 596 | local lastpos = caps[ncaps] or 0 597 | local linestart = lastpos + 1 598 | local colno = position - lastpos 599 | local lineend = subject:find("\n", position+1, true) 600 | lineend = lineend and lineend-1 or #subject 601 | local line = subject:sub(linestart, lineend) 602 | return lineno, colno, line, linestart, lineend 603 | end 604 | 605 | -- Auxiliary function for `prettyast` 606 | local function ast2string(node, indent, ss) 607 | if node.tag then 608 | ss[#ss+1] = indent..node.tag 609 | else 610 | ss[#ss+1] = indent..'-' 611 | end 612 | indent = indent..'| ' 613 | for i=1,#node do 614 | local child = node[i] 615 | local ty = type(child) 616 | if ty == 'table' then 617 | ast2string(child, indent, ss) 618 | elseif ty == 'string' then 619 | local escaped = child 620 | :gsub([[\]], [[\\]]) 621 | :gsub([["]], [[\"]]) 622 | :gsub('\n', '\\n') 623 | :gsub('\t', '\\t') 624 | :gsub('\r', '\\r') 625 | :gsub('[^ %w%p]', function(s) 626 | return string.format('\\x%02x', string.byte(s)) 627 | end) 628 | ss[#ss+1] = indent..'"'..escaped..'"' 629 | else 630 | ss[#ss+1] = indent..tostring(child) 631 | end 632 | end 633 | end 634 | 635 | -- Convert an AST into a human readable string. 636 | function lpegrex.prettyast(node) 637 | local ss = {} 638 | ast2string(node, '', ss) 639 | return table.concat(ss, '\n') 640 | end 641 | 642 | return lpegrex 643 | 644 | --[[ 645 | The MIT License (MIT) 646 | 647 | Copyright (c) 2021 Eduardo Bart 648 | Copyright (c) 2014-2020 Sérgio Medeiros 649 | Copyright (c) 2007-2019 Lua.org, PUC-Rio. 650 | 651 | Permission is hereby granted, free of charge, to any person obtaining a copy 652 | of this software and associated documentation files (the "Software"), to deal 653 | in the Software without restriction, including without limitation the rights 654 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 655 | copies of the Software, and to permit persons to whom the Software is 656 | furnished to do so, subject to the following conditions: 657 | 658 | The above copyright notice and this permission notice shall be included in all 659 | copies or substantial portions of the Software. 660 | 661 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 662 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 663 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 664 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 665 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 666 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 667 | SOFTWARE. 668 | ]] 669 | -------------------------------------------------------------------------------- /parsers/c11.lua: -------------------------------------------------------------------------------- 1 | --[[ 2 | This grammar is based on the C11 specification. 3 | As seen in https://port70.net/~nsz/c/c11/n1570.html#A.1 4 | Support for parsing some new C2x syntax were also added. 5 | Support for some extensions to use with GCC/Clang were also added. 6 | ]] 7 | local Grammar = [==[ 8 | chunk <- SHEBANG? SKIP translation-unit (!.)^UnexpectedSyntax 9 | 10 | SHEBANG <-- '#!' (!LINEBREAK .)* LINEBREAK? 11 | 12 | COMMENT <-- LONG_COMMENT / SHRT_COMMENT 13 | LONG_COMMENT <-- '/*' (!'*/' .)* '*/' 14 | SHRT_COMMENT <-- '//' (!LINEBREAK .)* LINEBREAK? 15 | DIRECTIVE <-- '#' ('\' LINEBREAK / !LINEBREAK .)* 16 | 17 | SKIP <-- (%s+ / COMMENT / DIRECTIVE / `__extension__`)* 18 | LINEBREAK <-- %nl %cr / %cr %nl / %nl / %cr 19 | 20 | NAME_SUFFIX <-- identifier-suffix 21 | 22 | -------------------------------------------------------------------------------- 23 | -- Identifiers 24 | 25 | identifier <== identifier-word 26 | 27 | identifier-word <-- 28 | !KEYWORD identifier-anyword 29 | 30 | identifier-anyword <-- 31 | {identifier-nondigit identifier-suffix?} SKIP 32 | 33 | free-identifier:identifier <== 34 | identifier-word 35 | 36 | identifier-suffix <- (identifier-nondigit / digit)+ 37 | identifier-nondigit <- [a-zA-Z_] / universal-character-name 38 | 39 | digit <- [0-9] 40 | 41 | -------------------------------------------------------------------------------- 42 | -- Universal character names 43 | 44 | universal-character-name <-- 45 | '\u' hex-quad / 46 | '\U' hex-quad^2 47 | hex-quad <-- hexadecimal-digit^4 48 | 49 | -------------------------------------------------------------------------------- 50 | -- Constants 51 | 52 | constant <-- ( 53 | floating-constant / 54 | integer-constant / 55 | enumeration-constant / 56 | character-constant 57 | ) SKIP 58 | 59 | integer-constant <== 60 | {octal-constant integer-suffix?} / 61 | {hexadecimal-constant integer-suffix?} / 62 | {decimal-constant integer-suffix?} 63 | 64 | decimal-constant <-- digit+ 65 | octal-constant <-- '0' octal-digit+ 66 | hexadecimal-constant <-- hexadecimal-prefix hexadecimal-digit+ 67 | hexadecimal-prefix <-- '0' [xX] 68 | octal-digit <-- [0-7] 69 | hexadecimal-digit <-- [0-9a-fA-F] 70 | 71 | integer-suffix <-- 72 | unsigned-suffix (long-suffix long-suffix?)? / 73 | (long-suffix long-suffix?) unsigned-suffix? 74 | 75 | unsigned-suffix <-- [uU] 76 | long-suffix <-- [lL] 77 | 78 | floating-constant <== 79 | {decimal-floating-constant} / 80 | {hexadecimal-floating-constant} 81 | 82 | decimal-floating-constant <-- 83 | ( 84 | fractional-constant exponent-part? / 85 | digit-sequence exponent-part 86 | ) floating-suffix? 87 | 88 | hexadecimal-floating-constant <-- 89 | hexadecimal-prefix 90 | (hexadecimal-fractional-constant / hexadecimal-digit-sequence) 91 | binary-exponent-part floating-suffix? 92 | 93 | fractional-constant <-- 94 | digit-sequence? '.' digit-sequence / 95 | digit-sequence '.' 96 | 97 | exponent-part <--[eE] sign? digit-sequence 98 | sign <-- [+-] 99 | digit-sequence <-- digit+ 100 | 101 | hexadecimal-fractional-constant <-- 102 | hexadecimal-digit-sequence? '.' hexadecimal-digit-sequence / 103 | hexadecimal-digit-sequence '.' 104 | 105 | binary-exponent-part <-- [pP] sign? digit-sequence 106 | hexadecimal-digit-sequence <-- hexadecimal-digit+ 107 | floating-suffix <-- [fFlLqQ] 108 | 109 | enumeration-constant <-- 110 | identifier 111 | 112 | character-constant <== 113 | [LUu]? "'" {~c-char-sequence~} "'" 114 | 115 | c-char-sequence <-- c-char+ 116 | c-char <-- 117 | [^'\%cn%cr] / 118 | escape-sequence 119 | 120 | escape-sequence <-- 121 | simple-escape-sequence / 122 | octal-escape-sequence / 123 | hexadecimal-escape-sequence / 124 | universal-character-name 125 | 126 | simple-escape-sequence <-- 127 | "\"->'' simple-escape-sequence-suffix 128 | 129 | simple-escape-sequence-suffix <- 130 | [\'"?] / 131 | ("a" $7 / "b" $8 / "f" $12 / "n" $10 / "r" $13 / "t" $9 / "v" $11) ->tochar / 132 | (LINEBREAK $10)->tochar 133 | 134 | octal-escape-sequence <-- ('\' {octal-digit octal-digit^-2} $8)->tochar 135 | hexadecimal-escape-sequence <-- ('\x' {hexadecimal-digit+} $16)->tochar 136 | 137 | -------------------------------------------------------------------------------- 138 | -- String literals 139 | 140 | string-literal <== 141 | encoding-prefix? string-suffix+ 142 | string-suffix <-- '"' {~s-char-sequence?~} '"' SKIP 143 | encoding-prefix <-- 'u8' / [uUL] 144 | s-char-sequence <-- s-char+ 145 | s-char <- [^"\%cn%cr] / escape-sequence 146 | 147 | -------------------------------------------------------------------------------- 148 | -- Expressions 149 | 150 | primary-expression <-- 151 | string-literal / 152 | type-name / 153 | identifier / 154 | constant / 155 | statement-expression / 156 | `(` expression `)` / 157 | generic-selection 158 | 159 | statement-expression <== 160 | '({'SKIP (label-statement / declaration / statement)* '})'SKIP 161 | 162 | generic-selection <== 163 | `_Generic` @`(` @assignment-expression @`,` @generic-assoc-list @`)` 164 | 165 | generic-assoc-list <== 166 | generic-association (`,` @generic-association)* 167 | 168 | generic-association <== 169 | type-name `:` @assignment-expression / 170 | {`default`} `:` @assignment-expression 171 | 172 | postfix-expression <-- 173 | (postfix-expression-prefix postfix-expression-suffix*) ~> rfoldright 174 | 175 | postfix-expression-prefix <-- 176 | type-initializer / 177 | primary-expression 178 | 179 | type-initializer <== 180 | `(` type-name `)` `{` initializer-list? `,`? `}` 181 | 182 | postfix-expression-suffix <-- 183 | array-subscript / 184 | argument-expression / 185 | struct-or-union-member / 186 | pointer-member / 187 | post-increment / 188 | post-decrement 189 | 190 | array-subscript <== `[` expression `]` 191 | argument-expression <== `(` argument-expression-list `)` 192 | struct-or-union-member <== `.` identifier-word 193 | pointer-member <== `->` identifier-word 194 | post-increment <== `++` 195 | post-decrement <== `--` 196 | 197 | argument-expression-list <== 198 | (assignment-expression (`,` assignment-expression)*)? 199 | 200 | unary-expression <-- 201 | unary-op / 202 | postfix-expression 203 | unary-op <== 204 | ({`++`} / {`--`}) @unary-expression / 205 | ({`sizeof`}) unary-expression / 206 | ({`&`} / {`+`} / {`-`} / {`~`} / {`!`}) @cast-expression / 207 | {`*`} cast-expression / 208 | ({`sizeof`} / {`_Alignof`}) `(` type-name `)` 209 | 210 | cast-expression <-- 211 | op-cast / 212 | unary-expression 213 | op-cast:binary-op <== 214 | `(` type-name `)` $'cast' cast-expression 215 | 216 | multiplicative-expression <-- 217 | (cast-expression op-multiplicative*) ~> foldleft 218 | op-multiplicative:binary-op <== 219 | ({`/`} / {`%`}) @cast-expression / 220 | {`*`} cast-expression 221 | 222 | additive-expression <-- 223 | (multiplicative-expression op-additive*) ~> foldleft 224 | op-additive:binary-op <== 225 | ({`+`} / {`-`}) @multiplicative-expression 226 | 227 | shift-expression <-- 228 | (additive-expression op-shift*) ~> foldleft 229 | op-shift:binary-op <== 230 | ({`<<`} / {`>>`}) @additive-expression 231 | 232 | relational-expression <-- 233 | (shift-expression op-relational*) ~> foldleft 234 | op-relational:binary-op <== 235 | ({`<=`} / {`>=`} / {`<`} / {`>`}) @shift-expression 236 | 237 | equality-expression <-- 238 | (relational-expression op-equality*) ~> foldleft 239 | op-equality:binary-op <== 240 | ({`==`} / {`!=`}) @relational-expression 241 | 242 | AND-expression <-- 243 | (equality-expression op-AND*) ~> foldleft 244 | op-AND:binary-op <== 245 | {`&`} @equality-expression 246 | 247 | exclusive-OR-expression <-- 248 | (AND-expression op-OR*) ~> foldleft 249 | op-OR:binary-op <== 250 | {`^`} @AND-expression 251 | 252 | inclusive-OR-expression <-- 253 | (exclusive-OR-expression op-inclusive-OR*) ~> foldleft 254 | op-inclusive-OR:binary-op <== 255 | {`|`} @exclusive-OR-expression 256 | 257 | logical-AND-expression <-- 258 | (inclusive-OR-expression op-logical-AND*) ~> foldleft 259 | op-logical-AND:binary-op <== 260 | {`&&`} @inclusive-OR-expression 261 | 262 | logical-OR-expression <-- 263 | (logical-AND-expression op-logical-OR*) ~> foldleft 264 | op-logical-OR:binary-op <== 265 | {`||`} @logical-AND-expression 266 | 267 | conditional-expression <-- 268 | (logical-OR-expression op-conditional?) ~> foldleft 269 | op-conditional:ternary-op <== 270 | {`?`} @expression @`:` @conditional-expression 271 | 272 | assignment-expression <-- 273 | conditional-expression !assignment-operator / 274 | (unary-expression op-assignment+) ~> foldleft 275 | op-assignment:binary-op <== 276 | assignment-operator @assignment-expression 277 | assignment-operator <-- 278 | {`=`} / 279 | {`*=`} / 280 | {`/=`} / 281 | {`%=`} / 282 | {`+=`} / 283 | {`-=`} / 284 | {`<<=`} / 285 | {`>>=`} / 286 | {`&=`} / 287 | {`^=`} / 288 | {`|=`} 289 | 290 | expression <== 291 | assignment-expression (`,` @assignment-expression)* 292 | 293 | constant-expression <-- 294 | conditional-expression 295 | 296 | -------------------------------------------------------------------------------- 297 | -- Declarations 298 | 299 | declaration <== 300 | ( 301 | typedef-declaration / 302 | type-declaration / 303 | static_assert-declaration 304 | ) 305 | @`;` 306 | 307 | extension-specifiers <== 308 | extension-specifier+ 309 | 310 | extension-specifier <== 311 | attribute / asm / tg-promote 312 | 313 | attribute <== 314 | (`__attribute__` / `__attribute`) `(` @`(` attribute-list @`)` @`)` / 315 | `[` `[` attribute-list @`]` @`]` 316 | 317 | attribute-list <-- 318 | attribute-item (`,` attribute-item)* 319 | 320 | tg-promote <== 321 | `__tg_promote` @`(` (expression / parameter-varargs) @`)` 322 | 323 | attribute-item <== 324 | identifier-anyword (`(` expression `)`)? 325 | 326 | asm <== 327 | (`__asm` / `__asm__`) 328 | (`__volatile__` / `volatile`)~? 329 | `(` asm-argument (`,` asm-argument)* @`)` 330 | 331 | asm-argument <-- ( 332 | string-literal / 333 | {`:`} / 334 | {`,`} / 335 | `[` expression @`]` / 336 | `(` expression @`)` / 337 | expression 338 | )+ 339 | 340 | typedef-declaration <== 341 | `typedef` @declaration-specifiers (typedef-declarator (`,` @typedef-declarator)*)? 342 | 343 | type-declaration <== 344 | declaration-specifiers init-declarator-list? 345 | 346 | declaration-specifiers <== 347 | ((type-specifier-width / declaration-specifiers-aux)* type-specifier / 348 | declaration-specifiers-aux* type-specifier-width 349 | ) (type-specifier-width / declaration-specifiers-aux)* 350 | 351 | declaration-specifiers-aux <-- 352 | storage-class-specifier / 353 | type-qualifier / 354 | function-specifier / 355 | alignment-specifier 356 | 357 | init-declarator-list <== 358 | init-declarator (`,` init-declarator)* 359 | 360 | init-declarator <== 361 | declarator (`=` initializer)? 362 | 363 | storage-class-specifier <== 364 | {`extern`} / 365 | {`static`} / 366 | {`auto`} / 367 | {`register`} / 368 | (`_Thread_local` / `__thread`)->'_Thread_local' 369 | 370 | type-specifier <== 371 | {`void`} / 372 | {`char`} / 373 | {`int`} / 374 | {`float`} / 375 | {`double`} / 376 | {`_Bool`} / 377 | atomic-type-specifier / 378 | struct-or-union-specifier / 379 | enum-specifier / 380 | typedef-name / 381 | typeof 382 | 383 | type-specifier-width : type-specifier <== 384 | {`short`} / 385 | (`signed` / `__signed__`)->'signed' / 386 | {`unsigned`} / 387 | (`long` `long`)->'long long' / 388 | {`long`} / 389 | {`_Complex`} / 390 | {`_Imaginary`} 391 | 392 | typeof <== 393 | (`typeof` / `__typeof` / `__typeof__`) @argument-expression 394 | 395 | struct-or-union-specifier <== 396 | struct-or-union extension-specifiers~? 397 | (identifier-word struct-declaration-list? / $false struct-declaration-list) 398 | 399 | struct-or-union <-- 400 | {`struct`} / {`union`} 401 | 402 | struct-declaration-list <== 403 | `{` (struct-declaration / static_assert-declaration)* @`}` 404 | 405 | struct-declaration <== 406 | specifier-qualifier-list struct-declarator-list? @`;` 407 | 408 | specifier-qualifier-list <== 409 | ((type-specifier-width / specifier-qualifier-aux)* type-specifier / 410 | specifier-qualifier-aux* type-specifier-width 411 | ) (type-specifier-width / specifier-qualifier-aux)* 412 | 413 | specifier-qualifier-aux <-- 414 | type-qualifier / 415 | alignment-specifier 416 | 417 | struct-declarator-list <== 418 | struct-declarator (`,` struct-declarator)* 419 | 420 | struct-declarator <== 421 | declarator (`:` @constant-expression)? / 422 | `:` $false @constant-expression 423 | 424 | enum-specifier <== 425 | `enum` extension-specifiers~? (identifier-word~? `{` @enumerator-list `,`? @`}` / @identifier-word) 426 | 427 | enumerator-list <== 428 | enumerator (`,` enumerator)* 429 | 430 | enumerator <== 431 | enumeration-constant extension-specifiers~? (`=` @constant-expression)? 432 | 433 | atomic-type-specifier <== 434 | `_Atomic` `(` type-name `)` 435 | 436 | type-qualifier <== 437 | {`const`} / 438 | (`restrict` / `__restrict` / `__restrict__`)->'restrict' / 439 | {`volatile`} / 440 | {`_Atomic`} !`(` / 441 | extension-specifier 442 | 443 | function-specifier <== 444 | (`inline` / `__inline` / `__inline__`)->'inline' / 445 | {`_Noreturn`} 446 | 447 | alignment-specifier <== 448 | `_Alignas` `(` (type-name / constant-expression) `)` 449 | 450 | declarator <== 451 | (pointer* direct-declarator) -> foldright 452 | extension-specifiers? 453 | 454 | typedef-declarator:declarator <== 455 | (pointer* typedef-direct-declarator) -> foldright 456 | extension-specifiers? 457 | 458 | direct-declarator <-- 459 | ((identifier / `(` declarator `)`) direct-declarator-suffix*) ~> foldleft 460 | 461 | typedef-direct-declarator <-- 462 | ((typedef-identifier / `(` typedef-declarator `)`) direct-declarator-suffix*) ~> foldleft 463 | 464 | direct-declarator-suffix <-- 465 | declarator-subscript / 466 | declarator-parameters 467 | 468 | declarator-subscript <== 469 | `[` subscript-qualifier-list~? (assignment-expression / pointer)~? @`]` 470 | 471 | subscript-qualifier-list <== 472 | (type-qualifier / &`static` storage-class-specifier)+ 473 | 474 | declarator-parameters <== 475 | `(` parameter-type-list `)` / 476 | `(` identifier-list? `)` 477 | 478 | pointer <== 479 | extension-specifiers~? `*` type-qualifier-list~? 480 | 481 | type-qualifier-list <== 482 | type-qualifier+ 483 | 484 | parameter-type-list <== 485 | parameter-list (`,` parameter-varargs)? 486 | 487 | parameter-varargs <== 488 | `...` 489 | 490 | parameter-list <-- 491 | parameter-declaration (`,` parameter-declaration)* 492 | 493 | parameter-declaration <== 494 | declaration-specifiers (declarator / abstract-declarator?) 495 | 496 | identifier-list <== 497 | identifier-list-item (`,` @identifier-list-item)* 498 | 499 | identifier-list-item <-- 500 | identifier / `(` type-name @`)` 501 | 502 | type-name <== 503 | specifier-qualifier-list abstract-declarator? 504 | 505 | abstract-declarator:declarator <== 506 | ( 507 | (pointer+ direct-abstract-declarator?) -> foldright / 508 | direct-abstract-declarator 509 | ) extension-specifiers? 510 | 511 | direct-abstract-declarator <-- 512 | ( 513 | `(` abstract-declarator `)` direct-declarator-suffix* / 514 | direct-declarator-suffix+ 515 | ) ~> foldleft 516 | 517 | typedef-name <== 518 | &(identifier => is_typedef) identifier 519 | 520 | typedef-identifier <== 521 | &(identifier => set_typedef) identifier 522 | 523 | initializer <== 524 | assignment-expression / 525 | `{` initializer-list? `,`? @`}` 526 | 527 | initializer-list <== 528 | initializer-item (`,` initializer-item)* 529 | 530 | initializer-item <-- 531 | designation / 532 | initializer 533 | 534 | designation <== 535 | designator-list `=` @initializer 536 | 537 | designator-list <== 538 | designator+ 539 | 540 | designator <-- 541 | subscript-designator / 542 | member-designator 543 | 544 | subscript-designator <== 545 | `[` @constant-expression @`]` 546 | 547 | member-designator <== 548 | `.` @identifier-word 549 | 550 | static_assert-declaration <== 551 | `_Static_assert` @`(` @constant-expression (`,` @string-literal)? @`)` 552 | 553 | -------------------------------------------------------------------------------- 554 | -- Statements 555 | 556 | statement <-- 557 | label-statement / 558 | case-statement / 559 | default-statement / 560 | compound-statement / 561 | expression-statement / 562 | if-statement / 563 | switch-statement / 564 | while-statement / 565 | do-while-statement / 566 | for-statement / 567 | goto-statement / 568 | continue-statement / 569 | break-statement / 570 | return-statement / 571 | asm-statement / 572 | attribute / 573 | `;` 574 | 575 | label-statement <== 576 | identifier `:` 577 | 578 | case-statement <== 579 | `case` @constant-expression @`:` statement? 580 | 581 | default-statement <== 582 | `default` @`:` statement? 583 | 584 | compound-statement <== 585 | `{` (label-statement / declaration / statement)* @`}` 586 | 587 | expression-statement <== 588 | expression @`;` 589 | 590 | if-statement <== 591 | `if` @`(` @expression @`)` @statement (`else` @statement)? 592 | 593 | switch-statement <== 594 | `switch` @`(` @expression @`)` @statement 595 | 596 | while-statement <== 597 | `while` @`(` @expression @`)` @statement 598 | 599 | do-while-statement <== 600 | `do` @statement @`while` @`(` @expression @`)` @`;` 601 | 602 | for-statement <== 603 | `for` @`(` (declaration / expression~? @`;`) expression~? @`;` expression~? @`)` @statement 604 | 605 | goto-statement <== 606 | `goto` constant-expression @`;` 607 | 608 | continue-statement <== 609 | `continue` @`;` 610 | 611 | break-statement <== 612 | `break` @`;` 613 | 614 | return-statement <== 615 | `return` expression? @`;` 616 | 617 | asm-statement <== 618 | asm @`;` 619 | 620 | -------------------------------------------------------------------------------- 621 | -- External definitions 622 | 623 | translation-unit <== 624 | external-declaration* 625 | 626 | external-declaration <-- 627 | function-definition / 628 | declaration / 629 | `;` 630 | 631 | function-definition <== 632 | declaration-specifiers declarator declaration-list compound-statement 633 | 634 | declaration-list <== 635 | declaration* 636 | ]==] 637 | 638 | -- List of syntax errors 639 | local SyntaxErrorLabels = { 640 | ["UnexpectedSyntax"] = "unexpected syntax", 641 | } 642 | 643 | -- Extra builtin types (in GCC/Clang). 644 | local builtin_typedefs = { 645 | __builtin_va_list = true, 646 | __auto_type = true, 647 | __int128 = true, __int128_t = true, 648 | _Float32 = true, _Float32x = true, 649 | _Float64 = true, _Float64x = true, 650 | __float128 = true, _Float128 = true, 651 | } 652 | 653 | -- Parsing typedefs identifiers in C11 requires context information. 654 | local typedefs 655 | 656 | -- Clear typedefs. 657 | local function init_typedefs() 658 | typedefs = {} 659 | for k in pairs(builtin_typedefs) do 660 | typedefs[k] = true 661 | end 662 | end 663 | 664 | local Defs = {} 665 | 666 | -- Checks whether an identifier node is a typedef. 667 | function Defs.is_typedef(_, _, node) 668 | return typedefs[node[1]] == true 669 | end 670 | 671 | -- Set an identifier as a typedef. 672 | function Defs.set_typedef(_, _, node) 673 | typedefs[node[1]] = true 674 | return true 675 | end 676 | 677 | -- Compile grammar. 678 | local lpegrex = require 'lpegrex' 679 | local patt = lpegrex.compile(Grammar, Defs) 680 | 681 | --[[ 682 | Parse C11 source code into an AST. 683 | The source code must be already preprocessed (preprocessor directives will be ignored). 684 | ]] 685 | local function parse(source, name) 686 | init_typedefs() 687 | local ast, errlabel, errpos = patt:match(source) 688 | typedefs = nil 689 | if not ast then 690 | name = name or '' 691 | local lineno, colno, line = lpegrex.calcline(source, errpos) 692 | local colhelp = string.rep(' ', colno-1)..'^' 693 | local errmsg = SyntaxErrorLabels[errlabel] or errlabel 694 | error('syntax error: '..name..':'..lineno..':'..colno..': '..errmsg.. 695 | '\n'..line..'\n'..colhelp) 696 | end 697 | return ast 698 | end 699 | 700 | return parse 701 | -------------------------------------------------------------------------------- /parsers/csv.lua: -------------------------------------------------------------------------------- 1 | --[[ 2 | This grammar is based on the CSV specification. 3 | As seen in https://en.wikipedia.org/wiki/Comma-separated_values 4 | ]] 5 | local Grammar = [==[ 6 | Csv <-- rows (!.)^UnexpectedSyntax 7 | rows <-| Row (%nl Row)* 8 | Row <-| Column (',' Column)+ 9 | Column <-- Number / QuotedString / String 10 | QuotedString <-- '"' {~ ('""' -> '"' / !'"' .)* ~} '"' COLUMN_END 11 | Number <-- {[+-]? (%d+ '.'? %d+? / '.' %d+) ([eE] [+-]? %d+)?} -> tonumber COLUMN_END 12 | String <-- {[^%nl,]*} 13 | COLUMN_END <-- ![^%nl,] / !. 14 | ]==] 15 | 16 | -- List of syntax errors. 17 | local SyntaxErrorLabels = { 18 | ["UnexpectedSyntax"] = "unexpected syntax", 19 | } 20 | 21 | -- Compile grammar. 22 | local lpegrex = require 'lpegrex' 23 | local patt = lpegrex.compile(Grammar) 24 | 25 | -- Parse CSV source into an AST. 26 | local function parse(source, name) 27 | local ast, errlabel, errpos = patt:match(source) 28 | if not ast then 29 | name = name or '' 30 | local lineno, colno, line = lpegrex.calcline(source, errpos) 31 | local colhelp = string.rep(' ', colno-1)..'^' 32 | local errmsg = SyntaxErrorLabels[errlabel] or errlabel 33 | error('syntax error: '..name..':'..lineno..':'..colno..': '..errmsg.. 34 | '\n'..line..'\n'..colhelp) 35 | end 36 | return ast 37 | end 38 | 39 | return parse 40 | -------------------------------------------------------------------------------- /parsers/json.lua: -------------------------------------------------------------------------------- 1 | --[[ 2 | This grammar is based on the JSON specification. 3 | As seen in https://www.json.org/json-en.html 4 | ]] 5 | local Grammar = [==[ 6 | Json <-- SKIP (Object / Array) (!.)^UnexpectedSyntax 7 | Object <== `{` (Member (`,` @Member)*)? @`}` 8 | Array <== `[` (Value (`,` @Value)*)? @`]` 9 | Member <== String `:` @Value 10 | Value <-- String / Number / Object / Array / Boolean / Null 11 | String <-- '"' {~ ('\' -> '' @ESCAPE / !'"' .)* ~} @'"' SKIP 12 | Number <-- {[+-]? (%d+ '.'? %d+? / '.' %d+) ([eE] [+-]? %d+)?} -> tonumber SKIP 13 | Boolean <-- `false` -> tofalse / `true` -> totrue 14 | Null <-- `null` -> tonil 15 | ESCAPE <-- [\/"] / ('b' $8 / 't' $9 / 'n' $10 / 'f' $12 / 'r' $13 / 'u' {%x^4} $16) -> tochar 16 | SKIP <-- %s* 17 | NAME_SUFFIX <-- [_%w]+ 18 | ]==] 19 | 20 | -- List of syntax errors. 21 | local SyntaxErrorLabels = { 22 | ["UnexpectedSyntax"] = "unexpected syntax", 23 | ["Expected_Member"] = "expected an object member", 24 | ["Expected_Value"] = "expected a value", 25 | ["Expected_ESCAPE"] = "expected a valid escape sequence", 26 | ["Expected_}"] = "unclosed curly bracket `}`", 27 | ["Expected_]"] = "unclosed square bracket `]`", 28 | ['Expected_"'] = 'unclosed string quotes `"`', 29 | } 30 | 31 | -- Compile grammar. 32 | local lpegrex = require 'lpegrex' 33 | local patt = lpegrex.compile(Grammar) 34 | 35 | -- Parse JSON source into an AST. 36 | local function parse(source, name) 37 | local ast, errlabel, errpos = patt:match(source) 38 | if not ast then 39 | name = name or '' 40 | local lineno, colno, line = lpegrex.calcline(source, errpos) 41 | local colhelp = string.rep(' ', colno-1)..'^' 42 | local errmsg = SyntaxErrorLabels[errlabel] or errlabel 43 | error('syntax error: '..name..':'..lineno..':'..colno..': '..errmsg.. 44 | '\n'..line..'\n'..colhelp) 45 | end 46 | return ast 47 | end 48 | 49 | return parse 50 | -------------------------------------------------------------------------------- /parsers/lua.lua: -------------------------------------------------------------------------------- 1 | --[[ 2 | This grammar is based on Lua 5.4 3 | As seen in https://www.lua.org/manual/5.4/manual.html#9 4 | ]] 5 | local Grammar = [==[ 6 | chunk <-- SHEBANG? SKIP Block (!.)^UnexpectedSyntax 7 | 8 | Block <== ( Label / Return / Break / Goto / Do / While / Repeat / If / ForNum / ForIn 9 | / FuncDef / FuncDecl / VarDecl / Assign / call / `;`)* 10 | Label <== `::` @NAME @`::` 11 | Return <== `return` exprlist? 12 | Break <== `break` 13 | Goto <== `goto` @NAME 14 | Do <== `do` Block @`end` 15 | While <== `while` @expr @`do` Block @`end` 16 | Repeat <== `repeat` Block @`until` @expr 17 | If <== `if` @expr @`then` Block (`elseif` @expr @`then` Block)* (`else` Block)? @`end` 18 | ForNum <== `for` Id `=` @expr @`,` @expr (`,` @expr)? @`do` Block @`end` 19 | ForIn <== `for` @idlist `in` @exprlist @`do` Block @`end` 20 | FuncDef <== `function` @funcname funcbody 21 | FuncDecl <== `local` `function` @Id funcbody 22 | VarDecl <== `local` @iddecllist (`=` @exprlist)? 23 | Assign <== varlist `=` @exprlist 24 | 25 | Number <== NUMBER->tonumber SKIP 26 | String <== STRING SKIP 27 | Boolean <== `false`->tofalse / `true`->totrue 28 | Nil <== `nil` 29 | Varargs <== `...` 30 | Id <== NAME 31 | IdDecl <== NAME (`<` @NAME @`>`)? 32 | Function <== `function` funcbody 33 | Table <== `{` (field (fieldsep field)* fieldsep?)? @`}` 34 | Paren <== `(` @expr @`)` 35 | Pair <== `[` @expr @`]` @`=` @expr / NAME `=` @expr 36 | 37 | Call <== callargs 38 | CallMethod <== `:` @NAME @callargs 39 | DotIndex <== `.` @NAME 40 | ColonIndex <== `:` @NAME 41 | KeyIndex <== `[` @expr @`]` 42 | 43 | indexsuffix <-- DotIndex / KeyIndex 44 | callsuffix <-- Call / CallMethod 45 | 46 | var <-- (exprprimary (callsuffix+ indexsuffix / indexsuffix)+)~>rfoldright / Id 47 | call <-- (exprprimary (indexsuffix+ callsuffix / callsuffix)+)~>rfoldright 48 | exprsuffixed <-- (exprprimary (indexsuffix / callsuffix)*)~>rfoldright 49 | funcname <-- (Id DotIndex* ColonIndex?)~>rfoldright 50 | 51 | funcbody <-- @`(` funcargs @`)` Block @`end` 52 | field <-- Pair / expr 53 | fieldsep <-- `,` / `;` 54 | 55 | callargs <-| `(` (expr (`,` @expr)*)? @`)` / Table / String 56 | idlist <-| Id (`,` @Id)* 57 | iddecllist <-| IdDecl (`,` @IdDecl)* 58 | funcargs <-| (Id (`,` Id)* (`,` Varargs)? / Varargs)? 59 | exprlist <-| expr (`,` @expr)* 60 | varlist <-| var (`,` @var)* 61 | 62 | opor :BinaryOp <== `or`->'or' @exprand 63 | opand :BinaryOp <== `and`->'and' @exprcmp 64 | opcmp :BinaryOp <== (`==`->'eq' / `~=`->'ne' / `<=`->'le' / `>=`->'ge' / `<`->'lt' / `>`->'gt') @exprbor 65 | opbor :BinaryOp <== `|`->'bor' @exprbxor 66 | opbxor :BinaryOp <== `~`->'bxor' @exprband 67 | opband :BinaryOp <== `&`->'band' @exprbshift 68 | opbshift :BinaryOp <== (`<<`->'shl' / `>>`->'shr') @exprconcat 69 | opconcat :BinaryOp <== `..`->'concat' @exprconcat 70 | oparit :BinaryOp <== (`+`->'add' / `-`->'sub') @exprfact 71 | opfact :BinaryOp <== (`*`->'mul' / `//`->'idiv' / `/`->'div' / `%`->'mod') @exprunary 72 | oppow :BinaryOp <== `^`->'pow' @exprunary 73 | opunary :UnaryOp <== (`not`->'not' / `#`->'len' / `-`->'unm' / `~`->'bnot') @exprunary 74 | 75 | expr <-- expror 76 | expror <-- (exprand opor*)~>foldleft 77 | exprand <-- (exprcmp opand*)~>foldleft 78 | exprcmp <-- (exprbor opcmp*)~>foldleft 79 | exprbor <-- (exprbxor opbor*)~>foldleft 80 | exprbxor <-- (exprband opbxor*)~>foldleft 81 | exprband <-- (exprbshift opband*)~>foldleft 82 | exprbshift <-- (exprconcat opbshift*)~>foldleft 83 | exprconcat <-- (exprarit opconcat*)~>foldleft 84 | exprarit <-- (exprfact oparit*)~>foldleft 85 | exprfact <-- (exprunary opfact*)~>foldleft 86 | exprunary <-- opunary / exprpow 87 | exprpow <-- (exprsimple oppow*)~>foldleft 88 | exprsimple <-- Nil / Boolean / Number / String / Varargs / Function / Table / exprsuffixed 89 | exprprimary <-- Id / Paren 90 | 91 | STRING <-- STRING_SHRT / STRING_LONG 92 | STRING_LONG <-- {:LONG_OPEN {LONG_CONTENT} @LONG_CLOSE:} 93 | STRING_SHRT <-- {:QUOTE_OPEN {~QUOTE_CONTENT~} @QUOTE_CLOSE:} 94 | QUOTE_OPEN <-- {:qe: ['"] :} 95 | QUOTE_CONTENT <-- (ESCAPE_SEQ / !(QUOTE_CLOSE / LINEBREAK) .)* 96 | QUOTE_CLOSE <-- =qe 97 | ESCAPE_SEQ <-- '\'->'' @ESCAPE 98 | ESCAPE <-- [\'"] / 99 | ('n' $10 / 't' $9 / 'r' $13 / 'a' $7 / 'b' $8 / 'v' $11 / 'f' $12)->tochar / 100 | ('x' {HEX_DIGIT^2} $16)->tochar / 101 | ('u' '{' {HEX_DIGIT^+1} '}' $16)->toutf8char / 102 | ('z' SPACE*)->'' / 103 | (DEC_DIGIT DEC_DIGIT^-1 !DEC_DIGIT / [012] DEC_DIGIT^2)->tochar / 104 | (LINEBREAK $10)->tochar 105 | 106 | NUMBER <-- {HEX_NUMBER / DEC_NUMBER} 107 | HEX_NUMBER <-- '0' [xX] @HEX_PREFIX ([pP] @EXP_DIGITS)? 108 | DEC_NUMBER <-- DEC_PREFIX ([eE] @EXP_DIGITS)? 109 | HEX_PREFIX <-- HEX_DIGIT+ ('.' HEX_DIGIT*)? / '.' HEX_DIGIT+ 110 | DEC_PREFIX <-- DEC_DIGIT+ ('.' DEC_DIGIT*)? / '.' DEC_DIGIT+ 111 | EXP_DIGITS <-- [+-]? DEC_DIGIT+ 112 | 113 | COMMENT <-- '--' (COMMENT_LONG / COMMENT_SHRT) 114 | COMMENT_LONG <-- (LONG_OPEN LONG_CONTENT @LONG_CLOSE)->0 115 | COMMENT_SHRT <-- (!LINEBREAK .)* 116 | 117 | LONG_CONTENT <-- (!LONG_CLOSE .)* 118 | LONG_OPEN <-- '[' {:eq: '='*:} '[' LINEBREAK? 119 | LONG_CLOSE <-- ']' =eq ']' 120 | 121 | NAME <-- !KEYWORD {NAME_PREFIX NAME_SUFFIX?} SKIP 122 | NAME_PREFIX <-- [_a-zA-Z] 123 | NAME_SUFFIX <-- [_a-zA-Z0-9]+ 124 | 125 | SHEBANG <-- '#!' (!LINEBREAK .)* LINEBREAK? 126 | SKIP <-- (SPACE+ / COMMENT)* 127 | LINEBREAK <-- %cn %cr / %cr %cn / %cn / %cr 128 | SPACE <-- %sp 129 | HEX_DIGIT <-- [0-9a-fA-F] 130 | DEC_DIGIT <-- [0-9] 131 | EXTRA_TOKENS <-- `[[` `[=` `--` -- unused rule, here just to force defining these tokens 132 | ]==] 133 | 134 | -- List of syntax errors 135 | local SyntaxErrorLabels = { 136 | ["Expected_::"] = "unclosed label, did you forget `::`?", 137 | ["Expected_)"] = "unclosed parenthesis, did you forget a `)`?", 138 | ["Expected_>"] = "unclosed angle bracket, did you forget a `>`?", 139 | ["Expected_]"] = "unclosed square bracket, did you forget a `]`?", 140 | ["Expected_}"] = "unclosed curly brace, did you forget a `}`?", 141 | ["Expected_LONG_CLOSE"] = "unclosed long string or comment, did your forget a ']]'?", 142 | ["Expected_QUOTE_CLOSE"]= "unclosed short string or comment, did your forget a quote?", 143 | ["Expected_("] = "expected parenthesis token `(`", 144 | ["Expected_,"] = "expected comma token `,`", 145 | ["Expected_="] = "expected equals token `=`", 146 | ["Expected_callargs"] = "expected arguments", 147 | ["Expected_expr"] = "expected an expression", 148 | ["Expected_exprand"] = "expected an expression after operator", 149 | ["Expected_exprcmp"] = "expected an expression after operator", 150 | ["Expected_exprbor"] = "expected an expression after operator", 151 | ["Expected_exprbxor"] = "expected an expression after operator", 152 | ["Expected_exprband"] = "expected an expression after operator", 153 | ["Expected_exprbshift"] = "expected an expression after operator", 154 | ["Expected_exprconcat"] = "expected an expression after operator", 155 | ["Expected_exprfact"] = "expected an expression after operator", 156 | ["Expected_exprunary"] = "expected an expression after operator", 157 | ["Expected_exprlist"] = "expected expressions", 158 | ["Expected_funcname"] = "expected a function name", 159 | ["Expected_do"] = "expected `do` keyword to begin a statement block", 160 | ["Expected_end"] = "expected `end` keyword to close a statement block", 161 | ["Expected_then"] = "expected `then` keyword to begin a statement block", 162 | ["Expected_until"] = "expected `until` keyword to close repeat statement", 163 | ["Expected_ESCAPE"] = "malformed escape sequence", 164 | ["Expected_EXP_DIGITS"] = "malformed exponential number", 165 | ["Expected_HEX_PREFIX"] = "malformed hexadecimal number", 166 | ["Expected_Id"] = "expected an identifier name", 167 | ["Expected_NAME"] = "expected an identifier name", 168 | ["Expected_IdDecl"] = "expected an identifier name declaration", 169 | ["Expected_iddecllist"] = "expected identifiers names declaration", 170 | ["Expected_idlist"] = "expected identifiers names", 171 | ["Expected_var"] = "expected a variable", 172 | ["UnexpectedSyntax"] = "unexpected syntax", 173 | } 174 | 175 | -- Compile grammar 176 | local lpegrex = require 'lpegrex' 177 | local patt = lpegrex.compile(Grammar) 178 | 179 | -- Parse Lua source into an AST. 180 | local function parse(source, name) 181 | local ast, errlabel, errpos = patt:match(source) 182 | if not ast then 183 | name = name or '' 184 | local lineno, colno, line = lpegrex.calcline(source, errpos) 185 | local colhelp = string.rep(' ', colno-1)..'^' 186 | local errmsg = SyntaxErrorLabels[errlabel] or errlabel 187 | error('syntax error: '..name..':'..lineno..':'..colno..': '..errmsg.. 188 | '\n'..line..'\n'..colhelp) 189 | end 190 | return ast 191 | end 192 | 193 | return parse 194 | -------------------------------------------------------------------------------- /rockspecs/lpegrex-0.2.2-1.rockspec: -------------------------------------------------------------------------------- 1 | package = "lpegrex" 2 | version = "0.2.2-1" 3 | source = { 4 | url = "git://github.com/edubart/lpegrex.git", 5 | tag = "v0.2.2" 6 | } 7 | description = { 8 | summary = "LPeg Regular Expression eXtended", 9 | homepage = "https://github.com/edubart/lpegrex", 10 | license = "MIT" 11 | } 12 | dependencies = { 13 | "lua >= 5.1", 14 | 'lpeglabel >= 1.6.0', 15 | } 16 | build = { 17 | type = "builtin", 18 | modules = { 19 | ['lpegrex'] = 'lpegrex.lua' 20 | } 21 | } 22 | -------------------------------------------------------------------------------- /tests/c11-test.lua: -------------------------------------------------------------------------------- 1 | local parse_c11 = require 'parsers.c11' 2 | local lpegrex = require 'lpegrex' 3 | local lester = require 'tests.lester' 4 | 5 | local describe, it = lester.describe, lester.it 6 | 7 | local function eqast(source, expected) 8 | local aststr = lpegrex.prettyast(parse_c11(source)) 9 | expected = expected:gsub('^%s+', ''):gsub('%s+$', '') 10 | if not aststr:find(expected, 1, true) then 11 | error('expected to match second in first value\nfirst value:\n'..aststr..'\nsecond value:\n'..expected) 12 | end 13 | end 14 | 15 | describe('c11 parser', function() 16 | 17 | it("basic", function() 18 | eqast([[]], [[-]]) 19 | eqast([[/* comment */]], [[-]]) 20 | eqast([[// comment]], [[-]]) 21 | end) 22 | 23 | it("escape sequence", function() 24 | eqast([[const char* s = "\'\"\?\a\b\f\n\r\t\v\\\000\xff";]], 25 | [["'\"?\x07\x08\x0c\n\r\t\x0b\\\x00\xff"]]) 26 | end) 27 | 28 | it("external declaration", function() 29 | eqast([[int a;]], [[ 30 | translation-unit 31 | | declaration 32 | | | type-declaration 33 | | | | declaration-specifiers 34 | | | | | type-specifier 35 | | | | | | "int" 36 | | | | init-declarator-list 37 | | | | | init-declarator 38 | | | | | | declarator 39 | | | | | | | identifier 40 | | | | | | | | "a" 41 | ]]) 42 | eqast([[void main(){}]], [[ 43 | translation-unit 44 | | function-definition 45 | | | declaration-specifiers 46 | | | | type-specifier 47 | | | | | "void" 48 | | | declarator 49 | | | | declarator-parameters 50 | | | | | identifier 51 | | | | | | "main" 52 | | | declaration-list 53 | | | compound-statement 54 | ]]) 55 | eqast([[_Static_assert(x, "x");]], [[ 56 | translation-unit 57 | | declaration 58 | | | static_assert-declaration 59 | | | | identifier 60 | | | | | "x" 61 | | | | string-literal 62 | | | | | "x" 63 | ]]) 64 | end) 65 | 66 | it("expression statement", function() 67 | eqast([[void main() {a;;}]], [[ 68 | | | | expression-statement 69 | | | | | expression 70 | | | | | | identifier 71 | | | | | | | "a" 72 | ]]) 73 | end) 74 | 75 | it("selection statement", function() 76 | eqast([[void main() {if(a) {} else if(b) {} else {}}]], [[ 77 | | | | if-statement 78 | | | | | expression 79 | | | | | | identifier 80 | | | | | | | "a" 81 | | | | | compound-statement 82 | | | | | if-statement 83 | | | | | | expression 84 | | | | | | | identifier 85 | | | | | | | | "b" 86 | | | | | | compound-statement 87 | | | | | | compound-statement 88 | ]]) 89 | eqast([[void main() {switch(a) {case A: default: break;}}]], [[ 90 | | | | switch-statement 91 | | | | | expression 92 | | | | | | identifier 93 | | | | | | | "a" 94 | | | | | compound-statement 95 | | | | | | case-statement 96 | | | | | | | identifier 97 | | | | | | | | "A" 98 | | | | | | | default-statement 99 | | | | | | | | break-statement 100 | ]]) 101 | end) 102 | 103 | it("iteration statement", function() 104 | eqast([[void main() {while(a) {};}]], [[ 105 | | | | while-statement 106 | | | | | expression 107 | | | | | | identifier 108 | | | | | | | "a" 109 | | | | | compound-statement 110 | ]]) 111 | eqast([[void main() {do{} while(a);}]], [[ 112 | | | | do-while-statement 113 | | | | | compound-statement 114 | | | | | expression 115 | | | | | | identifier 116 | | | | | | | "a" 117 | ]]) 118 | eqast([[void main() {for(;;) {}}]], [[ 119 | | | | for-statement 120 | | | | | false 121 | | | | | false 122 | | | | | false 123 | | | | | compound-statement 124 | ]]) 125 | eqast([[void main() {for(i=10;i;i--) {}}]], [[ 126 | | | | for-statement 127 | | | | | expression 128 | | | | | | binary-op 129 | | | | | | | identifier 130 | | | | | | | | "i" 131 | | | | | | | "=" 132 | | | | | | | integer-constant 133 | | | | | | | | "10" 134 | | | | | expression 135 | | | | | | identifier 136 | | | | | | | "i" 137 | | | | | expression 138 | | | | | | post-decrement 139 | | | | | | | identifier 140 | | | | | | | | "i" 141 | | | | | compound-statement 142 | ]]) 143 | end) 144 | 145 | it("jump statement", function() 146 | eqast([[void main() {continue;}]], "continue-statement") 147 | eqast([[void main() {break;}]], "break-statement") 148 | eqast([[void main() {return;}]], "return-statement") 149 | eqast([[void main() {return a;}]], [[ 150 | | | | return-statement 151 | | | | | expression 152 | | | | | | identifier 153 | | | | | | | "a" 154 | ]]) 155 | eqast([[void main() {a: goto a;}]], [[ 156 | | | | label-statement 157 | | | | | identifier 158 | | | | | | "a" 159 | | | | goto-statement 160 | | | | | identifier 161 | | | | | | "a" 162 | ]]) 163 | 164 | end) 165 | 166 | it("label with typedefs", function() 167 | eqast([[ 168 | // namespaces.c 169 | typedef int S, T, U; 170 | struct S { int T; }; 171 | union U { int x; }; 172 | void f(void) { 173 | // The following uses of S, T, U are correct, and have no 174 | // effect on the visibility of S, T, U as typedef names. 175 | struct S s = { .T = 1 }; 176 | T: s.T = 2; 177 | union U u = { 1 }; 178 | goto T; 179 | // S, T and U are still typedef names: 180 | S ss = 1; T tt = 1; U uu = 1; 181 | } 182 | ]], [[ 183 | | | | label-statement 184 | | | | | identifier 185 | | | | | | "T" 186 | ]]) 187 | 188 | end) 189 | 190 | end) 191 | -------------------------------------------------------------------------------- /tests/csv-test.lua: -------------------------------------------------------------------------------- 1 | local parse_csv = require 'parsers.csv' 2 | local lester = require 'tests.lester' 3 | 4 | local describe, it, expect = lester.describe, lester.it, lester.expect 5 | 6 | describe("csv", function() 7 | 8 | it("simple", function() 9 | local source = [[name,age 10 | John,20 11 | Maria,23]] 12 | local expected_csv = { 13 | {'name', 'age'}, 14 | {'John', 20}, 15 | {'Maria', 23}, 16 | } 17 | expect.equal(parse_csv(source), expected_csv) 18 | end) 19 | 20 | it("quoted strings", function() 21 | local source = [[name,age 22 | "John",20 23 | "Maria "" Maria 24 | Maria",23 25 | Paul "Paul",24]] 26 | local expected_csv = { 27 | {'name', 'age'}, 28 | {'John', 20}, 29 | {'Maria " Maria\nMaria', 23}, 30 | {'Paul "Paul"', 24}, 31 | } 32 | expect.equal(parse_csv(source), expected_csv) 33 | end) 34 | 35 | it("complex", function() 36 | local source = [[Year,Make,Model,Description,Price 37 | 1997,Ford,E350,"ac, abs, moon",3000.00 38 | 1999,Chevy,"Venture ""Extended Edition""","",4900.00 39 | 1999,Chevy,"Venture ""Extended Edition, Very Large""",,5000.00 40 | 1996,Jeep,Grand Cherokee,"MUST SELL! 41 | air, moon roof, loaded",4799.00]] 42 | local expected_csv = { 43 | { 'Year', 'Make', 'Model', 'Description', 'Price' }, 44 | { 1997, 'Ford', 'E350', 'ac, abs, moon', 3000.0 }, 45 | { 1999, 'Chevy', 'Venture "Extended Edition"', '', 4900.0 }, 46 | { 1999, 'Chevy', 'Venture "Extended Edition, Very Large"', '', 5000.0 }, 47 | { 1996, 'Jeep', 'Grand Cherokee', 'MUST SELL!\nair, moon roof, loaded', 4799.0 } 48 | } 49 | expect.equal(parse_csv(source), expected_csv) 50 | end) 51 | 52 | end) 53 | -------------------------------------------------------------------------------- /tests/json-test.lua: -------------------------------------------------------------------------------- 1 | local parse_json = require 'parsers.json' 2 | local lester = require 'tests.lester' 3 | 4 | local describe, it, expect = lester.describe, lester.it, lester.expect 5 | 6 | describe("json", function() 7 | 8 | it("simple", function() 9 | local source = '[{"string":"some\\ntext", "boolean":true, "number":-1.5e+2, "null":null}]' 10 | local expected_json = 11 | { tag = "Array", pos = 1, endpos = 73, 12 | { tag = "Object", pos = 2, endpos = 72, 13 | { tag = "Member", pos = 3, endpos = 24, 14 | "string","some\ntext" }, 15 | { tag = "Member", pos = 26, endpos = 40, 16 | "boolean", true }, 17 | { tag = "Member", pos = 42, endpos = 58, 18 | "number", -150.0 }, 19 | { tag = "Member", pos = 60, endpos = 71, 20 | "null", nil } 21 | } 22 | } 23 | expect.equal(parse_json(source), expected_json) 24 | end) 25 | 26 | end) 27 | -------------------------------------------------------------------------------- /tests/lester.lua: -------------------------------------------------------------------------------- 1 | --[[ 2 | Minimal test framework for Lua. 3 | lester - v0.1.2 - 15/Feb/2021 4 | Eduardo Bart - edub4rt@gmail.com 5 | https://github.com/edubart/lester 6 | Minimal Lua test framework. 7 | See end of file for LICENSE. 8 | ]] 9 | 10 | --[[-- 11 | Lester is a minimal unit testing framework for Lua with a focus on being simple to use. 12 | 13 | ## Features 14 | 15 | * Minimal, just one file. 16 | * Self contained, no external dependencies. 17 | * Simple and hackable when needed. 18 | * Use `describe` and `it` blocks to describe tests. 19 | * Supports `before` and `after` handlers. 20 | * Colored output. 21 | * Configurable via the script or with environment variables. 22 | * Quiet mode, to use in live development. 23 | * Optionally filter tests by name. 24 | * Show traceback on errors. 25 | * Show time to complete tests. 26 | * Works with Lua 5.1+. 27 | * Efficient. 28 | 29 | ## Usage 30 | 31 | Copy `lester.lua` file to a project and require it, 32 | which returns a table that includes all of the functionality: 33 | 34 | ```lua 35 | local lester = require 'lester' 36 | local describe, it, expect = lester.describe, lester.it, lester.expect 37 | 38 | -- Customize lester configuration. 39 | lester.show_traceback = false 40 | 41 | describe('my project', function() 42 | lester.before(function() 43 | -- This function is run before every test. 44 | end) 45 | 46 | describe('module1', function() -- Describe blocks can be nested. 47 | it('feature1', function() 48 | expect.equal('something', 'something') -- Pass. 49 | end) 50 | 51 | it('feature2', function() 52 | expect.truthy(false) -- Fail. 53 | end) 54 | end) 55 | end) 56 | 57 | lester.report() -- Print overall statistic of the tests run. 58 | lester.exit() -- Exit with success if all tests passed. 59 | ``` 60 | 61 | ## Customizing output with environment variables 62 | 63 | To customize the output of lester externally, 64 | you can set the following environment variables before running a test suite: 65 | 66 | * `LESTER_QUIET="true"`, omit print of passed tests. 67 | * `LESTER_COLORED="false"`, disable colored output. 68 | * `LESTER_SHOW_TRACEBACK="false"`, disable traceback on test failures. 69 | * `LESTER_SHOW_ERROR="false"`, omit print of error description of failed tests. 70 | * `LESTER_STOP_ON_FAIL="true"`, stop on first test failure. 71 | * `LESTER_UTF8TERM="false"`, disable printing of UTF-8 characters. 72 | * `LESTER_FILTER="some text"`, filter the tests that should be run. 73 | 74 | Note that these configurations can be changed via script too, check the documentation. 75 | 76 | ]] 77 | 78 | -- Returns whether the terminal supports UTF-8 characters. 79 | local function is_utf8term() 80 | local lang = os.getenv('LANG') 81 | return (lang and lang:lower():match('utf%-?8$')) and true or false 82 | end 83 | 84 | -- Returns whether a system environment variable is "true". 85 | local function getboolenv(varname, default) 86 | local val = os.getenv(varname) 87 | if val == 'true' then 88 | return true 89 | elseif val == 'false' then 90 | return false 91 | end 92 | return default 93 | end 94 | 95 | -- The lester module. 96 | local lester = { 97 | --- Weather lines of passed tests should not be printed. False by default. 98 | quiet = getboolenv('LESTER_QUIET', false), 99 | --- Weather the output should be colorized. True by default. 100 | colored = getboolenv('LESTER_COLORED', true), 101 | --- Weather a traceback must be shown on test failures. True by default. 102 | show_traceback = getboolenv('LESTER_SHOW_TRACEBACK', true), 103 | --- Weather the error description of a test failure should be shown. True by default. 104 | show_error = getboolenv('LESTER_SHOW_ERROR', true), 105 | --- Weather test suite should exit on first test failure. False by default. 106 | stop_on_fail = getboolenv('LESTER_STOP_ON_FAIL', false), 107 | --- Weather we can print UTF-8 characters to the terminal. True by default when supported. 108 | utf8term = getboolenv('LESTER_UTF8TERM', is_utf8term()), 109 | --- A string with a lua pattern to filter tests. Nil by default. 110 | filter = os.getenv('LESTER_FILTER'), 111 | --- Function to retrieve time in seconds with milliseconds precision, `os.clock` by default. 112 | seconds = os.clock, 113 | } 114 | 115 | -- Variables used internally for the lester state. 116 | local lester_start = nil 117 | local last_succeeded = false 118 | local level = 0 119 | local successes = 0 120 | local total_successes = 0 121 | local failures = 0 122 | local total_failures = 0 123 | local start = 0 124 | local befores = {} 125 | local afters = {} 126 | local names = {} 127 | 128 | -- Color codes. 129 | local color_codes = { 130 | reset = string.char(27) .. '[0m', 131 | bright = string.char(27) .. '[1m', 132 | red = string.char(27) .. '[31m', 133 | green = string.char(27) .. '[32m', 134 | blue = string.char(27) .. '[34m', 135 | magenta = string.char(27) .. '[35m', 136 | } 137 | 138 | -- Colors table, returning proper color code if colored mode is enabled. 139 | local colors = setmetatable({}, { __index = function(_, key) 140 | return lester.colored and color_codes[key] or '' 141 | end}) 142 | 143 | --- Table of terminal colors codes, can be customized. 144 | lester.colors = colors 145 | 146 | --- Describe a block of tests, which consists in a set of tests. 147 | -- Describes can be nested. 148 | -- @param name A string used to describe the block. 149 | -- @param func A function containing all the tests or other describes. 150 | function lester.describe(name, func) 151 | if level == 0 then -- Get start time for top level describe blocks. 152 | start = lester.seconds() 153 | if not lester_start then 154 | lester_start = start 155 | end 156 | end 157 | -- Setup describe block variables. 158 | failures = 0 159 | successes = 0 160 | level = level + 1 161 | names[level] = name 162 | -- Run the describe block. 163 | func() 164 | -- Cleanup describe block. 165 | afters[level] = nil 166 | befores[level] = nil 167 | names[level] = nil 168 | level = level - 1 169 | -- Pretty print statistics for top level describe block. 170 | if level == 0 and not lester.quiet and (successes > 0 or failures > 0) then 171 | local io_write = io.write 172 | local colors_reset, colors_green = colors.reset, colors.green 173 | io_write(failures == 0 and colors_green or colors.red, '[====] ', 174 | colors.magenta, name, colors_reset, ' | ', 175 | colors_green, successes, colors_reset, ' successes / ') 176 | if failures > 0 then 177 | io_write(colors.red, failures, colors_reset, ' failures / ') 178 | end 179 | io_write(colors.bright, string.format('%.6f', lester.seconds() - start), colors_reset, ' seconds\n') 180 | end 181 | end 182 | 183 | -- Error handler used to get traceback for errors. 184 | local function xpcall_error_handler(err) 185 | return debug.traceback(tostring(err), 2) 186 | end 187 | 188 | -- Pretty print the line on the test file where an error happened. 189 | local function show_error_line(err) 190 | local info = debug.getinfo(3) 191 | local io_write = io.write 192 | local colors_reset = colors.reset 193 | local short_src, currentline = info.short_src, info.currentline 194 | io_write(' (', colors.blue, short_src, colors_reset, 195 | ':', colors.bright, currentline, colors_reset) 196 | if err and lester.show_traceback then 197 | local fnsrc = short_src..':'..currentline 198 | for cap1, cap2 in err:gmatch('\t[^\n:]+:(%d+): in function <([^>]+)>\n') do 199 | if cap2 == fnsrc then 200 | io_write('/', colors.bright, cap1, colors_reset) 201 | break 202 | end 203 | end 204 | end 205 | io_write(')') 206 | end 207 | 208 | -- Pretty print the test name, with breadcrumb for the describe blocks. 209 | local function show_test_name(name) 210 | local io_write = io.write 211 | local colors_reset = colors.reset 212 | for _,descname in ipairs(names) do 213 | io_write(colors.magenta, descname, colors_reset, ' | ') 214 | end 215 | io_write(colors.bright, name, colors_reset) 216 | end 217 | 218 | --- Declare a test, which consists of a set of assertions. 219 | -- @param name A name for the test. 220 | -- @param func The function containing all assertions. 221 | function lester.it(name, func) 222 | -- Skip the test if it does not match the filter. 223 | if lester.filter then 224 | local fullname = table.concat(names, ' | ')..' | '..name 225 | if not fullname:match(lester.filter) then 226 | return 227 | end 228 | end 229 | -- Execute before handlers. 230 | for _,levelbefores in ipairs(befores) do 231 | for _,beforefn in ipairs(levelbefores) do 232 | beforefn(name) 233 | end 234 | end 235 | -- Run the test, capturing errors if any. 236 | local success, err 237 | if lester.show_traceback then 238 | success, err = xpcall(func, xpcall_error_handler) 239 | else 240 | success, err = pcall(func) 241 | if not success and err then 242 | err = tostring(err) 243 | end 244 | end 245 | -- Count successes and failures. 246 | if success then 247 | successes = successes + 1 248 | total_successes = total_successes + 1 249 | else 250 | failures = failures + 1 251 | total_failures = total_failures + 1 252 | end 253 | local io_write = io.write 254 | local colors_reset = colors.reset 255 | -- Print the test run. 256 | if not lester.quiet then -- Show test status and complete test name. 257 | if success then 258 | io_write(colors.green, '[PASS] ', colors_reset) 259 | else 260 | io_write(colors.red, '[FAIL] ', colors_reset) 261 | end 262 | show_test_name(name) 263 | if not success then 264 | show_error_line(err) 265 | end 266 | io_write('\n') 267 | else 268 | if success then -- Show just a character hinting that the test succeeded. 269 | local o = (lester.utf8term and lester.colored) and 270 | string.char(226, 151, 143) or 'o' 271 | io_write(colors.green, o, colors_reset) 272 | else -- Show complete test name on failure. 273 | io_write(last_succeeded and '\n' or '', 274 | colors.red, '[FAIL] ', colors_reset) 275 | show_test_name(name) 276 | show_error_line(err) 277 | io_write('\n') 278 | end 279 | end 280 | -- Print error message, colorizing its output if possible. 281 | if err and lester.show_error then 282 | if lester.colored then 283 | local errfile, errline, errmsg, rest = err:match('^([^:\n]+):(%d+): ([^\n]+)(.*)') 284 | if errfile and errline and errmsg and rest then 285 | io_write(colors.blue, errfile, colors_reset, 286 | ':', colors.bright, errline, colors_reset, ': ') 287 | if errmsg:match('^%w([^:]*)$') then 288 | io_write(colors.red, errmsg, colors_reset) 289 | else 290 | io_write(errmsg) 291 | end 292 | err = rest 293 | end 294 | end 295 | io_write(err, '\n\n') 296 | end 297 | io.flush() 298 | -- Stop on failure. 299 | if not success and lester.stop_on_fail then 300 | if lester.quiet then 301 | io_write('\n') 302 | io.flush() 303 | end 304 | lester.exit() 305 | end 306 | -- Execute after handlers. 307 | for _,levelafters in ipairs(afters) do 308 | for _,afterfn in ipairs(levelafters) do 309 | afterfn(name) 310 | end 311 | end 312 | last_succeeded = success 313 | end 314 | 315 | --- Set a function that is called before every test inside a describe block. 316 | -- A single string containing the name of the test about to be run will be passed to `func`. 317 | function lester.before(func) 318 | local levelbefores = befores[level] 319 | if not levelbefores then 320 | levelbefores = {} 321 | befores[level] = levelbefores 322 | end 323 | levelbefores[#levelbefores+1] = func 324 | end 325 | 326 | --- Set a function that is called after every test inside a describe block. 327 | -- A single string containing the name of the test that was finished will be passed to `func`. 328 | -- The function is executed independently if the test passed or failed. 329 | function lester.after(func) 330 | local levelafters = afters[level] 331 | if not levelafters then 332 | levelafters = {} 333 | afters[level] = levelafters 334 | end 335 | levelafters[#levelafters+1] = func 336 | end 337 | 338 | --- Pretty print statistics of all test runs. 339 | -- With total success, total failures and run time in seconds. 340 | function lester.report() 341 | local now = lester.seconds() 342 | local colors_reset = colors.reset 343 | io.write(lester.quiet and '\n' or '', 344 | colors.green, total_successes, colors_reset, ' successes / ', 345 | colors.red, total_failures, colors_reset, ' failures / ', 346 | colors.bright, string.format('%.6f', now - (lester_start or now)), colors_reset, ' seconds\n') 347 | io.flush() 348 | return total_failures == 0 349 | end 350 | 351 | --- Exit the application with success code if all tests passed, or failure code otherwise. 352 | function lester.exit() 353 | os.exit(total_failures == 0) 354 | end 355 | 356 | local expect = {} 357 | --- Expect module, containing utility function for doing assertions inside a test. 358 | lester.expect = expect 359 | 360 | --- Check if a function fails with an error. 361 | -- If `expected` is nil then any error is accepted. 362 | -- If `expected` is a string then we check if the error contains that string. 363 | -- If `expected` is anything else then we check if both are equal. 364 | function expect.fail(func, expected) 365 | local ok, err = pcall(func) 366 | if ok then 367 | error('expected function to fail', 2) 368 | elseif expected ~= nil then 369 | local found = expected == err 370 | if not found and type(expected) == 'string' then 371 | found = string.find(tostring(err), expected, 1, true) 372 | end 373 | if not found then 374 | error('expected function to fail\nexpected:\n'..tostring(expected)..'\ngot:\n'..tostring(err), 2) 375 | end 376 | end 377 | end 378 | 379 | --- Check if a function does not fail with a error. 380 | function expect.not_fail(func) 381 | local ok, err = pcall(func) 382 | if not ok then 383 | error('expected function to not fail\ngot error:\n'..tostring(err), 2) 384 | end 385 | end 386 | 387 | --- Check if a value is not `nil`. 388 | function expect.exist(v) 389 | if v == nil then 390 | error('expected value to exist\ngot:\n'..tostring(v), 2) 391 | end 392 | end 393 | 394 | --- Check if a value is `nil`. 395 | function expect.not_exist(v) 396 | if v ~= nil then 397 | error('expected value to not exist\ngot:\n'..tostring(v), 2) 398 | end 399 | end 400 | 401 | --- Check if an expression is evaluates to `true`. 402 | function expect.truthy(v) 403 | if not v then 404 | error('expected expression to be true\ngot:\n'..tostring(v), 2) 405 | end 406 | end 407 | 408 | --- Check if an expression is evaluates to `false`. 409 | function expect.falsy(v) 410 | if v then 411 | error('expected expression to be false\ngot:\n'..tostring(v), 2) 412 | end 413 | end 414 | 415 | --- Compare if two values are equal, considering nested tables. 416 | local function strict_eq(t1, t2) 417 | if rawequal(t1, t2) then return true end 418 | if type(t1) ~= type(t2) then return false end 419 | if type(t1) ~= 'table' then return t1 == t2 end 420 | if getmetatable(t1) ~= getmetatable(t2) then return false end 421 | for k,v1 in pairs(t1) do 422 | if not strict_eq(v1, t2[k]) then return false end 423 | end 424 | for k,v2 in pairs(t2) do 425 | if not strict_eq(v2, t1[k]) then return false end 426 | end 427 | return true 428 | end 429 | 430 | --- Check if two values are equal. 431 | function expect.equal(v1, v2) 432 | if not strict_eq(v1, v2) then 433 | error('expected values to be equal\nfirst value:\n'..tostring(v1)..'\nsecond value:\n'..tostring(v2), 2) 434 | end 435 | end 436 | 437 | --- Check if two values are not equal. 438 | function expect.not_equal(v1, v2) 439 | if strict_eq(v1, v2) then 440 | error('expected values to be not equal\nfirst value:\n'..tostring(v1)..'\nsecond value:\n'..tostring(v2), 2) 441 | end 442 | end 443 | 444 | return lester 445 | 446 | --[[ 447 | The MIT License (MIT) 448 | 449 | Copyright (c) 2021 Eduardo Bart (https://github.com/edubart) 450 | 451 | Permission is hereby granted, free of charge, to any person obtaining a copy 452 | of this software and associated documentation files (the "Software"), to deal 453 | in the Software without restriction, including without limitation the rights 454 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 455 | copies of the Software, and to permit persons to whom the Software is 456 | furnished to do so, subject to the following conditions: 457 | 458 | The above copyright notice and this permission notice shall be included in all 459 | copies or substantial portions of the Software. 460 | 461 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 462 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 463 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 464 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 465 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 466 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 467 | SOFTWARE. 468 | ]] 469 | -------------------------------------------------------------------------------- /tests/lpegrex-test.lua: -------------------------------------------------------------------------------- 1 | -- Most of these tests are taken from lpeg and lpeglabel projects. 2 | 3 | local lpegrex = require 'lpegrex' 4 | local lpeg = require 'lpeglabel' 5 | local lester = require 'tests.lester' 6 | 7 | local describe, it, expect = lester.describe, lester.it, lester.expect 8 | local eq, truthy, falsy = expect.equal, expect.truthy, expect.falsy 9 | local compile, match, find, gsub = lpegrex.compile, lpegrex.match, lpegrex.find, lpegrex.gsub 10 | 11 | local function genallchar() 12 | local allchar = {} 13 | for i=0,255 do 14 | allchar[i + 1] = i 15 | end 16 | local unpack = table.unpack or unpack 17 | allchar = string.char(unpack(allchar)) 18 | assert(#allchar == 256) 19 | return allchar 20 | end 21 | 22 | local allchar = genallchar() 23 | 24 | local function cs2str(c) 25 | return lpeg.match(lpeg.Cs((c + lpeg.P(1)/"")^0), allchar) 26 | end 27 | 28 | describe('lpeg patterns', function() 29 | 30 | it("basic", function() 31 | eq(match("a", "."), 2) 32 | eq(match("a", "''"), 1) 33 | eq(match("", " ! . "), 1) 34 | falsy(match("a", " ! . ")) 35 | eq(match("abcde", " ( . . ) * "), 5) 36 | eq(match("abbcde", " [a-c] +"), 5) 37 | eq(match("0abbc1de", "'0' [a-c]+ '1'"), 7) 38 | eq(match("0zz1dda", "'0' [^a-c]+ 'a'"), 8) 39 | eq(match("abbc--", " [a-c] + +"), 5) 40 | eq(match("abbc--", " [ac-] +"), 2) 41 | eq(match("abbc--", " [-acb] + "), 7) 42 | falsy(match("abbcde", " [b-z] + ")) 43 | eq(match("abb\"de", '"abb"["]"de"'), 7) 44 | eq(match("abceeef", "'ac' ? 'ab' * 'c' { 'e' * } / 'abceeef' "), "eee") 45 | eq(match("abceeef", "'ac'? 'ab'* 'c' { 'f'+ } / 'abceeef' "), 8) 46 | eq(match("aaand", "[a]^2"), 3) 47 | end) 48 | 49 | it("predicates", function() 50 | eq({match("abceefe", "( ( & 'e' {} ) ? . ) * ")}, {4, 5, 7}) 51 | eq({match("abceefe", "((&&'e' {})? .)*")}, {4, 5, 7}) 52 | eq({match("abceefe", "( ( ! ! 'e' {} ) ? . ) *")}, {4, 5, 7}) 53 | eq({match("abceefe", "(( & ! & ! 'e' {})? .)*")}, {4, 5, 7}) 54 | end) 55 | 56 | it("ordered choice", function() 57 | eq(match("cccx" , "'ab'? ('ccc' / ('cde' / 'cd'*)? / 'ccc') 'x'+"), 5) 58 | eq(match("cdx" , "'ab'? ('ccc' / ('cde' / 'cd'*)? / 'ccc') 'x'+"), 4) 59 | eq(match("abcdcdx" , "'ab'? ('ccc' / ('cde' / 'cd'*)? / 'ccc') 'x'+"), 8) 60 | 61 | eq(match("abc", "a <- (. a)?"), 4) 62 | 63 | local p = "balanced <- '(' ([^()] / balanced)* ')'" 64 | truthy(match("(abc)", p)) 65 | truthy(match("(a(b)((c) (d)))", p)) 66 | falsy(match("(a(b ((c) (d)))", p)) 67 | 68 | local c = compile[[ balanced <- "(" ([^()] / balanced)* ")" ]] 69 | eq(c, lpeg.P(c)) 70 | truthy(c:match"((((a))(b)))") 71 | 72 | c = [[ 73 | S <- "0" B / "1" A / "" -- balanced strings 74 | A <- "0" S / "1" A A -- one more 0 75 | B <- "1" S / "0" B B -- one more 1 76 | ]] 77 | eq(match("00011011", c), 9) 78 | 79 | c = [[ 80 | S <- ("0" B / "1" A)* 81 | A <- "0" / "1" A A 82 | B <- "1" / "0" B B 83 | ]] 84 | eq(match("00011011", c), 9) 85 | eq(match("000110110", c), 9) 86 | eq(match("011110110", c), 3) 87 | eq(match("000110010", c), 1) 88 | end) 89 | 90 | it("repetitions", function() 91 | local s = "aaaaaaaaaaaaaaaaaaaaaaaa" 92 | eq(match(s, "'a'^3"), 4) 93 | eq(match(s, "'a'^0"), 1) 94 | eq(match(s, "'a'^+3"), s:len() + 1) 95 | falsy(match(s, "'a'^+30")) 96 | eq(match(s, "'a'^-30"), s:len() + 1) 97 | eq(match(s, "'a'^-5"), 6) 98 | for i = 1, s:len() do 99 | truthy(match(s, string.format("'a'^+%d", i)) >= i + 1) 100 | truthy(match(s, string.format("'a'^-%d", i)) <= i + 1) 101 | truthy(match(s, string.format("'a'^%d", i)) == i + 1) 102 | end 103 | 104 | eq(match("01234567890123456789", "[0-9]^3+"), 19) 105 | end) 106 | 107 | it("substitutions", function() 108 | eq(match("01234567890123456789", "({....}{...}) -> '%2%1'"), "4560123") 109 | eq(match("0123456789", "{| {.}* |}"), {"0", "1", "2", "3", "4", "5", "6", "7", "8", "9"}) 110 | eq(match("012345", "{| (..) -> '%0%0' |}")[1], "0101") 111 | 112 | eq(match("abcdef", "( {.} {.} {.} {.} {.} ) -> 3"), "c") 113 | eq(match("abcdef", "( {:x: . :} {.} {.} {.} {.} ) -> 3"), "d") 114 | eq(match("abcdef", "( {:x: . :} {.} {.} {.} {.} ) -> 0"), 6) 115 | 116 | falsy(match("abcdef", "{:x: ({.} {.} {.}) -> 2 :} =x")) 117 | truthy(match("abcbef", "{:x: ({.} {.} {.}) -> 2 :} =x")) 118 | end) 119 | 120 | it("sets", function() 121 | local function eqcharset(c1, c2) 122 | eq(cs2str(c1), cs2str(c2)) 123 | end 124 | 125 | eqcharset(compile"[]]", "]") 126 | eqcharset(compile"[][]", lpeg.S"[]") 127 | eqcharset(compile"[]-]", lpeg.S"-]") 128 | eqcharset(compile"[-]", lpeg.S"-") 129 | eqcharset(compile"[az-]", lpeg.S"a-z") 130 | eqcharset(compile"[-az]", lpeg.S"a-z") 131 | eqcharset(compile"[a-z]", lpeg.R"az") 132 | eqcharset(compile"[]['\"]", lpeg.S[[]['"]]) 133 | 134 | local any = lpeg.P(1) 135 | eqcharset(compile"[^]]", any - "]") 136 | eqcharset(compile"[^][]", any - lpeg.S"[]") 137 | eqcharset(compile"[^]-]", any - lpeg.S"-]") 138 | eqcharset(compile"[^]-]", any - lpeg.S"-]") 139 | eqcharset(compile"[^-]", any - lpeg.S"-") 140 | eqcharset(compile"[^az-]", any - lpeg.S"a-z") 141 | eqcharset(compile"[^-az]", any - lpeg.S"a-z") 142 | eqcharset(compile"[^a-z]", any - lpeg.R"az") 143 | eqcharset(compile"[^]['\"]", any - lpeg.S[[]['"]]) 144 | end) 145 | 146 | it("predefined names", function() 147 | eq(os.setlocale("C"), "C") 148 | 149 | local function eqlpeggsub(p1, p2) 150 | eq(cs2str(compile(p1)), allchar:gsub("[^" .. p2 .. "]", "")) 151 | end 152 | 153 | eqlpeggsub("%w", "%w") 154 | eqlpeggsub("%a", "%a") 155 | eqlpeggsub("%l", "%l") 156 | eqlpeggsub("%u", "%u") 157 | eqlpeggsub("%p", "%p") 158 | eqlpeggsub("%d", "%d") 159 | eqlpeggsub("%x", "%x") 160 | eqlpeggsub("%s", "%s") 161 | eqlpeggsub("%c", "%c") 162 | 163 | eqlpeggsub("%W", "%W") 164 | eqlpeggsub("%A", "%A") 165 | eqlpeggsub("%L", "%L") 166 | eqlpeggsub("%U", "%U") 167 | eqlpeggsub("%P", "%P") 168 | eqlpeggsub("%D", "%D") 169 | eqlpeggsub("%X", "%X") 170 | eqlpeggsub("%S", "%S") 171 | eqlpeggsub("%C", "%C") 172 | 173 | eqlpeggsub("[%w]", "%w") 174 | eqlpeggsub("[_%w]", "_%w") 175 | eqlpeggsub("[^%w]", "%W") 176 | eqlpeggsub("[%W%S]", "%W%S") 177 | 178 | lpegrex.updatelocale() 179 | end) 180 | 181 | it("comments", function() 182 | local c = compile[[ 183 | A <- _B -- \t \n %nl .<> <- -> -- 184 | _B <- 'x' --]] 185 | eq(c:match'xy', 2) 186 | end) 187 | 188 | it("pre-definitions", function() 189 | local defs = {digits = lpeg.R"09", letters = lpeg.R"az", _=lpeg.P"__"} 190 | local c = compile("%letters (%letters / %digits)*", defs) 191 | eq(c:match"x123", 5) 192 | c = compile("%_", defs) 193 | eq(c:match"__", 3) 194 | 195 | c = compile([[ 196 | S <- A+ 197 | A <- %letters+ B 198 | B <- %digits+ 199 | ]], defs) 200 | eq(c:match("abcd1234"), 9) 201 | 202 | c = compile("{[0-9]+'.'?[0-9]*} -> sin", math) 203 | eq(c:match("2.34"), math.sin(2.34)) 204 | end) 205 | 206 | it("back reference", function() 207 | local c = compile([[ 208 | longstring <- '[' {:init: '='* :} '[' close 209 | close <- ']' =init ']' / . close 210 | ]]) 211 | 212 | eq(c:match'[==[]]===]]]]==]===[]', 17) 213 | eq(c:match'[[]=]====]=]]]==]===[]', 14) 214 | falsy(c:match'[[]=]====]=]=]==]===[]') 215 | 216 | c = compile" '[' {:init: '='* :} '[' (!(']' =init ']') .)* ']' =init ']' !. " 217 | 218 | truthy(c:match'[==[]]===]]]]==]') 219 | truthy(c:match'[[]=]====]=][]==]===[]]') 220 | falsy(c:match'[[]=]====]=]=]==]===[]') 221 | 222 | c = compile([[ 223 | doc <- block !. 224 | block <- (start {| (block / { [^<]+ })* |} end?) => addtag 225 | start <- '<' {:tag: [a-z]+ :} '>' 226 | end <- '' 227 | ]], {addtag = function(_, i, t, tag) t.tag = tag return i, t end}) 228 | 229 | local t = c:match[[hihellobuttotheend]] 230 | eq(t, {tag='x', 'hi', {tag = 'b', 'hello'}, 'but', {'totheend'}}) 231 | end) 232 | 233 | it("find", function() 234 | eq(find("hi alalo", "{:x:..:} =x"), 4) 235 | eq(find("hi alalo", "{:x:..:} =x", 4), 4) 236 | falsy(find("hi alalo", "{:x:..:} =x", 5)) 237 | eq(find("hi alalo", "{'al'}", 5), 6) 238 | eq(find("hi aloalolo", "{:x:..:} =x"), 8) 239 | eq(find("alo alohi x x", "{:word:%w+:}%W*(=word)!%w"), 11) 240 | eq(find("", "!."), 1) 241 | eq(find("alo", "!."), 4) 242 | 243 | -- find discards any captures 244 | eq({2,3,nil}, {find("alo", "{.}{'o'}")}) 245 | 246 | local function fmatch(s, p) 247 | local i,e = find(s,p) 248 | if i then return s:sub(i, e) end 249 | end 250 | eq(fmatch("alo alo", '[a-z]+'), "alo") 251 | eq(fmatch("alo alo", '{:x: [a-z]+ :} =x'), nil) 252 | eq(fmatch("alo alo", "{:x: [a-z]+ :} ' ' =x"), "alo alo") 253 | end) 254 | 255 | it("gsub", function() 256 | eq(gsub("alo alo", "[abc]", "x"), "xlo xlo") 257 | eq(gsub("alo alo", "%w+", "."), ". .") 258 | eq(gsub("hi, how are you", "[aeiou]", string.upper), "hI, hOw ArE yOU") 259 | local s = 'hi [[a comment[=]=] ending here]] and [=[another]]=]]' 260 | local c = compile" '[' {:i: '='* :} '[' (!(']' =i ']') .)* ']' { =i } ']' " 261 | eq(gsub(s, c, "%2"), 'hi and =]') 262 | eq(gsub(s, c, "%0"), s) 263 | eq(gsub('[=[hi]=]', c, "%2"), '=') 264 | end) 265 | 266 | it("folding captures", function() 267 | local c = compile([[ 268 | S <- (number (%s+ number)*) ~> add 269 | number <- digit->tonumber 270 | digit <- %d+ 271 | ]], {tonumber = tonumber, add = function(a,b) return a + b end}) 272 | eq(c:match("3 401 50"), 3 + 401 + 50) 273 | end) 274 | 275 | it("look-ahead captures", function() 276 | eq({match("alo", "&(&{.}) !{'b'} {&(...)} &{..} {...} {!.}")}, 277 | {"", "alo", ""}) 278 | eq(match("aloalo", "{~ (((&'al' {.}) -> 'A%1' / (&%l {.}) -> '%1%1') / .)* ~}"), 279 | "AallooAalloo") 280 | eq(match("alo alo", [[ {~ (&(. ([a-z]* -> '*')) ([a-z]+ -> '+') ' '*)* ~} ]]), 281 | "+ +") 282 | eq(match("hello aloaLo aloalo xuxu", [[S <- &({:two: .. :} . =two) {[a-z]+} / . S]]), 283 | "aloalo") 284 | 285 | local c = compile[[ 286 | block <- {| {:ident:space*:} line 287 | ((=ident !space line) / &(=ident space) block)* |} 288 | line <- {[^%nl]*} %nl 289 | space <- '_' -- should be ' ', but '_' is simpler for editors 290 | ]] 291 | local t = c:match[[ 292 | 1 293 | __1.1 294 | __1.2 295 | ____1.2.1 296 | ____ 297 | 2 298 | __2.1 299 | ]] 300 | eq(t, {"1", {"1.1", "1.2", {"1.2.1", "", ident = "____"}, ident = "__"}, 301 | "2", {"2.1", ident = "__"}, ident = ""}) 302 | end) 303 | 304 | it("nested grammars", function() 305 | local c = compile[[ 306 | s <- a b !. 307 | b <- ( x <- ('b' x)? ) 308 | a <- ( x <- 'a' x? ) 309 | ]] 310 | truthy(c:match'aaabbb') 311 | truthy(c:match'aaa') 312 | falsy(c:match'bbb') 313 | falsy(c:match'aaabbba') 314 | end) 315 | 316 | it("groups", function() 317 | eq({match("abc", "{:S <- {:.:} {S} / '':}")}, {"a", "bc", "b", "c", "c", ""}) 318 | eq(match("1234", "{| {:a:.:} {:b:.:} {:c:.{.}:} |}"), {a="1", b="2", c="4"}) 319 | eq(match("1234", "{|{:a:.:} {:b:{.}{.}:} {:c:{.}:}|}"), {a="1", b="2", c="4"}) 320 | eq(match("12345", "{| {:.:} {:b:{.}{.}:} {:{.}{.}:} |}"), {"1", b="2", "4", "5"}) 321 | eq(match("12345", "{| {:.:} {:{:b:{.}{.}:}:} {:{.}{.}:} |}"), {"1", "23", "4", "5"}) 322 | eq(match("12345", "{| {:.:} {{:b:{.}{.}:}} {:{.}{.}:} |}"), {"1", "23", "4", "5"}) 323 | end) 324 | 325 | it("nested substitutions", function() 326 | local c = compile[[ 327 | text <- {~ item* ~} 328 | item <- macro / [^()] / '(' item* ')' 329 | arg <- ' '* {~ (!',' item)* ~} 330 | args <- '(' arg (',' arg)* ')' 331 | macro <- ('apply' args) -> '%1(%2)' 332 | / ('add' args) -> '%1 + %2' 333 | / ('mul' args) -> '%1 * %2' 334 | ]] 335 | eq(c:match"add(mul(a,b), apply(f,x))", "a * b + f(x)") 336 | 337 | c = compile[[ R <- (!.) -> '' / ({.} R) -> '%2%1']] 338 | eq(c:match"0123456789", "9876543210") 339 | end) 340 | 341 | it("error labels", function() 342 | local c = compile[['a' / %{l1}]] 343 | eq(c:match("a"), 2) 344 | eq({nil, 'l1', 1}, {c:match("b")}) 345 | 346 | c = compile[['a'^l1]] 347 | eq({nil, 'l1', 1}, {c:match("b")}) 348 | 349 | c = compile[[ 350 | A <- 'a'^B 351 | B <- [a-f]^C 352 | C <- [a-z] 353 | ]] 354 | eq(c:match("a"), 2) 355 | eq(c:match("a"), 2) 356 | eq(c:match("f"), 2) 357 | eq(c:match("g"), 2) 358 | eq(c:match("z"), 2) 359 | eq({nil, 'fail', 1}, {c:match("A")}) 360 | 361 | c = compile[[ 362 | A <- %{C} 363 | B <- [a-z] 364 | ]] 365 | eq({nil, 'C', 1}, {c:match("a")}) 366 | 367 | c = compile[[A <- %{B} 368 | B <- [a-z] 369 | ]] 370 | eq(c:match("a"), 2) 371 | eq({nil, 'fail', 1}, {c:match("U")}) 372 | 373 | c = compile[[ 374 | A <- [a-f] %{B} 375 | B <- [a-c] %{C} 376 | C <- [a-z] 377 | ]] 378 | eq({nil, 'fail', 2}, {c:match("a")}) 379 | eq({nil, 'fail', 3}, {c:match("aa")}) 380 | eq(c:match("aaa"), 4) 381 | eq({nil, 'fail', 2}, {c:match("ad")}) 382 | eq({nil, 'fail', 1}, {c:match("g")}) 383 | 384 | --[[ grammar based on Figure 8 of paper submitted to SCP (using the recovery operator) 385 | S -> S0 //{1} ID //{2} ID '=' Exp //{3} 'unsigned'* 'int' ID //{4} 'unsigned'* ID ID / %error 386 | S0 -> S1 / S2 / &'int' %3 387 | S1 -> &(ID '=') %2 / &(ID !.) %1 / &ID %4 388 | S2 -> &('unsigned'+ ID) %4 / & ('unsigned'+ 'int') %3 389 | ]] 390 | 391 | c = compile([[ 392 | S <- S0 / %{L5} 393 | S0 <- S1 / S2 / &Int %{L3} 394 | S1 <- &(ID %s* '=') %{L2} / &(ID !.) %{L1} / &ID %{L4} 395 | S2 <- &(U+ ID) %{L4} / &(U+ Int) %{L3} 396 | ID <- %s* 'a' 397 | U <- %s* 'unsigned' 398 | Int <- %s* 'int' 399 | Exp <- %s* 'E' 400 | L1 <- ID 401 | L2 <- ID %s* '=' Exp 402 | L3 <- U* Int ID 403 | L4 <- U ID ID 404 | ]]) 405 | local s = "a" 406 | eq(c:match(s), #s + 1) 407 | s = "a = E" 408 | eq(c:match(s), #s + 1) 409 | s = "int a" 410 | eq(c:match(s), #s + 1) 411 | s = "unsigned int a" 412 | eq(c:match(s), #s + 1) 413 | s = "unsigned a a" 414 | eq(c:match(s), #s + 1) 415 | s = "b" 416 | eq({nil, 'L5', 1}, {c:match(s)}) 417 | s = "unsigned" 418 | eq({nil, 'L5', 1}, {c:match(s)}) 419 | s = "unsigned a" 420 | eq({nil, 'L5', 1}, {c:match(s)}) 421 | s = "unsigned int" 422 | eq({nil, 'L5', 1}, {c:match(s)}) 423 | end) 424 | 425 | it("error labels with captures", function() 426 | local c = compile[[ 427 | S <- ( %s* &. {A} )* 428 | A <- [0-9]+ / %{5} 429 | ]] 430 | eq({"523", "624", "346", "888"} , {c:match("523 624 346\n888")}) 431 | eq({nil, 5, 4}, {c:match("44 a 123")}) 432 | 433 | c = compile[[ 434 | S <- ( %s* &. {A} )* 435 | A <- [0-9]+ / %{Rec} 436 | Rec <- ((![0-9] .)*) -> "58" 437 | ]] 438 | eq({"523", "624", "346", "888"} , {c:match("523 624 346\n888")}) 439 | eq({"44", "a ", "58", "123"}, {c:match("44 a 123")}) 440 | 441 | c = compile[[ 442 | S <- ( %s* &. A )* 443 | A <- {[0-9]+} / %{5} 444 | ]] 445 | eq({"523", "624", "346", "888"} , {c:match("523 624 346\n888")}) 446 | eq({nil, 5, 4}, {c:match("44 a 123")}) 447 | 448 | c = compile[[ 449 | S <- ( %s* &. A )* 450 | A <- {[0-9]+} / %{Rec} 451 | Rec <- ((![0-9] .)*) -> "58" 452 | ]] 453 | eq({"523", "624", "346", "888"} , {c:match("523 624 346\n888")}) 454 | eq({"44", "58", "123"}, {c:match("44 a 123")}) 455 | end) 456 | 457 | it("grammar syntax errors", function() 458 | expect.fail(function() compile('aaaa') end, "rule 'aaaa'") 459 | expect.fail(function() compile('a') end, 'outside') 460 | expect.fail(function() compile('b <- a') end, 'undefined') 461 | expect.fail(function() compile('b <- %invalid') end, 'undefined') 462 | expect.fail(function() compile("x <- 'a' x <- 'b'") end, 'already defined') 463 | expect.fail(function() compile("'a' -") end, 'unexpected characters') 464 | end) 465 | 466 | it("some syntax errors", function() 467 | expect.fail(function() compile([[~]]) end, [[L1:C1: no pattern found 468 | ~ 469 | ^]]) 470 | expect.fail(function() compile([['p'~]]) end, [[L1:C4: unexpected characters after the pattern 471 | 'p'~ 472 | ^]]) 473 | expect.fail(function() compile([['p' /]]) end, [[L1:C5: expected a pattern after '/' 474 | 'p' / 475 | ^]]) 476 | expect.fail(function() compile([[&]]) end, [[L1:C1: expected a pattern after '&' 477 | & 478 | ^]]) 479 | expect.fail(function() compile([[ A <- %nosuch ('error']]) end, [[L1:C22: missing closing ')' 480 | A <- %nosuch ('error' 481 | ^]]) 482 | end) 483 | 484 | it("non syntax errors", function() 485 | expect.fail(function() compile([[A <- %nosuch %def]]) end, [[name 'nosuch' undefined]]) 486 | expect.fail(function() compile([[names not in grammar]]) end, [[rule 'names' used outside a grammar]]) 487 | expect.fail(function() compile([[A<-. A<-.]]) end, [['A' already defined as a rule]]) 488 | end) 489 | 490 | end) 491 | 492 | describe("lpegrex extensions", function() 493 | 494 | it("control characters patterns", function() 495 | eq(match('\a', "%ca"), 2) 496 | eq(match('\b', "%cb"), 2) 497 | eq(match('\f', "%cf"), 2) 498 | eq(match('\n', "%cn"), 2) 499 | eq(match('\r', "%cr"), 2) 500 | eq(match('\t', "%ct"), 2) 501 | eq(match('\v', "%cv"), 2) 502 | eq(match('\a\b\f\n\r\t\v', "%ca"), 2) 503 | 504 | end) 505 | 506 | it("utf8", function() 507 | eq(match('AA', "%utf8"), 2) 508 | eq(match('AA', "%utf8+"), 3) 509 | eq(match('AA', "%utf8seq"), nil) 510 | eq(match('AA', "%ascii"), 2) 511 | local pi = string.char(0xCF, 0x80) 512 | eq(match(pi..'A', "%utf8"), 3) 513 | eq(match(pi..'A', "%utf8+"), 4) 514 | eq(match(pi..'A', "%utf8seq"), 3) 515 | eq(match(pi..'A', "%ascii"), nil) 516 | end) 517 | 518 | it("arbitrary captures", function() 519 | local c = compile([[$nil $true $false ${} $myvar]], { myvar = 'hello'}) 520 | eq({nil, true, false, {}, 'hello'}, {c:match('')}) 521 | 522 | c = compile([[$'text' $"something" $0 $-1]]) 523 | eq({'text', "something", 0, -1}, {c:match('')}) 524 | end) 525 | 526 | it("optional match with false capture", function() 527 | eq(match('b', [[{'a'}~?]]), false) 528 | eq(match('a', [[{'a'}~?]]), 'a') 529 | end) 530 | 531 | it("token and keywords literals", function() 532 | eq(match('a', [[A <- `a` NAME_SUFFIX<-[_%w]+ SKIP<-%s*]]), 2) 533 | eq(match('{', [[A <- `{` SKIP <- %s*]]), 2) 534 | eq(match('`', [[A <- ``` SKIP <- %s*]]), 2) 535 | eq(match('a : c', [[G <- `a` `:` `c` NAME_SUFFIX<-[_%w]+ SKIP<-%s*]]), 6) 536 | eq(match('local function()', 537 | [[A <- `local` `function` `(` `)` NAME_SUFFIX<-[_%w]+ SKIP<-%s*]]), 17) 538 | 539 | eq({match('{ a \n', [[A <- {`{`} {`a`} SKIP<-%s* NAME_SUFFIX<-[_%w]+]])}, {'{', 'a'}) 540 | end) 541 | 542 | it("matching unique tokens", function() 543 | local c = compile([[ 544 | chunk <-| (Dot1 / Dot2 / Dot3)* 545 | Dot1 <== `.` NAME 546 | Dot3 <== `...` NAME 547 | Dot2 <== `..` NAME 548 | NAME <- {%w+} SKIP 549 | SKIP <- %s* 550 | ]],{__options={pos=false, endpos=false}}) 551 | eq({{tag="Dot1", "1"}, {tag="Dot2", "2"}, {tag="Dot3", "3"}}, c:match('.1 ..2 ...3')) 552 | eq({{tag="Dot3", "3"}, {tag="Dot2", "2"}, {tag="Dot1", "1"}}, c:match('...3 ..2 .1')) 553 | 554 | c = compile([[ 555 | chunk <- `.` `..` `...` TOKEN TOKEN TOKEN 556 | SKIP <- %s* 557 | ]]) 558 | eq(c:match('. .. ... ... . .. '), 19) 559 | end) 560 | 561 | it("matching unique keywords", function() 562 | local c = compile([[ 563 | chunk <-| (NAME / Else / ElseIf)* 564 | Else <== `else` 565 | ElseIf <== `elseif` 566 | NAME <-- !KEYWORD {NAME_PREFIX NAME_SUFFIX?} SKIP 567 | NAME_PREFIX <-- [_%a] 568 | NAME_SUFFIX <- [_%w]+ 569 | SKIP <- %s* 570 | ]], {__options={pos=false, endpos=false}}) 571 | eq({{tag="Else"}, {tag="ElseIf"}, 'elsedummy'}, c:match('else elseif elsedummy')) 572 | eq({'elsedummy', {tag="ElseIf"}, {tag="Else"}}, c:match('elsedummy elseif else')) 573 | end) 574 | 575 | it("auxiliary functions", function() 576 | eq(match('dummy', '%a+ -> tonil'), nil) 577 | eq(match('dummy', '%a+ -> tofalse'), false) 578 | eq(match('dummy', '%a+ -> totrue'), true) 579 | eq(match('dummy', '%a+ -> toemptytable'), {}) 580 | eq(match('1234', '%d+ -> tonumber'), 1234) 581 | eq(match('ff', '({%x+} $16) -> tonumber'), 0xff) 582 | eq(match('65', '%d+ -> tochar'), string.char(65)) 583 | eq(match('255', '%d+ -> tochar'), string.char(255)) 584 | eq(match('41', '({%x+} $16) -> tochar'), string.char(0x41)) 585 | 586 | if utf8 and utf8.char then -- support utf8.char 587 | eq(match('65', '%d+ -> toutf8char'), string.char(65)) 588 | eq(match('41', '({%x+} $16) -> toutf8char'), string.char(0x41)) 589 | eq(match('03C0', '({%x+} $16) -> toutf8char'), "\xCF\x80") 590 | end 591 | 592 | local c = compile "{| {%d+} %s* |}+ ~> foldleft" 593 | eq({"1"}, c:match("1")) 594 | eq({{"1"},"2"}, c:match("1 2")) 595 | eq({{{"1"},"2"},"3"}, c:match("1 2 3")) 596 | 597 | c = compile "{| {%d+} %s* |}+ -> foldright" 598 | eq({"1"}, c:match("1")) 599 | eq({"1",{"2"}}, c:match("1 2")) 600 | eq({"1",{"2",{"3"}}}, c:match("1 2 3")) 601 | 602 | c = compile "{| {%d+} %s* |}+ -> rfoldleft" 603 | eq({"1"}, c:match("1")) 604 | eq({{"2"},"1"}, c:match("1 2")) 605 | eq({{{"3"},"2"},"1"}, c:match("1 2 3")) 606 | 607 | c = compile "{| {%d+} %s* |}+ ~> rfoldright" 608 | eq({"1"}, c:match("1")) 609 | eq({"2",{"1"}}, c:match("1 2")) 610 | eq({"3",{"2",{"1"}}}, c:match("1 2 3")) 611 | end) 612 | 613 | it("expected matches", function() 614 | local c = compile[[@'test' %s* @"aaaa"]] 615 | eq(c:match'test aaaa', 10) 616 | eq({nil, 'Expected_test', 1}, {c:match'tesi aaaa'}) 617 | eq({nil, 'Expected_aaaa', 6}, {c:match'test aaab'}) 618 | 619 | c = compile[[ 620 | rules <- @`test` @`=` 621 | NAME_SUFFIX <- [_%w]+ 622 | SKIP <- %s* 623 | ]] 624 | eq(c:match'test =', 7) 625 | eq({nil, 'Expected_test', 1}, {c:match'tesi aaaa'}) 626 | eq({nil, 'Expected_=', 6}, {c:match'test !'}) 627 | 628 | c = compile[[ 629 | rules <- @test %s* @aaaa 630 | test <- 'test' 631 | aaaa <- 'aaaa' 632 | ]] 633 | eq({nil, 'Expected_test', 1}, {c:match'tesi aaaa'}) 634 | eq({nil, 'Expected_aaaa', 6}, {c:match'test aaab'}) 635 | 636 | end) 637 | 638 | it("table capture", function() 639 | local c = compile[[ 640 | chunk <-- Numbers 641 | Numbers <-| NUMBER NUMBER 642 | NUMBER <-- {%d+} SKIP 643 | SKIP <-- %s* 644 | ]] 645 | eq({"1234", "5678"}, c:match('1234 5678')) 646 | end) 647 | 648 | it("node capture", function() 649 | local c = compile[[ 650 | chunk <-- Number 651 | Number <== {%d+} %s* 652 | ]] 653 | eq({tag="Number", pos=1, endpos=5, "1234"}, c:match('1234')) 654 | 655 | c = compile[[ 656 | chunk <-- num 657 | num:Number <== {%d+} %s* 658 | ]] 659 | eq({tag="Number", pos=1, endpos=5, "1234"}, c:match('1234')) 660 | 661 | c = compile([[ 662 | chunk <-- num 663 | num:Number <== {%d+} %s* 664 | ]], {__options={tag=function(name, node) 665 | node.mytag = name 666 | return node 667 | end}}) 668 | eq({mytag="Number", pos=1, endpos=5, "1234"}, c:match('1234')) 669 | end) 670 | 671 | it("quick ref examples", function() 672 | eq({match('x',[[ 673 | name <-- patt 674 | patt <- . 675 | ]])}, {match('x',[[ 676 | name <- patt 677 | patt <- . 678 | ]])}) 679 | 680 | eq({match('x',[[ 681 | Node <== patt 682 | patt <- . 683 | ]])}, {match('x',[[ 684 | Node <- {| {:pos:{}:} {:tag:''->'Node':} patt {:endpos:{}:} |} 685 | patt <- . 686 | ]])}) 687 | 688 | eq({match('x',[[ 689 | name : Node <== patt 690 | patt <- . 691 | ]])}, {match('x',[[ 692 | name <- {| {:pos:{}:} {:tag:''->'Node':} patt {:endpos:{}:} |} 693 | patt <- . 694 | ]])}) 695 | 696 | eq({match('keyword ',[[ 697 | G <- `keyword` 698 | NAME_SUFFIX <- [_%w]+ 699 | SKIP <- %s* 700 | ]])}, {match('keyword ',[[ 701 | G <- 'keyword' !NAME_SUFFIX SKIP 702 | NAME_SUFFIX <- [_%w]+ 703 | SKIP <- %s* 704 | ]])}) 705 | 706 | eq({match('. .. ',[[ 707 | G <- `.` `..` 708 | SKIP <- %s* 709 | ]])}, {match('. .. ',[[ 710 | G <- !('..' SKIP) '.' SKIP '..' SKIP 711 | SKIP <- %s* 712 | ]])}) 713 | 714 | eq({match('. ',[[ 715 | G <- {`,`} 716 | SKIP <- %s* 717 | ]])}, {match('. ',[[ 718 | G <- {`,`} SKIP 719 | SKIP <- %s* 720 | ]])}) 721 | 722 | eq({match('0',[[ 723 | G <- {patt}~? 724 | patt <- %d 725 | ]])}, {match('0',[[ 726 | G <- {patt} / ''->tofalse 727 | patt <- %d 728 | ]])}) 729 | 730 | eq({match('a',[[ 731 | G <- {patt}~? 732 | patt <- %d 733 | ]])}, {match('a',[[ 734 | G <- {patt} / ''->tofalse 735 | patt <- %d 736 | ]])}) 737 | 738 | eq({match('\n',[[ 739 | %cn 740 | ]])}, {match('\n',[[ 741 | %nl 742 | ]])}) 743 | 744 | eq({match('',[[ 745 | $'string' 746 | ]])}, {match('',[[ 747 | ''->'string' 748 | ]])}) 749 | 750 | eq({match('x',[[ 751 | G <- @'string' @rule 752 | rule <- . 753 | ]])}, {match('x',[[ 754 | G <- 'string'^Expected_string rule^Expected_rule 755 | rule <- . 756 | ]])}) 757 | end) 758 | 759 | it("calcline", function() 760 | expect.fail(function() lpegrex.calcline("a", -1) end, "invalid position") 761 | 762 | eq({1, 0, "a", 1, 1}, {lpegrex.calcline("a", 0)}) 763 | eq({1, 1, "a", 1, 1}, {lpegrex.calcline("a", 1)}) 764 | eq({1, 1, "a", 1, 1}, {lpegrex.calcline("a", 2)}) 765 | 766 | eq({1, 0, "ab", 1, 2}, {lpegrex.calcline("ab", 0)}) 767 | eq({1, 1, "ab", 1, 2}, {lpegrex.calcline("ab", 1)}) 768 | eq({1, 2, "ab", 1, 2}, {lpegrex.calcline("ab", 2)}) 769 | eq({1, 2, "ab", 1, 2}, {lpegrex.calcline("ab", 3)}) 770 | 771 | eq({1, 0, "a", 1, 1}, {lpegrex.calcline("a\n", 0)}) 772 | eq({1, 1, "a", 1, 1}, {lpegrex.calcline("a\n", 1)}) 773 | eq({2, 0, "", 3, 2}, {lpegrex.calcline("a\n", 2)}) 774 | 775 | eq({1, 0, "a", 1, 1}, {lpegrex.calcline("a\nb", 0)}) 776 | eq({1, 1, "a", 1, 1}, {lpegrex.calcline("a\nb", 1)}) 777 | eq({2, 0, "b", 3, 3}, {lpegrex.calcline("a\nb", 2)}) 778 | eq({2, 1, "b", 3, 3}, {lpegrex.calcline("a\nb", 3)}) 779 | 780 | eq({1, 0, "", 1, 0}, {lpegrex.calcline("\n", 0)}) 781 | eq({2, 0, "", 2, 1}, {lpegrex.calcline("\n", 1)}) 782 | eq({2, 0, "", 2, 1}, {lpegrex.calcline("\n", 2)}) 783 | 784 | eq({1, 0, "", 1, 0}, {lpegrex.calcline("\n\n", 0)}) 785 | eq({2, 0, "", 2, 1}, {lpegrex.calcline("\n\n", 1)}) 786 | eq({3, 0, "", 3, 2}, {lpegrex.calcline("\n\n", 2)}) 787 | eq({3, 0, "", 3, 2}, {lpegrex.calcline("\n\n", 3)}) 788 | 789 | local text = [[some 790 | long 791 | 792 | text 793 | ]] 794 | eq({ 1, 0, "some", 1, 4 }, {lpegrex.calcline(text, 0)}) 795 | eq({ 1, 1, "some", 1, 4 }, {lpegrex.calcline(text, 1)}) 796 | eq({ 1, 2, "some", 1, 4 }, {lpegrex.calcline(text, 2)}) 797 | eq({ 1, 3, "some", 1, 4 }, {lpegrex.calcline(text, 3)}) 798 | eq({ 1, 4, "some", 1, 4 }, {lpegrex.calcline(text, 4)}) 799 | eq({ 2, 0, "long", 6, 9 }, {lpegrex.calcline(text, 5)}) 800 | eq({ 2, 1, "long", 6, 9 }, {lpegrex.calcline(text, 6)}) 801 | eq({ 2, 2, "long", 6, 9 }, {lpegrex.calcline(text, 7)}) 802 | eq({ 2, 3, "long", 6, 9 }, {lpegrex.calcline(text, 8)}) 803 | eq({ 2, 4, "long", 6, 9 }, {lpegrex.calcline(text, 9)}) 804 | eq({ 3, 0, "", 11, 10 }, {lpegrex.calcline(text, 10)}) 805 | eq({ 4, 0, "text", 12, 15 }, {lpegrex.calcline(text, 11)}) 806 | eq({ 4, 1, "text", 12, 15 }, {lpegrex.calcline(text, 12)}) 807 | eq({ 4, 2, "text", 12, 15 }, {lpegrex.calcline(text, 13)}) 808 | eq({ 4, 3, "text", 12, 15 }, {lpegrex.calcline(text, 14)}) 809 | eq({ 4, 4, "text", 12, 15 }, {lpegrex.calcline(text, 15)}) 810 | eq({ 5, 0, "", 17, 16 }, {lpegrex.calcline(text, 16)}) 811 | eq({ 5, 0, "", 17, 16 }, {lpegrex.calcline(text, 17)}) 812 | end) 813 | 814 | end) 815 | 816 | -------------------------------------------------------------------------------- /tests/test.lua: -------------------------------------------------------------------------------- 1 | local lester = require 'tests.lester' 2 | 3 | require 'tests.lpegrex-test' 4 | require 'tests.c11-test' 5 | require 'tests.json-test' 6 | 7 | lester.report() 8 | --------------------------------------------------------------------------------