Description
124 | 125 | This module defines a handful of operations which can be applied to 126 | LPeg patterns and grammars in 127 | general. 128 | 129 |Dependencies
130 |-
131 |
- LPeg.
Operations
134 | 135 |Piping
136 | 137 | Pattern matching dissociates the notion of matching from the notion of 138 | capturing: matching checks if a given string follows a certain pattern, 139 | and capturing generates values according to the match made. This division 140 | allows interesting possibilities: 141 |-
142 |
- different problems can be solved by applying different captures to the same grammar; 143 |
- captures may be defined separately; 144 |
- captures may be done on top of other captures.
Completing
158 | 159 | With piping, several levels of captures can be chained together up to the 160 | most appropriate for the task at hand. Yet some levels might require extra rules, and modifications to existing ones, to ensure proper matching. 161 | 162 | To avoid manual copying, the new grammar should redefine only the necessary 163 | rules, copying the rest from the older grammar. This action is dubbed 164 | completing. 165 | 166 |Applying
167 | 168 | Once a new rule set is created and completed, and 169 | all captures are correctly piped, all that's left is 170 | to put them together, a process called applying. The result is a grammar ready for lpeg.P 171 | consumption, whose pattern will return the intended result when a match is made. 172 | 173 |Example
174 | 175 | Let's consider the problem of documenting a Lua module. In this case, comments 176 | must be captured before every function declaration when in the outermost scope: 177 | 178 |179 | -- the code to parse 180 | subject = [[ 181 | -- Calculates the sum a+b. 182 | -- An extra line. 183 | function sum (a, b) 184 | -- code 185 | end 186 | 187 | -- f1: assume a variable assignment is not a proper declaration for an 188 | -- exported function 189 | f1 = function () 190 | -- code 191 | end 192 | 193 | while true do 194 | -- this function is not in the outermost scope 195 | function aux() end 196 | end 197 | 198 | function something:other(a, ...) 199 | -- a global function without comments 200 | end 201 | ]] 202 |203 | 204 | In the code above only
sum
and something:other
should be documented, as f1
isn't properly (by our standards) declared and aux
is not in the outermost scope.
205 |
206 | By combining LPeg and the modules parser and grammar, this specific problem can be solved as follows:
207 |
208 | 209 | -- ye olde imports 210 | local parser, grammar = require 'leg.parser', require 'leg.grammar' 211 | local lpeg = require 'lpeg' 212 | 213 | -- a little aliasing never hurt anyone 214 | local P, V = lpeg.P, lpeg.V 215 | 216 | -- change only the initial rule and make no captures 217 | patt = grammar.apply(parser.rules, parser.COMMENT^-1 * V'GlobalFunction', nil) 218 | 219 | -- transform the new grammar into a LPeg pattern 220 | patt = P(patt) 221 | 222 | -- making a pattern that matches any Lua statement, also without captures 223 | Stat = P( grammar.apply(parser.rules, V'Stat', nil) ) 224 | 225 | -- a pattern which matches function declarations and skips statements in 226 | -- inner scopes or undesired tokens 227 | patt = (patt + Stat + parser.ANY)^0 228 | 229 | -- matching a string 230 | patt:match(subject) 231 |232 | 233 | These are the relevant rules in the grammar: 234 | 235 |
236 | GlobalFunction = 'function' * FuncName * FuncBody 237 | FuncName = ID * ('.' * ID)^0 * (':' * ID)^-1 238 | FuncBody = '(' * (ParList + EPSILON) * ')' * Block * 'end' 239 | ParList = NameList * (',' * '...')^-1 240 | NameList = ID * (',' * ID)^0 241 | ID = parser.IDENTIFIER 242 | EPSILON = P(true) 243 |244 | 245 | It may seem that
ParList + EPSILON
could be substituted for ParList^-1
(optionally match ParList
), but then no captures would be made for empty parameter lists, and GlobalFunction
would get all strings matched by FuncBody
. The EPSILON
rule acts in this manner as a placeholder in the argument list, avoiding any argument list processing in the capture function.
246 |
247 | Since no captures are being made, lpeg.match doesn't return anything interesting. Here are some possible captures:
248 |
249 | 250 | -- some interesting captures bundled up in a table. Note that the table keys 251 | -- match the grammar rules we want to add captures to. Whatever rules aren't in 252 | -- the rules table below will come from parser.rules . 253 | captures = { 254 | [1] = function (...) -- the initial rule 255 | return '<function>'..table.concat{...}..'</function>' 256 | end, 257 | 258 | GlobalFunction = function (name, parlist) 259 | return '<name>'..name..'</name><parlist>'..(parlist or '')..'</parlist>' 260 | end, 261 | 262 | FuncName = grammar.C, -- capture the raw text 263 | ParList = grammar.C, -- capture the raw text 264 | COMMENT = parser.comment2text, -- remove the comment trappings 265 | } 266 | 267 | -- spacing rule 268 | local S = parser.SPACE ^ 0 269 | 270 | -- rules table 271 | rules = { 272 | [1] = ((V'COMMENT' *S) ^ 0) *S* V'GlobalFunction', 273 | COMMENT = parser.COMMENT, 274 | } 275 | 276 | -- building the new grammar and adding the captures 277 | patt = P( grammar.apply(parser.rules, rules, captures) ) 278 | 279 | -- a pattern that matches a sequence of patts and concatenates the results 280 | patt = (patt + Stat + parser.ANY)^0 / function(...) 281 | return table.concat({...}, '\n\n') -- some line breaks for easier reading 282 | end 283 | 284 | -- finally, matching a string 285 | print(patt:match(subject)) 286 |287 | 288 |
FuncBody
needs no captures, as Block
and all its non-terminals have none; it
289 | just needs to pass along any captures made by ParList
. NameList
and ID
also have no captures, and the whole subject string is passed further.
290 |
291 | The printed result is:
292 | 293 | <function>Calculates the sum a+b. An extra line.<name>sum</name><parlist>a, b</parlist></function> 294 |297 | 298 | 299 | 300 | 301 | 302 | 303 | 304 |
295 | <function><name>something:other</name><parlist>a, ...</parlist></function> 296 |
305 |
Functions
306 | anyOf (t) |
311 | Returns a pattern which matches any of the patterns in t . |
312 |
anywhere (patt) |
317 | Returns a pattern which searches for the pattern patt anywhere in a string. |
318 |
apply (grammar, rules, captures) |
323 | Completes rules with grammar and then applies captures . |
324 |
C () |
329 | A capture function, made so that patt / C is equivalent to m.C(patt) . It's intended to be used in capture tables, such as those required by pipe and apply. |
330 |
complete (dest, orig) |
335 | Completes dest with orig . |
336 |
copy (grammar) |
341 | Creates a shallow copy of grammar . |
342 |
Ct () |
347 | A capture function, made so that patt / Ct is equivalent to m.Ct(patt) . It's intended to be used in capture tables, such as those required by pipe and apply. |
348 |
listOf (patt, sep) |
353 | Returns a pattern which matches a list of patt s, separated by sep . |
354 |
oneOf (list) |
359 | Returns a pattern which matches any of the patterns in list . |
360 |
pipe (dest, orig) |
365 | Pipes the captures in orig to the ones in dest . |
366 |
pmatch (patt) |
371 | Returns a pattern which simply fails to match if an error is thrown during the matching. | 372 |
anyOf (t)
379 | - Returns a pattern which matches any of the patterns in
t
.
380 |
381 | The iterator pairs
is used to traverse t
, so no particular traversal order
382 | is guaranteed. Use oneOf to ensure sequential matching
383 | attempts.
384 |
385 | Example:
386 | 387 | local g, p, m = require 'leg.grammar', require 'leg.parser', require 'lpeg' 388 | 389 | -- match numbers or operators, capture the numbers 390 | print( (g.anyOf { '+', '-', '*', '/', m.C(p.NUMBER) }):match '34.5@23 * 56 / 45 - 45' ) 391 | --> prints 34.5 392 |393 | 394 | Parameters:
-
395 |
t
: a table with LPeg patterns as values. The keys are ignored.
-
398 |
- a pattern which matches any of the patterns received.
anywhere (patt)
402 | - Returns a pattern which searches for the pattern
patt
: a LPeg pattern.- a LPeg pattern which searches for
patt
anywhere in the string.
patt
anywhere in a string.
403 |
404 | This code was extracted from the LPeg home page, in the examples section.
405 |
406 | Parameters:-
407 |
-
410 |
apply (grammar, rules, captures)
414 | - Completes
- a single pattern, which is taken to be the new initial rule, 418 |
- a possibly incomplete LPeg grammar table, as per complete, or 419 |
nil
, which means no new rules are added.- a capture table, as per pipe, or 423 |
nil
, which means no captures are applied.grammar
: the old grammar. It stays unmodified.
427 | rules
: optional, the new rules.
428 | captures
: optional, the final capture table.rules
, suitably augmented bygrammar
andcaptures
.
rules
with grammar
and then applies captures
.
415 |
416 | rules
can either be:-
417 |
captures
can either be:-
422 |
-
426 |
-
431 |
C ()
435 | - A capture function, made so that
patt / C
is equivalent to m.C(patt)
. It's intended to be used in capture tables, such as those required by pipe and apply.complete (dest, orig)
439 | - Completes
dest
: the new grammar. Must be a table.
443 | orig
: the original grammar. Must be a table.dest
, with new rules inherited fromorig
.
dest
with orig
.
440 |
441 | Parameters:-
442 |
-
446 |
copy (grammar)
450 | - Creates a shallow copy of
grammar
: a regular table.- a newly created table, with
grammar
's keys and values.
grammar
.
451 |
452 | Parameters:-
453 |
-
456 |
Ct ()
460 | - A capture function, made so that
patt / Ct
is equivalent to m.Ct(patt)
. It's intended to be used in capture tables, such as those required by pipe and apply.listOf (patt, sep)
464 | - Returns a pattern which matches a list of
patt
s, separated by sep
.
465 |
466 | Example: matching comma-separated values:
467 | 468 | local g, m = require 'leg.grammar', require 'lpeg' 469 | 470 | -- separator 471 | local sep = m.P',' + m.P'\n' 472 | 473 | -- element: anything but sep, capture it 474 | local elem = m.C((1 - sep)^0) 475 | 476 | -- pattern 477 | local patt = g.listOf(elem, sep) 478 | 479 | -- matching 480 | print( patt:match [[a, b, 'christmas eve' 481 | d, evening; mate! 482 | f]]) 483 | --> prints out "a b 'christmas eve' d evening; mate! f" 484 |485 | 486 | Parameters:
-
487 |
patt
: a LPeg pattern.
488 | sep
: a LPeg pattern.
-
491 |
- the following pattern:
patt * (sep * patt)^0
oneOf (list)
495 | - Returns a pattern which matches any of the patterns in
list
: a list of LPeg patterns.- a pattern which matches any of the patterns received.
list
.
496 |
497 | Differently from anyOf, this function ensures sequential
498 | traversing.
499 |
500 | Parameters:-
501 |
-
504 |
pipe (dest, orig)
508 | - Pipes the captures in
dest
: a capture table.
514 | orig
: a capture table.dest
, suitably modified.
orig
to the ones in dest
.
509 |
510 | dest
and orig
should be tables, with each key storing a capture function. Each capture in dest
will be altered to use the results for the matching one in orig
as input, using function composition. Should orig
possess keys not in dest
, dest
will copy them.
511 |
512 | Parameters:-
513 |
-
517 |
pmatch (patt)
521 | - Returns a pattern which simply fails to match if an error is thrown during the matching.
522 |
523 | One usage example is parser.NUMBER. Originally it threw an error when trying to match a malformed number (such as 1e23e4), since in this case the input is obviously invalid and the pattern would be part of the Lua grammar. So pmatch is used to catch the error and return
patt
: a LPeg pattern.- a pattern which catches any errors thrown during the matching and simply doesn't match instead of propagating the error.
nil
(signalling a non-match) and the error message.
524 |
525 | Parameters:-
526 |
-
529 |
532 | 533 | 534 | 535 | 536 |