t |
PEG.js is currently maintained by Futago-za Ryuu. 13 | Since it's inception in 2010, PEG.js was 14 | maintained by David Majda (@dmajda), 15 | until May 2017.
16 | 17 |The Bower package is maintained by 18 | Michel Krämer 19 | (@michelkraemer).
20 | 21 |You are welcome to contribute code. Unless your contribution is really 22 | trivial you should get in touch with me first — this can prevent wasted 23 | effort on both sides. You can send code both as a patch or a GitHub pull 24 | request.
25 | 26 |Note that PEG.js is still very much work in progress. There are no 27 | compatibility guarantees until version 1.0.
28 | -------------------------------------------------------------------------------- /views/documentation.ejs: -------------------------------------------------------------------------------- 1 |To use the pegjs
command, install PEG.js globally:
$ npm install -g pegjs
37 |
38 | To use the JavaScript API, install PEG.js locally:
39 | 40 |$ npm install pegjs
41 |
42 | If you need both the pegjs
command and the JavaScript API,
43 | install PEG.js both ways.
Download the PEG.js library 48 | (regular or minified version) or install it using Bower:
49 | 50 |$ bower install pegjs
51 |
52 | PEG.js generates parser from a grammar that describes expected input and can 55 | specify what the parser returns (using semantic actions on matched parts of the 56 | input). Generated parser itself is a JavaScript object with a simple API.
57 | 58 |To generate a parser from your grammar, use the pegjs
61 | command:
$ pegjs arithmetics.pegjs
64 |
65 | This writes parser source code into a file with the same name as the grammar 66 | file but with “.js” extension. You can also specify the output file 67 | explicitly:
68 | 69 |$ pegjs -o arithmetics-parser.js arithmetics.pegjs
70 |
71 | If you omit both input and output file, standard input and output are 72 | used.
73 | 74 |By default, the generated parser is in the Node.js module format. You can
75 | override this using the --format
option.
You can tweak the generated parser with several options:
78 | 79 |--allowed-start-rules
--cache
--dependency
--export-var
--extra-options
peg.generate
.--extra-options-file
peg.generate
.--format
amd
, commonjs
,
106 | globals
, umd
(default: commonjs
).--optimize
speed
) or code size (size
) (default:
111 | speed
)--plugin
--trace
In Node.js, require the PEG.js parser generator module:
124 | 125 |var peg = require("pegjs");
126 |
127 | In browser, include the PEG.js library in your web page or application using
128 | the <script>
tag. If PEG.js detects an AMD loader, it will
129 | define itself as a module, otherwise the API will be available in the
130 | peg
global object.
To generate a parser, call the peg.generate
method and pass your
133 | grammar as a parameter:
var parser = peg.generate("start = ('a' / 'b')+");
136 |
137 | The method will return generated parser object or its source code as a string
138 | (depending on the value of the output
option — see below). It will
139 | throw an exception if the grammar is invalid. The exception will contain
140 | message
property with more details about the error.
You can tweak the generated parser by passing a second parameter with an
143 | options object to peg.generate
. The following options are
144 | supported:
allowedStartRules
cache
true
, makes the parser cache results, avoiding exponential
153 | parsing time in pathological cases but making the parser slower (default:
154 | false
).dependencies
format
is set to "amd"
,
160 | "commonjs"
, or "umd"
(default:
161 | {}
).exportVar
format
is set to
166 | "globals"
or "umd"
(default:
167 | null
).format
"amd"
, "bare"
,
171 | "commonjs"
, "globals"
, or "umd"
); valid
172 | only when output
is set to "source"
(default:
173 | "bare"
).optimize
"speed"
) or code size ("size"
) (default:
178 | "speed"
).output
"parser"
, the method will return generated parser
182 | object; if set to "parser"
).plugins
trace
false
).Using the generated parser is simple — just call its parse
195 | method and pass an input string as a parameter. The method will return a parse
196 | result (the exact value depends on the grammar used to generate the parser) or
197 | throw an exception if the input is invalid. The exception will contain
198 | location
, expected
, found
and
199 | message
properties with more details about the error.
parser.parse("abba"); // returns ["a", "b", "b", "a"]
202 |
203 | parser.parse("abcd"); // throws an exception
204 |
205 | You can tweak parser behavior by passing a second parameter with an options
206 | object to the parse
method. The following options are
207 | supported:
startRule
tracer
Parsers can also support their own custom options.
218 | 219 |The grammar syntax is similar to JavaScript in that it is not line-oriented
222 | and ignores whitespace between tokens. You can also use JavaScript-style
223 | comments (// ...
and /* ... */
).
Let's look at example grammar that recognizes simple arithmetic expressions
226 | like 2*(3+4)
. A parser generated from this grammar computes their
227 | values.
start
230 | = additive
231 |
232 | additive
233 | = left:multiplicative "+" right:additive { return left + right; }
234 | / multiplicative
235 |
236 | multiplicative
237 | = left:primary "*" right:multiplicative { return left * right; }
238 | / primary
239 |
240 | primary
241 | = integer
242 | / "(" additive:additive ")" { return additive; }
243 |
244 | integer "integer"
245 | = digits:[0-9]+ { return parseInt(digits.join(""), 10); }
246 |
247 | On the top level, the grammar consists of rules (in our example,
248 | there are five of them). Each rule has a name (e.g.
249 | integer
) that identifies the rule, and a parsing
250 | expression (e.g. digits:[0-9]+ { return parseInt(digits.join(""),
251 | 10); }
) that defines a pattern to match against the input text and
252 | possibly contains some JavaScript code that determines what happens when the
253 | pattern matches successfully. A rule can also contain human-readable
254 | name that is used in error messages (in our example, only the
255 | integer
rule has a human-readable name). The parsing starts at the
256 | first rule, which is also called the start rule.
A rule name must be a JavaScript identifier. It is followed by an equality 259 | sign (“=”) and a parsing expression. If the rule has a human-readable name, it 260 | is written as a JavaScript string between the name and separating equality sign. 261 | Rules need to be separated only by whitespace (their beginning is easily 262 | recognizable), but a semicolon (“;”) after the parsing expression is 263 | allowed.
264 | 265 |The first rule can be preceded by an initializer — a piece of
266 | JavaScript code in curly braces (“{” and “}”). This code is executed before the
267 | generated parser starts parsing. All variables and functions defined in the
268 | initializer are accessible in rule actions and semantic predicates. The code
269 | inside the initializer can access options passed to the parser using the
270 | options
variable. Curly braces in the initializer code must be
271 | balanced. Let's look at the example grammar from above using a simple
272 | initializer.
{
275 | function makeInteger(o) {
276 | return parseInt(o.join(""), 10);
277 | }
278 | }
279 |
280 | start
281 | = additive
282 |
283 | additive
284 | = left:multiplicative "+" right:additive { return left + right; }
285 | / multiplicative
286 |
287 | multiplicative
288 | = left:primary "*" right:multiplicative { return left * right; }
289 | / primary
290 |
291 | primary
292 | = integer
293 | / "(" additive:additive ")" { return additive; }
294 |
295 | integer "integer"
296 | = digits:[0-9]+ { return makeInteger(digits); }
297 |
298 | The parsing expressions of the rules are used to match the input text to the 299 | grammar. There are various types of expressions — matching characters or 300 | character classes, indicating optional parts and repetition, etc. Expressions 301 | can also contain references to other rules. See detailed 303 | description below.
304 | 305 |If an expression successfully matches a part of the text when running the 306 | generated parser, it produces a match result, which is a JavaScript 307 | value. For example:
308 | 309 |The match results propagate through the rules when the rule names are used in 318 | expressions, up to the start rule. The generated parser returns start rule's 319 | match result when parsing is successful.
320 | 321 |One special case of parser expression is a parser action — a 322 | piece of JavaScript code inside curly braces (“{” and “}”) that takes match 323 | results of some of the the preceding expressions and returns a JavaScript value. 324 | This value is considered match result of the preceding expression (in other 325 | words, the parser action is a match result transformer).
326 | 327 |In our arithmetics example, there are many parser actions. Consider the
328 | action in expression digits:[0-9]+ { return parseInt(digits.join(""), 10);
329 | }
. It takes the match result of the expression [0-9]+, which is an array
330 | of strings containing digits, as its parameter. It joins the digits together to
331 | form a number and converts it to a JavaScript number
object.
There are several types of parsing expressions, some of them containing 336 | subexpressions and thus forming a recursive structure:
337 | 338 |"literal"
'literal'
Match exact literal string and return it. The string syntax is the same
343 | as in JavaScript. Appending i
right after the literal makes the
344 | match case-insensitive.
.
Match exactly one character and return it as a string.
351 |[characters]
Match one character from a set and return it as a string. The characters
357 | in the list can be escaped in exactly the same way as in JavaScript string.
358 | The list of characters can also contain ranges (e.g. [a-z]
359 | means “all lowercase letters”). Preceding the characters with ^
360 | inverts the matched set (e.g. [^a-z]
means “all character but
361 | lowercase letters”). Appending i
right after the literal makes
362 | the match case-insensitive.
rule
Match a parsing expression of a rule recursively and return its match 369 | result.
370 |( expression )
Match a subexpression and return its match result.
376 |expression *
Match zero or more repetitions of the expression and return their match 382 | results in an array. The matching is greedy, i.e. the parser tries to match 383 | the expression as many times as possible. Unlike in regular expressions, 384 | there is no backtracking.
385 |expression +
Match one or more repetitions of the expression and return their match 391 | results in an array. The matching is greedy, i.e. the parser tries to match 392 | the expression as many times as possible. Unlike in regular expressions, 393 | there is no backtracking.
394 |expression ?
Try to match the expression. If the match succeeds, return its match
400 | result, otherwise return null
. Unlike in regular expressions,
401 | there is no backtracking.
& expression
Try to match the expression. If the match succeeds, just return
408 | undefined
and do not consume any input, otherwise consider the
409 | match failed.
! expression
Try to match the expression. If the match does not succeed, just return
416 | undefined
and do not consume any input, otherwise consider the
417 | match failed.
& { predicate }
The predicate is a piece of JavaScript code that is executed as if it was
424 | inside a function. It gets the match results of labeled expressions in
425 | preceding expression as its arguments. It should return some JavaScript
426 | value using the return
statement. If the returned value
427 | evaluates to true
in boolean context, just return
428 | undefined
and do not consume any input; otherwise consider the
429 | match failed.
The code inside the predicate can access all variables and functions 432 | defined in the initializer at the beginning of the grammar.
433 | 434 |The code inside the predicate can also access location information using
435 | the location
function. It returns an object like this:
{
438 | start: { offset: 23, line: 5, column: 6 },
439 | end: { offset: 23, line: 5, column: 6 }
440 | }
441 |
442 | The start
and end
properties both refer to the
443 | current parse position. The offset
property contains an offset
444 | as a zero-based index and line
and column
445 | properties contain a line and a column as one-based indices.
The code inside the predicate can also access options passed to the
448 | parser using the options
variable.
Note that curly braces in the predicate code must be balanced.
451 |! { predicate }
The predicate is a piece of JavaScript code that is executed as if it was
457 | inside a function. It gets the match results of labeled expressions in
458 | preceding expression as its arguments. It should return some JavaScript
459 | value using the return
statement. If the returned value
460 | evaluates to false
in boolean context, just return
461 | undefined
and do not consume any input; otherwise consider the
462 | match failed.
The code inside the predicate can access all variables and functions 465 | defined in the initializer at the beginning of the grammar.
466 | 467 |The code inside the predicate can also access location information using
468 | the location
function. It returns an object like this:
{
471 | start: { offset: 23, line: 5, column: 6 },
472 | end: { offset: 23, line: 5, column: 6 }
473 | }
474 |
475 | The start
and end
properties both refer to the
476 | current parse position. The offset
property contains an offset
477 | as a zero-based index and line
and column
478 | properties contain a line and a column as one-based indices.
The code inside the predicate can also access options passed to the
481 | parser using the options
variable.
Note that curly braces in the predicate code must be balanced.
484 |$ expression
Try to match the expression. If the match succeeds, return the matched 490 | text instead of the match result.
491 |label : expression
Match the expression and remember its match result under given label. 497 | The label must be a JavaScript identifier.
498 | 499 |Labeled expressions are useful together with actions, where saved match 500 | results can be accessed by action's JavaScript code.
501 |expression1 expression2 ... expressionn
Match a sequence of expressions and return their match results in an array.
507 |expression { action }
Match the expression. If the match is successful, run the action, 513 | otherwise consider the match failed.
514 | 515 |The action is a piece of JavaScript code that is executed as if it was
516 | inside a function. It gets the match results of labeled expressions in
517 | preceding expression as its arguments. The action should return some
518 | JavaScript value using the return
statement. This value is
519 | considered match result of the preceding expression.
To indicate an error, the code inside the action can invoke the
522 | expected
function, which makes the parser throw an exception.
523 | The function takes two parameters — a description of what was expected at
524 | the current position and optional location information (the default is what
525 | location
would return — see below). The description will be
526 | used as part of a message of the thrown exception.
The code inside an action can also invoke the error
529 | function, which also makes the parser throw an exception. The function takes
530 | two parameters — an error message and optional location information (the
531 | default is what location
would return — see below). The message
532 | will be used by the thrown exception.
The code inside the action can access all variables and functions defined 535 | in the initializer at the beginning of the grammar. Curly braces in the 536 | action code must be balanced.
537 | 538 |The code inside the action can also access the text matched by the
539 | expression using the text
function.
The code inside the action can also access location information using the
542 | location
function. It returns an object like this:
{
545 | start: { offset: 23, line: 5, column: 6 },
546 | end: { offset: 25, line: 5, column: 8 }
547 | }
548 |
549 | The start
property refers to the position at the beginning
550 | of the expression, the end
property refers to position after
551 | the end of the expression. The offset
property contains an
552 | offset as a zero-based index and line
and column
553 | properties contain a line and a column as one-based indices.
The code inside the action can also access options passed to the parser
556 | using the options
variable.
Note that curly braces in the action code must be balanced.
559 |expression1 / expression2 / ... / expressionn
Try to match the first expression, if it does not succeed, try the second 565 | one, etc. Return the match result of the first successfully matched 566 | expression. If no expression matches, consider the match failed.
567 |Both the parser generator and generated parsers should run well in the 573 | following environments:
574 | 575 |PEG.js is a simple parser generator for JavaScript that produces fast 18 | parsers with excellent error reporting. You can use it to process complex data 19 | or computer languages and build transformers, interpreters, compilers and 20 | other tools easily.
21 | 22 |
4 |
|
58 |
59 |
|
117 |