├── .circleci
└── config.yml
├── .clj-kondo
└── config.edn
├── .gitattributes
├── .gitignore
├── CHANGES.md
├── LICENSE
├── README.md
├── docs
├── ABNF.md
├── ExperimentalFeatures.md
├── Performance.md
└── Tracing.md
├── images
└── vizexample1.png
├── project.clj
├── resources
└── clj-kondo.exports
│ └── instaparse
│ └── config.edn
├── runner
└── cljs
│ └── runner
│ └── runner.cljs
├── src
└── instaparse
│ ├── abnf.cljc
│ ├── auto_flatten_seq.cljc
│ ├── cfg.cljc
│ ├── combinators.cljc
│ ├── combinators_source.cljc
│ ├── core.cljc
│ ├── failure.cljc
│ ├── gll.cljc
│ ├── line_col.cljc
│ ├── macros.clj
│ ├── print.cljc
│ ├── reduction.cljc
│ ├── repeat.cljc
│ ├── transform.cljc
│ ├── util.cljc
│ ├── viz.clj
│ └── viz.cljs
└── test
├── data
├── abnf_uri.txt
├── defparser_grammar.txt
└── phone_uri.txt
└── instaparse
├── abnf_test.cljc
├── auto_flatten_seq_test.cljc
├── core_test.cljc
├── defparser_test.cljc
├── failure_test.cljc
├── grammars.cljc
├── namespaced_nts_test.cljc
├── repeat_test.cljc
├── specs.cljc
└── viz_test.clj
/.circleci/config.yml:
--------------------------------------------------------------------------------
1 | version: 2
2 |
3 | workflows:
4 | version: 2
5 | build:
6 | jobs:
7 | - test-clj
8 | - test-cljs
9 |
10 | jobs:
11 | test-clj:
12 | working_directory: ~/project
13 | docker:
14 | - image: circleci/clojure:lein-2.8.1
15 | steps:
16 | - checkout
17 | - run: lein check
18 | - run: lein test-all
19 | test-cljs:
20 | working_directory: ~/project
21 | docker:
22 | - image: circleci/clojure:lein-2.8.1-node
23 | steps:
24 | - checkout
25 | - run: lein test-cljs-all
--------------------------------------------------------------------------------
/.clj-kondo/config.edn:
--------------------------------------------------------------------------------
1 | {:config-paths ["../resources/clj-kondo.exports/instaparse"]}
2 |
--------------------------------------------------------------------------------
/.gitattributes:
--------------------------------------------------------------------------------
1 | * text auto
2 | *.clj text
3 | *.md text
4 | *.png binary
--------------------------------------------------------------------------------
/.gitignore:
--------------------------------------------------------------------------------
1 | /target
2 | /lib
3 | /classes
4 | /checkouts
5 | /bin
6 | /out
7 | deps.edn
8 | .cpcache
9 | .project
10 | .classpath
11 | pom.xml
12 | deps.edn
13 | *.jar
14 | *.class
15 | .lein-deps-sum
16 | .lein-failures
17 | .lein-plugins
18 | ideas.txt
19 | benchmarks.txt
20 | todo.txt
21 | /.settings
22 | .nrepl-port
23 | .lein-repl-history
24 | *~
25 | *#*#
26 | .cljs_node_repl/
27 | .idea/
28 | *.iml
29 | *.asc
30 | .nrepl-history
31 | /.clj-kondo
32 | !/.clj-kondo/config.edn
33 |
--------------------------------------------------------------------------------
/CHANGES.md:
--------------------------------------------------------------------------------
1 | # Instaparse Change Log
2 |
3 | ## 1.5.0
4 |
5 | ### Enhancements
6 |
7 | * instaparse.core/parser now accepts an optional keyword argument `:allow-namespaced-nts true` which accepts namespaced non-terminals in the parser's grammar, thus building a parser that will tag the output with the corresponding namespaced keywords.
8 |
9 | ## 1.4.14
10 |
11 | ### Enhancements
12 |
13 | * Now leverages clojurescript's implicit sugar for :require-macros, :include-macros, and :refer-macros in namespace declaration. Thanks to sumbach for the pull request!
14 |
15 | ## 1.4.13
16 |
17 | ### Enhancements
18 |
19 | * Added clj-kondo resource file. Thanks to toniz4 for the pull request!
20 | * Added new arity to add-line-and-column-info-to-metadata that supports starting-line and starting-column. Thanks to mainej for the pull request!
21 |
22 | ## 1.4.12
23 |
24 | ### Bugfixes
25 |
26 | * Instaparse error messages weren't pointing the caret at the right character when the text had tab characters. Thanks to ema-fox and seltzer1717 for the pull request.
27 |
28 | ## 1.4.11
29 |
30 | ### Bugfixes
31 |
32 | * Fixed problem where `:start` option wasn't being respected when grammar was provided as a file.
33 |
34 | ## 1.4.10
35 |
36 | ### Enhancements
37 |
38 | * Change to remove warning caused by latest version of Clojurescript, which warned about use of private var from tools.reader.
39 |
40 | * Added type hints to support native compilation under Graal.
41 |
42 | * Removed test case broken by Clojure 1.10.
43 |
44 | ## 1.4.9
45 |
46 | ### Enhancements
47 |
48 | * ABNF parsers' string case-insensitivity can now be disabled by setting `:string-ci false`.
49 |
50 | * `ebnf` and `abnf` combinators now support an optional `:string-ci` argument, which overrides the default case-insensitivity behavior for that input format.
51 |
52 | ### Bugfixes
53 |
54 | * Case-insensitive regexp flag on Clojurescript
55 |
56 | * Better handling for when rhizome is present in compilation environment, but not at runtime.
57 |
58 | ## 1.4.8
59 |
60 | ### Updates
61 |
62 | * Update to support Clojurescript 1.9.854 and above, due to a breaking change in Clojurescript to use tools.reader.
63 |
64 | ## 1.4.7
65 |
66 | ### Enhancements
67 |
68 | * `visualize` now supports `:output-file :buffered-image`, which returns a java.awt.image.BufferedImage object.
69 |
70 | ### Bugfixes
71 |
72 | * Fixed problem where `visualize` with `:output-file` didn't work on rootless trees.
73 |
74 | ## 1.4.6
75 |
76 | ### Performance improvements
77 |
78 | * Better performance for ABNF grammars in Clojurescript.
79 |
80 | ## 1.4.5
81 |
82 | ### Bugfixes
83 |
84 | * Fixed regression in 1.4.4 involving parsers based off of URIs.
85 |
86 | * defparser now supports the full range of relevant parser options.
87 |
88 | ## 1.4.4
89 |
90 | ### Enhancements
91 |
92 | * Instaparse is now cross-platform compatible between Clojure and Clojurescript.
93 |
94 | ### Features
95 |
96 | * defparser - builds parser at compile time
97 |
98 | ## 1.4.3
99 |
100 | ### Bugfixes
101 |
102 | * Fixed bug with insta/transform on tree with hidden root tag and strings at the top level of the tree.
103 |
104 | ## 1.4.2
105 |
106 | ### Bugfixes
107 |
108 | * Fixed problem with counted repetitions in ABNF.
109 |
110 | ## 1.4.1
111 |
112 | ### Features
113 |
114 | * New function `add-line-and-column-info-to-metadata` in the instaparse.core namespace.
115 |
116 | ### Enhancements
117 |
118 | * Added new combinators for unicode character ranges, for better portability to Clojurescript.
119 |
120 | ### Bugfixes
121 |
122 | * Improved compatibility with boot, which allows having multiple versions of Clojure on the classpath, by making change to string-reader which needs to
123 | be aware of what version of Clojure it is running due to a breaking change in Clojure 1.7.
124 |
125 | * Fixed bug with the way failure messages were printed in certain cases.
126 |
127 | ## 1.4.0
128 |
129 | ### Bugfixes
130 |
131 | * In 1.3.6, parsing of any CharSequence was introduced, however, the error messages
132 | for failed parses weren't printing properly. This has been fixed.
133 |
134 | * 1.4.0 uses a more robust algorithm for handling nested negative lookaheads, in
135 | response to a bug report where the existing mechanism produced incorrect parses
136 | (in addition to the correct parse) for a very unusual case.
137 |
138 | ### Enhancements
139 |
140 | * New support for tracing the steps the parser goes through. Call your parser with
141 | the optional flag `:trace true`. The first time you use this flag, it triggers a
142 | recompilation of the code with additional tracing and profiling steps.
143 | To restore the code to its non-instrumented form, call `(insta/disable-tracing!)`.
144 |
145 | ## 1.3.6
146 |
147 | ### Enhancements
148 |
149 | * Modified for compatibility with Clojure 1.7.0-alpha6
150 | * Instaparse now can parse anything supporting the CharSequence interface, not just strings.
151 | Specifically, this allows instaparse to operate on StringBuilder objects.
152 |
153 | ## 1.3.5
154 |
155 | ### Bugfixes
156 |
157 | * Fixed bug with `transform` on hiccup data structures with numbers or other atomic data as leaves.
158 |
159 | * Fixed bug with character concatenation support in ABNF grammar
160 |
161 | ### Enhancements
162 |
163 | * Added support for Unicode characters to ABNF.
164 |
165 | ## 1.3.4
166 |
167 | ### Enhancements
168 |
169 | * Modified for compatibility with Clojure 1.7.0-alpha2.
170 |
171 | ## 1.3.3
172 |
173 | ### Enhancements
174 |
175 | Made two changes to make it possible to use instaparse on Google App Engine.
176 |
177 | * Removed dependency on javax.swing.text.Segment class.
178 | * Added `:no-slurp true` keyword option to `insta/parser` to disable URI slurping behavior, since GAE does not support slurp.
179 |
180 | ## 1.3.2
181 |
182 | ### Bugfixes
183 |
184 | * Regular expressions on empty strings weren't properly returning a failure.
185 |
186 | ## 1.3.1
187 |
188 | ### Enhancements
189 |
190 | * Updated tests to use Clojure 1.6.0's final release.
191 | * Added `:ci-string true` flag to `insta/parser`.
192 |
193 | ## 1.3.0
194 |
195 | ### Compatibility with Clojure 1.6
196 |
197 | ## 1.2.16
198 |
199 | ### Bugfixes
200 |
201 | * Calling `empty` on a FlattenOnDemandVector now returns [].
202 |
203 | ## 1.2.15
204 |
205 | ### Enhancements
206 |
207 | * :auto-whitespace can now take the keyword :standard or :comma to access one of the predefined whitespace parsers.
208 |
209 | ### Bugfixes
210 |
211 | * Fixed newline problem visualizing parse trees on Linux.
212 | * Fixed problem with visualizing rootless trees.
213 |
214 | ## 1.2.11
215 |
216 | ### Minor enhancements
217 |
218 | * Further refinements to the way ordered choice interacts with epsilon parsers.
219 |
220 | ## 1.2.10
221 |
222 | ### Bugfixes
223 |
224 | * Fixed bug introduced by 1.2.9 affecting ordered choice.
225 |
226 | ## 1.2.9
227 |
228 | ### Bugfixes
229 |
230 | * Fixed bug where ordered choice was ignoring epsilon parser.
231 |
232 | ## 1.2.8
233 |
234 | ### Bugfixes
235 |
236 | * Fixed bug introduced by 1.2.7, affecting printing of grammars with regexes.
237 |
238 | ### Enhancements
239 |
240 | * Parser printing format now includes <> hidden information and tags.
241 |
242 | ## 1.2.7
243 |
244 | ### Bugfixes
245 |
246 | * Fixed bug when regular expression contains | character.
247 |
248 | ## 1.2.6
249 |
250 | ### Bugfixes
251 |
252 | * Changed pre-condition assertion for auto-whitespace option which was causing a problem with "lein jar".
253 |
254 | ## 1.2.5
255 |
256 | ### Bugfixes
257 |
258 | * Improved handling of unusual characters in ABNF grammars.
259 |
260 | ## 1.2.4
261 |
262 | ### Bugfixes
263 |
264 | * When parsing in :total mode with :enlive as the output format, changed the content of failure node from vector to list to match the rest of the enlive output.
265 |
266 | ## 1.2.3
267 |
268 | ### Bugfixes
269 |
270 | * Fixed problem when epsilon was the only thing in a nonterminal, e.g., "S = epsilon"
271 |
272 | ### Features
273 |
274 | * Added experimental `:auto-whitespace` feature. See the [Experimental Features Document](docs/ExperimentalFeatures.md) for more details.
275 |
276 | ## 1.2.2
277 |
278 | ### Bugfixes
279 |
280 | * Fixed reflection warning.
281 |
282 | ## 1.2.1
283 |
284 | ### Bugfixes
285 |
286 | * I had accidentally left a dependency on tools.trace in the repeat.clj file, used while I was debugging that namespace. Removed it.
287 |
288 | ## 1.2.0
289 |
290 | ### New Features
291 |
292 | * `span` function returns substring indexes into the parsed text for a portion of the parse tree.
293 | * `visualize` function draws the parse tree, using rhizome and graphviz if installed.
294 | * `:optimize :memory` flag that, for suitable parsers, will perform the parsing in discrete chunks, using less memory.
295 | * New parsing flag to undo the effect of the <> hide notation.
296 | + `(my-parser text :unhide :tags)` - reveals tags, i.e., `<>` applied on the left-hand sides of rules.
297 | + `(my-parser text :unhide :content)` - reveals content hidden on the right-hand side of rules with `<>`
298 | + `(my-parser text :unhide :all)` - reveals both tags and content.
299 |
300 | ### Notable Performance Improvements
301 |
302 | * Dramatic performance improvement (quadratic time reduced to linear) when repetition parsers (+ or *) operate on text whose parse tree contains a large number of repetitions.
303 | * Performance improvement for regular expressions.
304 |
305 | ### Minor Enhancements
306 |
307 | * Added more support to IncrementalVector for a wider variety of vector operations, including subvec, nth, and vec.
308 |
309 | ## 1.1.0
310 |
311 | ### Breaking Changes
312 |
313 | * When you run a parser in "total" mode, the failure node is no longer tagged with `:failure`, but instead is tagged with `:instaparse/failure`.
314 |
315 | ### New Features
316 |
317 | * Comments now supported in CFGs. Use (* and *) notation.
318 | * Added `ebnf` combinator to the `instaparse/combinators` namespace. This new combinator converts string specifications to the combinator-built equivalent. See combinator section of the updated tutorial for details.
319 | * ABNF: can now create a parser from a specification using `:input-format :abnf` for ABNF parser syntax.
320 | * New combinators related to ABNF:
321 | 1. `abnf` -- converts ABNF string fragments to combinators.
322 | 2. `string-ci` -- case-insensitive strings.
323 | 3. `rep` -- between m and n repetitions.
324 | * New core function related to ABNF:
325 | `set-default-input-format!` -- initially defaults to :ebnf
326 |
327 | ### Minor Enhancements
328 |
329 | * Added comments to regexes used by the parser that processes the context-free grammar syntax, improving the readability of error messages if you have a faulty grammar specification.
330 |
331 | ### Bug Fixes
332 |
333 | * Backslashes in front of quotation mark were escaping the quotation mark, even if the backslash itself was escaped.
334 | * Unescaped double-quote marks weren't properly handled, e.g., (parser "A = '\"'").
335 | * Nullable Plus: ((parser "S = ('a'?)+") "") previously returned a failure, now returns [:S]
336 | * Fixed problem with failure reporting that would occur if parse failed on an input that ended with a newline character.
337 |
--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
1 | THE ACCOMPANYING PROGRAM IS PROVIDED UNDER THE TERMS OF THIS ECLIPSE PUBLIC
2 | LICENSE ("AGREEMENT"). ANY USE, REPRODUCTION OR DISTRIBUTION OF THE PROGRAM
3 | CONSTITUTES RECIPIENT'S ACCEPTANCE OF THIS AGREEMENT.
4 |
5 | 1. DEFINITIONS
6 |
7 | "Contribution" means:
8 |
9 | a) in the case of the initial Contributor, the initial code and
10 | documentation distributed under this Agreement, and
11 |
12 | b) in the case of each subsequent Contributor:
13 |
14 | i) changes to the Program, and
15 |
16 | ii) additions to the Program;
17 |
18 | where such changes and/or additions to the Program originate from and are
19 | distributed by that particular Contributor. A Contribution 'originates' from
20 | a Contributor if it was added to the Program by such Contributor itself or
21 | anyone acting on such Contributor's behalf. Contributions do not include
22 | additions to the Program which: (i) are separate modules of software
23 | distributed in conjunction with the Program under their own license
24 | agreement, and (ii) are not derivative works of the Program.
25 |
26 | "Contributor" means any person or entity that distributes the Program.
27 |
28 | "Licensed Patents" mean patent claims licensable by a Contributor which are
29 | necessarily infringed by the use or sale of its Contribution alone or when
30 | combined with the Program.
31 |
32 | "Program" means the Contributions distributed in accordance with this
33 | Agreement.
34 |
35 | "Recipient" means anyone who receives the Program under this Agreement,
36 | including all Contributors.
37 |
38 | 2. GRANT OF RIGHTS
39 |
40 | a) Subject to the terms of this Agreement, each Contributor hereby grants
41 | Recipient a non-exclusive, worldwide, royalty-free copyright license to
42 | reproduce, prepare derivative works of, publicly display, publicly perform,
43 | distribute and sublicense the Contribution of such Contributor, if any, and
44 | such derivative works, in source code and object code form.
45 |
46 | b) Subject to the terms of this Agreement, each Contributor hereby grants
47 | Recipient a non-exclusive, worldwide, royalty-free patent license under
48 | Licensed Patents to make, use, sell, offer to sell, import and otherwise
49 | transfer the Contribution of such Contributor, if any, in source code and
50 | object code form. This patent license shall apply to the combination of the
51 | Contribution and the Program if, at the time the Contribution is added by the
52 | Contributor, such addition of the Contribution causes such combination to be
53 | covered by the Licensed Patents. The patent license shall not apply to any
54 | other combinations which include the Contribution. No hardware per se is
55 | licensed hereunder.
56 |
57 | c) Recipient understands that although each Contributor grants the licenses
58 | to its Contributions set forth herein, no assurances are provided by any
59 | Contributor that the Program does not infringe the patent or other
60 | intellectual property rights of any other entity. Each Contributor disclaims
61 | any liability to Recipient for claims brought by any other entity based on
62 | infringement of intellectual property rights or otherwise. As a condition to
63 | exercising the rights and licenses granted hereunder, each Recipient hereby
64 | assumes sole responsibility to secure any other intellectual property rights
65 | needed, if any. For example, if a third party patent license is required to
66 | allow Recipient to distribute the Program, it is Recipient's responsibility
67 | to acquire that license before distributing the Program.
68 |
69 | d) Each Contributor represents that to its knowledge it has sufficient
70 | copyright rights in its Contribution, if any, to grant the copyright license
71 | set forth in this Agreement.
72 |
73 | 3. REQUIREMENTS
74 |
75 | A Contributor may choose to distribute the Program in object code form under
76 | its own license agreement, provided that:
77 |
78 | a) it complies with the terms and conditions of this Agreement; and
79 |
80 | b) its license agreement:
81 |
82 | i) effectively disclaims on behalf of all Contributors all warranties and
83 | conditions, express and implied, including warranties or conditions of title
84 | and non-infringement, and implied warranties or conditions of merchantability
85 | and fitness for a particular purpose;
86 |
87 | ii) effectively excludes on behalf of all Contributors all liability for
88 | damages, including direct, indirect, special, incidental and consequential
89 | damages, such as lost profits;
90 |
91 | iii) states that any provisions which differ from this Agreement are offered
92 | by that Contributor alone and not by any other party; and
93 |
94 | iv) states that source code for the Program is available from such
95 | Contributor, and informs licensees how to obtain it in a reasonable manner on
96 | or through a medium customarily used for software exchange.
97 |
98 | When the Program is made available in source code form:
99 |
100 | a) it must be made available under this Agreement; and
101 |
102 | b) a copy of this Agreement must be included with each copy of the Program.
103 |
104 | Contributors may not remove or alter any copyright notices contained within
105 | the Program.
106 |
107 | Each Contributor must identify itself as the originator of its Contribution,
108 | if any, in a manner that reasonably allows subsequent Recipients to identify
109 | the originator of the Contribution.
110 |
111 | 4. COMMERCIAL DISTRIBUTION
112 |
113 | Commercial distributors of software may accept certain responsibilities with
114 | respect to end users, business partners and the like. While this license is
115 | intended to facilitate the commercial use of the Program, the Contributor who
116 | includes the Program in a commercial product offering should do so in a
117 | manner which does not create potential liability for other Contributors.
118 | Therefore, if a Contributor includes the Program in a commercial product
119 | offering, such Contributor ("Commercial Contributor") hereby agrees to defend
120 | and indemnify every other Contributor ("Indemnified Contributor") against any
121 | losses, damages and costs (collectively "Losses") arising from claims,
122 | lawsuits and other legal actions brought by a third party against the
123 | Indemnified Contributor to the extent caused by the acts or omissions of such
124 | Commercial Contributor in connection with its distribution of the Program in
125 | a commercial product offering. The obligations in this section do not apply
126 | to any claims or Losses relating to any actual or alleged intellectual
127 | property infringement. In order to qualify, an Indemnified Contributor must:
128 | a) promptly notify the Commercial Contributor in writing of such claim, and
129 | b) allow the Commercial Contributor tocontrol, and cooperate with the
130 | Commercial Contributor in, the defense and any related settlement
131 | negotiations. The Indemnified Contributor may participate in any such claim
132 | at its own expense.
133 |
134 | For example, a Contributor might include the Program in a commercial product
135 | offering, Product X. That Contributor is then a Commercial Contributor. If
136 | that Commercial Contributor then makes performance claims, or offers
137 | warranties related to Product X, those performance claims and warranties are
138 | such Commercial Contributor's responsibility alone. Under this section, the
139 | Commercial Contributor would have to defend claims against the other
140 | Contributors related to those performance claims and warranties, and if a
141 | court requires any other Contributor to pay any damages as a result, the
142 | Commercial Contributor must pay those damages.
143 |
144 | 5. NO WARRANTY
145 |
146 | EXCEPT AS EXPRESSLY SET FORTH IN THIS AGREEMENT, THE PROGRAM IS PROVIDED ON
147 | AN "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, EITHER
148 | EXPRESS OR IMPLIED INCLUDING, WITHOUT LIMITATION, ANY WARRANTIES OR
149 | CONDITIONS OF TITLE, NON-INFRINGEMENT, MERCHANTABILITY OR FITNESS FOR A
150 | PARTICULAR PURPOSE. Each Recipient is solely responsible for determining the
151 | appropriateness of using and distributing the Program and assumes all risks
152 | associated with its exercise of rights under this Agreement , including but
153 | not limited to the risks and costs of program errors, compliance with
154 | applicable laws, damage to or loss of data, programs or equipment, and
155 | unavailability or interruption of operations.
156 |
157 | 6. DISCLAIMER OF LIABILITY
158 |
159 | EXCEPT AS EXPRESSLY SET FORTH IN THIS AGREEMENT, NEITHER RECIPIENT NOR ANY
160 | CONTRIBUTORS SHALL HAVE ANY LIABILITY FOR ANY DIRECT, INDIRECT, INCIDENTAL,
161 | SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING WITHOUT LIMITATION
162 | LOST PROFITS), HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
163 | CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
164 | ARISING IN ANY WAY OUT OF THE USE OR DISTRIBUTION OF THE PROGRAM OR THE
165 | EXERCISE OF ANY RIGHTS GRANTED HEREUNDER, EVEN IF ADVISED OF THE POSSIBILITY
166 | OF SUCH DAMAGES.
167 |
168 | 7. GENERAL
169 |
170 | If any provision of this Agreement is invalid or unenforceable under
171 | applicable law, it shall not affect the validity or enforceability of the
172 | remainder of the terms of this Agreement, and without further action by the
173 | parties hereto, such provision shall be reformed to the minimum extent
174 | necessary to make such provision valid and enforceable.
175 |
176 | If Recipient institutes patent litigation against any entity (including a
177 | cross-claim or counterclaim in a lawsuit) alleging that the Program itself
178 | (excluding combinations of the Program with other software or hardware)
179 | infringes such Recipient's patent(s), then such Recipient's rights granted
180 | under Section 2(b) shall terminate as of the date such litigation is filed.
181 |
182 | All Recipient's rights under this Agreement shall terminate if it fails to
183 | comply with any of the material terms or conditions of this Agreement and
184 | does not cure such failure in a reasonable period of time after becoming
185 | aware of such noncompliance. If all Recipient's rights under this Agreement
186 | terminate, Recipient agrees to cease use and distribution of the Program as
187 | soon as reasonably practicable. However, Recipient's obligations under this
188 | Agreement and any licenses granted by Recipient relating to the Program shall
189 | continue and survive.
190 |
191 | Everyone is permitted to copy and distribute copies of this Agreement, but in
192 | order to avoid inconsistency the Agreement is copyrighted and may only be
193 | modified in the following manner. The Agreement Steward reserves the right to
194 | publish new versions (including revisions) of this Agreement from time to
195 | time. No one other than the Agreement Steward has the right to modify this
196 | Agreement. The Eclipse Foundation is the initial Agreement Steward. The
197 | Eclipse Foundation may assign the responsibility to serve as the Agreement
198 | Steward to a suitable separate entity. Each new version of the Agreement will
199 | be given a distinguishing version number. The Program (including
200 | Contributions) may always be distributed subject to the version of the
201 | Agreement under which it was received. In addition, after a new version of
202 | the Agreement is published, Contributor may elect to distribute the Program
203 | (including its Contributions) under the new version. Except as expressly
204 | stated in Sections 2(a) and 2(b) above, Recipient receives no rights or
205 | licenses to the intellectual property of any Contributor under this
206 | Agreement, whether expressly, by implication, estoppel or otherwise. All
207 | rights in the Program not expressly granted under this Agreement are
208 | reserved.
209 |
210 | This Agreement is governed by the laws of the State of Washington and the
211 | intellectual property laws of the United States of America. No party to this
212 | Agreement will bring a legal action under this Agreement more than one year
213 | after the cause of action arose. Each party waives its rights to a jury trial
214 | in any resulting litigation.
--------------------------------------------------------------------------------
/docs/ABNF.md:
--------------------------------------------------------------------------------
1 | # ABNF Input Format
2 |
3 | ABNF is an alternative input format for instaparse grammar specifications. ABNF does not provide any additional expressive power over instaparse's default EBNF-based syntax, so if you are new to instaparse and parsing, you do not need to read this document -- stick with the syntax described in [the tutorial](https://github.com/Engelberg/instaparse/blob/master/README.md).
4 |
5 | ABNF's main virtue is that it is precisely specified and commonly used in protocol specifications. If you use such protocols, instaparse's ABNF input format is a simple way to turn the ABNF specification into an executable parser. However, unless you are working with such specifications, you do not need the ABNF input format.
6 |
7 | ## EBNF vs ABNF
8 |
9 | ### EBNF
10 |
11 | The most common notation for expressing context-free grammars is [Backus-Naur Form](http://en.wikipedia.org/wiki/Backus%E2%80%93Naur_Form), or BNF for short. BNF, however, is a little too simplistic. People wanted more convenient notation for expressing repetitions, so [EBNF](http://en.wikipedia.org/wiki/Extended_Backus%E2%80%93Naur_Form), or *Extended* Backus-Naur Form was developed.
12 |
13 | There is a hodge-podge of various syntax extensions that all fall under the umbrella of EBNF. For example, one standard specifies that repetitions should be specified with `{}`, but regular expression operators such as `+`, `*`, and `?` are far more popular.
14 |
15 | When creating the primary input format for instaparse, I based the syntax off of EBNF. I consulted various standards I found on the internet, and filtered it through my own experience of what I've seen in various textbooks and specs over the years. I included the official repetition operators as well as the ones derived from regular expressions. I also incorporated PEG-like syntax extensions.
16 |
17 | What I ended up with was a slightly tweaked version of EBNF, making it relatively easy to turn any EBNF-specified grammar into an executable parser. However, with multiple competing standards and actively-used variations, there's no guarantee that an EBNF grammar that you find will perfectly align with instaparse's syntax. You may need to make a few tweaks to get it to work.
18 |
19 | ### ABNF
20 |
21 | From what I can tell, the purpose of [ABNF](http://en.wikipedia.org/wiki/Augmented_Backus%E2%80%93Naur_Form), or *Augmented* Backus-Naur Form, was to create a grammar syntax that would have a single, well-defined, formal standard, so that all ABNF grammars would look exactly the same.
22 |
23 | For this reason, ABNF seems to be a more popular grammar syntax in the world of specifications and protocols. For example, if you want to know the formal definition of what constitutes a valid URI, there's an ABNF grammar for that.
24 |
25 | After instaparse's initial release, I received a couple requests to support ABNF as an alternative input format. Since ABNF is so precisely defined, in theory, any ABNF grammar should work without modification. In practice, I've found that many ABNF specifications have one or two small typos; nevertheless, applying instaparse to ABNF is mostly a trivial copy-paste exercise.
26 |
27 | I included whatever further extensions and extra instaparse goodies I could safely include, but omitted any extension that would conflict with the ABNF standard and jeopardize the ability to use ABNF grammar specifications without modification.
28 |
29 | Aside from just wanting to adhere to the ABNF specifcation, I can think of a few niceties that ABNF provides over EBNF:
30 |
31 | 1. ABNF has a convenient syntax for specifying bounded repetitions, for example, something like "between 3 and 5 repetitions of the letter a".
32 |
33 | 2. Convenient syntax for expressing characters and ranges of characters.
34 |
35 | 3. ABNF comes with a "standard library" of a dozen or so common token rules.
36 |
37 | ## Usage
38 |
39 | To get a feeling for what ABNF syntax looks like, first check out this [ABNF specification for phone URIs.](https://raw.githubusercontent.com/Engelberg/instaparse/master/test/data/phone_uri.txt) I copied and pasted it directly from the formal spec -- found one typo which I fixed.
40 |
41 | (def phone-uri-parser
42 | (insta/parser "https://raw.githubusercontent.com/Engelberg/instaparse/master/test/data/phone_uri.txt"
43 | :input-format :abnf))
44 |
45 | => (phone-uri-parser "tel:+1-201-555-0123")
46 | [:telephone-uri
47 | "tel:"
48 | [:telephone-subscriber
49 | [:global-number
50 | [:global-number-digits
51 | "+"
52 | [:DIGIT "1"]
53 | [:phonedigit [:visual-separator "-"]]
54 | [:phonedigit [:DIGIT "2"]]
55 | [:phonedigit [:DIGIT "0"]]
56 | [:phonedigit [:DIGIT "1"]]
57 | [:phonedigit [:visual-separator "-"]]
58 | [:phonedigit [:DIGIT "5"]]
59 | [:phonedigit [:DIGIT "5"]]
60 | [:phonedigit [:DIGIT "5"]]
61 | [:phonedigit [:visual-separator "-"]]
62 | [:phonedigit [:DIGIT "0"]]
63 | [:phonedigit [:DIGIT "1"]]
64 | [:phonedigit [:DIGIT "2"]]
65 | [:phonedigit [:DIGIT "3"]]]]]]
66 |
67 | The usage, as you can see, is almost identical to the way you build parsers using the `insta/parser` constructor. The only difference is the additional keyword argument `:input-format :abnf`.
68 |
69 | If you find yourself working with a whole series of ABNF parser specifications, you may find it more convenient to call
70 |
71 | (insta/set-default-input-format! :abnf)
72 |
73 | to alter the default input format. Changing the default makes it unnecessary to specify `:input-format :abnf` with each call to the parser constructor.
74 |
75 | Here is the doc string:
76 |
77 | => (doc insta/set-default-input-format!)
78 | -------------------------
79 | instaparse.core/set-default-input-format!
80 | ([type])
81 | Changes the default input format. Input should be :abnf or :ebnf
82 |
83 | ## ABNF Syntax Guide
84 |
85 |
86 | Category | Notations | Example | Notes |
87 | Rule | = =/ | S = A | =/ is usually used to extend an already-defined rule |
88 | Alternation | / | A / B | Despite the use of /, this is unordered choice |
89 | Concatenation | whitespace | A B | |
90 | Grouping | () | (A / B) C | |
91 | Bounded Repetition | * | 3*5 A | In ABNF, repetition precedes the element |
92 | Optional | *1 | *1 A | |
93 | One or more | 1* | 1* A | |
94 | Zero or more | * | *A |
95 | String terminal | "" '' | 'a' "a" | Single-quoted strings are an instaparse extension |
96 | Regex terminal | #"" #'' | #'a' #"a" | Regexes are an instaparse extension |
97 | Character terminal | %d %b %x | %x30-37 |
98 | Comment | ; | ; comment to the end of the line |
99 | Lookahead | & | &A | Lookahead is an instaparse extension |
100 | Negative lookahead | ! | !A | Negative lookahead is an instaparse extension |
101 |
102 |
103 | Some important things to be aware of:
104 |
105 | + According to the ABNF standard, all strings are *case-insensitive*.
106 | + ABNF strings do not support any kind of escape characters. Use ABNF's character notation to specify unusual characters.
107 | + In ABNF, there is one repetition operator, `*`, and it *precedes* the thing that it is operating on. So, for example, `3*5` means "between 3 and 5 repetitions". The first number defaults to 0 and the second defaults to infinity, so you can omit one or both numbers to get effects comparable to EBNF's `+`, `*`, and `?`. `4*4` could just be written as `4`.
108 | + Use `;` for comments to the end of the line. The ABNF specification has rigid definitions about where comments can be, but in instaparse the rules for comment placement are a bit more flexible and intuitive.
109 | + ABNF uses `/` for the ordinary alternative operator with no order implied.
110 | + ABNF allows the restatement of a rule name to specify multiple alternatives. The custom is to use `=/` in definitions that are adding alternatives, for example `S = 'a' / 'b'` could be written as:
111 |
112 |