├── LICENSE
├── README.md
├── Syntax.rules.txt
├── _config.yml
├── bnf2html.perl.txt
├── bnf2html.pl
├── bnf2yacc.perl.txt
├── bnf2yacc.pl
├── index.html
├── outer-joins.html
├── sql-2003-1.bnf
├── sql-2003-1.bnf.html
├── sql-2003-2.bnf
├── sql-2003-2.bnf.html
├── sql-2003-2.ebnf
├── sql-2003-2.ebnf.readme
├── sql-2003-core-features.html
├── sql-2003-noncore-features.html
├── sql-2016.ebnf
├── sql-2016.ebnf.readme
├── sql-92.bnf
├── sql-92.bnf.html
├── sql-99.bnf
├── sql-99.bnf.html
├── sql-bnf.mk
└── webcode-1.09.tgz
/LICENSE:
--------------------------------------------------------------------------------
1 | MIT License
2 |
3 | Copyright (c) 2017 Ron Savage
4 |
5 | Permission is hereby granted, free of charge, to any person obtaining a copy
6 | of this software and associated documentation files (the "Software"), to deal
7 | in the Software without restriction, including without limitation the rights
8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9 | copies of the Software, and to permit persons to whom the Software is
10 | furnished to do so, subject to the following conditions:
11 |
12 | The above copyright notice and this permission notice shall be included in all
13 | copies or substantial portions of the Software.
14 |
15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21 | SOFTWARE.
22 |
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 | # BNF Grammars for SQL-92, SQL-99 and SQL-2003
2 |
3 | This repository contains the BNF (Backus-Naur Form) grammars for three versions of standard SQL — SQL-92, SQL-99 and SQL-2003.
4 |
5 | You should be able to find a version of this site with 'active HTML' at:
6 |
7 | * https://ronsavage.github.io/SQL/
8 |
9 | It may not be the most recent release, but the technical content is mostly valid.
10 | The download link is not functional — you can obtain the material for the latest
11 | release from https://github.com/ronsavage/SQL/releases/latest.
12 |
13 | ** !! Syntax Rules
14 |
15 | Regarding the text '!! See the Syntax Rules': That is literally what it says in the PDF
16 | containing the standard.
17 |
18 | For an extract of the standard about these rules see the file 'Syntax.rules.txt'.
19 |
20 | *This project is still in transition to GitHub.
21 | The links in this README.md file lead to the pages in the GitHub source tree.
22 | Most of them will display the HTML source — not a rendered HTML image.
23 | There probably are ways around that; we're learning GitHub as we go.*
24 |
25 | For a long time, this material was hosted by Ron Savage at
26 | [http://savage.net.au/SQL](http://savage.net.au) — many thanks, Ron! —
27 | but that site now points to here.
28 |
29 | At the moment, the suggested method of operation is:
30 |
31 | * Clone this repository to your machine — e.g. into the `/home/somebody/SQL` directory
32 | * Point your browser to `file:///home/somebody/SQL/index.html`.
33 |
34 | This should give you full HTML access to the material.
35 | Alternatively, you can download the latest release of this material
36 | (instead of cloning the repo), and then extract that into a directory
37 | and point your browser to the `index.html` file in that directory.
38 |
39 | Yes: it is sub-optimal.
40 | Yes: we'll fix it when we know how to fix it.
41 |
42 | ## SQL-92
43 |
44 | The file [`sql-92.bnf.html`](sql-92.bnf.html) is a heavily hyperlinked HTML
45 | version of the BNF grammar for SQL-92 (ISO/IEC 9075:1992 - Database Language -
46 | SQL).
47 |
48 | The plain text file [`sql-92.bnf`](sql-92.bnf), from which it was
49 | automatically converted, is more useful (read legible) for reading
50 | without a browser.
51 |
52 | ## SQL-99
53 |
54 | The file [`sql-99.bnf.html`](sql-99.bnf.html) is a heavily hyperlinked HTML
55 | version of the BNF grammar for SQL-99 (ISO/IEC 9075-2:1999 - Database
56 | Languages - SQL - Part 2: Foundation (SQL/Foundation)).
57 |
58 | The plain text file [`sql-99.bnf`](sql-99.bnf), from which it was
59 | automatically converted, is more useful (read legible) for reading
60 | without a browser.
61 |
62 | ## SQL-2003
63 |
64 | The file [`sql-2003-2.bnf.html`](sql-2003-2.bnf.html) is a heavily hyperlinked HTML
65 | version of the BNF grammar for SQL-2003 (ISO/IEC 9075-2:2003 - Database
66 | Languages - SQL - Part 2: Foundation (SQL/Foundation)).
67 |
68 | The plain text file [`sql-2003-2.bnf`](sql-2003-2.bnf), from which it was
69 | automatically converted, is more useful (read legible) for reading
70 | without a browser.
71 |
72 |
73 | There is a separate file [`sql-2003-1.bnf.html`](sql-2003-1.bnf.html) for
74 | the information from ISO/IEC 9075-1:2003 - Database Languages - SQL - Part
75 | 1: Framework (SQL/Framework).
76 |
77 | It was automatically converted from the plain text file [`sql-2003-1.bnf`](sql-2003-1.bnf),
78 | which is more useful (read legible) for reading without a browser.
79 |
80 |
81 | Also available:
82 |
83 | SQL 2003 Core Features
84 | SQL 2003 Non-Core Features
85 |
86 |
87 | ## Informix OUTER Join Syntax
88 |
89 | The file [`outer-joins.html`](outer-joins.html) is an explanation of the
90 | non-standard Informix OUTER join syntax and semantics.
91 |
92 | ## Conversion tools
93 |
94 |
95 | The plain text was converted to HTML by the Perl script
96 | [`bnf2html`](bnf2html.perl.txt) which you may use if you wish.
97 | The `bnf2html` script also uses the C program
98 | WEBCODE version 1.09
99 | which you can download as a [gzipped tar file](webcode-1.09.tgz).
100 |
101 | See also [`bnf2yacc`](bnf2yacc.perl.txt), an experimental
102 | script to convert BNF into an outline Yacc grammar.
103 | The generated grammar typically includes some unacceptable tokens, such
104 | as _`%token 0`_, that should be handled by the lexical analyzer
105 | rather than the grammar.
106 | The SQL standard includes such rules as grammar rules; consequently, you won't
107 | get a clean Yacc grammar from the SQL BNF files.
108 |
109 | _(The Perl scripts should normally be renamed after downloading.)_
110 |
111 | ## Download
112 |
113 | You should be able to get the downloadable version of the latest release of this
114 | repository from the releases area:
115 |
116 | * https://github.com/ronsavage/SQL/releases/latest
117 |
118 | ## SQL 2016 Released
119 |
120 | [ISO/IEC JTC 1/SC 32 Publishes Updated SQL Database Language Standard](https://www.ansi.org/news_publications/news_story?menuid=7&articleid=753a952d-1244-415b-bb92-0010750bb8cd) — SQL 2016.
121 |
122 |
123 |
124 | Please send feedback to Jonathan Leffler
125 | ( jonathan.leffler@gmail.com ) _and_
126 | Ron Savage ( ron@savage.net.au ).
127 |
128 | Last modified:
129 | 13th March 2017
130 |
--------------------------------------------------------------------------------
/Syntax.rules.txt:
--------------------------------------------------------------------------------
1 | That (!! See the Syntax Rules) is literally what it says in the PDF
2 | containing the standard. And the Syntax Rules are one part of the verbiage
3 | in the standard supporting the grammar — specifying what it means. The
4 | first such place where it occurs is:
5 |
6 | <#xref-space> ::= !! See the Syntax Rules.
7 |
8 | And if I go to the full pDF, I find 5.1 says:
9 |
10 | Information technology — Database languages — SQL — Part 2: Foundation
11 | (SQL/Foundation)
12 |
13 | Syntax Rules
14 |
15 | 1) Every character set shall contain a character that is equivalent
16 | to U+0020.
17 |
18 | Access Rules
19 |
20 | None.
21 |
22 | General Rules
23 |
24 | 1) There is a one-to-one correspondence between the symbols contained in
25 | and the symbols contained in such that, for all i, the symbol defined as the i-th
27 | alternative for corresponds to the symbol
28 | defined as the i-th alternative for .
29 |
30 | Conformance Rules
31 |
32 | None.
33 | And, in this case, that's all it has to say. Each section of the standard
34 | has sub-headings 'Function', 'Format' (containing the BNF), 'Syntax Rules',
35 | 'Access Rules', 'General Rules' (usually the biggest section), and
36 | 'Conformance Rules'. The next pair of occurrences are:
37 |
38 | ::=
39 |
40 |
41 |
42 | |
43 |
44 | ::= !! See the Syntax Rules
45 |
46 | ::= !! See the Syntax Rules
47 |
48 |
49 |
50 | That's pulled from the PDF, not the HTML. This time, we find:
51 |
52 | Syntax Rules
53 |
54 | 1) An is any character in the Unicode General Category
55 | classes "Lu", "Ll", "Lt", "Lm",
56 |
57 | "Lo", or "Nl".
58 |
59 | NOTE 58 — The Unicode General Category classes "Lu", "Ll", "Lt", "Lm",
60 | "Lo", and "Nl" are assigned to Unicode characters
61 |
62 | that are, respectively, upper-case letters, lower-case letters, title-case
63 | letters, modifier letters, other letters, and letter numbers.
64 |
65 | 2) An is U+00B7, "Middle Dot", or any character in the
66 | Unicode General Category classes
67 |
68 | "Mn", "Mc", "Nd", "Pc", or "Cf".
69 |
70 | NOTE 59 — The Unicode General Category classes "Mn", "Mc", "Nd", "Pc", and
71 | "Cf" are assigned to Unicode characters that
72 |
73 | are, respectively, nonspacing marks, spacing combining marks, decimal
74 | numbers, connector punctuations, and formatting codes.
75 |
76 | Very detailed specification stuff — but not something you can easily put
77 | into the BNF. It belongs in the lexical analyzer, probably.
78 |
79 | Another example — not to do with characters this time:
80 | <#xref-preparable
81 | implementation-defined statement> ::= !! See the Syntax Rules.
82 |
83 | Here the further information is:
84 |
85 | 3) The Format and Syntax Rules for are implementation-defined.
87 |
88 | And another pair of them:
89 |
90 | <#xref-SQLSTATE class value> ::=
91 | <#SQLSTATE char> <#SQLSTATE char> !! See the Syntax Rules.
92 |
93 | <#xref-SQLSTATE subclass value> ::= <#SQLSTATE char> <#SQLSTATE char>
95 | <#SQLSTATE char> !! See the Syntax Rules.
96 | The Syntax Rules say:
97 |
98 | 3) In the values of and ,
99 | there shall be no
100 |
101 | between the s.
102 |
103 | 4) The values of and shall
104 | correspond to class values
105 |
106 | and subclass values, respectively, specified in Table 32, "SQLSTATE class
107 | and subclass values".
108 |
109 | Expanding on this last example, here is the copy'n'paste of the Syntax
110 | Rules through the end of the section:
111 |
112 | Syntax Rules
113 |
114 | 1) SQLWARNING, NOT FOUND, and SQLEXCEPTION correspond to SQLSTATE class
115 | values corresponding
116 |
117 | to categories W, N, and X in Table 32, "SQLSTATE class and subclass values",
118 | respectively.
119 |
120 | ©ISO/IEC 2003 – All rights reserved Embedded SQL 1001
121 |
122 | ISO/IEC 9075-2:2003 (E)
123 |
124 | 20.2
125 |
126 | 2) An contained in an applies to an contained in that if and
130 | only if the appears after the that has
133 | condition C in the text sequence
134 |
135 | of the and no other E that satisfies one
137 |
138 | of the following conditions appears between the and the in the text sequence of the .
142 |
143 | Let D be the contained in E.
144 |
145 | a) D is the same as C.
146 |
147 | b) D is a and belongs to the same class to which C belongs.
148 |
149 | c) D contains an , but does not contain an , and
151 |
152 | E contains the same that C contains.
153 |
154 | d) D contains the that corresponds to integrity
155 | constraint violation and C
156 |
157 | contains CONSTRAINT.
158 |
159 | 3) In the values of and ,
160 | there shall be no
161 |
162 | between the s.
163 |
164 | 4) The values of and shall
165 | correspond to class values
166 |
167 | and subclass values, respectively, specified in Table 32, "SQLSTATE class
168 | and subclass values".
169 |
170 | 5) If an specifies a , then the
171 | , , or of the shall be such that a
174 | host language GO TO statement
175 |
176 | specifying that , , or
177 | is valid at every
178 |
179 | to which the
180 | applies.
181 |
182 | NOTE 445 —
183 |
184 | If an is contained in an , then the of a should specify a that is a label_name in the
188 | containing .
189 |
190 | If an is contained in an , then the of a
192 |
193 | should specify a that is a label in the containing
194 | .
195 |
196 | If an is contained in an , then the of a
198 |
199 | should specify a that is a section-name or
200 | an unqualified paragraph-name in the containing
201 |
202 | .
203 |
204 | If an is contained in an , then the of a should be an that is the statement label of an
208 | executable statement that appears in the same program
209 |
210 | unit as the .
211 |
212 | If an is contained in an , then the of a
214 |
215 | should be a gotoargument that is the statement label of an
216 | executable statement that appears in the same .
219 |
220 | If an is contained in an , then the of a should be an that is a label.
224 |
225 | If an is contained in an , then the of a should specify either a or a .
230 |
231 | Case:
232 |
233 | — If is specified, then the
234 | should be a label constant in the containing
235 |
236 | .
237 |
238 | ISO/IEC 9075-2:2003 (E)
239 |
240 | 20.2
241 |
242 | 1002 Foundation (SQL/Foundation) ©ISO/IEC 2003 – All rights reserved
243 |
244 | — If is specified, then the should be a PL/I label variable declared in
246 |
247 | the containing .
248 |
249 | Access Rules
250 |
251 | None.
252 |
253 | General Rules
254 |
255 | 1) Immediately after the execution of an STMT in
256 | an
257 |
258 | that returns an SQLSTATE value other than successful completion:
259 |
260 | a) Let E be the set of s that are contained
261 | in the containing STMT, that applies to STMT, and that specifies a
264 | that is .
267 |
268 | b) Let CV and SCV be respectively the values of the class and subclass of
269 | the SQLSTATE value that
270 |
271 | indicates the result of the .
272 |
273 | c) If the execution of the caused the violation
274 | of one or more constraints or
275 |
276 | assertions, then:
277 |
278 | i) Let ECN be the set of s in E that
279 | specify CONSTRAINT and
280 |
281 | the of a constraint that was violated by execution of
282 | STMT.
283 |
284 | ii) If ECN contains more than one , then an
285 | implementationdependent
286 |
287 | is chosen from ECN; otherwise, the single
288 |
289 | in ECN is chosen.
290 |
291 | iii) A GO TO statement of the host language is performed, specifying the
292 | ,
293 |
294 | , or of the specified
295 | in the chosen from ECN.
298 |
299 | d) Otherwise:
300 |
301 | i) Let ECS be the set of s in E that
302 | specify SQLSTATE, an
303 |
304 | , and an .
305 |
306 | ii) If ECS contains an EY that specifies
307 | an identical to CV and an identical to SCV,
310 | then a GO TO
311 |
312 | statement of the host language is performed, specifying the , , or of the specified in the
316 | EY.
319 |
320 | iii) Otherwise:
321 |
322 | 1) Let EC be the set of s in E that specify
323 | SQLSTATE and
324 |
325 | an without an .
326 |
327 | 2) If EC contains an EY that specifies an
328 | identical to CV, then a GO TO statement of the host language
331 | is performed,
332 |
333 | ©ISO/IEC 2003 – All rights reserved Embedded SQL 1003
334 |
335 | ISO/IEC 9075-2:2003 (E)
336 |
337 | 20.2
338 |
339 | specifying the , , or
340 | of
341 |
342 | the specified in the EY.
343 |
344 | 3) Otherwise:
345 |
346 | A) Let EX be the set of s in E that specify
347 | SQLEXCEPTION.
348 |
349 | B) If EX contains an EY and CV belongs to
350 | Category
351 |
352 | X in Table 32, "SQLSTATE class and subclass values", then a GO TO statement
353 | of the
354 |
355 | host language is performed, specifying the , , or of the specified in the EY.
362 |
363 | C) Otherwise:
364 |
365 | I) Let EW be the set of s in E that specify
366 | SQLWARNING.
367 |
368 | II) If EW contains an EY and CV belongs to
369 |
370 | Category W in Table 32, "SQLSTATE class and subclass values", then a GO
371 |
372 | TO statement of the host language is performed, specifying the , , or of the
376 |
377 | specified in the EY.
378 |
379 | III) Otherwise, let ENF be the set of s in
380 | E that
381 |
382 | specify NOT FOUND. If ENF contains an
383 |
384 | EY and CV belongs to Category N in Table 32, "SQLSTATE class and subclass
385 |
386 | values", then a GO TO statement of the host language is performed,
387 | specifying
388 |
389 | the , , or of
391 |
392 | the specified in the EY.
393 |
394 | Conformance Rules
395 |
396 | 1) Without Feature B041, "Extensions to embedded SQL exception
397 | declarations", conforming SQL language
398 |
399 | shall not contain an that contains either SQLSTATE or
400 | CONSTRAINT.
401 |
402 | 2) Without Feature F491, "Constraint management", conforming SQL language
403 | shall not contain an that contains a .
406 |
407 | ISO/IEC 9075-2:2003 (E)
408 |
409 | 20.2
410 |
411 | 1004
412 |
--------------------------------------------------------------------------------
/_config.yml:
--------------------------------------------------------------------------------
1 | theme: jekyll-theme-merlot
--------------------------------------------------------------------------------
/bnf2html.perl.txt:
--------------------------------------------------------------------------------
1 | #!/usr/bin/env perl
2 | #
3 | # @(#)$Id: bnf2html.pl,v 3.16 2017/11/14 06:53:22 jleffler Exp $
4 | #
5 | # Convert SQL-92, SQL-99 BNF plain text file into hyperlinked HTML.
6 |
7 | use strict;
8 | use warnings;
9 | use POSIX qw(strftime);
10 | #use Data::Dumper;
11 |
12 | use constant debug => 0;
13 |
14 | my(%rules); # Indexed by rule names w/o angle-brackets; each entry is a ref to a hash.
15 | my(%keywords); # Index by keywords; each entry is a ref to a hash.
16 | my(%names); # Indexed by rule names w/o angle-brackets; each entry is a ref to an array of line numbers
17 |
18 | sub top
19 | {
20 | print "Top
\n\n";
21 | }
22 |
23 | # Usage: add_rule_name(\%names, $rulename, $.);
24 | sub add_rule_name
25 | {
26 | my($reflist, $lhs, $line) = @_;
27 | #print "\nrulename = $lhs; line = $line\n";
28 | if (defined ${$reflist}{$lhs})
29 | {
30 | #print Data::Dumper->Dump([ ${$reflist}{$lhs} ], qw[ ${$reflist}{$lhs} ]);
31 | #print Data::Dumper->Dump([ \@{${$reflist}{$lhs}} ], qw[ \@{${$reflist}{$lhs}} ]);
32 | my @lines = @{${$reflist}{$lhs}};
33 | print STDERR "\n$0: Rule <$lhs> at line $line already seen at line(s) ", join(", ", @lines), "\n\n";
34 | }
35 | else
36 | {
37 | ${$reflist}{$lhs} = [];
38 | }
39 | push @{${$reflist}{$lhs}}, $line;
40 | }
41 |
42 | # Usage: add_entry(\%keywords, $keyword, $rule);
43 | # Usage: add_entry(\%rules, $rhs, $rule);
44 | sub add_entry
45 | {
46 | my($reflist, $lhs, $rhs) = @_;
47 | ${$reflist}{$lhs} = {} unless defined ${$reflist}{$lhs};
48 | ${$reflist}{$lhs}{$rhs} = 1;
49 | }
50 |
51 | sub add_refs
52 | {
53 | my($def, $tail) = @_;
54 | print "\n\n" if debug;
55 | return if $tail =~ m/^!!/;
56 | return if $tail =~ m/^&(?:lt|gt|amp);$/;
57 | while ($tail)
58 | {
59 | $tail =~ s/^\s*//;
60 | if ($tail =~ m%^\<([-:/\w\s]+)\>%)
61 | {
62 | print "\n" if debug;
63 | add_entry(\%rules, $1, $def);
64 | $tail =~ s%^\<([-:/\w\s]+)\>%%;
65 | }
66 | elsif ($tail =~ m%^([-:/\w]+)%)
67 | {
68 | my($token) = $1;
69 | print "\n" if debug;
70 | add_entry(\%keywords, $token, $def) if $token =~ m%[[:alpha:]][[:alpha:]]% || $token eq 'C';
71 | $tail =~ s%^[-:/\w]+%%;
72 | }
73 | else
74 | {
75 | # Otherwise, it is punctuation (such as the BNF metacharacters).
76 | $tail =~ s%^[^-:/\w]%%;
77 | }
78 | }
79 | }
80 |
81 | # NB: webcode replaces tabs with blanks!
82 | open( my $WEBCODE, "-|", "webcode @ARGV") or die "$!";
83 |
84 | # Read first line of file - use as title in head and in H1 heading in body
85 | $_ = <$WEBCODE>;
86 | exit 0 unless defined($_);
87 | chomp;
88 |
89 | # Is it wicked to use double quoting with single quotes, as in qq'text'?
90 | # It is used quite extensively in this script - beware!
91 | print qq'\n';
92 | print "\n";
93 | print "\n\n";
94 | print " $_ \n\n\n\n";
95 | print " $_ \n\n";
96 | print qq' \n';
97 |
98 | print " \n";
99 | print qq' Cross-Reference: rules \n';
100 | print " \n";
101 | print qq' Cross-Reference: keywords \n';
102 | print " \n";
103 |
104 | sub rcs_id
105 | {
106 | my($id) = @_;
107 | $id =~ s%^(@\(#\))?\$[I]d: %%o;
108 | $id =~ s% \$$%%o;
109 | $id =~ s%,v % %o;
110 | $id =~ s%\w+ Exp( \w+)?$%%o;
111 | my(@words) = split / /, $id;
112 | my($version) = "file $words[0] version $words[1] dated $words[2] $words[3]";
113 | return $version;
114 | }
115 |
116 | sub iso8601_format
117 | {
118 | my($tm) = @_;
119 | my $today = strftime("%Y-%m-%d %H:%M:%S+00:00", gmtime($tm));
120 | return($today);
121 | }
122 |
123 | # Print hrefs for non-terminals and keywords.
124 | # Also substitute /* Nothing */ for an absence of productions between alternatives.
125 | sub print_tail
126 | {
127 | my($tail, $tcount) = @_;
128 | while ($tail)
129 | {
130 | my($newtail);
131 | if ($tail =~ m%^\s+%)
132 | {
133 | my($spaces) = $&;
134 | $newtail = $';
135 | print "\n" if debug;
136 | $spaces =~ s% {4,8}% %g;
137 | print $spaces;
138 | # Spaces are not a token - don't count them!
139 | }
140 | elsif ($tail =~ m%^'[^']*'% || $tail =~ m%^"[^"]*"% || $tail =~ m%^!!.*$%)
141 | {
142 | # Quoted literal - print and ignore.
143 | # Or meta-expression...
144 | my($quote) = $&;
145 | $newtail = $';
146 | print "\n" if debug;
147 | $quote =~ s%!!.*% $quote %;
148 | print $quote;
149 | $tcount++;
150 | }
151 | elsif ($tail =~ m%^\<([-:/\w\s]+)\>%)
152 | {
153 | my($nonterm) = $&;
154 | $newtail = $';
155 | print "\n" if debug;
156 | $nonterm =~ s%\<([-:/\w\s]+)\>%\<$1\> %;
157 | print " $nonterm";
158 | $tcount++;
159 | }
160 | elsif ($tail =~ m%^[\w_]([-._\w]*[\w_])?%)
161 | {
162 | # Keyword
163 | my($keyword) = $&;
164 | $newtail = $';
165 | print "\n" if debug;
166 | print(($keyword =~ m/^\d\d+$/) ? $keyword : qq' $keyword ');
167 | $tcount++;
168 | }
169 | else
170 | {
171 | # Metacharacter, string literal, etc.
172 | $tail =~ m%\S+%;
173 | my($symbol) = $&;
174 | $newtail = $';
175 | print "\n" if debug;
176 | if ($symbol eq '|')
177 | {
178 | print "/* Nothing */ " if $tcount == 0;
179 | $tcount = 0;
180 | }
181 | else
182 | {
183 | $symbol =~ s%...omitted...%/* $& */ %i;
184 | $tcount++;
185 | }
186 | print " $symbol";
187 | }
188 | $tail = $newtail;
189 | }
190 | return($tcount);
191 | }
192 |
193 | sub undo_web_coding
194 | {
195 | my($line) = @_;
196 | $line =~ s%>%>%g;
197 | $line =~ s%<%<%g;
198 | $line =~ s%&%&%g;
199 | return $line;
200 | }
201 |
202 | my $hr_count = 0;
203 | my $tcount = 0; # Ick!
204 | my $def; # Current rule
205 |
206 | # Don't forget - the input has been web-encoded!
207 |
208 | while (<$WEBCODE>)
209 | {
210 | chomp;
211 | next if /^===*$/o;
212 | s/\s+$//o; # Remove trailing white space
213 | if (/^$/)
214 | {
215 | print "\n";
216 | }
217 | elsif (/^---*$/)
218 | {
219 | print " \n";
220 | }
221 | elsif (/^--@@\s*(.*)$/)
222 | {
223 | my $comment = undo_web_coding($1);
224 | print "\n";
225 | }
226 | elsif (/^@.#..Id:/)
227 | {
228 | # Convert what(1) string identifier into version information
229 | my $id = '$Id: bnf2html.pl,v 3.16 2017/11/14 06:53:22 jleffler Exp $';
230 | my($v1) = rcs_id($_);
231 | my $v2 = rcs_id($id);
232 | print "\n";
233 | print "Derived from $v1\n";
234 | my $today = iso8601_format(time);
235 | print " \n";
236 | print "Generated on $today by $v2\n";
237 | print "
\n";
238 | }
239 | elsif (/\s+::=/)
240 | {
241 | # Definition line
242 | $def = $_;
243 | $def =~ s%\<([-:/()\w\s]+)\>.*%$1%;
244 | my($tail) = $_;
245 | $tail =~ s%.*::=\s*%%;
246 | print qq' <$def> ::=';
247 | $tcount = 0;
248 | add_rule_name(\%names, $def, $.);
249 | if ($def eq "vertical bar")
250 | {
251 | # Needs special case attention to avoid a /* Nothing */ comment appearing.
252 | # Problem pointed out by Jens Odborg (jho1965us@gmail.com) 2016-04-14.
253 | # This builds knowledge of the SQL language definition into this script;
254 | # ugly, but trying to fix it in the print_tail function is probably worse.
255 | print " |";
256 | }
257 | elsif ($tail)
258 | {
259 | add_refs($def, $tail);
260 | print " ";
261 | $tcount = print_tail($tail, $tcount);
262 | }
263 | print "\n";
264 | }
265 | elsif (/^\s/)
266 | {
267 | # Expansion line
268 | add_refs($def, $_);
269 | print " ";
270 | $tcount = print_tail($_, $tcount);
271 | }
272 | elsif (m/^--[\/]?(\w+)/)
273 | {
274 | # Pseudo-directive line in lower-case
275 | # Print a 'Top' link before
tags except first.
276 | top if /--hr/ && $hr_count++ > 0;
277 | s%--(/?[a-z][a-z\d]*)%<$1>%;
278 | s%\<([-:/\w\s]+)\>%\<$1\> %g;
279 | print "$_\n";
280 | }
281 | elsif (m%^--##%)
282 | {
283 | $_ = undo_web_coding($_);
284 | s%^--##\s*%%;
285 | print "$_\n";
286 | }
287 | elsif (m/^--%start\s+(\w+)/)
288 | {
289 | # Designated start symbol
290 | my $start = $1;
291 | print qq'Start symbol: $start
\n';
292 | }
293 | else
294 | {
295 | # Anything unrecognized passed through unchanged!
296 | print "$_\n";
297 | }
298 | }
299 |
300 | close $WEBCODE;
301 |
302 | # Print index of initial letters for keywords.
303 | sub print_index_key
304 | {
305 | my($prefix, @keys) = @_;
306 | my %letters = ();
307 | foreach my $keyword (@keys)
308 | {
309 | my $initial = uc substr $keyword, 0, 1;
310 | $letters{$initial} = 1;
311 | }
312 | foreach my $letter ('A' .. 'Z')
313 | {
314 | if (defined($letters{$letter}))
315 | {
316 | print qq' $letter \n';
317 | }
318 | else
319 | {
320 | print qq'$letter\n';
321 | }
322 | }
323 | print "\n";
324 | }
325 |
326 | ### Generate cross-reference tables
327 |
328 | {
329 | print " \n\n";
330 | print " \n";
331 | print qq' \n';
332 | print " Cross-Reference Table: Rules \n";
333 |
334 | print_index_key("rules", keys %rules);
335 |
336 | print "\n";
337 | print " Rule (non-terminal) Rules using it \n";
338 | my %letters = ();
339 |
340 | foreach my $rule (sort { uc $a cmp uc $b } keys %rules)
341 | {
342 | my $initial = uc substr $rule, 0, 1;
343 | my $label = "";
344 | if (!defined($letters{$initial}))
345 | {
346 | $letters{$initial} = 1;
347 | $label = qq' ';
348 | }
349 | print qq' $label $rule \n ';
350 | my $pad = "";
351 | foreach my $ref (sort { uc $a cmp uc $b } keys %{$rules{$rule}})
352 | {
353 | print qq'$pad <$ref> \n';
354 | $pad = " ";
355 | }
356 | print " \n \n";
357 | }
358 | print "
\n";
359 | print " \n";
360 | top;
361 | }
362 |
363 | {
364 | print " \n";
365 | print qq' \n';
366 | print " Cross-Reference Table: Keywords \n";
367 |
368 | print_index_key("keywords", keys %keywords);
369 |
370 | print "\n";
371 | print " Keyword Rules using it \n";
372 | my %letters = ();
373 | foreach my $keyword (sort { uc $a cmp uc $b } keys %keywords)
374 | {
375 | my $initial = uc substr $keyword, 0, 1;
376 | my $label = "";
377 | if (!defined($letters{$initial}))
378 | {
379 | $letters{$initial} = 1;
380 | $label = qq' ';
381 | }
382 | print qq' $label $keyword \n ';
383 | my $pad = "";
384 | foreach my $ref (sort { uc $a cmp uc $b } keys %{$keywords{$keyword}})
385 | {
386 | print qq'$pad <$ref> \n';
387 | $pad = " ";
388 | }
389 | print " \n \n";
390 | }
391 | print "
\n";
392 | print " \n";
393 | top;
394 | print " \n";
395 | }
396 |
397 | printf "%s\n", q'Please send feedback to Jonathan Leffler:';
398 | printf "%s\n", q' jonathan.leffler@gmail.com .';
399 |
400 | print "\n\n\n";
401 |
402 | __END__
403 |
404 | =pod
405 |
406 | =head1 PROGRAM
407 |
408 | bnf2html - Convert (ISO SQL) BNF Notation to Hyperlinked HTML
409 |
410 | =head1 SYNTAX
411 |
412 | bnf2html [file ...]
413 |
414 | =head1 DESCRIPTION
415 |
416 | The bnf2html filters the annotated BNF (Backus-Naur Form) from its input
417 | files and converts it into HTML on standard output.
418 |
419 | The HTML is heavily hyperlinked.
420 | Each rule (LHS) links to a table of other rules where it is used on the
421 | RHS.
422 | Similarly, each symbol on the RHS is linked to the rule that defines it.
423 | Thus, it is possible to find where items are used and defined quite
424 | easily.
425 |
426 | =head1 INPUT FORMAT
427 |
428 | This script is adapted to the BNF notation using in the SQL standard
429 | (ISO/IEC 9075:2003, for example).
430 | It also takes various forms of annotations.
431 |
432 | The first line of the file is used as the title in the head section.
433 | It is also used as the text for a H1 header at the top of the body.
434 |
435 | Lines consisting of two or more equal signs are ignored.
436 |
437 | Lines consisting of two or more dashes are converted to a horizontal
438 | rule.
439 |
440 | Lines starting with the SCCS identification string '@(#)' are used to
441 | print version information about the file converted and the script doing
442 | the converting.
443 |
444 | Lines containing space, colon, colon, equals are treated as rules.
445 |
446 | Lines starting with white space are treated as continuations of a rule.
447 |
448 | Lines starting dash, dash, (optionally a slash) and then one or more tag
449 | letters are converted into an HTML start or end tag.
450 |
451 | Any line starting dash, dash, hash, hash has any HTML entities
452 | introduced by the WEBCODE program removed.
453 |
454 | The should be at most one line starting '--%start'; this indicates the
455 | start symbol for the bnf2yacc converter, but is effectively ignored by
456 | bnf2html.
457 |
458 | Any other line is passed through verbatim.
459 |
460 | =head1 AUTHOR
461 |
462 | Jonathan Leffler
463 |
464 | =cut
465 |
--------------------------------------------------------------------------------
/bnf2html.pl:
--------------------------------------------------------------------------------
1 | #!/usr/bin/env perl
2 | #
3 | # @(#)$Id: bnf2html.pl,v 3.16 2017/11/14 06:53:22 jleffler Exp $
4 | #
5 | # Convert SQL-92, SQL-99 BNF plain text file into hyperlinked HTML.
6 |
7 | use strict;
8 | use warnings;
9 | use POSIX qw(strftime);
10 | #use Data::Dumper;
11 |
12 | use constant debug => 0;
13 |
14 | my(%rules); # Indexed by rule names w/o angle-brackets; each entry is a ref to a hash.
15 | my(%keywords); # Index by keywords; each entry is a ref to a hash.
16 | my(%names); # Indexed by rule names w/o angle-brackets; each entry is a ref to an array of line numbers
17 |
18 | sub top
19 | {
20 | print "Top
\n\n";
21 | }
22 |
23 | # Usage: add_rule_name(\%names, $rulename, $.);
24 | sub add_rule_name
25 | {
26 | my($reflist, $lhs, $line) = @_;
27 | #print "\nrulename = $lhs; line = $line\n";
28 | if (defined ${$reflist}{$lhs})
29 | {
30 | #print Data::Dumper->Dump([ ${$reflist}{$lhs} ], qw[ ${$reflist}{$lhs} ]);
31 | #print Data::Dumper->Dump([ \@{${$reflist}{$lhs}} ], qw[ \@{${$reflist}{$lhs}} ]);
32 | my @lines = @{${$reflist}{$lhs}};
33 | print STDERR "\n$0: Rule <$lhs> at line $line already seen at line(s) ", join(", ", @lines), "\n\n";
34 | }
35 | else
36 | {
37 | ${$reflist}{$lhs} = [];
38 | }
39 | push @{${$reflist}{$lhs}}, $line;
40 | }
41 |
42 | # Usage: add_entry(\%keywords, $keyword, $rule);
43 | # Usage: add_entry(\%rules, $rhs, $rule);
44 | sub add_entry
45 | {
46 | my($reflist, $lhs, $rhs) = @_;
47 | ${$reflist}{$lhs} = {} unless defined ${$reflist}{$lhs};
48 | ${$reflist}{$lhs}{$rhs} = 1;
49 | }
50 |
51 | sub add_refs
52 | {
53 | my($def, $tail) = @_;
54 | print "\n\n" if debug;
55 | return if $tail =~ m/^!!/;
56 | return if $tail =~ m/^&(?:lt|gt|amp);$/;
57 | while ($tail)
58 | {
59 | $tail =~ s/^\s*//;
60 | if ($tail =~ m%^\<([-:/\w\s]+)\>%)
61 | {
62 | print "\n" if debug;
63 | add_entry(\%rules, $1, $def);
64 | $tail =~ s%^\<([-:/\w\s]+)\>%%;
65 | }
66 | elsif ($tail =~ m%^([-:/\w]+)%)
67 | {
68 | my($token) = $1;
69 | print "\n" if debug;
70 | add_entry(\%keywords, $token, $def) if $token =~ m%[[:alpha:]][[:alpha:]]% || $token eq 'C';
71 | $tail =~ s%^[-:/\w]+%%;
72 | }
73 | else
74 | {
75 | # Otherwise, it is punctuation (such as the BNF metacharacters).
76 | $tail =~ s%^[^-:/\w]%%;
77 | }
78 | }
79 | }
80 |
81 | # NB: webcode replaces tabs with blanks!
82 | open( my $WEBCODE, "-|", "webcode @ARGV") or die "$!";
83 |
84 | # Read first line of file - use as title in head and in H1 heading in body
85 | $_ = <$WEBCODE>;
86 | exit 0 unless defined($_);
87 | chomp;
88 |
89 | # Is it wicked to use double quoting with single quotes, as in qq'text'?
90 | # It is used quite extensively in this script - beware!
91 | print qq'\n';
92 | print "\n";
93 | print "\n\n";
94 | print " $_ \n\n\n\n";
95 | print " $_ \n\n";
96 | print qq' \n';
97 |
98 | print " \n";
99 | print qq' Cross-Reference: rules \n';
100 | print " \n";
101 | print qq' Cross-Reference: keywords \n';
102 | print " \n";
103 |
104 | sub rcs_id
105 | {
106 | my($id) = @_;
107 | $id =~ s%^(@\(#\))?\$[I]d: %%o;
108 | $id =~ s% \$$%%o;
109 | $id =~ s%,v % %o;
110 | $id =~ s%\w+ Exp( \w+)?$%%o;
111 | my(@words) = split / /, $id;
112 | my($version) = "file $words[0] version $words[1] dated $words[2] $words[3]";
113 | return $version;
114 | }
115 |
116 | sub iso8601_format
117 | {
118 | my($tm) = @_;
119 | my $today = strftime("%Y-%m-%d %H:%M:%S+00:00", gmtime($tm));
120 | return($today);
121 | }
122 |
123 | # Print hrefs for non-terminals and keywords.
124 | # Also substitute /* Nothing */ for an absence of productions between alternatives.
125 | sub print_tail
126 | {
127 | my($tail, $tcount) = @_;
128 | while ($tail)
129 | {
130 | my($newtail);
131 | if ($tail =~ m%^\s+%)
132 | {
133 | my($spaces) = $&;
134 | $newtail = $';
135 | print "\n" if debug;
136 | $spaces =~ s% {4,8}% %g;
137 | print $spaces;
138 | # Spaces are not a token - don't count them!
139 | }
140 | elsif ($tail =~ m%^'[^']*'% || $tail =~ m%^"[^"]*"% || $tail =~ m%^!!.*$%)
141 | {
142 | # Quoted literal - print and ignore.
143 | # Or meta-expression...
144 | my($quote) = $&;
145 | $newtail = $';
146 | print "\n" if debug;
147 | $quote =~ s%!!.*% $quote %;
148 | print $quote;
149 | $tcount++;
150 | }
151 | elsif ($tail =~ m%^\<([-:/\w\s]+)\>%)
152 | {
153 | my($nonterm) = $&;
154 | $newtail = $';
155 | print "\n" if debug;
156 | $nonterm =~ s%\<([-:/\w\s]+)\>%\<$1\> %;
157 | print " $nonterm";
158 | $tcount++;
159 | }
160 | elsif ($tail =~ m%^[\w_]([-._\w]*[\w_])?%)
161 | {
162 | # Keyword
163 | my($keyword) = $&;
164 | $newtail = $';
165 | print "\n" if debug;
166 | print(($keyword =~ m/^\d\d+$/) ? $keyword : qq' $keyword ');
167 | $tcount++;
168 | }
169 | else
170 | {
171 | # Metacharacter, string literal, etc.
172 | $tail =~ m%\S+%;
173 | my($symbol) = $&;
174 | $newtail = $';
175 | print "\n" if debug;
176 | if ($symbol eq '|')
177 | {
178 | print "/* Nothing */ " if $tcount == 0;
179 | $tcount = 0;
180 | }
181 | else
182 | {
183 | $symbol =~ s%...omitted...%/* $& */ %i;
184 | $tcount++;
185 | }
186 | print " $symbol";
187 | }
188 | $tail = $newtail;
189 | }
190 | return($tcount);
191 | }
192 |
193 | sub undo_web_coding
194 | {
195 | my($line) = @_;
196 | $line =~ s%>%>%g;
197 | $line =~ s%<%<%g;
198 | $line =~ s%&%&%g;
199 | return $line;
200 | }
201 |
202 | my $hr_count = 0;
203 | my $tcount = 0; # Ick!
204 | my $def; # Current rule
205 |
206 | # Don't forget - the input has been web-encoded!
207 |
208 | while (<$WEBCODE>)
209 | {
210 | chomp;
211 | next if /^===*$/o;
212 | s/\s+$//o; # Remove trailing white space
213 | if (/^$/)
214 | {
215 | print "\n";
216 | }
217 | elsif (/^---*$/)
218 | {
219 | print " \n";
220 | }
221 | elsif (/^--@@\s*(.*)$/)
222 | {
223 | my $comment = undo_web_coding($1);
224 | print "\n";
225 | }
226 | elsif (/^@.#..Id:/)
227 | {
228 | # Convert what(1) string identifier into version information
229 | my $id = '$Id: bnf2html.pl,v 3.16 2017/11/14 06:53:22 jleffler Exp $';
230 | my($v1) = rcs_id($_);
231 | my $v2 = rcs_id($id);
232 | print "\n";
233 | print "Derived from $v1\n";
234 | my $today = iso8601_format(time);
235 | print " \n";
236 | print "Generated on $today by $v2\n";
237 | print "
\n";
238 | }
239 | elsif (/\s+::=/)
240 | {
241 | # Definition line
242 | $def = $_;
243 | $def =~ s%\<([-:/()\w\s]+)\>.*%$1%;
244 | my($tail) = $_;
245 | $tail =~ s%.*::=\s*%%;
246 | print qq' <$def> ::=';
247 | $tcount = 0;
248 | add_rule_name(\%names, $def, $.);
249 | if ($def eq "vertical bar")
250 | {
251 | # Needs special case attention to avoid a /* Nothing */ comment appearing.
252 | # Problem pointed out by Jens Odborg (jho1965us@gmail.com) 2016-04-14.
253 | # This builds knowledge of the SQL language definition into this script;
254 | # ugly, but trying to fix it in the print_tail function is probably worse.
255 | print " |";
256 | }
257 | elsif ($tail)
258 | {
259 | add_refs($def, $tail);
260 | print " ";
261 | $tcount = print_tail($tail, $tcount);
262 | }
263 | print "\n";
264 | }
265 | elsif (/^\s/)
266 | {
267 | # Expansion line
268 | add_refs($def, $_);
269 | print " ";
270 | $tcount = print_tail($_, $tcount);
271 | }
272 | elsif (m/^--[\/]?(\w+)/)
273 | {
274 | # Pseudo-directive line in lower-case
275 | # Print a 'Top' link before
tags except first.
276 | top if /--hr/ && $hr_count++ > 0;
277 | s%--(/?[a-z][a-z\d]*)%<$1>%;
278 | s%\<([-:/\w\s]+)\>%\<$1\> %g;
279 | print "$_\n";
280 | }
281 | elsif (m%^--##%)
282 | {
283 | $_ = undo_web_coding($_);
284 | s%^--##\s*%%;
285 | print "$_\n";
286 | }
287 | elsif (m/^--%start\s+(\w+)/)
288 | {
289 | # Designated start symbol
290 | my $start = $1;
291 | print qq'Start symbol: $start
\n';
292 | }
293 | else
294 | {
295 | # Anything unrecognized passed through unchanged!
296 | print "$_\n";
297 | }
298 | }
299 |
300 | close $WEBCODE;
301 |
302 | # Print index of initial letters for keywords.
303 | sub print_index_key
304 | {
305 | my($prefix, @keys) = @_;
306 | my %letters = ();
307 | foreach my $keyword (@keys)
308 | {
309 | my $initial = uc substr $keyword, 0, 1;
310 | $letters{$initial} = 1;
311 | }
312 | foreach my $letter ('A' .. 'Z')
313 | {
314 | if (defined($letters{$letter}))
315 | {
316 | print qq' $letter \n';
317 | }
318 | else
319 | {
320 | print qq'$letter\n';
321 | }
322 | }
323 | print "\n";
324 | }
325 |
326 | ### Generate cross-reference tables
327 |
328 | {
329 | print " \n\n";
330 | print " \n";
331 | print qq' \n';
332 | print " Cross-Reference Table: Rules \n";
333 |
334 | print_index_key("rules", keys %rules);
335 |
336 | print "\n";
337 | print " Rule (non-terminal) Rules using it \n";
338 | my %letters = ();
339 |
340 | foreach my $rule (sort { uc $a cmp uc $b } keys %rules)
341 | {
342 | my $initial = uc substr $rule, 0, 1;
343 | my $label = "";
344 | if (!defined($letters{$initial}))
345 | {
346 | $letters{$initial} = 1;
347 | $label = qq' ';
348 | }
349 | print qq' $label $rule \n ';
350 | my $pad = "";
351 | foreach my $ref (sort { uc $a cmp uc $b } keys %{$rules{$rule}})
352 | {
353 | print qq'$pad <$ref> \n';
354 | $pad = " ";
355 | }
356 | print " \n \n";
357 | }
358 | print "
\n";
359 | print " \n";
360 | top;
361 | }
362 |
363 | {
364 | print " \n";
365 | print qq' \n';
366 | print " Cross-Reference Table: Keywords \n";
367 |
368 | print_index_key("keywords", keys %keywords);
369 |
370 | print "\n";
371 | print " Keyword Rules using it \n";
372 | my %letters = ();
373 | foreach my $keyword (sort { uc $a cmp uc $b } keys %keywords)
374 | {
375 | my $initial = uc substr $keyword, 0, 1;
376 | my $label = "";
377 | if (!defined($letters{$initial}))
378 | {
379 | $letters{$initial} = 1;
380 | $label = qq' ';
381 | }
382 | print qq' $label $keyword \n ';
383 | my $pad = "";
384 | foreach my $ref (sort { uc $a cmp uc $b } keys %{$keywords{$keyword}})
385 | {
386 | print qq'$pad <$ref> \n';
387 | $pad = " ";
388 | }
389 | print " \n \n";
390 | }
391 | print "
\n";
392 | print " \n";
393 | top;
394 | print " \n";
395 | }
396 |
397 | printf "%s\n", q'Please send feedback to Jonathan Leffler:';
398 | printf "%s\n", q' jonathan.leffler@gmail.com .';
399 |
400 | print "\n\n\n";
401 |
402 | __END__
403 |
404 | =pod
405 |
406 | =head1 PROGRAM
407 |
408 | bnf2html - Convert (ISO SQL) BNF Notation to Hyperlinked HTML
409 |
410 | =head1 SYNTAX
411 |
412 | bnf2html [file ...]
413 |
414 | =head1 DESCRIPTION
415 |
416 | The bnf2html filters the annotated BNF (Backus-Naur Form) from its input
417 | files and converts it into HTML on standard output.
418 |
419 | The HTML is heavily hyperlinked.
420 | Each rule (LHS) links to a table of other rules where it is used on the
421 | RHS.
422 | Similarly, each symbol on the RHS is linked to the rule that defines it.
423 | Thus, it is possible to find where items are used and defined quite
424 | easily.
425 |
426 | =head1 INPUT FORMAT
427 |
428 | This script is adapted to the BNF notation using in the SQL standard
429 | (ISO/IEC 9075:2003, for example).
430 | It also takes various forms of annotations.
431 |
432 | The first line of the file is used as the title in the head section.
433 | It is also used as the text for a H1 header at the top of the body.
434 |
435 | Lines consisting of two or more equal signs are ignored.
436 |
437 | Lines consisting of two or more dashes are converted to a horizontal
438 | rule.
439 |
440 | Lines starting with the SCCS identification string '@(#)' are used to
441 | print version information about the file converted and the script doing
442 | the converting.
443 |
444 | Lines containing space, colon, colon, equals are treated as rules.
445 |
446 | Lines starting with white space are treated as continuations of a rule.
447 |
448 | Lines starting dash, dash, (optionally a slash) and then one or more tag
449 | letters are converted into an HTML start or end tag.
450 |
451 | Any line starting dash, dash, hash, hash has any HTML entities
452 | introduced by the WEBCODE program removed.
453 |
454 | The should be at most one line starting '--%start'; this indicates the
455 | start symbol for the bnf2yacc converter, but is effectively ignored by
456 | bnf2html.
457 |
458 | Any other line is passed through verbatim.
459 |
460 | =head1 AUTHOR
461 |
462 | Jonathan Leffler
463 |
464 | =cut
465 |
--------------------------------------------------------------------------------
/bnf2yacc.perl.txt:
--------------------------------------------------------------------------------
1 | #!/usr/bin/perl -w
2 | #
3 | # @(#)$Id: bnf2yacc.pl,v 1.16 2017/11/14 06:53:22 jleffler Exp $
4 | #
5 | # Convert SQL-92, SQL-99 BNF plain text file into YACC grammar.
6 |
7 | use strict;
8 | $| = 1;
9 |
10 | use constant debug => 0;
11 |
12 | my $heading = "";
13 | my %tokens;
14 | my %nonterminals;
15 | my %rules;
16 | my %used;
17 | my $start;
18 | my @grammar;
19 |
20 | my $nt_number = 0;
21 |
22 | # Generate a new non-terminal identifier
23 | sub new_non_terminal
24 | {
25 | my($prefix) = @_;
26 | $prefix = "" unless defined $prefix;
27 | return sprintf "${prefix}nt_%03d", ++$nt_number;
28 | }
29 |
30 | # map_non_terminal converts names that are not acceptable to Yacc into names that are.
31 | # Non-identifier characters are converted to underscores.
32 | # If the first character is not alphabetic, prefix 'j_'.
33 | # Case-convert to lower case.
34 | sub map_non_terminal
35 | {
36 | my($nt) = @_;
37 | $nt =~ s/\W+/_/go;
38 | $nt = "j_$nt" unless $nt =~ m/^[a-zA-Z]/o;
39 | $nt =~ tr/[A-Z]/[a-z]/;
40 | $nt =~ s/__+/_/go;
41 | return $nt;
42 | }
43 |
44 | # scan_rhs breaks up the RHS of a rule into a token stream
45 | # Keywords (terminals) are prefixed with a '#' marker.
46 | sub scan_rhs
47 | {
48 | my($tail) = @_;
49 | my(@rhs);
50 | while ($tail)
51 | {
52 | print "RHS: $tail\n" if debug;
53 | my $name;
54 | if ($tail =~ m%^(\s*<([-:/()_\w\s]+)>\s*)%o)
55 | {
56 | # Simpler regex for non-terminal: <[^>]+>
57 | # Non-terminal
58 | my $n = $2;
59 | print "N: $n\n" if debug;
60 | $tail = substr $tail, length($1);
61 | $name = map_non_terminal($n);
62 | $nonterminals{$name} = 1;
63 | $used{$name} = 1;
64 | push @rhs, $name;
65 | }
66 | elsif ($tail =~ m%^(\s*(\w[-\w\d_.]*)\s*)%o)
67 | {
68 | # Terminal (keyword)
69 | # Dot '.' is used in Interfaces.SQL in Ada syntax
70 | # Dash '-' is used in EXEC-SQL in the keywords.
71 | my $t = $2;
72 | print "T: $t\n" if debug;
73 | $tail = substr $tail, length($1);
74 | $name = $t;
75 | $tokens{$name} = 1;
76 | push @rhs, "#$name";
77 | }
78 | elsif ($tail =~ m%^\s*(\.\.\.omitted\.\.\.)\s*%o)
79 | {
80 | # Something omitted from the grammar.
81 | # Triple punctuation detected before double.
82 | my $str = "/* $1 */";
83 | push @rhs, $str;
84 | last;
85 | }
86 | elsif ($tail =~ m{^(\s*([-.<=>|]{2})\s*)$}o)
87 | {
88 | # Double-punctuation (non-metacharacters)
89 | # .., <=, >=, <>, ||, ->
90 | my $p = $2;
91 | print "DP: $p\n" if debug;
92 | $tail = substr $tail, length($1);
93 | $name = "'$p'";
94 | push @rhs, $name;
95 | }
96 | elsif ($tail =~ m{^(\s*([][{}"'%&()*+,-./:;<=>?^_|])\s*)$}o)
97 | {
98 | # Punctuation (non-metacharacters)
99 | # Note that none of '@', '~', '!' or '\' have any significance in SQL
100 | my $p = $2;
101 | print "P: $p\n" if debug;
102 | $tail = substr $tail, length($1);
103 | $p = "\\'" if $p eq "'";
104 | $name = "'$p'";
105 | push @rhs, $name;
106 | }
107 | elsif ($tail =~ m%^(\s*('[^']*'))\s*%o ||
108 | $tail =~ m%^(\s*("[^"]*"))\s*%o)
109 | {
110 | # Terminal in quotes - single or double.
111 | # (Possibly a multi-character string).
112 | my $q = $2;
113 | print "Q: $q\n" if debug;
114 | $tail = substr $tail, length($1);
115 | $q =~ m%^(['"])(.+)['"]$%o;
116 | # Expand multi-character string constants.
117 | # into repeated single-character constants.
118 | my($o) = $1;
119 | my($l) = $2;
120 | while (length($l))
121 | {
122 | my($c) = substr $l, 0, 1;
123 | $name = "$o$c$o";
124 | $l = substr $l, 1, length($l)-1;
125 | push @rhs, $name;
126 | }
127 | }
128 | elsif ($tail =~ m%^(\s*([{}\|\[\]]|\.\.\.)\s*)%o)
129 | {
130 | # Punctuation (metacharacters)
131 | my $p = $2;
132 | print "M: $p\n" if debug;
133 | $tail = substr $tail, length($1);
134 | $name = $p;
135 | push @rhs, $name;
136 | }
137 | elsif ($tail =~ m%^\s*!!%o)
138 | {
139 | # Exhortation to see the syntax rules - usually.
140 | my $str = "/* $tail */";
141 | push @rhs, $str;
142 | last;
143 | }
144 | else
145 | {
146 | # Unknown!
147 | print "/* UNK: $tail */\n";
148 | print STDERR "UNK:$.: $tail\n";
149 | last;
150 | }
151 | }
152 | return(@rhs);
153 | }
154 |
155 | # Format a Yacc rule given LHS and RHS array
156 | sub record_rule
157 | {
158 | my($lhs, $comment, @rule) = @_;
159 | my($production) = "";
160 | print "==>> record_rule ($lhs : @rule)\n" if debug;
161 | $production .= "/*\n" if $comment;
162 | $production .= "$lhs\n\t:\t";
163 | my $pad = "";
164 | my $br_count = 0;
165 | for (my $i = 0; $i <= $#rule; $i++)
166 | {
167 | my $item = $rule[$i];
168 | print "==== item $item\n" if debug;
169 | if ($item eq "|" && $br_count == 0)
170 | {
171 | $production .= "\n\t|\t";
172 | $pad = "";
173 | }
174 | else
175 | {
176 | $production .= "$pad$item";
177 | $pad = " ";
178 | $br_count++ if ($item eq '[' or $item eq '{');
179 | $br_count-- if ($item eq ']' or $item eq '}');
180 | }
181 | }
182 | $production .= "\n\t;\n";
183 | $production .= "*/\n" if $comment;
184 | $production .= "\n";
185 | print "$production" if debug;
186 | push @grammar, $production;
187 | print "<<== record_rule\n" if debug;
188 | }
189 |
190 | sub print_iterator
191 | {
192 | my($lhs,$rhs) = @_;
193 | my($production) = "";
194 | print "==>> print_iterator ($lhs $rhs)\n" if debug;
195 | $production .= "$lhs\n\t:\t$rhs\n\t|\t$lhs $rhs\n\t;\n\n";
196 | print "<<== print_iterator\n" if debug;
197 | push @grammar, $production;
198 | }
199 |
200 | # Process an optional item enclosed in square brackets
201 | sub find_balanced_bracket
202 | {
203 | my($lhs,@rhs) = @_;
204 | my(@rule) = ( "/* Nothing */", "|");
205 | print "==>> find_balanced_bracket ($lhs : @rhs)\n" if debug;
206 | while (my $name = shift @rhs)
207 | {
208 | print " name = $name\n" if debug;
209 | if ($name eq ']')
210 | {
211 | # Found closing bracket
212 | # Terminate search
213 | last;
214 | }
215 | elsif ($name eq '[')
216 | {
217 | # Found nested optional clause
218 | my $tag = new_non_terminal('opt_');
219 | @rhs = find_balanced_bracket($tag, @rhs);
220 | push @rule, $tag;
221 | }
222 | elsif ($name eq '{')
223 | {
224 | # Found start of sequence
225 | my $tag = new_non_terminal('seq_');
226 | @rhs = find_balanced_brace($tag, @rhs);
227 | push @rule, $tag;
228 | }
229 | elsif ($name eq '}')
230 | {
231 | # Found unbalanced close brace.
232 | # Error!
233 | }
234 | elsif ($name eq '...')
235 | {
236 | # Found iteration.
237 | my $tag = new_non_terminal('lst_');
238 | print "==== find_balanced_bracket: iterator (@rule)\n" if debug;
239 | my($old) = pop @rule;
240 | push @rule, $tag;
241 | print "==== find_balanced_bracket: iterator ($tag/$old - @rule)\n" if debug;
242 | print_iterator($tag, $old);
243 | }
244 | else
245 | {
246 | $name =~ s/^#//;
247 | push @rule, $name;
248 | $used{$name} = 1;
249 | }
250 | }
251 | record_rule($lhs, 0, @rule);
252 | print "<<== find_balanced_bracket: @rhs)\n" if debug;
253 | return(@rhs);
254 | }
255 |
256 | # Process an sequence item enclosed in curly braces
257 | sub find_balanced_brace
258 | {
259 | my($lhs,@rhs) = @_;
260 | my(@rule);
261 | print "==>> find_balanced_brace ($lhs : @rhs)\n" if debug;
262 | while (my $name = shift @rhs)
263 | {
264 | print " name = $name\n" if debug;
265 | if ($name eq '}')
266 | {
267 | # Found closing brace
268 | # Terminate search
269 | last;
270 | }
271 | elsif ($name eq '[')
272 | {
273 | # Found nested optional clause
274 | my $tag = new_non_terminal('opt_');
275 | @rhs = find_balanced_bracket($tag, @rhs);
276 | push @rule, $tag;
277 | }
278 | elsif ($name eq '{')
279 | {
280 | # Found start of sequence
281 | my $tag = new_non_terminal('seq_');
282 | @rhs = find_balanced_brace($tag, @rhs);
283 | push @rule, $tag;
284 | }
285 | elsif ($name eq ']')
286 | {
287 | # Found unbalanced close brace.
288 | # Error!
289 | }
290 | elsif ($name eq '...')
291 | {
292 | # Found iteration.
293 | my $tag = new_non_terminal('lst_');
294 | print "==== find_balanced_brace: iterator (@rule)\n" if debug;
295 | my($old) = pop @rule;
296 | push @rule, $tag;
297 | print "==== find_balanced_brace: iterator ($tag/$old - @rule)\n" if debug;
298 | print_iterator($tag, $old);
299 | }
300 | else
301 | {
302 | $name =~ s/^#//;
303 | push @rule, $name;
304 | $used{$name} = 1;
305 | }
306 | }
307 | record_rule($lhs, 0, @rule);
308 | print "<<== find_balanced_brace: @rhs)\n" if debug;
309 | return(@rhs);
310 | }
311 |
312 | # Note that the [ and { parts are nice and easy because they are
313 | # balanced operators. The iteration operator ... is much harder to
314 | # process because it is a trailing modifier. When processing the list
315 | # of symbols, you need to establish whether there is a trailing iterator
316 | # after the current symbol, and modify the behaviour appropriately.
317 | sub process_rhs
318 | {
319 | my($lhs, $tail) = @_;
320 | my(@rhs) = scan_rhs($tail);
321 | print "==>> process_rhs ($lhs : @rhs)\n" if debug;
322 | # List parsed rule in output only if debugging.
323 | record_rule($lhs, 1, @rhs) if debug;
324 | my(@rule);
325 | while (my $name = shift @rhs)
326 | {
327 | print "name = $name\n" if debug;
328 | if ($name eq '[')
329 | {
330 | my $tag = new_non_terminal('opt_');
331 | @rhs = find_balanced_bracket($tag, @rhs);
332 | push @rule, $tag;
333 | }
334 | elsif ($name eq ']')
335 | {
336 | # Found a close bracket for something unbalanced.
337 | # Error!
338 | }
339 | elsif ($name eq '{')
340 | {
341 | # Start of mandatory sequence of items, possibly containing alternatives.
342 | my $tag = new_non_terminal('seq_');
343 | @rhs = find_balanced_brace($tag, @rhs);
344 | push @rule, $tag;
345 | }
346 | elsif ($name eq '}')
347 | {
348 | # Found a close brace for something unbalanced.
349 | # Error!
350 | }
351 | elsif ($name eq '|')
352 | {
353 | # End of one alternative and start of a new one.
354 | print "==== process_rhs: alternative $name\n" if debug;
355 | push @rule, $name;
356 | }
357 | elsif ($name eq '...')
358 | {
359 | # Found iteration.
360 | my $tag = new_non_terminal('lst_');
361 | my($old) = pop @rule;
362 | push @rule, $tag;
363 | print "==== process_rhs: iterator\n" if debug;
364 | print_iterator($tag, $old);
365 | }
366 | elsif ($name =~ m/^#/)
367 | {
368 | # Keyword token
369 | print "==== process_rhs: token $name\n" if debug;
370 | $name =~ s/^#//;
371 | push @rule, $name;
372 | }
373 | else
374 | {
375 | # Non-terminal (or comment)
376 | print "==== process_rhs: non-terminal $name\n" if debug;
377 | push @rule, $name;
378 | }
379 | }
380 | print "==== process_rhs: @rule\n" if debug;
381 | record_rule($lhs, 0, @rule);
382 | print "<<== process_rhs\n" if debug;
383 | }
384 |
385 | sub count_unmatched_keys
386 | {
387 | my($ref1, $ref2) = @_;
388 | my(%keys) = %$ref1;
389 | my(%match) = %$ref2;
390 | my($count) = 0;
391 | foreach my $key (keys %keys)
392 | {
393 | $count++ unless defined $match{$key};
394 | }
395 | return $count;
396 | }
397 |
398 | # ------------------------------------------------------------
399 |
400 | open INPUT, "cat @ARGV |" or die "$!";
401 | $_ = ;
402 | exit 0 unless defined($_);
403 | chomp;
404 | $heading = "%{\n/*\n** $_\n*/\n%}\n\n" unless m/^\s*$/;
405 |
406 | # Commentary appears in column 1.
407 | # Continuations of rules have a blank in column 1.
408 | # Blank lines, dash lines and equals lines separate rules (are not embedded within them)..
409 |
410 | while ( )
411 | {
412 | chomp;
413 | print "DBG:$.: $_\n" if debug;
414 | next if /^===*$/o;
415 | next if /^\s*$/o; # Blank lines
416 | next if /^---*$/o; # Horizontal lines
417 | if (/^--/o)
418 | {
419 | # Various HTML pseudo-directives
420 | if (m%^--/?\w+\b%)
421 | {
422 | print "/* $' */\n" if $';
423 | }
424 | elsif (/^--%start (\w+)/)
425 | {
426 | $start = $1;
427 | print "/* Start symbol - $start */\n";
428 | }
429 | elsif (/^--##/)
430 | {
431 | print "/* $_ */\n";
432 | }
433 | else
434 | {
435 | print "/* Unrecognized 2: $_ */\n";
436 | }
437 | }
438 | elsif (/^@.#..Id:/)
439 | {
440 | # Convert what(1) string identifier into version information
441 | s%^@.#..Id: %%;
442 | s% \$$%%;
443 | s%,v % %;
444 | s%\w+ Exp( \w+)?$%%;
445 | my @words = split;
446 | print "/*\n";
447 | print "** Derived from file $words[0] version $words[1] dated $words[2] $words[3]\n";
448 | print "*/\n";
449 | }
450 | elsif (/ ::=/)
451 | {
452 | # Definition line
453 | my $def = $_;
454 | $def =~ s%<([-:/()\w\s]+)>.*%$1%o;
455 | $def = map_non_terminal($def);
456 | $rules{$def} = 1;
457 | $nonterminals{$def} = 1;
458 | my $tail = $_;
459 | $tail =~ s%.*::=\s*%%; # Remove LHS of statement
460 | while ( )
461 | {
462 | chomp;
463 | last unless /^\s/;
464 | $tail .= $_;
465 | }
466 | process_rhs($def, $tail);
467 | }
468 | else
469 | {
470 | # Anything unrecognized passed through as a comment!
471 | print "/* $_ */\n";
472 | }
473 | }
474 |
475 | close INPUT;
476 |
477 | print "==== End of input phase ====\n" if debug;
478 |
479 | print $heading if $heading;
480 |
481 | # List of tokens
482 | foreach my $token (sort keys %tokens)
483 | {
484 | print "\%token $token\n";
485 | }
486 | print "\n";
487 |
488 | # Undefined non-terminals might need to be treated as tokens
489 | if (count_unmatched_keys(\%nonterminals, \%rules) > 0)
490 | {
491 | print "/* The following non-terminals were not defined */\n";
492 | foreach my $nt (sort keys %nonterminals)
493 | {
494 | print "%token $nt\n" unless defined $rules{$nt};
495 | }
496 | print "/* End of undefined non-terminals */\n\n";
497 | }
498 |
499 | # List the rules that are defined in the original grammar.
500 | # Do not list the rules defined by this conversion process.
501 | print "/*\n";
502 | foreach my $nt (sort keys %nonterminals)
503 | {
504 | print "\%rule $nt\n";
505 | }
506 | print "*/\n\n";
507 |
508 |
509 | if (defined $start)
510 | {
511 | print "%start $start\n\n";
512 | print "%%\n\n";
513 | }
514 | else
515 | {
516 | # No start symbol defined - let's see if we can work out what to use.
517 | # If there's more than one unused non-terminal, then treat them
518 | # all as simple alternatives to a list of statements.
519 | my $count = count_unmatched_keys(\%nonterminals, \%used);
520 |
521 | if ($count > 1)
522 | {
523 | my $prog = "bnf_program";
524 | my $stmt = "bnf_statement";
525 | print "%start $prog\n\n";
526 | print "%%\n\n";
527 | print "$prog\n\t:\t$stmt\n\t|\t$prog $stmt\n\t;\n\n";
528 | print "$stmt\n";
529 | my $pad = "\t:\t";
530 | foreach my $nt (sort keys %nonterminals)
531 | {
532 | unless (defined $used{$nt})
533 | {
534 | print "$pad$nt\n";
535 | $pad = "\t|\t";
536 | }
537 | }
538 | print "\t;\n\n";
539 | }
540 | elsif ($count == 1)
541 | {
542 | foreach my $nt (sort keys %nonterminals)
543 | {
544 | print "%start $nt" unless defined $used{$nt};
545 | }
546 | print "%%\n\n";
547 | }
548 | else
549 | {
550 | # No single start symbol - loop?
551 | # Error!
552 | print STDERR "$0: no start symbol recognized!\n";
553 | print "%%\n\n";
554 | }
555 | }
556 |
557 | # Output the complete grammar
558 | while (my $line = shift @grammar)
559 | {
560 | print $line;
561 | }
562 |
563 | print "\n%%\n\n";
564 |
565 | __END__
566 |
567 | =pod
568 |
569 | Given a rule:
570 |
571 | abc: def ghi jkl
572 |
573 | The Yacc output is:
574 |
575 | abc
576 | : def ghi jkl
577 | ;
578 |
579 | Given a rule:
580 |
581 | abc: def [ ghi ] jkl
582 |
583 | The Yacc output is:
584 |
585 | abc
586 | : def opt_nt_0001 jkl
587 | ;
588 |
589 | opt_nt_0001
590 | : /* Nothing */
591 | | ghi
592 | ;
593 |
594 | Given a rule:
595 |
596 | abc: def { ghi } jkl
597 |
598 | The Yacc output is:
599 |
600 | abc
601 | : def seq_nt_0002 jkl
602 | ;
603 |
604 | seq_nt_0002
605 | : ghi
606 | ;
607 |
608 | Note that such rules are seldom used in isolation; either the contents
609 | of the '{' to '}' contains alternatives, or the construct as a whole is
610 | followed by a repetition.
611 |
612 | Given a rule:
613 |
614 | abc: def | ghi
615 |
616 | The Yacc output is:
617 |
618 | abc
619 | : def
620 | | ghi
621 | ;
622 |
623 | Given a rule:
624 |
625 | abc: def ghi... jkl
626 |
627 | The Yacc output is:
628 |
629 | abc
630 | : def lst_nt_0003 jkl
631 | ;
632 |
633 | lst_nt_0003
634 | : ghi
635 | | lst_nt_0003 ghi
636 | ;
637 |
638 | These rules can be, and often are, combined. The following examples
639 | come from the SQL-99 grammar which is the target of this effort. The
640 | target of this program is to produce Yacc rules equivalent to those
641 | which follow each fragment. Note that keywords (equivalently,
642 | terminals) are in upper case only; mixed case or lower case symbols are
643 | non-terminals.
644 |
645 | ::=
646 |
647 |
648 |
649 | [ ]
650 | [ ]
651 | [ ... ]
652 | ...
653 |
654 | SQL_client_module_definition
655 | : module_name_clause language_clause module_authorization_clause opt_nt_0001 opt_nt_0002 opt_nt_0003 lst_nt_0004
656 | ;
657 | opt_nt_0001
658 | : /* Nothing */
659 | | module_path_specification
660 | ;
661 | opt_nt_0002
662 | : /* Nothing */
663 | | module_transform_group_specification
664 | ;
665 | opt_nt_0003
666 | : /* Nothing */
667 | | lst_nt_0005
668 | ;
669 | lst_nt_0004
670 | : module_contents
671 | | lst_nt_0004 module_contents
672 | ;
673 | lst_nt_0005
674 | : temporary_table_declaration
675 | | lst_nt_0005 temporary_table_declaration
676 | ;
677 |
678 | The next example is interesting - it is fairly typical of the grammar,
679 | but is not minimal. The rule could be written ' ::=
680 | [ ... ]' without altering the
681 | meaning. It is not clear whether this program should apply this
682 | transformation automatically.
683 |
684 | ::= [ { }... ]
685 |
686 | identifier_body
687 | : identifier_start opt_nt_0006
688 | ;
689 | opt_nt_0006
690 | : /* Nothing */
691 | | lst_nt_0007
692 | ;
693 | lst_nt_0007
694 | : seq_nt_0008
695 | | lst_nt_0007 seq_nt_0008
696 | ;
697 | seq_nt_0008
698 | : identifier_part
699 | ;
700 |
701 | /* Optimized alternative to lst_nt_0007 */
702 | lst_nt_0007
703 | : identifier_part
704 | | lst_nt_0007 identifier_part
705 | ;
706 |
707 | ::=
708 | [ { | }... ]
709 |
710 | sql_language_identifier
711 | : sql_language_identifier_start opt_nt_0009
712 | ;
713 | opt_nt_0009
714 | : /* Nothing */
715 | | lst_nt_0010
716 | ;
717 | lst_nt_0010
718 | : seq_nt_0011
719 | | lst_nt_0010 seq_nt_0011
720 | ;
721 | seq_nt_0011
722 | : underscore
723 | | sql_language_identifier_part
724 | ;
725 |
726 | The next rule is the first example with keywords.
727 |
728 | ::=
729 | SCHEMA
730 | | AUTHORIZATION
731 | | SCHEMA AUTHORIZATION
732 |
733 | module_authorization_clause
734 | : SCHEMA schema_name
735 | | AUTHORIZATION module_authorization_identifier
736 | | SCHEMA schema_name AUTHORIZATION module_authorization_identifier
737 | ;
738 |
739 | ::=
740 | TRANSFORM GROUP { | }
741 |
742 | transform_group_specification
743 | : TRANSFORM GROUP seq_nt_0012
744 | ;
745 | seq_nt_0012
746 | : single_group_specification
747 | | multiple_group_specification
748 | ;
749 |
750 | ::= [ { }... ]
751 |
752 | multiple_group_specification
753 | : group_specification opt_nt_0013
754 | ;
755 | opt_nt_0013
756 | : /* Nothing */
757 | | lst_nt_0014
758 | ;
759 | lst_nt_0014
760 | : seq_nt_0015
761 | | lst_nt_0014 seq_nt_0015
762 | ;
763 | seq_nt_0015
764 | : comma group_specification
765 | ;
766 |
767 | Except for the presence of a token () after the optional
768 | list, the next example is equivalent to the previous one. It does show,
769 | however, that there is an element of lookahead required to tell whether
770 | an optional item contains a list or a sequence or a simple list of
771 | terminals and non-terminals.
772 |
773 | ::=
774 | [ { }... ]
775 |
776 | table_element_list
777 | : left_paren table_element opt_nt_0016 right_paren
778 | ;
779 | opt_nt_0016
780 | : /* Nothing */
781 | | lst_nt_0017
782 | ;
783 | lst_nt_0017
784 | : seq_nt_0018
785 | | lst_nt_0017 seq_nt_0018
786 | ;
787 | seq_nt_0018
788 | : comma table_element
789 | ;
790 |
791 | The next example is interesting because the sequence item contains
792 | alternatives with no optionality or iteration. It suggests that the
793 | term 'sequence' is not necessarily the 'mot juste'.
794 |
795 | ::=
796 |
797 | { | }
798 | [ ]
799 | [ ]
800 | [ ... ]
801 | [ ]
802 |
803 | column_definition
804 | : column_name seq_nt_0019 opt_nt_0020 opt_nt_0021 opt_nt_0022 opt_nt_0023
805 | ;
806 | seq_nt_0019
807 | : data_type
808 | | domain_name
809 | ;
810 | opt_nt_0020
811 | : /* Nothing */
812 | | reference_scope_check
813 | ;
814 | opt_nt_0021
815 | : /* Nothing */
816 | | default_clause
817 | ;
818 | opt_nt_0022
819 | : /* Nothing */
820 | | lst_nt_0024
821 | ;
822 | opt_nt_0023
823 | : /* Nothing */
824 | | collate_clause
825 | ;
826 | lst_nt_0024
827 | : column_constraint_definition
828 | | lst_nt_0024 column_constraint_definition
829 | ;
830 |
831 |
832 | ::= | [ { }... ]
833 |
834 | select_list
835 | : asterisk
836 | | select_sublist opt_nt_0025
837 | ;
838 | opt_nt_0025
839 | : /* Nothing */
840 | | lst_nt_0026
841 | ;
842 | lst_nt_0026
843 | : seq_nt_0027
844 | | lst_nt_0026 seq_nt_0027
845 | ;
846 | seq_nt_0027
847 | : comma select_sublist
848 | ;
849 |
850 | The next statement does not introduce any new grammatical features. It
851 | does, however, trigger a shift/reduce conflict because an LALR(1)
852 | grammar cannot resolve with one lookahead token whether the token WITH
853 | is part of the WITH HIERARCHY OPTION or part of the WITH GRANT OPTION.
854 | Note that should use a non-terminal such as , but such structural changes cannot readily be done by this
856 | program.
857 |
858 | ::=
859 | GRANT TO [ { }... ]
860 | [ WITH HIERARCHY OPTION ] [ WITH GRANT OPTION ] [ GRANTED BY ]
861 |
862 | grant_privilege_statement
863 | : GRANT privileges TO grantee opt_nt_0028 opt_nt_0029 opt_nt_0030 opt_nt_0031
864 | ;
865 | opt_nt_0028
866 | : /* Nothing */
867 | | lst_nt_0032
868 | ;
869 | opt_nt_0029
870 | : /* Nothing */
871 | | WITH HIERARCHY OPTION
872 | ;
873 | opt_nt_0030
874 | : /* Nothing */
875 | | WITH GRANT OPTION
876 | ;
877 | opt_nt_0031
878 | : /* Nothing */
879 | | GRANTED BY grantor
880 | ;
881 | lst_nt_0032
882 | : seq_nt_0033
883 | | lst_nt_0032 seq_nt_0033
884 | ;
885 | seq_nt_0033
886 | : comma grantee
887 | ;
888 |
889 | The next statement reuses material introduced previously, but in a
890 | slightly more complex manner.
891 |
892 | ::=
893 | [ { }... ]
894 | | VALUE -
[ { }... ]
895 |
896 | set_descriptor_information
897 | : set_header_information opt_nt_0034
898 | | VALUE item_number set_item_information opt_nt_0035
899 | ;
900 | opt_nt_0034
901 | : /* Nothing */
902 | | lst_nt_0036
903 | ;
904 | opt_nt_0035
905 | : /* Nothing */
906 | | lst_nt_0037
907 | ;
908 | lst_nt_0036
909 | : seq_nt_0038
910 | | lst_nt_0036 seq_nt_0038
911 | ;
912 | lst_nt_0037
913 | : seq_nt_0039
914 | | lst_nt_0037 seq_nt_0039
915 | ;
916 | seq_nt_0038
917 | : comma set_header_information
918 | ;
919 | seq_nt_0039
920 | : comma set_item_information
921 | ;
922 |
923 | The next statement introduces deeper nesting than any of the previous
924 | ones. The expansion produces two rules (opt_nt_0040 and opt_nt_0044)
925 | that are identical. This is indicative of problems with the grammar on
926 | which it is working, which would be better written with a couple of new
927 | non-terminals, and . However, this
929 | is a stylistic change that should also be made in many other places in
930 | the grammar.
931 |
932 | ::=
933 | SQL TYPE IS CLOB AS LOCATOR
934 | [ ] [ { [ ] } ... ]
935 |
936 | c_blob_locator_variable
937 | : SQL TYPE IS CLOB AS LOCATOR c_host_identifier opt_nt_0040 opt_nt_0041
938 | ;
939 | opt_nt_0040
940 | : /* Nothing */
941 | | c_initial_value
942 | ;
943 | opt_nt_0041
944 | : /* Nothing */
945 | | lst_nt_0042
946 | ;
947 | lst_nt_0042
948 | : seq_nt_0043
949 | | lst_nt_0042 seq_nt_0043
950 | ;
951 | seq_nt_0043
952 | : comma c_host_identifier opt_nt_0044
953 | ;
954 | opt_nt_0044
955 | : /* Nothing */
956 | | c_initial_value
957 | ;
958 |
959 | =cut
960 |
--------------------------------------------------------------------------------
/bnf2yacc.pl:
--------------------------------------------------------------------------------
1 | #!/usr/bin/perl -w
2 | #
3 | # @(#)$Id: bnf2yacc.pl,v 1.16 2017/11/14 06:53:22 jleffler Exp $
4 | #
5 | # Convert SQL-92, SQL-99 BNF plain text file into YACC grammar.
6 |
7 | use strict;
8 | $| = 1;
9 |
10 | use constant debug => 0;
11 |
12 | my $heading = "";
13 | my %tokens;
14 | my %nonterminals;
15 | my %rules;
16 | my %used;
17 | my $start;
18 | my @grammar;
19 |
20 | my $nt_number = 0;
21 |
22 | # Generate a new non-terminal identifier
23 | sub new_non_terminal
24 | {
25 | my($prefix) = @_;
26 | $prefix = "" unless defined $prefix;
27 | return sprintf "${prefix}nt_%03d", ++$nt_number;
28 | }
29 |
30 | # map_non_terminal converts names that are not acceptable to Yacc into names that are.
31 | # Non-identifier characters are converted to underscores.
32 | # If the first character is not alphabetic, prefix 'j_'.
33 | # Case-convert to lower case.
34 | sub map_non_terminal
35 | {
36 | my($nt) = @_;
37 | $nt =~ s/\W+/_/go;
38 | $nt = "j_$nt" unless $nt =~ m/^[a-zA-Z]/o;
39 | $nt =~ tr/[A-Z]/[a-z]/;
40 | $nt =~ s/__+/_/go;
41 | return $nt;
42 | }
43 |
44 | # scan_rhs breaks up the RHS of a rule into a token stream
45 | # Keywords (terminals) are prefixed with a '#' marker.
46 | sub scan_rhs
47 | {
48 | my($tail) = @_;
49 | my(@rhs);
50 | while ($tail)
51 | {
52 | print "RHS: $tail\n" if debug;
53 | my $name;
54 | if ($tail =~ m%^(\s*<([-:/()_\w\s]+)>\s*)%o)
55 | {
56 | # Simpler regex for non-terminal: <[^>]+>
57 | # Non-terminal
58 | my $n = $2;
59 | print "N: $n\n" if debug;
60 | $tail = substr $tail, length($1);
61 | $name = map_non_terminal($n);
62 | $nonterminals{$name} = 1;
63 | $used{$name} = 1;
64 | push @rhs, $name;
65 | }
66 | elsif ($tail =~ m%^(\s*(\w[-\w\d_.]*)\s*)%o)
67 | {
68 | # Terminal (keyword)
69 | # Dot '.' is used in Interfaces.SQL in Ada syntax
70 | # Dash '-' is used in EXEC-SQL in the keywords.
71 | my $t = $2;
72 | print "T: $t\n" if debug;
73 | $tail = substr $tail, length($1);
74 | $name = $t;
75 | $tokens{$name} = 1;
76 | push @rhs, "#$name";
77 | }
78 | elsif ($tail =~ m%^\s*(\.\.\.omitted\.\.\.)\s*%o)
79 | {
80 | # Something omitted from the grammar.
81 | # Triple punctuation detected before double.
82 | my $str = "/* $1 */";
83 | push @rhs, $str;
84 | last;
85 | }
86 | elsif ($tail =~ m{^(\s*([-.<=>|]{2})\s*)$}o)
87 | {
88 | # Double-punctuation (non-metacharacters)
89 | # .., <=, >=, <>, ||, ->
90 | my $p = $2;
91 | print "DP: $p\n" if debug;
92 | $tail = substr $tail, length($1);
93 | $name = "'$p'";
94 | push @rhs, $name;
95 | }
96 | elsif ($tail =~ m{^(\s*([][{}"'%&()*+,-./:;<=>?^_|])\s*)$}o)
97 | {
98 | # Punctuation (non-metacharacters)
99 | # Note that none of '@', '~', '!' or '\' have any significance in SQL
100 | my $p = $2;
101 | print "P: $p\n" if debug;
102 | $tail = substr $tail, length($1);
103 | $p = "\\'" if $p eq "'";
104 | $name = "'$p'";
105 | push @rhs, $name;
106 | }
107 | elsif ($tail =~ m%^(\s*('[^']*'))\s*%o ||
108 | $tail =~ m%^(\s*("[^"]*"))\s*%o)
109 | {
110 | # Terminal in quotes - single or double.
111 | # (Possibly a multi-character string).
112 | my $q = $2;
113 | print "Q: $q\n" if debug;
114 | $tail = substr $tail, length($1);
115 | $q =~ m%^(['"])(.+)['"]$%o;
116 | # Expand multi-character string constants.
117 | # into repeated single-character constants.
118 | my($o) = $1;
119 | my($l) = $2;
120 | while (length($l))
121 | {
122 | my($c) = substr $l, 0, 1;
123 | $name = "$o$c$o";
124 | $l = substr $l, 1, length($l)-1;
125 | push @rhs, $name;
126 | }
127 | }
128 | elsif ($tail =~ m%^(\s*([{}\|\[\]]|\.\.\.)\s*)%o)
129 | {
130 | # Punctuation (metacharacters)
131 | my $p = $2;
132 | print "M: $p\n" if debug;
133 | $tail = substr $tail, length($1);
134 | $name = $p;
135 | push @rhs, $name;
136 | }
137 | elsif ($tail =~ m%^\s*!!%o)
138 | {
139 | # Exhortation to see the syntax rules - usually.
140 | my $str = "/* $tail */";
141 | push @rhs, $str;
142 | last;
143 | }
144 | else
145 | {
146 | # Unknown!
147 | print "/* UNK: $tail */\n";
148 | print STDERR "UNK:$.: $tail\n";
149 | last;
150 | }
151 | }
152 | return(@rhs);
153 | }
154 |
155 | # Format a Yacc rule given LHS and RHS array
156 | sub record_rule
157 | {
158 | my($lhs, $comment, @rule) = @_;
159 | my($production) = "";
160 | print "==>> record_rule ($lhs : @rule)\n" if debug;
161 | $production .= "/*\n" if $comment;
162 | $production .= "$lhs\n\t:\t";
163 | my $pad = "";
164 | my $br_count = 0;
165 | for (my $i = 0; $i <= $#rule; $i++)
166 | {
167 | my $item = $rule[$i];
168 | print "==== item $item\n" if debug;
169 | if ($item eq "|" && $br_count == 0)
170 | {
171 | $production .= "\n\t|\t";
172 | $pad = "";
173 | }
174 | else
175 | {
176 | $production .= "$pad$item";
177 | $pad = " ";
178 | $br_count++ if ($item eq '[' or $item eq '{');
179 | $br_count-- if ($item eq ']' or $item eq '}');
180 | }
181 | }
182 | $production .= "\n\t;\n";
183 | $production .= "*/\n" if $comment;
184 | $production .= "\n";
185 | print "$production" if debug;
186 | push @grammar, $production;
187 | print "<<== record_rule\n" if debug;
188 | }
189 |
190 | sub print_iterator
191 | {
192 | my($lhs,$rhs) = @_;
193 | my($production) = "";
194 | print "==>> print_iterator ($lhs $rhs)\n" if debug;
195 | $production .= "$lhs\n\t:\t$rhs\n\t|\t$lhs $rhs\n\t;\n\n";
196 | print "<<== print_iterator\n" if debug;
197 | push @grammar, $production;
198 | }
199 |
200 | # Process an optional item enclosed in square brackets
201 | sub find_balanced_bracket
202 | {
203 | my($lhs,@rhs) = @_;
204 | my(@rule) = ( "/* Nothing */", "|");
205 | print "==>> find_balanced_bracket ($lhs : @rhs)\n" if debug;
206 | while (my $name = shift @rhs)
207 | {
208 | print " name = $name\n" if debug;
209 | if ($name eq ']')
210 | {
211 | # Found closing bracket
212 | # Terminate search
213 | last;
214 | }
215 | elsif ($name eq '[')
216 | {
217 | # Found nested optional clause
218 | my $tag = new_non_terminal('opt_');
219 | @rhs = find_balanced_bracket($tag, @rhs);
220 | push @rule, $tag;
221 | }
222 | elsif ($name eq '{')
223 | {
224 | # Found start of sequence
225 | my $tag = new_non_terminal('seq_');
226 | @rhs = find_balanced_brace($tag, @rhs);
227 | push @rule, $tag;
228 | }
229 | elsif ($name eq '}')
230 | {
231 | # Found unbalanced close brace.
232 | # Error!
233 | }
234 | elsif ($name eq '...')
235 | {
236 | # Found iteration.
237 | my $tag = new_non_terminal('lst_');
238 | print "==== find_balanced_bracket: iterator (@rule)\n" if debug;
239 | my($old) = pop @rule;
240 | push @rule, $tag;
241 | print "==== find_balanced_bracket: iterator ($tag/$old - @rule)\n" if debug;
242 | print_iterator($tag, $old);
243 | }
244 | else
245 | {
246 | $name =~ s/^#//;
247 | push @rule, $name;
248 | $used{$name} = 1;
249 | }
250 | }
251 | record_rule($lhs, 0, @rule);
252 | print "<<== find_balanced_bracket: @rhs)\n" if debug;
253 | return(@rhs);
254 | }
255 |
256 | # Process an sequence item enclosed in curly braces
257 | sub find_balanced_brace
258 | {
259 | my($lhs,@rhs) = @_;
260 | my(@rule);
261 | print "==>> find_balanced_brace ($lhs : @rhs)\n" if debug;
262 | while (my $name = shift @rhs)
263 | {
264 | print " name = $name\n" if debug;
265 | if ($name eq '}')
266 | {
267 | # Found closing brace
268 | # Terminate search
269 | last;
270 | }
271 | elsif ($name eq '[')
272 | {
273 | # Found nested optional clause
274 | my $tag = new_non_terminal('opt_');
275 | @rhs = find_balanced_bracket($tag, @rhs);
276 | push @rule, $tag;
277 | }
278 | elsif ($name eq '{')
279 | {
280 | # Found start of sequence
281 | my $tag = new_non_terminal('seq_');
282 | @rhs = find_balanced_brace($tag, @rhs);
283 | push @rule, $tag;
284 | }
285 | elsif ($name eq ']')
286 | {
287 | # Found unbalanced close brace.
288 | # Error!
289 | }
290 | elsif ($name eq '...')
291 | {
292 | # Found iteration.
293 | my $tag = new_non_terminal('lst_');
294 | print "==== find_balanced_brace: iterator (@rule)\n" if debug;
295 | my($old) = pop @rule;
296 | push @rule, $tag;
297 | print "==== find_balanced_brace: iterator ($tag/$old - @rule)\n" if debug;
298 | print_iterator($tag, $old);
299 | }
300 | else
301 | {
302 | $name =~ s/^#//;
303 | push @rule, $name;
304 | $used{$name} = 1;
305 | }
306 | }
307 | record_rule($lhs, 0, @rule);
308 | print "<<== find_balanced_brace: @rhs)\n" if debug;
309 | return(@rhs);
310 | }
311 |
312 | # Note that the [ and { parts are nice and easy because they are
313 | # balanced operators. The iteration operator ... is much harder to
314 | # process because it is a trailing modifier. When processing the list
315 | # of symbols, you need to establish whether there is a trailing iterator
316 | # after the current symbol, and modify the behaviour appropriately.
317 | sub process_rhs
318 | {
319 | my($lhs, $tail) = @_;
320 | my(@rhs) = scan_rhs($tail);
321 | print "==>> process_rhs ($lhs : @rhs)\n" if debug;
322 | # List parsed rule in output only if debugging.
323 | record_rule($lhs, 1, @rhs) if debug;
324 | my(@rule);
325 | while (my $name = shift @rhs)
326 | {
327 | print "name = $name\n" if debug;
328 | if ($name eq '[')
329 | {
330 | my $tag = new_non_terminal('opt_');
331 | @rhs = find_balanced_bracket($tag, @rhs);
332 | push @rule, $tag;
333 | }
334 | elsif ($name eq ']')
335 | {
336 | # Found a close bracket for something unbalanced.
337 | # Error!
338 | }
339 | elsif ($name eq '{')
340 | {
341 | # Start of mandatory sequence of items, possibly containing alternatives.
342 | my $tag = new_non_terminal('seq_');
343 | @rhs = find_balanced_brace($tag, @rhs);
344 | push @rule, $tag;
345 | }
346 | elsif ($name eq '}')
347 | {
348 | # Found a close brace for something unbalanced.
349 | # Error!
350 | }
351 | elsif ($name eq '|')
352 | {
353 | # End of one alternative and start of a new one.
354 | print "==== process_rhs: alternative $name\n" if debug;
355 | push @rule, $name;
356 | }
357 | elsif ($name eq '...')
358 | {
359 | # Found iteration.
360 | my $tag = new_non_terminal('lst_');
361 | my($old) = pop @rule;
362 | push @rule, $tag;
363 | print "==== process_rhs: iterator\n" if debug;
364 | print_iterator($tag, $old);
365 | }
366 | elsif ($name =~ m/^#/)
367 | {
368 | # Keyword token
369 | print "==== process_rhs: token $name\n" if debug;
370 | $name =~ s/^#//;
371 | push @rule, $name;
372 | }
373 | else
374 | {
375 | # Non-terminal (or comment)
376 | print "==== process_rhs: non-terminal $name\n" if debug;
377 | push @rule, $name;
378 | }
379 | }
380 | print "==== process_rhs: @rule\n" if debug;
381 | record_rule($lhs, 0, @rule);
382 | print "<<== process_rhs\n" if debug;
383 | }
384 |
385 | sub count_unmatched_keys
386 | {
387 | my($ref1, $ref2) = @_;
388 | my(%keys) = %$ref1;
389 | my(%match) = %$ref2;
390 | my($count) = 0;
391 | foreach my $key (keys %keys)
392 | {
393 | $count++ unless defined $match{$key};
394 | }
395 | return $count;
396 | }
397 |
398 | # ------------------------------------------------------------
399 |
400 | open INPUT, "cat @ARGV |" or die "$!";
401 | $_ = ;
402 | exit 0 unless defined($_);
403 | chomp;
404 | $heading = "%{\n/*\n** $_\n*/\n%}\n\n" unless m/^\s*$/;
405 |
406 | # Commentary appears in column 1.
407 | # Continuations of rules have a blank in column 1.
408 | # Blank lines, dash lines and equals lines separate rules (are not embedded within them)..
409 |
410 | while ( )
411 | {
412 | chomp;
413 | print "DBG:$.: $_\n" if debug;
414 | next if /^===*$/o;
415 | next if /^\s*$/o; # Blank lines
416 | next if /^---*$/o; # Horizontal lines
417 | if (/^--/o)
418 | {
419 | # Various HTML pseudo-directives
420 | if (m%^--/?\w+\b%)
421 | {
422 | print "/* $' */\n" if $';
423 | }
424 | elsif (/^--%start (\w+)/)
425 | {
426 | $start = $1;
427 | print "/* Start symbol - $start */\n";
428 | }
429 | elsif (/^--##/)
430 | {
431 | print "/* $_ */\n";
432 | }
433 | else
434 | {
435 | print "/* Unrecognized 2: $_ */\n";
436 | }
437 | }
438 | elsif (/^@.#..Id:/)
439 | {
440 | # Convert what(1) string identifier into version information
441 | s%^@.#..Id: %%;
442 | s% \$$%%;
443 | s%,v % %;
444 | s%\w+ Exp( \w+)?$%%;
445 | my @words = split;
446 | print "/*\n";
447 | print "** Derived from file $words[0] version $words[1] dated $words[2] $words[3]\n";
448 | print "*/\n";
449 | }
450 | elsif (/ ::=/)
451 | {
452 | # Definition line
453 | my $def = $_;
454 | $def =~ s%<([-:/()\w\s]+)>.*%$1%o;
455 | $def = map_non_terminal($def);
456 | $rules{$def} = 1;
457 | $nonterminals{$def} = 1;
458 | my $tail = $_;
459 | $tail =~ s%.*::=\s*%%; # Remove LHS of statement
460 | while ( )
461 | {
462 | chomp;
463 | last unless /^\s/;
464 | $tail .= $_;
465 | }
466 | process_rhs($def, $tail);
467 | }
468 | else
469 | {
470 | # Anything unrecognized passed through as a comment!
471 | print "/* $_ */\n";
472 | }
473 | }
474 |
475 | close INPUT;
476 |
477 | print "==== End of input phase ====\n" if debug;
478 |
479 | print $heading if $heading;
480 |
481 | # List of tokens
482 | foreach my $token (sort keys %tokens)
483 | {
484 | print "\%token $token\n";
485 | }
486 | print "\n";
487 |
488 | # Undefined non-terminals might need to be treated as tokens
489 | if (count_unmatched_keys(\%nonterminals, \%rules) > 0)
490 | {
491 | print "/* The following non-terminals were not defined */\n";
492 | foreach my $nt (sort keys %nonterminals)
493 | {
494 | print "%token $nt\n" unless defined $rules{$nt};
495 | }
496 | print "/* End of undefined non-terminals */\n\n";
497 | }
498 |
499 | # List the rules that are defined in the original grammar.
500 | # Do not list the rules defined by this conversion process.
501 | print "/*\n";
502 | foreach my $nt (sort keys %nonterminals)
503 | {
504 | print "\%rule $nt\n";
505 | }
506 | print "*/\n\n";
507 |
508 |
509 | if (defined $start)
510 | {
511 | print "%start $start\n\n";
512 | print "%%\n\n";
513 | }
514 | else
515 | {
516 | # No start symbol defined - let's see if we can work out what to use.
517 | # If there's more than one unused non-terminal, then treat them
518 | # all as simple alternatives to a list of statements.
519 | my $count = count_unmatched_keys(\%nonterminals, \%used);
520 |
521 | if ($count > 1)
522 | {
523 | my $prog = "bnf_program";
524 | my $stmt = "bnf_statement";
525 | print "%start $prog\n\n";
526 | print "%%\n\n";
527 | print "$prog\n\t:\t$stmt\n\t|\t$prog $stmt\n\t;\n\n";
528 | print "$stmt\n";
529 | my $pad = "\t:\t";
530 | foreach my $nt (sort keys %nonterminals)
531 | {
532 | unless (defined $used{$nt})
533 | {
534 | print "$pad$nt\n";
535 | $pad = "\t|\t";
536 | }
537 | }
538 | print "\t;\n\n";
539 | }
540 | elsif ($count == 1)
541 | {
542 | foreach my $nt (sort keys %nonterminals)
543 | {
544 | print "%start $nt" unless defined $used{$nt};
545 | }
546 | print "%%\n\n";
547 | }
548 | else
549 | {
550 | # No single start symbol - loop?
551 | # Error!
552 | print STDERR "$0: no start symbol recognized!\n";
553 | print "%%\n\n";
554 | }
555 | }
556 |
557 | # Output the complete grammar
558 | while (my $line = shift @grammar)
559 | {
560 | print $line;
561 | }
562 |
563 | print "\n%%\n\n";
564 |
565 | __END__
566 |
567 | =pod
568 |
569 | Given a rule:
570 |
571 | abc: def ghi jkl
572 |
573 | The Yacc output is:
574 |
575 | abc
576 | : def ghi jkl
577 | ;
578 |
579 | Given a rule:
580 |
581 | abc: def [ ghi ] jkl
582 |
583 | The Yacc output is:
584 |
585 | abc
586 | : def opt_nt_0001 jkl
587 | ;
588 |
589 | opt_nt_0001
590 | : /* Nothing */
591 | | ghi
592 | ;
593 |
594 | Given a rule:
595 |
596 | abc: def { ghi } jkl
597 |
598 | The Yacc output is:
599 |
600 | abc
601 | : def seq_nt_0002 jkl
602 | ;
603 |
604 | seq_nt_0002
605 | : ghi
606 | ;
607 |
608 | Note that such rules are seldom used in isolation; either the contents
609 | of the '{' to '}' contains alternatives, or the construct as a whole is
610 | followed by a repetition.
611 |
612 | Given a rule:
613 |
614 | abc: def | ghi
615 |
616 | The Yacc output is:
617 |
618 | abc
619 | : def
620 | | ghi
621 | ;
622 |
623 | Given a rule:
624 |
625 | abc: def ghi... jkl
626 |
627 | The Yacc output is:
628 |
629 | abc
630 | : def lst_nt_0003 jkl
631 | ;
632 |
633 | lst_nt_0003
634 | : ghi
635 | | lst_nt_0003 ghi
636 | ;
637 |
638 | These rules can be, and often are, combined. The following examples
639 | come from the SQL-99 grammar which is the target of this effort. The
640 | target of this program is to produce Yacc rules equivalent to those
641 | which follow each fragment. Note that keywords (equivalently,
642 | terminals) are in upper case only; mixed case or lower case symbols are
643 | non-terminals.
644 |
645 | ::=
646 |
647 |