├── Discussion.md
├── README.md
├── Specification.md
├── TestSuite.md
└── jaxn.abnf


/Discussion.md:
--------------------------------------------------------------------------------
  1 | # Discussion
  2 | 
  3 | The following sections discuss the syntax and semantics of the extensions that JAXN brings to JSON, as well as rejected extensions that will not be added to JAXN.
  4 | 
  5 | * [Data Model](#data-model)
  6 | * [Unicode](#unicode)
  7 | * [White-Space](#white-space)
  8 | * [Newline](#newline)
  9 | * [Source Character Set](#source-character-set)
 10 | * [Comments](#comments)
 11 | * [Numbers](#numbers)
 12 | * [Strings](#strings)
 13 | * [Binary Data](#binary-data)
 14 | * [Unquoted Object Keys](#unquoted-object-keys)
 15 | * [Trailing Comma](#trailing-comma)
 16 | * [Conversion to JSON](#conversion-to-json)
 17 | 
 18 | ## Data Model
 19 | 
 20 | Most "relaxed JSON" extensions focus on the syntax of the string representation.
 21 | They sometimes do extend the data model, but they don't say so clearly or are even unaware of it.
 22 | JAXN goes further, by clearly specifying which additional values and data types a library should support.
 23 | This allows users to know what to expect from a JAXN-compatible library, or, looking at it from the other side, search for a JAXN-compatible library when they know that they need these extensions to the data model.
 24 | 
 25 | JAXN extends the JSON data model in two places.
 26 | 
 27 | 1. Allow `NaN`, `Infinity` and `-Infinity` for numeric values.
 28 | 2. Add a binary data type.
 29 | 
 30 | ## Unicode
 31 | 
 32 | JAXN does not require Unicode support beyond what is required by JSON.
 33 | 
 34 | A JAXN parser parses a sequence of bytes, the input data. The parser is...
 35 | 
 36 | * ...required to accept (correctly encoded) UTF-8 input data.
 37 |   * This is the recommended and only interoperable representation.
 38 | * ...allowed to accept (correctly encoded) UTF-16 or UTF-32 input data.
 39 | * ...allowed to accept a byte order marker (BOM) at the beginning of the input data.
 40 |   * This is only for UTF-16 or UTF-32, or other endian-dependent encodings.
 41 | * ...allowed to accept other encodings, provided that they are correctly identified (no guessing!) and unambiguously mapped to a sequence of Unicode code points.
 42 | * ...required to signal an error if it encounters an (encoding) error in the input data.
 43 | 
 44 | ## White-Space
 45 | 
 46 | JAXN does not allow white-space characters beyond those defined by JSON.
 47 | 
 48 | Some other libraries allow additional white-space characters, but we do not see a real-world use-case for those.
 49 | We believe users often add them by mistake and this is not a good-enough reason for us to allow them.
 50 | 
 51 | ## Newline
 52 | 
 53 | JAXN grammar allow well-formed documents to contain any sequence of 0x0A (Line feed or New line) and 0x0D (Carriage return) characters, mixed in any way, to be contained in the source data.
 54 | A JAXN parser is allowed (even expected) to further restrict the accepted end-of-line markers, for example to the system-native 0x0A (as it is common on Unix- and macOS-systems) or to require the sequence 0x0D, 0x0A (on Windows-systems).
 55 | This is necessary to report sensible position information in case of parse errors, as the line number in which an error occurs depends on the specific end-of-line markers allowed/expected for the input data.
 56 | 
 57 | ## Source Character Set
 58 | 
 59 | The source character set (i.e., the Unicode code points that may be contained in the input data) consists of HTAB (0x09), the end-of-line characters (0x0A, 0x0D), and all code points starting with space (0x20), except for `delete` (0x7F), i.e. (0x20-0x7E, 0x80-0x10FFFF).
 60 | 
 61 | JSON allows 0x7F although it is a control character (and all other control characters are explicitly excluded).
 62 | We consider this a mistake in the JSON specification, and do not allow 0x7F in JAXN.
 63 | 
 64 | If a JAXN parsers encounters a code point outside of the source character set, it must report an error.
 65 | 
 66 | JAXN does not *require* any non-ASCII characters in the input data.
 67 | All Unicode code points in the string values in the data model can be written in an escaped form in the input data.
 68 | JAXN documents can therefore, like JSON documents, be restricted to ASCII without losing expressiveness.
 69 | 
 70 | ## Comments
 71 | 
 72 | JAXN allows comments, however, they are a presentation detail and must not have any effect on the serialization tree, representation graph or events generated.
 73 | In particular, comments are not associated with a particular node.
 74 | This improves interoperability and ensures that the main concern why comments are not part of JSON is taken care of.
 75 | The usual purpose of a comment is to communicate between the human maintainers of a file.
 76 | A typical example is comments in a configuration file.
 77 | 
 78 | Michael Bolin writes:
 79 | 
 80 | > Because JSON is more concise than XML, JSON is often a better format for data files that are maintained by hand. Examples include configuration files, as well as blobs of test data for web applications. For files such as these, it is convenient to be able to temporarily comment out bits of information (such as a configuration option, or an old test value in lieu of a new one). Further, if the file is to be maintained by humans, it is desirable to be able to include comments so that maintainers may communicate amongst one another without interfering with the data in the file.
 81 | >
 82 | > [...]
 83 | >
 84 | > This begs the question: why aren't comments officially supported in JSON? Interestingly, when Douglas Crockford originally introduced JSON, there was explicit support for C-style comments. He [later dropped support for them in the specification](http://tech.groups.yahoo.com/group/json/message/156), but also [declared that a JSON decoder that accepts comments should be considered a valid JSON decoder](http://tech.groups.yahoo.com/group/json/message/152).
 85 | 
 86 | (Source: http://bolinfest.com/essays/json.html)
 87 | 
 88 | Note that comments sometimes don't interact nicely with strings.
 89 | If you try to comment out a parts of a document that contains strings, and if those strings contain the character sequence `*/`, using a block comment will fail.
 90 | This problem of block comments existed long before JAXN.
 91 | As JSON already allows escaping the slash with a backslash in strings, you might consider converting `*/` into `*\/` within the string in question, you will then be able to comment the string out (and in again) without problems.
 92 | 
 93 | The restrictions on the source character set also apply within comments.
 94 | 
 95 | ## Numbers
 96 | 
 97 | JAXN allows non-finite floating point values.
 98 | NaN and Infinity (as well as -Infinity) are well known, non-finite values from IEEE 754.
 99 | Real-world use-cases often require to deal with those values and providing a clear way to handle those non-finite values improves interoperability.
100 | A JAXN-compatible library is required to accept NaN and Infinity as valid numeric values for their internal data model.
101 | 
102 | ## Strings
103 | 
104 | JAXN keeps the JSON string data model intact.
105 | String values in JSON are required to be valid Unicode strings in order to be interoperable.
106 | The JSON RFC 8259 explains in paragraph 8.2 why this is the case.
107 | JAXN does *not* change this.
108 | Unlike some other libraries that allow escape sequences like `\xXX` for normal strings without specifying the semantics (properly), JAXN does not create ambiguity and confusion and does not require to store non-Unicode strings.
109 | 
110 | The sequence of represented Unicode code points is obtained from the sequence of representation code points by replacing escape sequences with the escaped code points (or, in the case of UTF-16 surrogates, temporary code units), and by merging the code units of subsequent high and low UTF-16 surrogates into a single code point.
111 | 
112 | > (RFC 8259 specifies how to encode code points not in the BMP with a 12-character encoding consisting of two `\uXXXX` escape sequences using UTF-16 surrogate pairs, but does not mandate a specific behaviour when the merging of surrogates fails, noting only that it could be "unpredictable" including "fatal".)
113 | 
114 | JAXN only allows complete UTF-16 surrogate pairs which are allowed to occur as escape sequences in strings.
115 | When the input character set is UTF-16, complete surrogate pairs are also allowed (unescaped) anywhere in the input, however escaped and unescaped surrogates can not be paired to form a surrogate pair.
116 | Other occurrences of surrogates are not allowed.
117 | 
118 | Merging of surrogate pairs, and the decision of whether a string contains unpaired surrogates, MUST be performed before concatentation of strings.
119 | 
120 | ## Binary Data
121 | 
122 | In real-world uses, one often needs to handle binary data.
123 | Representing this kind of data as strings requires, for example, hex- or base64-encoding.
124 | As JAXN recognizes the importance of binary data, we extend the data model of a JAXN-compatible library by an explicit binary type.
125 | For the representation in string form, we have chosen hex notation as base64 is human-unfriendly and adds additional implementation complexity.
126 | Having a binary type and a more direct representation allows for a more consise and reasonable representation.
127 | 
128 | Implementations must treat binary data as a separate data type.
129 | This increases interoperability with binary protocols like CBOR as well as providing a clear separation of readable strings from binary data.
130 | The latter is helpful when you are dumping data to, say, a log-file.
131 | 
132 | ## Unquoted Object Keys
133 | 
134 | Quoting Michael Bolin:
135 | 
136 | > Unfortunately, even though quoting is the minority case, JSON requires that all keys in maps must be double-quoted, regardless of whether they would need to be in ordinary JavaScript. Presumably this was done because it was the simplest way to guarantee that JSON would be a strict subset of ES3. (Fortunately, ES5 has evolved to allow JavaScript keywords to serve as unquoted property names in object literals.)
137 | >
138 | > Similar to the situation with trailing commas, if the design of JSON were not encumbered by the shortcomings of ES3, then I imagine that JSON keys would not have to be quoted. [...] In either case, demoting quoting from a requirement to an option would save most developers two bytes per key, which would be a win for both humans and machines. (I also think that it would make JSON more readable, though that may be a personal preference.)
139 | 
140 | (Source: http://bolinfest.com/essays/json.html)
141 | 
142 | ## Trailing Comma
143 | 
144 | Again, Michael Bolin provides a good rationale for trailing commas:
145 | 
146 | > Most modern browsers allow for a trailing comma in array and object literals in JavaScript. Although support for the trailing comma was not mandated until ES5, browsers such as Chrome and Firefox have supported it for a long time
147 | >
148 | > [...]
149 | >
150 | > Using the trailing comma is particularly convenient for developers who may modify the map in the course of development. As shown in the following example, commenting out the last entry in a map can inadvertently transform it into an object literal with a trailing comma:
151 | 
152 | ```json
153 | // Commenting out the last line produces an object literal with a
154 | // trailing comma.
155 | {
156 |   "margin": "2px",
157 |   // "padding": "3px"
158 | }
159 | ```
160 | 
161 | (Source: http://bolinfest.com/essays/json.html)
162 | 
163 | ## Conversion to JSON
164 | 
165 | A JAXN data value may contain values that have no direct representation in JSON.
166 | Those are the non-finite numeric values and binary data.
167 | A library may chose to report an error when conversion to JSON string representation is requested.
168 | If may also chose to replace those values with strings.
169 | A JAXN-compatible library should use the following strings:
170 | 
171 | * `"NaN"` for a NaN. No other strings should be used, e.g. `"nan"`, `"+NaN"` or `"-NaN"`.
172 | * `"Infinity"` and `"-Infinity"`. No other strings should be used, e.g. `"Inf"`, `"+Infinity"`, etc.
173 | * Binary data should be represented as a string containing the hex encoded data, e.g. `"496E66696E697479"`.
174 | 
175 | Copyright (c) 2017-2018 Daniel Frey and Dr. Colin Hirsch
176 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
 1 | # Welcome to JAXN
 2 | 
 3 | JAXN (pronounced "Jackson") is a standard that carefully extends [JSON](https://tools.ietf.org/html/rfc8259) with a few often-required additions to the data model, and with new syntax that makes it more human friendly.
 4 | 
 5 | > :exclamation: **JAXN IS CURRENTLY WORK-IN-PROGRESS** :exclamation:
 6 | >
 7 | > Until version 1.0 of JAXN is published, everything is considered work-in-progress, and anything might still change. Ideas, feedback and other input is welcome and appreciated. Please feel free to open an issue, or write to [`jaxn@icemx.net`](mailto:jaxn@icemx.net).
 8 | 
 9 | ## The JAXN Data Model
10 | 
11 | JAXN extends the JSON data model with the following points:
12 | 
13 | * Allows non-finite values `NaN`, `Infinity` and `-Infinity` for numbers.
14 | * Adds a new primitive type for values representing binary data.
15 | 
16 | ## The JAXN Text Representation
17 | 
18 | JAXN text representation extends the JSON text representation with the following points:
19 | 
20 | * [Comments](#comments)
21 | * [Numbers](#numbers)
22 | * [Strings](#strings)
23 | * [Binary Data](#binary-data)
24 | * [Unquoted Object Keys](#unquoted-object-keys)
25 | * [Trailing Comma](#trailing-comma)
26 | 
27 | #### Comments
28 | 
29 | * `# single-line comment`
30 | * `// single-line comment`
31 | * `/* block comment */`
32 | 
33 | #### Numbers
34 | 
35 | * Allow a leading `+` sign.
36 | * Allow omission of leading or trailing zeros, e.g. `.5`, or `42.`.
37 | * Add non-finite values `NaN` and `Infinity`.
38 | * Add hexadecimal integer values, e.g. `0xDEADBEEF`.
39 | 
40 | #### Strings
41 | 
42 | * Add single-quoted strings, e.g. `'This is a "single-quote" string. No really, it is!'`.
43 | * Add new escape sequences `\'`, `\v`, `\0` and `\u{X...}`.
44 | * Add multiline strings with no escape sequences.
45 | * Add concatenation of strings, e.g. `"Hello," + " world!"`.
46 | 
47 | #### Binary Data
48 | 
49 | * New primitive type that can represent arbitrary byte sequences.
50 | * Two syntactical variants that can be concatenated with each other.
51 | * Hexdumped binary, e.g. `$48656c6c6f2c20776f726c6421`.
52 |   * Allows optional dots, e.g. `$48.65.6c.6c.6f.2c.20.77.6f.72.6c.64.21`.
53 | * Binary strings, e.g. `$"Hello, \x77orld!"`.
54 |   * Only printable ASCII characters allowed, no control characters.
55 |   * No `\uXXXX` or `\u{...}` escape sequences allowed, instead:
56 |   * Add `\xXX` for arbitrary byte values.
57 | 
58 | #### Unquoted Object Keys
59 | 
60 | * Allow unquoted object keys, e.g. `{ foo: "Hello", bar: 42 }`.
61 | 
62 | #### Trailing Comma
63 | 
64 | * Allow `[1,2,3,]` and `{ foo: "Hello", bar: 42, }`.
65 | 
66 | ## More information
67 | 
68 | * [Specification](Specification.md)
69 | * [Discussion](Discussion.md)
70 | * [ABNF grammar](jaxn.abnf)
71 | 
72 | ## Libraries implementing JAXN
73 | 
74 | * [taocpp/json](https://github.com/taocpp/json)
75 | * ...
76 | 
77 | Copyright (c) 2017-2018 Daniel Frey and Dr. Colin Hirsch
78 | 


--------------------------------------------------------------------------------
/Specification.md:
--------------------------------------------------------------------------------
  1 | # Specification
  2 | 
  3 | This document is the normative specification of JAXN.
  4 | 
  5 | JAXN is a data representation and interchange format based on JSON.
  6 | 
  7 | Only the differences between JAXN and JSON are specified.
  8 | 
  9 | JSON is to be understood as the version defined in [RFC 8259](https://tools.ietf.org/html/rfc8259).
 10 | 
 11 | * [Restrictions](#restrictions)
 12 | * [Comments](#comments)
 13 | * [Numbers](#numbers)
 14 | * [Strings](#strings)
 15 | * [Binary Data](#binary-data)
 16 | * [Unquoted Names in Objects](#unquoted-names-in-objects)
 17 | * [Trailing Comma](#trailing-comma)
 18 | 
 19 | Note: The grammar rules given below are excerpts from the complete [JAXN grammar](jaxn.abnf).
 20 | The JAXN grammar is based on the JSON grammar, and both are in ABNF syntax as defined in [RFC 5234](https://tools.ietf.org/html/rfc5234).
 21 | 
 22 | ## Restrictions
 23 | 
 24 | JAXN is mostly a superset of JSON in that every JSON text is a JAXN text that represents the same value, however the following points restrict which JSON texts are also JAXN:
 25 | 
 26 | * A document is considered valid when it validates against the JAXN grammar *and* when all additional restrictions are met.
 27 | * Duplicate names are not allowed in objects; the behaviour in the presence of duplicate names is implementation defined.
 28 | * The ASCII control character `%x7F` MUST NOT appear in a JAXN text (it may be part of a string or binary value when quoted appropriately).
 29 | 
 30 | ## Comments
 31 | 
 32 | #### Examples
 33 | 
 34 | * `# single-line comment`
 35 | * `// single-line comment`
 36 | * `/* block comment */`
 37 | 
 38 | #### Grammar
 39 | 
 40 | ```abnf
 41 | comment = c-line / c-block
 42 | 
 43 | c-line = c-begin-line *( c-char )
 44 | 
 45 | c-begin-line = %x23 / %x2F.2F ; # or //
 46 | 
 47 | c-char = %x09 / %x20-7E / %x80-10FFFF
 48 |                               ; Any HTAB or printable character
 49 | 
 50 | c-block = c-begin-block
 51 |           *( c-no-star / ( 1*c-star c-no-star-or-slash ) )
 52 |           c-end-block
 53 | 
 54 | c-begin-block = c-slash c-star
 55 | c-end-block = 1*c-star c-slash
 56 | 
 57 | c-slash = %x2F                ; /
 58 | c-star = %x2A                 ; *
 59 | 
 60 | c-no-star = %x09 / %x0A / %x0D /
 61 |             %x20-29 / %x2B-7E / %x80-10FFFF
 62 | 
 63 | c-no-star-or-slash = %x09 / %x0A / %x0D /
 64 |                      %x20-29 / %x2B-2E / %x30-7E / %x80-10FFFF
 65 | 
 66 | ws = *( %x20 /                ; Space
 67 |         %x09 /                ; Horizontal tab
 68 |         %x0A /                ; Line feed or New line
 69 |         %x0D /                ; Carriage return
 70 |         comment )             ; Comment
 71 | ```
 72 | 
 73 | #### Semantics
 74 | 
 75 | Comments change the representation of data but have no effect on which data is represented.
 76 | 
 77 | #### Notes
 78 | 
 79 | Single-line comments may not contain additional control characters.
 80 | A single-line comment ends at either the end of the line, or at the end of the input, whichever is encountered first.
 81 | 
 82 | Block comments do not nest.
 83 | In other words, occurrences of `/*` within a block comment are not interpreted as anything else other than part of the comment.
 84 | 
 85 | ## Numbers
 86 | 
 87 | #### Synopsis
 88 | 
 89 | Allow non-finite values, hexadecimal notation of integer values, an optional leading plus sign, and relax the rules for redundant zeros.
 90 | 
 91 | #### Examples
 92 | 
 93 | * `42.`
 94 | * `+.5`
 95 | * `NaN`
 96 | * `Infinity`
 97 | * `-Infinity`
 98 | * `0xDEADBEEF`
 99 | 
100 | #### Grammar
101 | 
102 | ```abnf
103 | number = [ plus / minus ] ( nan / inf / hex / dec )
104 | 
105 | nan = %x4E.61.4E              ; NaN
106 | 
107 | inf = %x49.6E.66.69.6E.69.74.79
108 |                               ; Infinity
109 | 
110 | hex = zero x 1*HEXDIG         ; 0xXXX...
111 | 
112 | dec = ( int [ frac0 ] / frac1 ) [ exp ]
113 | 
114 | decimal-point = %x2E          ; .
115 | 
116 | digit1-9 = %x31-39            ; 1-9
117 | 
118 | e = %x65 / %x45               ; e E
119 | x = %x78 / %x58               ; x X
120 | 
121 | exp = e [ plus / minus ] 1*DIGIT
122 | 
123 | frac0 = decimal-point *DIGIT
124 | frac1 = decimal-point 1*DIGIT
125 | 
126 | int = zero / ( digit1-9 *DIGIT )
127 | 
128 | plus = %x2B                   ; +
129 | minus = %x2D                  ; -
130 | zero = %x30                   ; 0
131 | ```
132 | 
133 | #### Notes
134 | 
135 | JAXN adds non-finite values to the data model that can not be represented in JSON.
136 | The spelling of the identifiers is case-sensitive.
137 | JAXN allows `+NaN` and `-NaN` as alternatives for `NaN`, as well as `+Infinity` as an alternative for `Infinity`.
138 | 
139 | All other extensions are a presentation detail and must not have any effect on the serialization tree, representation graph or events generated.
140 | 
141 | The permissible magnitude and precision of numbers is implementation defined.
142 | It must allow at least IEEE 754 double-precision floating point numbers.
143 | 
144 | ## Strings
145 | 
146 | #### Synopsis
147 | 
148 | Allow single-quoted strings, additional escape sequences, multiline strings, and string concatenation.
149 | 
150 | #### Examples
151 | 
152 | * `"Add \0 or \v, even \' is allowed in a string."`
153 | * `'That\'s right, you need to escape single-quotes in a single-quoted string.'`
154 | * `'Oh, and \" is allowed even in a single-quote string.'`
155 | * `"\u{1D11E} was my first love " + "and it will be my last."`
156 | * `"""String with a \ and " characters - no escape sequences,`<br>`may contain line breaks"""`
157 | 
158 | #### Grammar
159 | 
160 | ```abnf
161 | string = string-part *( value-concat string-part )
162 | 
163 | string-part = m-d-string / m-s-string / d-string / s-string
164 | 
165 | m-d-string = 3d-quote *( m-d-char ) 3d-quote
166 | m-s-string = 3s-quote *( m-s-char ) 3s-quote
167 | 
168 | m-d-char = *2d-quote ( %x09 / %x0A / %x0D / %x20-21 / %x23-7E / %x80-10FFFF )
169 | m-s-char = *2s-quote ( %x09 / %x0A / %x0D / %x20-26 / %x28-7E / %x80-10FFFF )
170 | 
171 | d-string = d-quote *( s-char / s-quote ) d-quote
172 | s-string = s-quote *( s-char / d-quote ) s-quote
173 | 
174 | s-char = unescaped /
175 |          escape (
176 |              %x22 /           ; "    double quote    U+0022
177 |              %x27 /           ; '    single quote    U+0027
178 |              %x5C /           ; \    reverse solidus U+005C
179 |              %x2F /           ; /    solidus         U+002F
180 |              %x30 /           ; 0    nul             U+0000
181 |              %x62 /           ; b    backspace       U+0008
182 |              %x66 /           ; f    form feed       U+000C
183 |              %x6E /           ; n    line feed       U+000A
184 |              %x72 /           ; r    carriage return U+000D
185 |              %x74 /           ; t    tab             U+0009
186 |              %x76 /           ; v    vtab            U+000B
187 |              %x75 4HEXDIG /   ; uXXXX                U+XXXX
188 |              %x75 %x7B 1*HEXDIG %x7D )
189 |                               ; u{X...}              U+X...
190 | 
191 | escape = %x5C                 ; \
192 | d-quote = %x22                ; "
193 | s-quote = %x27                ; '
194 | 
195 | unescaped = %x20-21 / %x23-26 / %x28-5B / %x5D-7E / %x80-10FFFF
196 | ```
197 | 
198 | #### Notes
199 | 
200 | * Each string (in a concatenation: individually) **MUST** be a sequence of Unicode characters.
201 | * `\uXXXX` with UTF-16 surrogates **MUST** be handled before concatenation.
202 | * Unpaired UTF-16 surrogates **MUST NOT** appear in the string representation.
203 | * `\u{X...}` **MUST NOT** encode surrogates (i.e. the represented string is not allowed to contain UTF-16 surrogates).
204 | * Multiline strings:
205 |   * Are surrounded by three quotation marks (single or double quote) on each side and allow newline.
206 |   * Can contain any sequence of non-control characters except for three matching closing quotation marks.
207 |   * Do not interpret escape sequences, a backslash `\` is just a literal backslash.
208 |   * A newline immediately following the opening delimiter is trimmed.
209 |   * All other characters remain intact.
210 | * Concatenations can mix single- and double-quoted strings as well as single- or multiline-strings.
211 | * Concatenation is a presentation detail and must not have any effect on the serialization tree, representation graph or events generated.
212 |   It happens before the final string is passed on from the parser.
213 | 
214 | ## Binary Data
215 | 
216 | #### Synopsis
217 | 
218 | Allow binary data as a separate type, in two forms: Hexdump or string.
219 | 
220 | #### Examples
221 | 
222 | * `$"Hello, \x77orld!"` (binary string)
223 | * `$48656c6c6f2c20776f726c6421` (binary hex)
224 | * `$48656c6c6f.2c20.776f726c64.21`
225 | * `$48.65.6c.6c.6f.2c.20.77.6f.72.6c.64.21`
226 | 
227 | #### Grammar
228 | 
229 | ```abnf
230 | binary = b-value *( value-concat b-value )
231 | 
232 | b-value = dollar [ b-string / b-direct ]
233 | 
234 | b-string = b-s-string / b-d-string
235 | 
236 | b-d-string = d-quote *( b-char / s-quote ) d-quote
237 | b-s-string = s-quote *( b-char / d-quote ) s-quote
238 | 
239 | b-char = b-unescaped /
240 |          escape (
241 |              %x22 /           ; "    double quote    0x22
242 |              %x27 /           ; '    single quote    0x27
243 |              %x5C /           ; \    reverse solidus 0x5C
244 |              %x2F /           ; /    solidus         0x2F
245 |              %x30 /           ; 0    nul             0x00
246 |              %x62 /           ; b    backspace       0x08
247 |              %x66 /           ; f    form feed       0x0C
248 |              %x6E /           ; n    line feed       0x0A
249 |              %x72 /           ; r    carriage return 0x0D
250 |              %x74 /           ; t    tab             0x09
251 |              %x76 /           ; v    vtab            0x0B
252 |              %x78 2HEXDIG )   ; xXX                  0xXX
253 | 
254 | b-unescaped = %x20-21 / %x23-26 / %x28-5B / %x5D-7E
255 | 
256 | b-direct = b-part *( dot b-part )
257 | 
258 | b-part = 1*b-byte
259 | 
260 | b-byte = 2HEXDIG
261 | 
262 | dollar = %x24                 ; $
263 | dot = %x2E                    ; .
264 | ```
265 | 
266 | #### Notes
267 | 
268 | * Binary data represents arbitrary byte sequences (not Unicode strings).
269 | * Binary strings allow only "printable" ASCII characters, no control characters.
270 | * No `\uXXXX` or `\u{...}` escape sequences allowed, instead:
271 | * Escape sequence `\xXX` for arbitrary byte values.
272 | * Concatenations can mix single- and double-quoted binary strings as well as hexdumped data.
273 | * Concatenation is a presentation detail and must not have any effect on the serialization tree, representation graph or events generated.
274 |   It happens before the final binary value is passed on from the parser.
275 | 
276 | ## Unquoted Names in Objects
277 | 
278 | #### Synopsis
279 | 
280 | Allow identifiers as unquoted names in objects.
281 | 
282 | #### Example
283 | 
284 | `{ foo: "Hello", bar: 42 }`
285 | 
286 | #### Grammar
287 | 
288 | ```abnf
289 | member = key name-separator value
290 | 
291 | key = string / identifier
292 | 
293 | identifier = i-begin *i-continue
294 | 
295 | i-begin = ALPHA / %x5F
296 | i-continue = i-begin / DIGIT
297 | ```
298 | 
299 | #### Notes
300 | 
301 | Names in objects are strings; the tokens `true`, `null`, and `false` are unambiguous shortcuts for `"true"`, `"null"`, and `"false"` when used within an object where a name is expected.
302 | Strings in their role as names in objects can use the extended syntax for strings including the single-quoted variant, additional escape sequences, and string concatenation, however unquoted names in objects can **not** be concatenated.
303 | 
304 | Unquoted names in objects are a presentation detail and must not have any effect on the serialization tree, representation graph or events generated, i.e. they are passed on as normal strings from the parser.
305 | 
306 | ## Trailing Comma
307 | 
308 | #### Synopsis
309 | 
310 | Allow trailing commas in arrays and objects.
311 | 
312 | #### Examples
313 | 
314 | * `[ 1, 2, 3, ]`
315 | * `{ foo: "Hello", bar: 42, }`
316 | 
317 | #### Grammar
318 | 
319 | ```abnf
320 | value-sep-opt = [ value-separator ]
321 | 
322 | array = begin-array
323 |         [ value *( value-separator value ) value-sep-opt ]
324 |         end-array
325 | 
326 | object = begin-object
327 |          [ member *( value-separator member ) value-sep-opt ]
328 |          end-object
329 | ```
330 | 
331 | #### Semantics
332 | 
333 | The additional commas have no semantics.
334 | 
335 | #### Notes
336 | 
337 | The above grammar does not allow for adjacent commas (`[1,,2]`), a leading comma (`[,1]`), or placing a comma in an empty array or object (`[,]`).
338 | 
339 | Trailing commas are a presentation detail and must not have any effect on the serialization tree, representation graph or events generated.
340 | 
341 | Copyright (c) 2017-2018 Daniel Frey and Dr. Colin Hirsch
342 | 


--------------------------------------------------------------------------------
/TestSuite.md:
--------------------------------------------------------------------------------
 1 | # JAXN Test Suite
 2 | 
 3 | > **Please note that this test suite does not yet exist**, this section is to discuss its development...
 4 | 
 5 | The JAXN test suite is intended to cover all aspects and details of the JAXN encoding.
 6 | A library that passes all tests can be considered JAXN compliant.
 7 | The test suite contains different categories of testcases, each consisting of one or more data files.
 8 | 
 9 | For the purpose of this document, two JSON files, or two JAXN files, are considered equivalent if they describe the same data.
10 | This can be checked by first parsing and then printing the two files with *TBD tools in taocpp/json* and checking whether the outputs are equal.
11 | 
12 | ## Invalid JAXN
13 | 
14 | Each testcase consists of one input file that does not conform to the JAXN standard.
15 | Reading such a file must generate an error.
16 | 
17 | ## JAXN encoded JSON
18 | 
19 | Each testcase consists of one valid JAXN input file, and one valid JSON reference file.
20 | The JAXN input files in this category remain within the JSON data model.
21 | The JAXN input file must be parsed and then printed to a JSON output file.
22 | The test passes if the JSON output file and the JSON reference file are equivalent.
23 | 
24 | ## JAXN beyond JSON
25 | 
26 | Each testcase consists of one valid JAXN input file, and one valid JAXN reference file.
27 | The JAXN input file must be parsed and then printed to a JAXN output file.
28 | The test passes if the JAXN output file and the JAXN reference file are equivalent.
29 | 
30 | Copyright (c) 2017-2018 Daniel Frey and Dr. Colin Hirsch
31 | 


--------------------------------------------------------------------------------
/jaxn.abnf:
--------------------------------------------------------------------------------
  1 | ; The JAXN grammar (https://github.com/stand-art/jaxn/)
  2 | 
  3 | ; Based on the JSON grammar, see RFC 8259.
  4 | ; See RFC 5234 for interpretation and core rules.
  5 | 
  6 | comment = c-line / c-block
  7 | 
  8 | c-line = c-begin-line *( c-char )
  9 | 
 10 | c-begin-line = %x23 / %x2F.2F ; # or //
 11 | 
 12 | c-char = %x09 / %x20-7E / %x80-10FFFF
 13 |                               ; Any HTAB or printable character
 14 | 
 15 | c-block = c-begin-block
 16 |           *( c-no-star / ( 1*c-star c-no-star-or-slash ) )
 17 |           c-end-block
 18 | 
 19 | c-begin-block = c-slash c-star
 20 | c-end-block = 1*c-star c-slash
 21 | 
 22 | c-slash = %x2F                ; /
 23 | c-star = %x2A                 ; *
 24 | 
 25 | c-no-star = %x09 / %x0A / %x0D /
 26 |             %x20-29 / %x2B-7E / %x80-10FFFF
 27 | 
 28 | c-no-star-or-slash = %x09 / %x0A / %x0D /
 29 |                      %x20-29 / %x2B-2E / %x30-7E / %x80-10FFFF
 30 | 
 31 | ws = *( %x20 /                ; Space
 32 |         %x09 /                ; Horizontal tab
 33 |         %x0A /                ; Line feed or New line
 34 |         %x0D /                ; Carriage return
 35 |         comment )             ; Comment
 36 | 
 37 | begin-array     = ws %x5B ws  ; [ left square bracket
 38 | begin-object    = ws %x7B ws  ; { left curly bracket
 39 | end-array       = ws %x5D ws  ; ] right square bracket
 40 | end-object      = ws %x7D ws  ; } right curly bracket
 41 | name-separator  = ws %x3A ws  ; : colon
 42 | value-separator = ws %x2C ws  ; , comma
 43 | value-concat    = ws %x2B ws  ; + plus
 44 | 
 45 | value-sep-opt = [ value-separator ]
 46 | 
 47 | null  = %x6E.75.6C.6C         ; null
 48 | true  = %x74.72.75.65         ; true
 49 | false = %x66.61.6C.73.65      ; false
 50 | 
 51 | number = [ plus / minus ] ( nan / inf / hex / dec )
 52 | 
 53 | nan = %x4E.61.4E              ; NaN
 54 | 
 55 | inf = %x49.6E.66.69.6E.69.74.79
 56 |                               ; Infinity
 57 | 
 58 | hex = zero x 1*HEXDIG         ; 0xXXX...
 59 | 
 60 | dec = ( int [ frac0 ] / frac1 ) [ exp ]
 61 | 
 62 | decimal-point = %x2E          ; .
 63 | 
 64 | digit1-9 = %x31-39            ; 1-9
 65 | 
 66 | e = %x65 / %x45               ; e E
 67 | x = %x78 / %x58               ; x X
 68 | 
 69 | exp = e [ plus / minus ] 1*DIGIT
 70 | 
 71 | frac0 = decimal-point *DIGIT
 72 | frac1 = decimal-point 1*DIGIT
 73 | 
 74 | int = zero / ( digit1-9 *DIGIT )
 75 | 
 76 | plus = %x2B                   ; +
 77 | minus = %x2D                  ; -
 78 | zero = %x30                   ; 0
 79 | 
 80 | string = string-part *( value-concat string-part )
 81 | 
 82 | string-part = m-d-string / m-s-string / d-string / s-string
 83 | 
 84 | m-d-string = 3d-quote *( m-d-char ) 3d-quote
 85 | m-s-string = 3s-quote *( m-s-char ) 3s-quote
 86 | 
 87 | m-d-char = *2d-quote ( %x09 / %x0A / %x0D / %x20-21 / %x23-7E / %x80-10FFFF )
 88 | m-s-char = *2s-quote ( %x09 / %x0A / %x0D / %x20-26 / %x28-7E / %x80-10FFFF )
 89 | 
 90 | d-string = d-quote *( s-char / s-quote ) d-quote
 91 | s-string = s-quote *( s-char / d-quote ) s-quote
 92 | 
 93 | s-char = unescaped /
 94 |          escape (
 95 |              %x22 /           ; "    double quote    U+0022
 96 |              %x27 /           ; '    single quote    U+0027
 97 |              %x5C /           ; \    reverse solidus U+005C
 98 |              %x2F /           ; /    solidus         U+002F
 99 |              %x30 /           ; 0    nul             U+0000
100 |              %x62 /           ; b    backspace       U+0008
101 |              %x66 /           ; f    form feed       U+000C
102 |              %x6E /           ; n    line feed       U+000A
103 |              %x72 /           ; r    carriage return U+000D
104 |              %x74 /           ; t    tab             U+0009
105 |              %x76 /           ; v    vtab            U+000B
106 |              %x75 4HEXDIG /   ; uXXXX                U+XXXX
107 |              %x75 %x7B 1*HEXDIG %x7D )
108 |                               ; u{X...}              U+X...
109 | 
110 | escape = %x5C                 ; \
111 | d-quote = %x22                ; "
112 | s-quote = %x27                ; '
113 | 
114 | unescaped = %x20-21 / %x23-26 / %x28-5B / %x5D-7E / %x80-10FFFF
115 | 
116 | binary = b-value *( value-concat b-value )
117 | 
118 | b-value = dollar [ b-string / b-direct ]
119 | 
120 | b-string = b-s-string / b-d-string
121 | 
122 | b-d-string = d-quote *( b-char / s-quote ) d-quote
123 | b-s-string = s-quote *( b-char / d-quote ) s-quote
124 | 
125 | b-char = b-unescaped /
126 |          escape (
127 |              %x22 /           ; "    double quote    0x22
128 |              %x27 /           ; '    single quote    0x27
129 |              %x5C /           ; \    reverse solidus 0x5C
130 |              %x2F /           ; /    solidus         0x2F
131 |              %x30 /           ; 0    nul             0x00
132 |              %x62 /           ; b    backspace       0x08
133 |              %x66 /           ; f    form feed       0x0C
134 |              %x6E /           ; n    line feed       0x0A
135 |              %x72 /           ; r    carriage return 0x0D
136 |              %x74 /           ; t    tab             0x09
137 |              %x76 /           ; v    vtab            0x0B
138 |              %x78 2HEXDIG )   ; xXX                  0xXX
139 | 
140 | b-unescaped = %x20-21 / %x23-26 / %x28-5B / %x5D-7E
141 | 
142 | b-direct = b-part *( dot b-part )
143 | 
144 | b-part = 1*b-byte
145 | 
146 | b-byte = 2HEXDIG
147 | 
148 | dollar = %x24                 ; $
149 | dot = %x2E                    ; .
150 | 
151 | array = begin-array
152 |         [ value *( value-separator value ) value-sep-opt ]
153 |         end-array
154 | 
155 | object = begin-object
156 |          [ member *( value-separator member ) value-sep-opt ]
157 |          end-object
158 | 
159 | member = key name-separator value
160 | 
161 | key = string / identifier
162 | 
163 | identifier = i-begin *i-continue
164 | 
165 | i-begin = ALPHA / %x5F
166 | i-continue = i-begin / DIGIT
167 | 
168 | value = false / null / true / object / array / number / string / binary
169 | 
170 | JAXN-text = ws value ws
171 | 


--------------------------------------------------------------------------------