├── Discussion.md ├── README.md ├── Specification.md ├── TestSuite.md └── jaxn.abnf /Discussion.md: -------------------------------------------------------------------------------- 1 | # Discussion 2 | 3 | The following sections discuss the syntax and semantics of the extensions that JAXN brings to JSON, as well as rejected extensions that will not be added to JAXN. 4 | 5 | * [Data Model](#data-model) 6 | * [Unicode](#unicode) 7 | * [White-Space](#white-space) 8 | * [Newline](#newline) 9 | * [Source Character Set](#source-character-set) 10 | * [Comments](#comments) 11 | * [Numbers](#numbers) 12 | * [Strings](#strings) 13 | * [Binary Data](#binary-data) 14 | * [Unquoted Object Keys](#unquoted-object-keys) 15 | * [Trailing Comma](#trailing-comma) 16 | * [Conversion to JSON](#conversion-to-json) 17 | 18 | ## Data Model 19 | 20 | Most "relaxed JSON" extensions focus on the syntax of the string representation. 21 | They sometimes do extend the data model, but they don't say so clearly or are even unaware of it. 22 | JAXN goes further, by clearly specifying which additional values and data types a library should support. 23 | This allows users to know what to expect from a JAXN-compatible library, or, looking at it from the other side, search for a JAXN-compatible library when they know that they need these extensions to the data model. 24 | 25 | JAXN extends the JSON data model in two places. 26 | 27 | 1. Allow `NaN`, `Infinity` and `-Infinity` for numeric values. 28 | 2. Add a binary data type. 29 | 30 | ## Unicode 31 | 32 | JAXN does not require Unicode support beyond what is required by JSON. 33 | 34 | A JAXN parser parses a sequence of bytes, the input data. The parser is... 35 | 36 | * ...required to accept (correctly encoded) UTF-8 input data. 37 | * This is the recommended and only interoperable representation. 38 | * ...allowed to accept (correctly encoded) UTF-16 or UTF-32 input data. 39 | * ...allowed to accept a byte order marker (BOM) at the beginning of the input data. 40 | * This is only for UTF-16 or UTF-32, or other endian-dependent encodings. 41 | * ...allowed to accept other encodings, provided that they are correctly identified (no guessing!) and unambiguously mapped to a sequence of Unicode code points. 42 | * ...required to signal an error if it encounters an (encoding) error in the input data. 43 | 44 | ## White-Space 45 | 46 | JAXN does not allow white-space characters beyond those defined by JSON. 47 | 48 | Some other libraries allow additional white-space characters, but we do not see a real-world use-case for those. 49 | We believe users often add them by mistake and this is not a good-enough reason for us to allow them. 50 | 51 | ## Newline 52 | 53 | JAXN grammar allow well-formed documents to contain any sequence of 0x0A (Line feed or New line) and 0x0D (Carriage return) characters, mixed in any way, to be contained in the source data. 54 | A JAXN parser is allowed (even expected) to further restrict the accepted end-of-line markers, for example to the system-native 0x0A (as it is common on Unix- and macOS-systems) or to require the sequence 0x0D, 0x0A (on Windows-systems). 55 | This is necessary to report sensible position information in case of parse errors, as the line number in which an error occurs depends on the specific end-of-line markers allowed/expected for the input data. 56 | 57 | ## Source Character Set 58 | 59 | The source character set (i.e., the Unicode code points that may be contained in the input data) consists of HTAB (0x09), the end-of-line characters (0x0A, 0x0D), and all code points starting with space (0x20), except for `delete` (0x7F), i.e. (0x20-0x7E, 0x80-0x10FFFF). 60 | 61 | JSON allows 0x7F although it is a control character (and all other control characters are explicitly excluded). 62 | We consider this a mistake in the JSON specification, and do not allow 0x7F in JAXN. 63 | 64 | If a JAXN parsers encounters a code point outside of the source character set, it must report an error. 65 | 66 | JAXN does not *require* any non-ASCII characters in the input data. 67 | All Unicode code points in the string values in the data model can be written in an escaped form in the input data. 68 | JAXN documents can therefore, like JSON documents, be restricted to ASCII without losing expressiveness. 69 | 70 | ## Comments 71 | 72 | JAXN allows comments, however, they are a presentation detail and must not have any effect on the serialization tree, representation graph or events generated. 73 | In particular, comments are not associated with a particular node. 74 | This improves interoperability and ensures that the main concern why comments are not part of JSON is taken care of. 75 | The usual purpose of a comment is to communicate between the human maintainers of a file. 76 | A typical example is comments in a configuration file. 77 | 78 | Michael Bolin writes: 79 | 80 | > Because JSON is more concise than XML, JSON is often a better format for data files that are maintained by hand. Examples include configuration files, as well as blobs of test data for web applications. For files such as these, it is convenient to be able to temporarily comment out bits of information (such as a configuration option, or an old test value in lieu of a new one). Further, if the file is to be maintained by humans, it is desirable to be able to include comments so that maintainers may communicate amongst one another without interfering with the data in the file. 81 | > 82 | > [...] 83 | > 84 | > This begs the question: why aren't comments officially supported in JSON? Interestingly, when Douglas Crockford originally introduced JSON, there was explicit support for C-style comments. He [later dropped support for them in the specification](http://tech.groups.yahoo.com/group/json/message/156), but also [declared that a JSON decoder that accepts comments should be considered a valid JSON decoder](http://tech.groups.yahoo.com/group/json/message/152). 85 | 86 | (Source: http://bolinfest.com/essays/json.html) 87 | 88 | Note that comments sometimes don't interact nicely with strings. 89 | If you try to comment out a parts of a document that contains strings, and if those strings contain the character sequence `*/`, using a block comment will fail. 90 | This problem of block comments existed long before JAXN. 91 | As JSON already allows escaping the slash with a backslash in strings, you might consider converting `*/` into `*\/` within the string in question, you will then be able to comment the string out (and in again) without problems. 92 | 93 | The restrictions on the source character set also apply within comments. 94 | 95 | ## Numbers 96 | 97 | JAXN allows non-finite floating point values. 98 | NaN and Infinity (as well as -Infinity) are well known, non-finite values from IEEE 754. 99 | Real-world use-cases often require to deal with those values and providing a clear way to handle those non-finite values improves interoperability. 100 | A JAXN-compatible library is required to accept NaN and Infinity as valid numeric values for their internal data model. 101 | 102 | ## Strings 103 | 104 | JAXN keeps the JSON string data model intact. 105 | String values in JSON are required to be valid Unicode strings in order to be interoperable. 106 | The JSON RFC 8259 explains in paragraph 8.2 why this is the case. 107 | JAXN does *not* change this. 108 | Unlike some other libraries that allow escape sequences like `\xXX` for normal strings without specifying the semantics (properly), JAXN does not create ambiguity and confusion and does not require to store non-Unicode strings. 109 | 110 | The sequence of represented Unicode code points is obtained from the sequence of representation code points by replacing escape sequences with the escaped code points (or, in the case of UTF-16 surrogates, temporary code units), and by merging the code units of subsequent high and low UTF-16 surrogates into a single code point. 111 | 112 | > (RFC 8259 specifies how to encode code points not in the BMP with a 12-character encoding consisting of two `\uXXXX` escape sequences using UTF-16 surrogate pairs, but does not mandate a specific behaviour when the merging of surrogates fails, noting only that it could be "unpredictable" including "fatal".) 113 | 114 | JAXN only allows complete UTF-16 surrogate pairs which are allowed to occur as escape sequences in strings. 115 | When the input character set is UTF-16, complete surrogate pairs are also allowed (unescaped) anywhere in the input, however escaped and unescaped surrogates can not be paired to form a surrogate pair. 116 | Other occurrences of surrogates are not allowed. 117 | 118 | Merging of surrogate pairs, and the decision of whether a string contains unpaired surrogates, MUST be performed before concatentation of strings. 119 | 120 | ## Binary Data 121 | 122 | In real-world uses, one often needs to handle binary data. 123 | Representing this kind of data as strings requires, for example, hex- or base64-encoding. 124 | As JAXN recognizes the importance of binary data, we extend the data model of a JAXN-compatible library by an explicit binary type. 125 | For the representation in string form, we have chosen hex notation as base64 is human-unfriendly and adds additional implementation complexity. 126 | Having a binary type and a more direct representation allows for a more consise and reasonable representation. 127 | 128 | Implementations must treat binary data as a separate data type. 129 | This increases interoperability with binary protocols like CBOR as well as providing a clear separation of readable strings from binary data. 130 | The latter is helpful when you are dumping data to, say, a log-file. 131 | 132 | ## Unquoted Object Keys 133 | 134 | Quoting Michael Bolin: 135 | 136 | > Unfortunately, even though quoting is the minority case, JSON requires that all keys in maps must be double-quoted, regardless of whether they would need to be in ordinary JavaScript. Presumably this was done because it was the simplest way to guarantee that JSON would be a strict subset of ES3. (Fortunately, ES5 has evolved to allow JavaScript keywords to serve as unquoted property names in object literals.) 137 | > 138 | > Similar to the situation with trailing commas, if the design of JSON were not encumbered by the shortcomings of ES3, then I imagine that JSON keys would not have to be quoted. [...] In either case, demoting quoting from a requirement to an option would save most developers two bytes per key, which would be a win for both humans and machines. (I also think that it would make JSON more readable, though that may be a personal preference.) 139 | 140 | (Source: http://bolinfest.com/essays/json.html) 141 | 142 | ## Trailing Comma 143 | 144 | Again, Michael Bolin provides a good rationale for trailing commas: 145 | 146 | > Most modern browsers allow for a trailing comma in array and object literals in JavaScript. Although support for the trailing comma was not mandated until ES5, browsers such as Chrome and Firefox have supported it for a long time 147 | > 148 | > [...] 149 | > 150 | > Using the trailing comma is particularly convenient for developers who may modify the map in the course of development. As shown in the following example, commenting out the last entry in a map can inadvertently transform it into an object literal with a trailing comma: 151 | 152 | ```json 153 | // Commenting out the last line produces an object literal with a 154 | // trailing comma. 155 | { 156 | "margin": "2px", 157 | // "padding": "3px" 158 | } 159 | ``` 160 | 161 | (Source: http://bolinfest.com/essays/json.html) 162 | 163 | ## Conversion to JSON 164 | 165 | A JAXN data value may contain values that have no direct representation in JSON. 166 | Those are the non-finite numeric values and binary data. 167 | A library may chose to report an error when conversion to JSON string representation is requested. 168 | If may also chose to replace those values with strings. 169 | A JAXN-compatible library should use the following strings: 170 | 171 | * `"NaN"` for a NaN. No other strings should be used, e.g. `"nan"`, `"+NaN"` or `"-NaN"`. 172 | * `"Infinity"` and `"-Infinity"`. No other strings should be used, e.g. `"Inf"`, `"+Infinity"`, etc. 173 | * Binary data should be represented as a string containing the hex encoded data, e.g. `"496E66696E697479"`. 174 | 175 | Copyright (c) 2017-2018 Daniel Frey and Dr. Colin Hirsch 176 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Welcome to JAXN 2 | 3 | JAXN (pronounced "Jackson") is a standard that carefully extends [JSON](https://tools.ietf.org/html/rfc8259) with a few often-required additions to the data model, and with new syntax that makes it more human friendly. 4 | 5 | > :exclamation: **JAXN IS CURRENTLY WORK-IN-PROGRESS** :exclamation: 6 | > 7 | > Until version 1.0 of JAXN is published, everything is considered work-in-progress, and anything might still change. Ideas, feedback and other input is welcome and appreciated. Please feel free to open an issue, or write to [`jaxn@icemx.net`](mailto:jaxn@icemx.net). 8 | 9 | ## The JAXN Data Model 10 | 11 | JAXN extends the JSON data model with the following points: 12 | 13 | * Allows non-finite values `NaN`, `Infinity` and `-Infinity` for numbers. 14 | * Adds a new primitive type for values representing binary data. 15 | 16 | ## The JAXN Text Representation 17 | 18 | JAXN text representation extends the JSON text representation with the following points: 19 | 20 | * [Comments](#comments) 21 | * [Numbers](#numbers) 22 | * [Strings](#strings) 23 | * [Binary Data](#binary-data) 24 | * [Unquoted Object Keys](#unquoted-object-keys) 25 | * [Trailing Comma](#trailing-comma) 26 | 27 | #### Comments 28 | 29 | * `# single-line comment` 30 | * `// single-line comment` 31 | * `/* block comment */` 32 | 33 | #### Numbers 34 | 35 | * Allow a leading `+` sign. 36 | * Allow omission of leading or trailing zeros, e.g. `.5`, or `42.`. 37 | * Add non-finite values `NaN` and `Infinity`. 38 | * Add hexadecimal integer values, e.g. `0xDEADBEEF`. 39 | 40 | #### Strings 41 | 42 | * Add single-quoted strings, e.g. `'This is a "single-quote" string. No really, it is!'`. 43 | * Add new escape sequences `\'`, `\v`, `\0` and `\u{X...}`. 44 | * Add multiline strings with no escape sequences. 45 | * Add concatenation of strings, e.g. `"Hello," + " world!"`. 46 | 47 | #### Binary Data 48 | 49 | * New primitive type that can represent arbitrary byte sequences. 50 | * Two syntactical variants that can be concatenated with each other. 51 | * Hexdumped binary, e.g. `$48656c6c6f2c20776f726c6421`. 52 | * Allows optional dots, e.g. `$48.65.6c.6c.6f.2c.20.77.6f.72.6c.64.21`. 53 | * Binary strings, e.g. `$"Hello, \x77orld!"`. 54 | * Only printable ASCII characters allowed, no control characters. 55 | * No `\uXXXX` or `\u{...}` escape sequences allowed, instead: 56 | * Add `\xXX` for arbitrary byte values. 57 | 58 | #### Unquoted Object Keys 59 | 60 | * Allow unquoted object keys, e.g. `{ foo: "Hello", bar: 42 }`. 61 | 62 | #### Trailing Comma 63 | 64 | * Allow `[1,2,3,]` and `{ foo: "Hello", bar: 42, }`. 65 | 66 | ## More information 67 | 68 | * [Specification](Specification.md) 69 | * [Discussion](Discussion.md) 70 | * [ABNF grammar](jaxn.abnf) 71 | 72 | ## Libraries implementing JAXN 73 | 74 | * [taocpp/json](https://github.com/taocpp/json) 75 | * ... 76 | 77 | Copyright (c) 2017-2018 Daniel Frey and Dr. Colin Hirsch 78 | -------------------------------------------------------------------------------- /Specification.md: -------------------------------------------------------------------------------- 1 | # Specification 2 | 3 | This document is the normative specification of JAXN. 4 | 5 | JAXN is a data representation and interchange format based on JSON. 6 | 7 | Only the differences between JAXN and JSON are specified. 8 | 9 | JSON is to be understood as the version defined in [RFC 8259](https://tools.ietf.org/html/rfc8259). 10 | 11 | * [Restrictions](#restrictions) 12 | * [Comments](#comments) 13 | * [Numbers](#numbers) 14 | * [Strings](#strings) 15 | * [Binary Data](#binary-data) 16 | * [Unquoted Names in Objects](#unquoted-names-in-objects) 17 | * [Trailing Comma](#trailing-comma) 18 | 19 | Note: The grammar rules given below are excerpts from the complete [JAXN grammar](jaxn.abnf). 20 | The JAXN grammar is based on the JSON grammar, and both are in ABNF syntax as defined in [RFC 5234](https://tools.ietf.org/html/rfc5234). 21 | 22 | ## Restrictions 23 | 24 | JAXN is mostly a superset of JSON in that every JSON text is a JAXN text that represents the same value, however the following points restrict which JSON texts are also JAXN: 25 | 26 | * A document is considered valid when it validates against the JAXN grammar *and* when all additional restrictions are met. 27 | * Duplicate names are not allowed in objects; the behaviour in the presence of duplicate names is implementation defined. 28 | * The ASCII control character `%x7F` MUST NOT appear in a JAXN text (it may be part of a string or binary value when quoted appropriately). 29 | 30 | ## Comments 31 | 32 | #### Examples 33 | 34 | * `# single-line comment` 35 | * `// single-line comment` 36 | * `/* block comment */` 37 | 38 | #### Grammar 39 | 40 | ```abnf 41 | comment = c-line / c-block 42 | 43 | c-line = c-begin-line *( c-char ) 44 | 45 | c-begin-line = %x23 / %x2F.2F ; # or // 46 | 47 | c-char = %x09 / %x20-7E / %x80-10FFFF 48 | ; Any HTAB or printable character 49 | 50 | c-block = c-begin-block 51 | *( c-no-star / ( 1*c-star c-no-star-or-slash ) ) 52 | c-end-block 53 | 54 | c-begin-block = c-slash c-star 55 | c-end-block = 1*c-star c-slash 56 | 57 | c-slash = %x2F ; / 58 | c-star = %x2A ; * 59 | 60 | c-no-star = %x09 / %x0A / %x0D / 61 | %x20-29 / %x2B-7E / %x80-10FFFF 62 | 63 | c-no-star-or-slash = %x09 / %x0A / %x0D / 64 | %x20-29 / %x2B-2E / %x30-7E / %x80-10FFFF 65 | 66 | ws = *( %x20 / ; Space 67 | %x09 / ; Horizontal tab 68 | %x0A / ; Line feed or New line 69 | %x0D / ; Carriage return 70 | comment ) ; Comment 71 | ``` 72 | 73 | #### Semantics 74 | 75 | Comments change the representation of data but have no effect on which data is represented. 76 | 77 | #### Notes 78 | 79 | Single-line comments may not contain additional control characters. 80 | A single-line comment ends at either the end of the line, or at the end of the input, whichever is encountered first. 81 | 82 | Block comments do not nest. 83 | In other words, occurrences of `/*` within a block comment are not interpreted as anything else other than part of the comment. 84 | 85 | ## Numbers 86 | 87 | #### Synopsis 88 | 89 | Allow non-finite values, hexadecimal notation of integer values, an optional leading plus sign, and relax the rules for redundant zeros. 90 | 91 | #### Examples 92 | 93 | * `42.` 94 | * `+.5` 95 | * `NaN` 96 | * `Infinity` 97 | * `-Infinity` 98 | * `0xDEADBEEF` 99 | 100 | #### Grammar 101 | 102 | ```abnf 103 | number = [ plus / minus ] ( nan / inf / hex / dec ) 104 | 105 | nan = %x4E.61.4E ; NaN 106 | 107 | inf = %x49.6E.66.69.6E.69.74.79 108 | ; Infinity 109 | 110 | hex = zero x 1*HEXDIG ; 0xXXX... 111 | 112 | dec = ( int [ frac0 ] / frac1 ) [ exp ] 113 | 114 | decimal-point = %x2E ; . 115 | 116 | digit1-9 = %x31-39 ; 1-9 117 | 118 | e = %x65 / %x45 ; e E 119 | x = %x78 / %x58 ; x X 120 | 121 | exp = e [ plus / minus ] 1*DIGIT 122 | 123 | frac0 = decimal-point *DIGIT 124 | frac1 = decimal-point 1*DIGIT 125 | 126 | int = zero / ( digit1-9 *DIGIT ) 127 | 128 | plus = %x2B ; + 129 | minus = %x2D ; - 130 | zero = %x30 ; 0 131 | ``` 132 | 133 | #### Notes 134 | 135 | JAXN adds non-finite values to the data model that can not be represented in JSON. 136 | The spelling of the identifiers is case-sensitive. 137 | JAXN allows `+NaN` and `-NaN` as alternatives for `NaN`, as well as `+Infinity` as an alternative for `Infinity`. 138 | 139 | All other extensions are a presentation detail and must not have any effect on the serialization tree, representation graph or events generated. 140 | 141 | The permissible magnitude and precision of numbers is implementation defined. 142 | It must allow at least IEEE 754 double-precision floating point numbers. 143 | 144 | ## Strings 145 | 146 | #### Synopsis 147 | 148 | Allow single-quoted strings, additional escape sequences, multiline strings, and string concatenation. 149 | 150 | #### Examples 151 | 152 | * `"Add \0 or \v, even \' is allowed in a string."` 153 | * `'That\'s right, you need to escape single-quotes in a single-quoted string.'` 154 | * `'Oh, and \" is allowed even in a single-quote string.'` 155 | * `"\u{1D11E} was my first love " + "and it will be my last."` 156 | * `"""String with a \ and " characters - no escape sequences,`
`may contain line breaks"""` 157 | 158 | #### Grammar 159 | 160 | ```abnf 161 | string = string-part *( value-concat string-part ) 162 | 163 | string-part = m-d-string / m-s-string / d-string / s-string 164 | 165 | m-d-string = 3d-quote *( m-d-char ) 3d-quote 166 | m-s-string = 3s-quote *( m-s-char ) 3s-quote 167 | 168 | m-d-char = *2d-quote ( %x09 / %x0A / %x0D / %x20-21 / %x23-7E / %x80-10FFFF ) 169 | m-s-char = *2s-quote ( %x09 / %x0A / %x0D / %x20-26 / %x28-7E / %x80-10FFFF ) 170 | 171 | d-string = d-quote *( s-char / s-quote ) d-quote 172 | s-string = s-quote *( s-char / d-quote ) s-quote 173 | 174 | s-char = unescaped / 175 | escape ( 176 | %x22 / ; " double quote U+0022 177 | %x27 / ; ' single quote U+0027 178 | %x5C / ; \ reverse solidus U+005C 179 | %x2F / ; / solidus U+002F 180 | %x30 / ; 0 nul U+0000 181 | %x62 / ; b backspace U+0008 182 | %x66 / ; f form feed U+000C 183 | %x6E / ; n line feed U+000A 184 | %x72 / ; r carriage return U+000D 185 | %x74 / ; t tab U+0009 186 | %x76 / ; v vtab U+000B 187 | %x75 4HEXDIG / ; uXXXX U+XXXX 188 | %x75 %x7B 1*HEXDIG %x7D ) 189 | ; u{X...} U+X... 190 | 191 | escape = %x5C ; \ 192 | d-quote = %x22 ; " 193 | s-quote = %x27 ; ' 194 | 195 | unescaped = %x20-21 / %x23-26 / %x28-5B / %x5D-7E / %x80-10FFFF 196 | ``` 197 | 198 | #### Notes 199 | 200 | * Each string (in a concatenation: individually) **MUST** be a sequence of Unicode characters. 201 | * `\uXXXX` with UTF-16 surrogates **MUST** be handled before concatenation. 202 | * Unpaired UTF-16 surrogates **MUST NOT** appear in the string representation. 203 | * `\u{X...}` **MUST NOT** encode surrogates (i.e. the represented string is not allowed to contain UTF-16 surrogates). 204 | * Multiline strings: 205 | * Are surrounded by three quotation marks (single or double quote) on each side and allow newline. 206 | * Can contain any sequence of non-control characters except for three matching closing quotation marks. 207 | * Do not interpret escape sequences, a backslash `\` is just a literal backslash. 208 | * A newline immediately following the opening delimiter is trimmed. 209 | * All other characters remain intact. 210 | * Concatenations can mix single- and double-quoted strings as well as single- or multiline-strings. 211 | * Concatenation is a presentation detail and must not have any effect on the serialization tree, representation graph or events generated. 212 | It happens before the final string is passed on from the parser. 213 | 214 | ## Binary Data 215 | 216 | #### Synopsis 217 | 218 | Allow binary data as a separate type, in two forms: Hexdump or string. 219 | 220 | #### Examples 221 | 222 | * `$"Hello, \x77orld!"` (binary string) 223 | * `$48656c6c6f2c20776f726c6421` (binary hex) 224 | * `$48656c6c6f.2c20.776f726c64.21` 225 | * `$48.65.6c.6c.6f.2c.20.77.6f.72.6c.64.21` 226 | 227 | #### Grammar 228 | 229 | ```abnf 230 | binary = b-value *( value-concat b-value ) 231 | 232 | b-value = dollar [ b-string / b-direct ] 233 | 234 | b-string = b-s-string / b-d-string 235 | 236 | b-d-string = d-quote *( b-char / s-quote ) d-quote 237 | b-s-string = s-quote *( b-char / d-quote ) s-quote 238 | 239 | b-char = b-unescaped / 240 | escape ( 241 | %x22 / ; " double quote 0x22 242 | %x27 / ; ' single quote 0x27 243 | %x5C / ; \ reverse solidus 0x5C 244 | %x2F / ; / solidus 0x2F 245 | %x30 / ; 0 nul 0x00 246 | %x62 / ; b backspace 0x08 247 | %x66 / ; f form feed 0x0C 248 | %x6E / ; n line feed 0x0A 249 | %x72 / ; r carriage return 0x0D 250 | %x74 / ; t tab 0x09 251 | %x76 / ; v vtab 0x0B 252 | %x78 2HEXDIG ) ; xXX 0xXX 253 | 254 | b-unescaped = %x20-21 / %x23-26 / %x28-5B / %x5D-7E 255 | 256 | b-direct = b-part *( dot b-part ) 257 | 258 | b-part = 1*b-byte 259 | 260 | b-byte = 2HEXDIG 261 | 262 | dollar = %x24 ; $ 263 | dot = %x2E ; . 264 | ``` 265 | 266 | #### Notes 267 | 268 | * Binary data represents arbitrary byte sequences (not Unicode strings). 269 | * Binary strings allow only "printable" ASCII characters, no control characters. 270 | * No `\uXXXX` or `\u{...}` escape sequences allowed, instead: 271 | * Escape sequence `\xXX` for arbitrary byte values. 272 | * Concatenations can mix single- and double-quoted binary strings as well as hexdumped data. 273 | * Concatenation is a presentation detail and must not have any effect on the serialization tree, representation graph or events generated. 274 | It happens before the final binary value is passed on from the parser. 275 | 276 | ## Unquoted Names in Objects 277 | 278 | #### Synopsis 279 | 280 | Allow identifiers as unquoted names in objects. 281 | 282 | #### Example 283 | 284 | `{ foo: "Hello", bar: 42 }` 285 | 286 | #### Grammar 287 | 288 | ```abnf 289 | member = key name-separator value 290 | 291 | key = string / identifier 292 | 293 | identifier = i-begin *i-continue 294 | 295 | i-begin = ALPHA / %x5F 296 | i-continue = i-begin / DIGIT 297 | ``` 298 | 299 | #### Notes 300 | 301 | Names in objects are strings; the tokens `true`, `null`, and `false` are unambiguous shortcuts for `"true"`, `"null"`, and `"false"` when used within an object where a name is expected. 302 | Strings in their role as names in objects can use the extended syntax for strings including the single-quoted variant, additional escape sequences, and string concatenation, however unquoted names in objects can **not** be concatenated. 303 | 304 | Unquoted names in objects are a presentation detail and must not have any effect on the serialization tree, representation graph or events generated, i.e. they are passed on as normal strings from the parser. 305 | 306 | ## Trailing Comma 307 | 308 | #### Synopsis 309 | 310 | Allow trailing commas in arrays and objects. 311 | 312 | #### Examples 313 | 314 | * `[ 1, 2, 3, ]` 315 | * `{ foo: "Hello", bar: 42, }` 316 | 317 | #### Grammar 318 | 319 | ```abnf 320 | value-sep-opt = [ value-separator ] 321 | 322 | array = begin-array 323 | [ value *( value-separator value ) value-sep-opt ] 324 | end-array 325 | 326 | object = begin-object 327 | [ member *( value-separator member ) value-sep-opt ] 328 | end-object 329 | ``` 330 | 331 | #### Semantics 332 | 333 | The additional commas have no semantics. 334 | 335 | #### Notes 336 | 337 | The above grammar does not allow for adjacent commas (`[1,,2]`), a leading comma (`[,1]`), or placing a comma in an empty array or object (`[,]`). 338 | 339 | Trailing commas are a presentation detail and must not have any effect on the serialization tree, representation graph or events generated. 340 | 341 | Copyright (c) 2017-2018 Daniel Frey and Dr. Colin Hirsch 342 | -------------------------------------------------------------------------------- /TestSuite.md: -------------------------------------------------------------------------------- 1 | # JAXN Test Suite 2 | 3 | > **Please note that this test suite does not yet exist**, this section is to discuss its development... 4 | 5 | The JAXN test suite is intended to cover all aspects and details of the JAXN encoding. 6 | A library that passes all tests can be considered JAXN compliant. 7 | The test suite contains different categories of testcases, each consisting of one or more data files. 8 | 9 | For the purpose of this document, two JSON files, or two JAXN files, are considered equivalent if they describe the same data. 10 | This can be checked by first parsing and then printing the two files with *TBD tools in taocpp/json* and checking whether the outputs are equal. 11 | 12 | ## Invalid JAXN 13 | 14 | Each testcase consists of one input file that does not conform to the JAXN standard. 15 | Reading such a file must generate an error. 16 | 17 | ## JAXN encoded JSON 18 | 19 | Each testcase consists of one valid JAXN input file, and one valid JSON reference file. 20 | The JAXN input files in this category remain within the JSON data model. 21 | The JAXN input file must be parsed and then printed to a JSON output file. 22 | The test passes if the JSON output file and the JSON reference file are equivalent. 23 | 24 | ## JAXN beyond JSON 25 | 26 | Each testcase consists of one valid JAXN input file, and one valid JAXN reference file. 27 | The JAXN input file must be parsed and then printed to a JAXN output file. 28 | The test passes if the JAXN output file and the JAXN reference file are equivalent. 29 | 30 | Copyright (c) 2017-2018 Daniel Frey and Dr. Colin Hirsch 31 | -------------------------------------------------------------------------------- /jaxn.abnf: -------------------------------------------------------------------------------- 1 | ; The JAXN grammar (https://github.com/stand-art/jaxn/) 2 | 3 | ; Based on the JSON grammar, see RFC 8259. 4 | ; See RFC 5234 for interpretation and core rules. 5 | 6 | comment = c-line / c-block 7 | 8 | c-line = c-begin-line *( c-char ) 9 | 10 | c-begin-line = %x23 / %x2F.2F ; # or // 11 | 12 | c-char = %x09 / %x20-7E / %x80-10FFFF 13 | ; Any HTAB or printable character 14 | 15 | c-block = c-begin-block 16 | *( c-no-star / ( 1*c-star c-no-star-or-slash ) ) 17 | c-end-block 18 | 19 | c-begin-block = c-slash c-star 20 | c-end-block = 1*c-star c-slash 21 | 22 | c-slash = %x2F ; / 23 | c-star = %x2A ; * 24 | 25 | c-no-star = %x09 / %x0A / %x0D / 26 | %x20-29 / %x2B-7E / %x80-10FFFF 27 | 28 | c-no-star-or-slash = %x09 / %x0A / %x0D / 29 | %x20-29 / %x2B-2E / %x30-7E / %x80-10FFFF 30 | 31 | ws = *( %x20 / ; Space 32 | %x09 / ; Horizontal tab 33 | %x0A / ; Line feed or New line 34 | %x0D / ; Carriage return 35 | comment ) ; Comment 36 | 37 | begin-array = ws %x5B ws ; [ left square bracket 38 | begin-object = ws %x7B ws ; { left curly bracket 39 | end-array = ws %x5D ws ; ] right square bracket 40 | end-object = ws %x7D ws ; } right curly bracket 41 | name-separator = ws %x3A ws ; : colon 42 | value-separator = ws %x2C ws ; , comma 43 | value-concat = ws %x2B ws ; + plus 44 | 45 | value-sep-opt = [ value-separator ] 46 | 47 | null = %x6E.75.6C.6C ; null 48 | true = %x74.72.75.65 ; true 49 | false = %x66.61.6C.73.65 ; false 50 | 51 | number = [ plus / minus ] ( nan / inf / hex / dec ) 52 | 53 | nan = %x4E.61.4E ; NaN 54 | 55 | inf = %x49.6E.66.69.6E.69.74.79 56 | ; Infinity 57 | 58 | hex = zero x 1*HEXDIG ; 0xXXX... 59 | 60 | dec = ( int [ frac0 ] / frac1 ) [ exp ] 61 | 62 | decimal-point = %x2E ; . 63 | 64 | digit1-9 = %x31-39 ; 1-9 65 | 66 | e = %x65 / %x45 ; e E 67 | x = %x78 / %x58 ; x X 68 | 69 | exp = e [ plus / minus ] 1*DIGIT 70 | 71 | frac0 = decimal-point *DIGIT 72 | frac1 = decimal-point 1*DIGIT 73 | 74 | int = zero / ( digit1-9 *DIGIT ) 75 | 76 | plus = %x2B ; + 77 | minus = %x2D ; - 78 | zero = %x30 ; 0 79 | 80 | string = string-part *( value-concat string-part ) 81 | 82 | string-part = m-d-string / m-s-string / d-string / s-string 83 | 84 | m-d-string = 3d-quote *( m-d-char ) 3d-quote 85 | m-s-string = 3s-quote *( m-s-char ) 3s-quote 86 | 87 | m-d-char = *2d-quote ( %x09 / %x0A / %x0D / %x20-21 / %x23-7E / %x80-10FFFF ) 88 | m-s-char = *2s-quote ( %x09 / %x0A / %x0D / %x20-26 / %x28-7E / %x80-10FFFF ) 89 | 90 | d-string = d-quote *( s-char / s-quote ) d-quote 91 | s-string = s-quote *( s-char / d-quote ) s-quote 92 | 93 | s-char = unescaped / 94 | escape ( 95 | %x22 / ; " double quote U+0022 96 | %x27 / ; ' single quote U+0027 97 | %x5C / ; \ reverse solidus U+005C 98 | %x2F / ; / solidus U+002F 99 | %x30 / ; 0 nul U+0000 100 | %x62 / ; b backspace U+0008 101 | %x66 / ; f form feed U+000C 102 | %x6E / ; n line feed U+000A 103 | %x72 / ; r carriage return U+000D 104 | %x74 / ; t tab U+0009 105 | %x76 / ; v vtab U+000B 106 | %x75 4HEXDIG / ; uXXXX U+XXXX 107 | %x75 %x7B 1*HEXDIG %x7D ) 108 | ; u{X...} U+X... 109 | 110 | escape = %x5C ; \ 111 | d-quote = %x22 ; " 112 | s-quote = %x27 ; ' 113 | 114 | unescaped = %x20-21 / %x23-26 / %x28-5B / %x5D-7E / %x80-10FFFF 115 | 116 | binary = b-value *( value-concat b-value ) 117 | 118 | b-value = dollar [ b-string / b-direct ] 119 | 120 | b-string = b-s-string / b-d-string 121 | 122 | b-d-string = d-quote *( b-char / s-quote ) d-quote 123 | b-s-string = s-quote *( b-char / d-quote ) s-quote 124 | 125 | b-char = b-unescaped / 126 | escape ( 127 | %x22 / ; " double quote 0x22 128 | %x27 / ; ' single quote 0x27 129 | %x5C / ; \ reverse solidus 0x5C 130 | %x2F / ; / solidus 0x2F 131 | %x30 / ; 0 nul 0x00 132 | %x62 / ; b backspace 0x08 133 | %x66 / ; f form feed 0x0C 134 | %x6E / ; n line feed 0x0A 135 | %x72 / ; r carriage return 0x0D 136 | %x74 / ; t tab 0x09 137 | %x76 / ; v vtab 0x0B 138 | %x78 2HEXDIG ) ; xXX 0xXX 139 | 140 | b-unescaped = %x20-21 / %x23-26 / %x28-5B / %x5D-7E 141 | 142 | b-direct = b-part *( dot b-part ) 143 | 144 | b-part = 1*b-byte 145 | 146 | b-byte = 2HEXDIG 147 | 148 | dollar = %x24 ; $ 149 | dot = %x2E ; . 150 | 151 | array = begin-array 152 | [ value *( value-separator value ) value-sep-opt ] 153 | end-array 154 | 155 | object = begin-object 156 | [ member *( value-separator member ) value-sep-opt ] 157 | end-object 158 | 159 | member = key name-separator value 160 | 161 | key = string / identifier 162 | 163 | identifier = i-begin *i-continue 164 | 165 | i-begin = ALPHA / %x5F 166 | i-continue = i-begin / DIGIT 167 | 168 | value = false / null / true / object / array / number / string / binary 169 | 170 | JAXN-text = ws value ws 171 | --------------------------------------------------------------------------------