├── README.md
├── newspec_notes
├── spec12
├── binary-data.html
├── contact.html
├── container-types.html
├── developer-resources.html
├── index.html
├── libraries.html
├── thanks.html
├── type-reference.html
└── value-types.html
└── spec8
├── Makefile
├── _static
└── .keep
├── conf.py
├── index.html
├── index.rst
├── libraries.rst
├── make.bat
├── spec.rst
├── tests
├── CouchDB4k.compact.json
├── CouchDB4k.formatted.json
├── MediaContent.compact.json
├── MediaContent.formatted.json
├── TwitterTimeline.compact.json
└── TwitterTimeline.formatted.json
├── thanks.rst
└── type_reference.rst
/README.md:
--------------------------------------------------------------------------------
1 | Universal Binary JSON
2 | =====================
3 |
4 | Community workspace for the [Universal Binary JSON Specification][ubjson].
5 |
6 | Introduction
7 | ============
8 |
9 | [JSON][json] has become a ubiquitous text-based file format for
10 | data interchange. Its simplicity, ease of processing and (relatively) rich data
11 | typing made it a natural choice for many developers needing to store or shuffle
12 | data between systems quickly and easy.
13 |
14 | Unfortunately, marshaling native programming language constructs in and out of
15 | a text-based representations does have a measurable processing cost associated
16 | with it.
17 |
18 | In high-performance applications, avoiding the text-processing step of JSON can
19 | net big wins in both processing time and size reduction of stored information,
20 | which is where a binary JSON format becomes helpful.
21 |
22 | Why
23 | ===
24 |
25 | Attempts to make using JSON faster through binary specifications like
26 | [BSON][bson], [BJSON][bjson] or [Smile][smile] exist, but have been rejected
27 | from mass-adoption for two reasons:
28 |
29 | * Custom (Binary-Only) Data Types:
30 | Inclusion of custom data types that have no ancillary in the original JSON
31 | spec, leaving room for incompatibilities to exist as different implementations
32 | of the spec handle the binary-only data types differently.
33 | * Complexity: Some specifications provide higher performance or smaller
34 | representations at the cost of a much more complex specification, making
35 | implementations more difficult which can slow or block adoption. One of the key
36 | reasons JSON became as popular as it did was because of its ease of use.
37 |
38 | Goals
39 | =====
40 |
41 | The Universal Binary JSON specification has 3 goals:
42 |
43 | 1. **Universal Compatibility**
44 |
45 | Meaning absolute compatibility with the JSON spec itself as well as only
46 | utilizing data types that are natively supported in all popular programming
47 | languages.
48 |
49 | This allows 1:1 transforms between standard JSON and Universal Binary JSON as
50 | well as efficient representation in all popular programming languages without
51 | requiring parser developers to account for strange data types that their
52 | language may not support.
53 |
54 | 2. **Ease of Use**
55 |
56 | The Universal Binary JSON specification is intentionally defined using a
57 | single core data structure to build up the entire specification.
58 |
59 | This accomplishes two things: it allows the spec to be understood quickly and
60 | allows developers to write trivially simple code to take advantage of it or
61 | interchange data with another system utilizing it.
62 |
63 | 3. **Speed / Efficiency**
64 |
65 | Typically the motivation for using a binary specification over a text-based
66 | one is speed and/or efficiency, so strict attention was paid to selecting data
67 | constructs and representations that are (roughly) 30% smaller than their
68 | compacted JSON counterparts and optimized for fast parsing.
69 |
70 | Got interested? Find more at [http://ubjson.org][ubjson]
71 |
72 | [ubjson]: http://ubjson.org
73 | [json]: http://json.org
74 | [bson]: http://bsonspec.org
75 | [bjson]: http://bjson.org
76 | [smile]: http://wiki.fasterxml.com/SmileFormat
77 |
--------------------------------------------------------------------------------
/newspec_notes:
--------------------------------------------------------------------------------
1 |
2 |
3 | "alpha" spec: type+count,object+array,
4 |
5 | T,F
6 | Z,N
7 | C (0-127 byte)
8 | H (string)
9 | S (string)
10 | i,U
11 | I
12 | l
13 | L
14 | d
15 | D
16 | [
17 | {
18 |
19 |
20 | https://realtimelogic.com/ba/doc/en/C/reference/html/ubjson_8h_source.html "alpha"...possibly TC swap?
21 | https://github.com/Steve132/ubj "alpha" +array draft
22 | https://bitbucket.org/tsieprawski/ubjsc "alpha"..TC swap?
23 | https://github.com/WhiZTiM/UbjsonCpp "alpha"
24 | https://github.com/dinocore1/ubjson "alpha"
25 | http://iso2mesh.sourceforge.net/cgi-bin/index.cgi?jsonlab "alpha" Incorrectly seperates #$ as independent
26 | http://libgdx.badlogicgames.com/nightlies/docs/api/com/badlogic/gdx/utils/UBJsonReader.html "alpha" +support for "a" which is wrong
27 | https://sourceforge.net/p/protoc/wiki/Home/ "pre-alpha" "Draft-9"
28 | https://github.com/Iotic-Labs/py-ubjson "pre-alpha" "Draft-9"
29 | https://github.com/dizews/php-ubjson "pre-alpha" "Draft-9"
30 | https://code.google.com/archive/p/simpleubjson/source/default/source "pre-alpha" "Draft-8/9"
31 | https://github.com/artcompiler/L16 "pre-alpha" "Draft-8"
32 | https://github.com/adilbaig/ubjsond "pre-alpha" "Draft-8"
33 | https://github.com/ubjson/universal-binary-json-java "pre-alpha" "Draft-8"
34 | http://ubjsonnet.codeplex.com/ "pre-alpha" "Draft-8"
35 | https://github.com/Sannis/node-ubjson "pre-alpha" "Draft-8"
36 |
--------------------------------------------------------------------------------
/spec12/binary-data.html:
--------------------------------------------------------------------------------
1 | Page not organized well and under development, but here are the highlights...
2 |
Overview
3 | Support for binary data in the Universal Binary JSON specification was in discussion for 2 years before it was finalized. Many, many different approaches were considered and discarded all in the name of maintaining compatibility with JSON while keeping an eye on performance.
4 |
5 | The result is a surprisingly simple and binary-efficient construct that is also easily translated to JSON and back to UBJSON again with the help of a good library, namely: a strongly-typedarray of uint8 values.
6 |
Compatibility with JSON
7 | Representing binary data efficiently in Universal Binary JSON while still maintaining compatibility with JSON is deceptively simple: leverage a strongly-typed array of uint8 values -- essentially a list of integers.
8 |
9 | There is no explicit binarytype, but instead the ability to represent binary inside of Universal Binary JSON in a very optimized and JSON-compatible construct.
10 |
11 | The #1 goal of Universal Binary JSON is compatibility with JSON. Compatibility is defined as:
12 |
if
13 | A.ubjson -> translated to -> B.json
14 | &&
15 | B.json -> translated to -> C.ubjson
16 | then
17 | A.ubjson == C.ubjson
18 | All of the Universal Binary JSON value and container types are 1:1 compatible with JSON. The only semantically (but not structurally) incompatible construct in UBJSON is strongly-typed containers in that once the container is converted to JSON the typing of the container is lost. Converting the container back to UBJSON and re-enabling the strong-typing does require assistance from the encoding library.
19 |
20 | Since JSON has no direct support for binary data or this style of strongly-typed container, the translation to JSON converts the strongly-typed array to an array of simple JSON types - in the case of binary data, it would be an array of number values (In the example above this is the translation step from A.ubjson to B.json).
21 |
22 | Going from JSON back to UBJSON (B.json -> C.ubjson) has the potential for losing the strongly-typed container information and has to be handled with care to re-enable the optimized representation of that information back in the UBJSON format.
23 |
Library Implementation Recommendation
24 | The library implementors are encouraged to provide this functionality in the form of two optional settings that can be turned on during generation:
25 |
26 |
[x] Automatically use strongly typed containers when possible
27 |
[x] Force use of strongly typed containers based on first element type
28 |
29 | [box type="info"]Specific naming and implementation is up to the developer. This is merely a suggestion on how to handle this situation as elegantly as possible for the client.[/box]
30 |
31 | The idea being that the library can either make an automated attempt at reconstructing the strongly typed containers OR if you have a lot of knowledge of your data, you can force the library to reconstitute what looks to be a strongly typed container based on the fist element type.
32 |
33 | [box type="alert"]If Force is used the library should take care to detect and fail if a different type of value is found in the container during generation. More specifically, the library should remember the first element type and continue checking types as it is generating UBJSON to ensure the type continues to stay consistent.[/box]
34 |
35 | Still under development...
36 |
Performance Considerations
37 | Something to be aware of when converting UBJSON containing a large amount of binary data is that each strongly-typed container of uint8 values will convert to a JSON array of number values, because this translation also introduces a ',' character between every value in the array, this effectively doubles the size of the binary data.
--------------------------------------------------------------------------------
/spec12/contact.html:
--------------------------------------------------------------------------------
1 | Please use the form below, email rkalla@gmail.com, post on the Google Group or file an issue on GitHub! I really would like to get any comments, questions or feedback on the specification you think is important to share. UBJSON will only be successful through the passion of many.
2 |
3 | If you are using the Universal Binary JSON format in an application we'd love to hear about it or if you wrote a library to add support for it to your favorite language please let us know and we'll add it to the site!
4 |
--------------------------------------------------------------------------------
/spec12/container-types.html:
--------------------------------------------------------------------------------
1 | The Universal Binary JSON Specification defines a total of 2 container types matching JSON's container types:
2 |
3 |
5 |
6 | Ignoring special-case optimizations, the design of the Universal Binary JSON containers is intentionally identical to JSON (the same start/end markers) and are streaming-friendly; more specifically they can be written out on-demand without knowing the size of the container ahead of time.
7 |
Optimized Format
8 | Both array and object container types in UBJSON support being represented in a more optimized format that can increase parsing performance as well as shrink data size in most cases (without compression).
9 |
10 | Please see Optimized Format below for details on how to leverage this support.
11 |
114 | [box type="info"]NOTE: The [S] (string) marker is omitted from each of the names in the name/value pairings inside the object. The JSON specification does not allow non-stringname values, therefore the [S] marker is redundant and must not be used.[/box]
115 |
126 | While the basic specification for the array and object types are identical to the JSON specification (i.e. simple beginning and end markers), both containers support optional parameters that can help optimize the container for better parsing performance and smaller size.
127 |
128 | At a very high level, the optimized format for both array and object container types are built around two optional parameters: type and count
129 |
159 | The effect on the container when specifying one or both parameters is as follows:
160 |
161 |
type [$] - when a type is specified, all value types stored in the container (either array or object) are considered to be of that singular type and as a result, type markers are omitted for each value in the container. This can be thought of providing the ability to create a strongly typed container in UBJSON.
162 |
163 |
If a type is specified, it must be done so before a count.
164 |
If a type is specified, a count must be specified as well (otherwise it is impossible to tell when a container is ending; e.g., did you just parse ']' or the int8 value of 93?)
165 |
166 |
167 |
count [#] - when a count is specified, the parser is able to know ahead of time how many child elements will be parsed. This allows the parser to pre-size any internal construct used for parsing, verify that the promised number of child values were found and avoid scanning for any terminating bytes while parsing.
168 |
169 |
A count can be specified without a type.
170 |
171 |
172 |
173 | [box type="info"]NOTE: Yes it is possible for an array or object to define their type as '[' or '{' to signal that they themselves contain additional containers![/box]
174 |
175 | [box type="download"]BONUS: Parsers can provide highly-optimized implementations for strongly typed containers of non-variable-length types (e.g. numeric, boolean, etc.) because the exact byte-length of the data is known![/box]
176 |
177 | Some rules that generators and parsers need to be aware of when dealing with these optional parameters is as follows:
178 |
179 |
[count] A count must be >= 0.
180 |
[count] A countcan be specified by itself.
181 |
[count] If a count is specified the container must not specify an end-marker.
182 |
[count] A container that specifies a countmust contain the specified number of child elements.
183 |
[type] If a type is specified, it must be done so before count.
184 |
[type] If a type is specified, a countmust also be specified. A type cannot be specified by itself.
185 |
[type] A container that specifies a typemust not contain any additional type markers for any contained value.
186 |
[type] The typecannot be No-op. Indeed, creating a container whose type is “nothing” (which is what No-op actually is) does not really mean anything.
187 |
188 |
189 |
Array Example
190 | Below are examples of incrementally more optimized representations of an array in UBJSON.
191 |
[[][#][i][5] // An array of 5 elements.
202 | [d][29.97]
203 | [d][31.13]
204 | [d][67.0]
205 | [d][2.113]
206 | [d][23.8889]
207 | // No end marker since a count was specified.
208 |
Optimized with type & count
209 |
[[][$][d][#][i][5] // An array of 5 float32 elements.
210 | [29.97] // Value type is known, so type markers are omitted.
211 | [31.13]
212 | [67.0]
213 | [2.113]
214 | [23.8889]
215 | // No end marker since a count was specified.
216 |
217 |
Object Example
218 | Below are examples of incrementally more optimized representations of an object in UBJSON.
219 |
220 | [box type="info"]Remember, in UBJSON the string markers ([S]) are omitted from the names in the name-value pairs of an Object because JSON only allows names of type string.[/box]
221 |
[{][#][i][3] // An object of 3 name:value pairs.
230 | [i][3][lat][d][29.976]
231 | [i][4][long][d][31.131]
232 | [i][3][alt][d][67.0]
233 | // No end marker since a count was specified.
234 |
Optimized with type & count
235 |
[{][$][d][#][i][3] // An object of 3 name:float32-value pairs.
236 | [i][3][lat][29.976] // Value type is known, so type markers are omitted.
237 | [i][4][long][31.131]
238 | [i][3][alt][67.0]
239 | // No end marker since a count was specified.
240 |
241 |
Special Cases (Null and Boolean)
242 | Up until now all the examples of leveraging type and count have illustrated the benefit of optimizing out the markers from value types that have a data payload (e.g. numeric values, strings, etc.); since the type of all the values are known, the markers are easily omitted. There are, however, a few special value types that have no data payload and the markers themselves represent the value, specifically: null and boolean (no-op is not a valid type for a container).
243 |
244 | This section will take a look at how those types behave when used with strongly-typed containers.
245 |
246 | At a high level, placing these values in a strongly-typed container provides the basic behavior of essentially pre-defining the value for every element in the container. In the case of and array, all the values contained in it. In the case of an object, all the values associated with all the names in the name-value pairs.
247 |
Array
248 |
[[][$][F][#][I][512] // 512 'false' values.
249 | The example above is a strongly typed array of typefalse and with a count of 512.
250 |
251 | This simple declaration is equivalent to a 514-bytearray containing 512 [F] markers; instead this single line is 6-bytes providing a 99% size reduction.
252 |
253 | Admittedly this is a selective example of leveraging this feature, but the point is that there are potentially very large performance and size optimizations available if your data can take advantage of this shorthand.
254 |
255 | [box type="info"]Strongly-typed arrays of null and booleanmust have an empty body. The header itself defines the container's contents.[/box]
256 |
Object
257 |
[{][$][Z][#][i][3]
258 | [i][4][name] // name only, no value specified.
259 | [i][8][password]
260 | [i][5][email]
261 | The example above is a strongly typed object of typenull and with a count of 3.
262 |
263 | When used in the context of an object, specifying one of these special-case values as a type has the effect of setting the default value for every name-value pair in the object; therefore the object only contains the names of all the pairs.
264 |
265 | In the case of objects the space-savings is typically a little less drastic than in the array case depending on the size of the names; in the case of small names, it could be significant, approaching a 50% reduction.
266 |
267 | [box type="info"]Strongly-typed objects of null and booleanmust not have any values specified in the body, just the name portions of the name-value pairs. The header itself defines the value for every name-value pair.[/box]
268 |
269 |
270 |
276 | The benefits realized by leveraging the optimized container types in UBJSON depend heavily on the data being stored and the implementation of the generator or parser. Baring the frustration of "it depends" as an answer, the benefits can be viewed at a very high level as the following:
277 |
278 |
Optimized for Parsing
279 | By specifying a count, you are hinting to the parser about the number of elements to expect. The performance gains are primarily around allowing the parser to pre-size its internal data structures to exactly the right size to hold pointers to the parsed values.
280 |
281 | By specifying a type and count, the parser not only knows how many child elements to expect, as well as less data to parse and less conditions to run (no marker checks), but in the cases of fixed-length values, the parser knows the exact byte length of the payload!
282 |
283 | For example, consider:
284 |
290 | After the parser parses the container's header, it knows the byte length of the entire payload is 4096 and in a single read operation can read all the values in and quickly break them up into their int32 representations.
291 |
292 | When you are able to leverage the type and count together to help the parser understand the payload in more detail is where the real performance gains come from.
293 |
294 |
Simple Validation Mechanism
295 | By specifying a count parameter, you are telling the parser the number of child elements it should find in the container. In the case where the parser is unable to find the specified number of child elements it can quickly report a format error to the caller.
296 |
297 | This is a very simple version of verification and not as robust as say a checksum-based approach, but it still provides benefit in addition to a performance gain.
298 |
299 |
Reduce Size up to 50%
300 | This is a 1-byte-per-value reduction in any container where strong typing is used.
301 |
302 | In the case of containers holding large amounts of fairly compact data (small numbers, chars, small strings or value-types like null), removing the type marker from the beginning of each of the values in the container can almost cut the size requirements for the data in half.
303 |
304 | The smaller the containers and bigger the individual values are (large numbers, large strings) the less size benefit this optimization will have, but it still provides a potentially significant opportunity to the parser to optimize it's code paths for parsing large chunks of same-type values (and not needing to worry about type changes mid-container). This is covered in more detail in the previous section: Optimized for Parsing
305 |
306 |
Binary Data Support
307 | This section is here for referential convenience; please see Binary Data for information on storing binary data in UBJSON.
308 |
--------------------------------------------------------------------------------
/spec12/developer-resources.html:
--------------------------------------------------------------------------------
1 | This page contains information for developers looking to develop a Universal Binary JSON library.
2 |
8 | Libraries implementing the Universal Binary JSON spec must adhere to the following guidelines:
9 |
10 |
Parsers must follow a "writer-makes-right" policy - more specifically, if a parser encounters unexpected or invalid data (e.g. negative container length value) an exception should be thrown and parsing stopped.
18 | Through work with the community, feedback from others and our own experience with the specification, below are some of the best-practices collected into one place making it easy for folks working with the format to find answers to the more flexible portions of the spec.
19 |
Optimizing Container Performance
20 | [box type="tick"]Why: (Potentially large) data size reduction and parsing performance increase.
21 | How: Homogeneous data type in a container.[/box]
22 |
23 | Very large performance advantages are available when writing out ARRAY or OBJECT containers that contain same-type values. Be sure to read through the optimized container format that can be leveraged in these cases.
24 |
25 | A typical level of optimization is being able to omit all the marker characters for all same-typed values in a container, making the sizes of all typical value types 1-byte smaller.
26 |
27 | An a-typical level of optimization, that leads to the biggest reduction, is for all 1-byte value types (e.g. NO-OP, NULL, etc); when used in conjunction with the optimized container format, the values themselves can be omitted from the container entirely leading to a space savings that approaches 100% as the size of the container grows.
28 |
Using Smallest Number Representation
29 | [box type="tick"]Why:~50% size reduction for numbers > 5 digits and < 20 digits.
30 | How: Always use the most compact numeric type possible when writing UBJSON.[/box]
31 |
32 | Numeric values can be represented in a number of ways in UBJSON; you can reduce the size of your UBJSON by inspecting the stored value and ensuring it is represented in the most-compact numeric representation possible when storing the UBJSON blob.
33 |
34 | Keep in mind that varying the type of values inside of a container may impact your ability to use the type parameter to optimize container storage.
35 |
Handling High-Precision Numbers on Unsupported Platforms
36 | [box type="tick"]Why: Cleanly handle > 64-bit numbers on platforms that don't support them.
37 | How: By using the high-precision type.[/box]
38 |
39 | Not every language supports arbitrarily long numbers and some not even numbers greater than 64-bits in size. In order to safely allow the transport and handling of > 64-bit numbers across every platform, UBJSON provides the high-precision numeric type.
40 |
41 | The high-precision type is a string-based type (identical in format to the string type) that provides a universally compatible mechanism by which arbitrarily large or precise numbers can be handled.
42 |
43 | For platforms with arbitrarily large/precise number support, they are free to parse the high-precision value into a native type; for platforms without support, the high-precision value can be safely passed on, persisted to storage or handled in other non-numeric ways while still allowing the client to handle the request and not overflow or otherwise balk at the unsupported numeric type.
44 |
45 | That said, for libraries written to support platforms that do not natively support arbitrarily large or precise values, the following guidance can be employed to provide a safe and consistent behavior when encountering them:
46 |
47 |
[Default] Exception/Error: Throw an exception(or return an error) when an unsupported high-precision value is encountered during parsing. The platform doesn't support them so allow the client a chance to be aware of the fact that it is receiving data it won't know how to parse into a native type.
48 |
[Optional] Handle as a String: (must be user-enabled) In the case where the client doesn't need to do any processing of the value and is just doing pass-through like persisting it to a data store, treat the high-precision value as a string and return it to the caller.
49 |
[Optional] Skip: (must be user-enabled) Provide the ability for the parser to optionally skip unsupported values during parsing. Be aware that this is a dangerous approach and will likely lead to data loss (skipped values won't be visible to the client), but in the case where a client must be able to parse any and all UBJSON it received even if it doesn't support arbitrarily large or precise numbers, then this has to be considered.
50 |
51 | These guidelines should provide the most functional experience for a client to work with UBJSON on their platform of choice.
52 |
Example Files
53 | [box type="alert"]Example files below only support Draft 8[/box]
54 |
55 | You can find files to test your implementation with here. There are formatted-json, compacted-json and UBJ versions of each of the testing files contained in the repository.
56 |
57 | The simple Java classes that have matching names to the UBJ files are Java class representations of the files (for Java testing) and the Marshaller classes are the hand-coded serialization and deserialization code used to write out and read in those test files from UBJ format.
58 |
59 | Even if you are not working in Java, you can use those classes as a high level guide if you are curious or ignore them completely and just test against the raw file resources.
--------------------------------------------------------------------------------
/spec12/index.html:
--------------------------------------------------------------------------------
1 |
2 |
31 |
32 |
33 |
34 | The Universal Binary JSON Specification is licensed under the Apache 2.0 License.
35 |
36 | Use of the spec, either as-defined or a customized extension of it, is intended to be commercial-friendly.
37 |
38 | The ultimate purpose of this specification is to provide a useful tool for software developers to leverage in any way they see fit.
39 |
Why
40 |
41 |
42 |
43 | JSON has become a ubiquitous text-based file format for data interchange. Its simplicity, ease of processing and (relatively) rich data typing made it a natural choice for many developers needing to store or shuffle data between systems quickly and easy.
44 |
45 | Unfortunately, marshalling native programming language constructs in and out of a text-based representations does have a measurable processing cost associated with it.
46 |
47 | In high-performance applications, avoiding the text-processing step of JSON can net big wins in both processing time and size reduction of stored information, which is where a binary JSON format becomes helpful.
48 |
49 | Attempts to make using JSON faster through binary specifications like BSON, BJSON or Smile exist, but have been rejected from mass-adoption for two reasons:
50 |
51 |
Custom (Binary-Only) Data Types: Inclusion of custom data types that have no ancillary in the original JSON spec, leaving room for incompatibilities to exist as different implementations of the spec handle the binary-only data types differently.
52 |
Complexity: Some specifications provide higher performance or smaller representations at the cost of a much more complex specification, making implementations more difficult which can slow or block adoption. One of the key reasons JSON became as popular as it did was because of its ease of use.
53 |
54 | BSON, for example, defines types for binary data, regular expressions, JavaScript code blocks and other constructs that have no equivalent data type in JSON. BJSON defines a binary data type as well, again leaving the door wide open to interpretation that can potentially lead to incompatibilities between two implementations of the spec and Smile, while the closest, defines more complex data constructs and generation/parsing rules in the name of absolute space efficiency. These are not short-comings, just trade-offs the different specs made in order to service specific use-cases.
55 |
56 | The existing binary JSON specifications all define incompatibilities or complexities that undo the singular tenet that made JSON so successful: simplicity.
57 |
58 | JSON's simplicity made it accessible to anyone, made implementations in every language available and made explaining it to anyone consuming your data immediate.
59 |
60 | Any successful binary JSON specification must carry these properties forward for it to be genuinely helpful to the community at large.
61 |
62 | This specification is defined around a singular marker-based construct used to build up and represent JSON values and objects. Reading and writing the format is trivial, designed with the goal of being understood in under 10 minutes (likely less if you are very comfortable with JSON already).
63 |
64 | [box type="info"]TIP: UBJSON is built exclusively out of marker-characters like 'C' (for CHAR), 'S' (for STRING), etc. followed by either the payload itself, or a length and then the payload... that's it![/box]
65 |
66 | Fortunately, while the Universal Binary JSON specification carries these tenets of simplicity forward, it is also able to take advantage of optimized binary data structures that are (on average) 30% smaller than compacted JSON and specified for ultimate read performance; bringing simplicity, size and performance all together into a single specification that is 100% compatible with JSON.
67 |
Why not JSON+gzip?
68 | On the surface simply gzipping your compacted JSON may seem like a valid (and smaller) alternative to using the Universal Binary JSON specification, but there are two significant costs associated with this approach that you should be aware of:
69 |
70 |
Lack of data clarity and inability to inspect it directly.
72 |
73 | While gzipping your JSON will give you great compression, about 75% on average, the overhead required to read/write the data becomes significantly higher.
74 |
75 | Additionally, because the binary data is now in a compressed format you can no longer open it directly in an editor and scan the human-readable portions of it easily; which can be important during debugging, testing or data verification and recovery.
76 |
77 | Utilizing the Universal Binary JSON format will typically provide a 30% reduction in sizeand store your data in an optimized format offering you much higher performance while still allowing you to open the file directly and read through it.
78 |
79 | If you had a usage scenario where your data is put into long-term cold storage and pulled out in large chunks for processing, you might even consider gzipping your Universal Binary JSON files, storing those, and when they are pulled out and unzipped, you can then process them with all the speed advantages of UBJSON.
80 |
81 | As always, deciding which approach is right for your project depends heavily on what you need.
82 |
Meaning absolute compatibility with the JSON spec itself as well as only utilizing data types that are natively supported in all popular programming languages.
90 |
This allows 1:1 transforms between standard JSON and Universal Binary JSON as well as efficient representation in all popular programming languages without requiring parser developers to account for strange data types that their language may not support.
91 | 2. Ease of Use
92 |
The Universal Binary JSON specification is intentionally defined using a single core data structure to build up the entire specification.
93 |
This accomplishes two things: it allows the spec to be understood quickly and allows developers to write trivially simple code to take advantage of it or interchange data with another system utilizing it.
94 | 3. Speed / Efficiency
95 |
Typically the motivation for using a binary specification over a text-based one is speed and/or efficiency, so strict attention was paid to selecting data constructs and representations that are (roughly) 30% smaller than their compacted JSON counterparts and optimized for fast parsing.
96 |
97 |
Data Format
98 |
99 |
100 |
101 | The Universal Binary JSON specification utilizes a single construct with two optional segments (length and data) for all types:
102 |
103 | Each element in the tuple is defined as:
104 |
105 |
type
106 |
107 |
A 1-byte ASCII char used to indicate the type of the data following it.
108 |
109 |
110 |
length (OPTIONAL)
111 |
112 |
A positive, integer numeric type (int8, uint8, int16, int32, int64) specifying the length of the following data payload.
113 |
114 |
115 |
data (OPTIONAL)
116 |
117 |
A run of bytes representing the actual binary data for this type of value.
118 |
119 |
120 |
121 | Some value are simple enough that just writing the 1-byte ASCII marker into the stream is enough to represent the value (e.g. null) while others have a type that is specific enough that no length is needed as the length is implied by the type (e.g. int32) while others still require both a type and a length to communicate their value (e.g. string).
122 |
Types
123 | Universal Binary JSON defines a number of Value Types and Container Types that map directly to JSON's types. For the most part the correlation is 1:1 except in the case of numeric types where UBJSON defines many more specific types of number storage and representation than JSON's single number type.
124 |
133 |
134 |
135 |
136 | The Universal Binary JSON specification tries to strike the perfect balance between space savings, simplicity and performance.
137 |
138 | Data stored using the Universal Binary JSON format are on average 30% smaller as a rule of thumb. As you can see from some of the examples in this document though, it is not uncommon to see the binary representation of some data lead to a 50% or 60% size reduction without compression.
139 |
140 | The size reduction of your data depends heavily on the type of data you are storing. It is best to do your own benchmarking with a comprehensive sampling of your own data.
141 |
142 | [box type="note"]The Universal Binary JSON specification does not use compression algorithms to achieve smaller storage sizes. The size reduction is a side effect of the efficient binary storage format.[/box]
143 |
Size Reduction Tips
144 | The amount of storage size reduction you'll experience with the Universal Binary JSON format will depend heavily on the type of data you are encoding.
145 |
146 | Some data shrinks considerably, some mildly and some not at all, but in every case your data will be stored in a much more efficient format that is faster to read and write.
147 |
148 | Below are pointers to give you an idea of how certain data may shrink in this format:
149 |
150 |
null, true and false values will be 75% smaller (80% in the case of false)
151 |
Large numeric values (> 5 digits < 20 digits) will be 50% smaller.
152 |
array and object containers will be 1-byte-per-value smaller.
153 |
Leveraging the optimized container format can lead to a significant size reduction in environments where container data is of the same type.
154 |
string values are 2-10 bytes bigger per string (depending on the length of the string being represented by the smaller integer numeric type).
155 |
156 | One of the great things about the Universal Binary JSON format is that even though most all your data will be represented in a smaller footprint, you still get two big wins:
157 |
158 |
A smaller data format means faster writes and smaller reads. It also means less data to process when parsing.
159 |
Binary format means no encoding/decoding primitive values to text and no parsing primitive values from text.
160 |
161 |
Endianness
162 |
163 |
164 |
165 | The Universal Binary JSON specification requires that all numeric values be written in Big-Endian order.
166 |
MIME Type
167 |
168 |
169 |
170 | The Universal Binary JSON specification is a binary format and recommends using the following mime type:
171 | [box type="info" border="full" icon="none"]application/ubjson[/box]
172 |
173 | This was added directly to the specification in hopes of avoiding similar confusion with JSON.
174 |
File Extension
175 |
176 |
177 |
178 | "ubj" is the recommended file extension when writing out files using the Universal Binary JSON format (e.g. "user.ubj").
179 |
180 | The extension stands for "Universal Binary JSON" and has no known conflicting mappings to other file formats.
181 |
Requests for Enhancement (RFE)
182 |
183 |
184 |
185 | All (proposed) changes to the specification are being tracked in GitHub.
--------------------------------------------------------------------------------
/spec12/libraries.html:
--------------------------------------------------------------------------------
1 | Below are a list of libraries, by language, that implement the Universal Binary JSON Specification.
2 |
3 |
4 |
5 |
58 |
--------------------------------------------------------------------------------
/spec12/thanks.html:
--------------------------------------------------------------------------------
1 | Universal Binary JSON was originally motivated by a desire to provide an on-disk & over-the-wire format that required no parsing or marshalling in CouchDB (inspiration). In its original draft form, UBJSON was much too simple of a spec with too many holes but over the next number of years and only with the help of the following people (among many others) did the spec grow up.
2 |
3 | I want to express my personal thanks to each one of you for all the help you lent at the different stages of UBJSON's development (and continue to provide in some cases).
4 |
5 | Sincerely, Riyad Kalla
6 |
7 |
8 |
9 | Adil Baig
10 |
Adil has been very involved in the in-depth and multi-year long discussions surrounding a more optimized container specification as well as binary data support. Adil also provided a very compelling, diff-typing proposal for an optimized container format that provided a lot of good guidance around elegant alternatives to consider.
Helped catch a number of specification errors around UTF-8 encoding in the original draft of the specification that would have been confusing/nasty to release. He also provided great feedback about the size and performance metrics for the specification.
Alex is both the author of the UBJSON Python library and a valued collaborator on the Universal Binary JSON spec as it matured. Alex provided instrumental insight into the modifications made between Draft 8 and Draft 9 of the spec to help simplify the spec by removing all the duplicate (compact) type representations, simplifying the length-arguments for STRING and HUGE as well as being the one to point out that the length arguments for the ARRAY and OBJECT container types are effectively useless once the streaming-format support was added (and do not make generator code or parsing code any easier or more performant).
Bjørn has been involved in most all of the binary data support discussions that have taken place since 2012. His detail-oriented contributions helped move the discussion forwad.
John was the one that recommended using UTF-8 string-encoded values (or huge) for arbitrarily huge numbers after seeing my desire to avoid including any non-portable constructs into the binary format.
19 |
Given that the discussion on numeric formats had been a very active one with lots of feelings on all sides, it was a boon to have John step up with such a simple suggestion that allowed for maximum compatibility and portability. It was a win-win all the way around.
Michael is the author behind the Ubjson.NET library and contributor of the int16 and float numeric types to the specification. For numeric-heavy (e.g. scientific) data, the inclusions of the in16 and float types can lead to significant space savings when writing out values in the Universal Binary JSON format.
22 |
Michael has also gone to great lengths to make the .NET implementation of UBJSON as tight and performant as possible; collaborating on benchmark design and testing data as well as compatibility testing between implementations to ensure a great Universal Binary JSON experience for .NET developers.
23 |
In addition to development, Michael has helped contribute to the growth of the Universal Binary JSON community with articles about the specification.
While approaching the CouchDB team for feedback on the Universal Binary JSON spec, I met Paul who was willing to spend a significant amount of time reviewing the specification and recommending suggestions, changes and improvements from everything the CouchDB team has learned by dealing closely with JSON for years.
26 |
Paul pointed out the shortcomings of prefixing the length to the two container types if the specification could ever be used easily with services or apps that streamed UBJ format for huge runs of data that the server couldn't load, buffer and count ahead of time before responding to the client. In order to more easily support streaming, unknown-length container types had to be added.
27 |
Paul also pointed out the importance of a NO_OP/SKIP/IGNORE type that can be useful during a long-lived streaming operation where the server may be waiting on something (like a DB) and you need to keep the connection alive between client/server and avoid the client timing out, but you need the client to know the data it is receiving is just meant as a "Hang on" message from the server and not actual data. This is where the NO_OP command comes in handy.
Stephan helped quite a bit with understanding the implications of a >= 64-bit numeric format and the implications of portability across a number of popular platforms.
I would like to personally thank everyone in the JSON Specification Group. The amount of feedback and help with the specification has been wonderful, constructive and creative. It also lead to one of the busiest conversations in the last year!
--------------------------------------------------------------------------------
/spec12/type-reference.html:
--------------------------------------------------------------------------------
1 | The table below is a quick-reference for folks working closely with the Universal Binary JSON format that want all the information at their finger tips:
2 |
135 | Below is an example of what a common JSON response would look like in UBJSON. This particular example was taken from the GitHub developer docs.
136 |
137 | JSON Response
138 |
--------------------------------------------------------------------------------
/spec12/value-types.html:
--------------------------------------------------------------------------------
1 | The Universal Binary JSON Specification defines a total of 13 value types (to JSON's 5 value types).
2 |
3 | The reason for the increased number of value types is because UBJSON defines 8 numeric value types (to JSON's 1) allowing for highly optimized storage/retrieval of numeric values depending on the necessary precision; in addition to a number of other more optimized representations of JSON values.
4 |
5 | The specifications for each of the Universal Binary JSON Specification value types are below.
6 |
7 |
63 | The no-op value in Universal Binary JSON is defined as:
64 |
65 |
66 |
67 |
Type
68 |
Size
69 |
Marker
70 |
Length
71 |
Data Payload
72 |
73 |
74 |
75 |
76 |
noop
77 |
1-byte
78 |
N
79 |
No
80 |
No
81 |
82 |
83 |
84 |
Usage
85 | The intended usage of the no-op value is as a valueless signal between a producer (most likely a server) and a consumer (most likely a client) to indicate activity; for example, as a keep-alive signal so a client knows a server is still working and hasn't hung or timed out.
86 |
87 | There is no equivalent to no-op value in the original JSON specification.
88 |
89 | The NO-OP value is meant to be a valueless value; meaning it can be added to the elements of a container and when parsed by the receiver, the no-op values are simply skipped and carry know meaningful value with them.
90 |
91 | For example, the two following array elements are considered equal (using JSON format for readability):
92 |
95 | There are a number of interesting advantages to having a valueless-value defined directly in the spec.
96 |
Example
97 | Consider a web service that performs an expensive operation that can take quite a while (let's say 5 minutes):
98 |
<start response>
99 | [N]
100 | <10 second delay>
101 | [N]
102 | <10 second delay>
103 | [N]
104 | <10 second delay>
105 | <...receiving data...>
106 | <10 second delay>
107 | [N]
108 | <10 second delay>
109 | [N]
110 | <...receiving remainder of data...>
111 | <end response>
112 | Most clients by default will timeout after 60 seconds and more aggressive clients will timeout even faster. To help let clients know that the server has not hung, is still alive and is still processing the request the server can reply at some determined interval (e.g. every X seconds) with the no-op value and the client can parse it, acknowledge it and reset its timeout-disconnect timer as a result.
113 |
114 | Another example of leveraging no-op in an interesting way is modeling an efficient delete operation for UBJSON on-disk when elements of a container are removed. Instead of reading the entire container, removing the elements and writing the whole thing out again, no-op bytes can simply be written over the records that were removed from the containers. When the record is parsed, it is semantically identical to a container without the values.
115 |
116 | These are just a few examples of how you can leverage the no-op value.
117 |
181 | There are 8 numeric types in Universal Binary JSON and are defined as:
182 |
183 |
184 |
185 |
Type
186 |
Size
187 |
Marker
188 |
Length
189 |
Data Payload
190 |
191 |
192 |
193 |
194 |
int8
195 |
2-bytes
196 |
i
197 |
No
198 |
Yes
199 |
200 |
201 |
uint8
202 |
2-bytes
203 |
U
204 |
No
205 |
Yes
206 |
207 |
208 |
int16
209 |
3-bytes
210 |
I
211 |
No
212 |
Yes
213 |
214 |
215 |
int32
216 |
5-bytes
217 |
l
218 |
No
219 |
Yes
220 |
221 |
222 |
int64
223 |
9-bytes
224 |
L
225 |
No
226 |
Yes
227 |
228 |
229 |
float32
230 |
5-bytes
231 |
d
232 |
No
233 |
Yes
234 |
235 |
236 |
float64
237 |
9-bytes
238 |
D
239 |
No
240 |
Yes
241 |
242 |
243 |
high-precision number
244 |
1-byte + int num val + string byte len
245 |
H
246 |
Yes
247 |
Yes (if non-empty)
248 |
249 |
250 |
251 | In JavaScript (and JSON) the Number type can represent any numeric value, while in most other languages multiple (discrete) numeric types exist to describe different sizes and types of numeric values; this allows the runtime to handle numeric operations more efficiently.
252 |
253 | In order for the Universal Binary JSON specification to be a performant alternative to JSON, support for these most common numeric types had to be added to allow for more efficient reading and writing of numeric values.
254 |
255 | Trying to maintain a single numeric type in UBJSON would have lead to parsing complexity, requiring each language to further inspect the numeric value and marshall it down to the most appropriate internal type. By pre-defining these different numeric types directly in UBJSON, it allows for either a direct conversion into a native language type (e.g. Java) or a straight forward marshaling into the nearest-supported language type (e.g. Erlang).
256 |
Usage
257 | The intended usage of the different numeric types are to efficiently store numbers in a space and encoding-optimized format.
258 |
259 | [box type="info"]It is always recommended to use the smallest numeric type that fits your needs. For data with a large amount of numeric data, this can cut down the size of the payloads significantly (on average a 50% reduction in size).[/box]
260 |
351 | While almost all languages native support 64-bit integers, not all do (e.g. C89 and JavaScript (yet)) and care must be taken when encoding 64-bit integer values into binary JSON then attempting to decode it on a platform that doesn’t support it.
352 |
353 | If you are fully aware of the platforms and runtime environments your binary JSON is being used on and know they all support 64-bit integers, then you are fine.
354 |
355 | If you are trying to deserialize 64-bit integers in a client’s browser in JavaScript or another environment that does not support 64-bit integers, then you will want to take care to skip them in the input or have the client producing them encode them as double or high-precision values if that is easier to handle.
356 |
357 | Alternatively you might consider encoding your 64-bit values as doubles if you know you are going from the server to a client JavaScript environment with the binary-encoded information.
358 |
High-Precision Numbers (Larger than 64-bit)
359 | The high-precision number type is an ultra-portable mechanism by which arbitrarily large (or precise) numbers, greater than 64-bit in size, are encoded as a UTF-8 string and passed between systems that support them. This allows high-precision number values to degrade gracefully on systems that do not have a built-in type to support numeric values larger than 64-bit. Please refer to the Best Practices page for techniques on working around the lack of larger-than-64-bit numeric types on certain platforms if you need them.
360 |
361 |
362 | high-precision number values must be written out in accordance with the original JSON number type specification.
363 |
379 | The size of the high-precision number type "on-disk" follows the same structure and sizing of the string type (see Storage Size section).
380 |
381 | All other numeric types storage size is reflected at the beginning of this section as well as in the Type Reference table.
382 |
392 | The char type in Universal Binary JSON is defined as:
393 |
394 |
395 |
396 |
Type
397 |
Size
398 |
Marker
399 |
Length
400 |
Data Payload
401 |
402 |
403 |
404 |
405 |
char
406 |
2-bytes
407 |
C
408 |
No
409 |
Yes
410 |
411 |
412 |
413 |
Usage
414 | The char type in Universal Binary JSON is an unsigned byte meant to represent a single printable ASCII character (decimal values 0-127). Put another way, the char type represents a single-byte UTF-8 encoded character.
415 |
416 | [box type="note"]The char type is synonymous with 1-byte, UTF8 encoded value (decimal values 0-127). A char value must not have a decimal value larger than 127.[/box]
417 |
418 | The char type is functionally identical to the uint8 type, but semantically is meant to represent a character and not a numeric value.
419 |
475 | The JSON specification does not dictate a specific required encoding, it does however use UTF-8 as the default encoding.
476 |
477 | The Universal Binary JSON specification dictates UTF-8 as the required string encoding (this includes the high-precision number type as it is a string-encoded value). This will allow you to easily exchange binary JSON between open systems that all support and follow this encoding requirement as well as providing a number of advantages and optimizations.
478 |
Storage Size
479 | The size of the string type varies depending on two things:
480 |
481 |
The integral numeric type used to describe the length of the string (e.g. int8, in16, int32 or int64)
482 |
The UTF-8 encoded size, in bytes, of the string.
483 |
484 | For example, English typically uses 1-byte per character, so the string “hello” has a length of 5. The same string in Russian is “привет” with a byte length of 12 and in Arabic the text becomes “مرحبا” with a byte length of 10.
485 |
486 | Here are some examples of what different string values look like to illustrate the point:
487 |
488 |
489 |
490 |
Binary Representation
491 |
Description
492 |
493 |
494 |
495 |
496 |
[S][i][5][hello]
497 |
8 bytes, string UTF-8 "hello" (English)
498 |
499 |
500 |
[S][i][12][привет]
501 |
15 bytes, string UTF-8 "hello" (Russian)
502 |
503 |
504 |
[S][i][10][مرحبا]
505 |
13 bytes, string UTF-8 "hello" (Arabic)
506 |
507 |
508 |
[S][I][1024][...1k long string...]
509 |
1 + 3 + 1024 bytes = 1028 bytes total
510 |
511 |
512 |
513 |
Binary Data
514 |
515 |
516 |
517 | Please see the Binary Data page...
--------------------------------------------------------------------------------
/spec8/Makefile:
--------------------------------------------------------------------------------
1 | # Makefile for Sphinx documentation
2 | #
3 |
4 | # You can set these variables from the command line.
5 | SPHINXOPTS =
6 | SPHINXBUILD = sphinx-build
7 | PAPER =
8 | BUILDDIR = _build
9 |
10 | # Internal variables.
11 | PAPEROPT_a4 = -D latex_paper_size=a4
12 | PAPEROPT_letter = -D latex_paper_size=letter
13 | ALLSPHINXOPTS = -d $(BUILDDIR)/doctrees $(PAPEROPT_$(PAPER)) $(SPHINXOPTS) .
14 | # the i18n builder cannot share the environment and doctrees with the others
15 | I18NSPHINXOPTS = $(PAPEROPT_$(PAPER)) $(SPHINXOPTS) .
16 |
17 | .PHONY: help clean html dirhtml singlehtml pickle json htmlhelp qthelp devhelp epub latex latexpdf text man changes linkcheck doctest gettext
18 |
19 | help:
20 | @echo "Please use \`make ' where is one of"
21 | @echo " html to make standalone HTML files"
22 | @echo " dirhtml to make HTML files named index.html in directories"
23 | @echo " singlehtml to make a single large HTML file"
24 | @echo " pickle to make pickle files"
25 | @echo " json to make JSON files"
26 | @echo " htmlhelp to make HTML files and a HTML help project"
27 | @echo " qthelp to make HTML files and a qthelp project"
28 | @echo " devhelp to make HTML files and a Devhelp project"
29 | @echo " epub to make an epub"
30 | @echo " latex to make LaTeX files, you can set PAPER=a4 or PAPER=letter"
31 | @echo " latexpdf to make LaTeX files and run them through pdflatex"
32 | @echo " text to make text files"
33 | @echo " man to make manual pages"
34 | @echo " texinfo to make Texinfo files"
35 | @echo " info to make Texinfo files and run them through makeinfo"
36 | @echo " gettext to make PO message catalogs"
37 | @echo " changes to make an overview of all changed/added/deprecated items"
38 | @echo " linkcheck to check all external links for integrity"
39 | @echo " doctest to run all doctests embedded in the documentation (if enabled)"
40 |
41 | clean:
42 | -rm -rf $(BUILDDIR)/*
43 |
44 | html:
45 | $(SPHINXBUILD) -b html $(ALLSPHINXOPTS) $(BUILDDIR)/html
46 | @echo
47 | @echo "Build finished. The HTML pages are in $(BUILDDIR)/html."
48 |
49 | dirhtml:
50 | $(SPHINXBUILD) -b dirhtml $(ALLSPHINXOPTS) $(BUILDDIR)/dirhtml
51 | @echo
52 | @echo "Build finished. The HTML pages are in $(BUILDDIR)/dirhtml."
53 |
54 | singlehtml:
55 | $(SPHINXBUILD) -b singlehtml $(ALLSPHINXOPTS) $(BUILDDIR)/singlehtml
56 | @echo
57 | @echo "Build finished. The HTML page is in $(BUILDDIR)/singlehtml."
58 |
59 | pickle:
60 | $(SPHINXBUILD) -b pickle $(ALLSPHINXOPTS) $(BUILDDIR)/pickle
61 | @echo
62 | @echo "Build finished; now you can process the pickle files."
63 |
64 | json:
65 | $(SPHINXBUILD) -b json $(ALLSPHINXOPTS) $(BUILDDIR)/json
66 | @echo
67 | @echo "Build finished; now you can process the JSON files."
68 |
69 | htmlhelp:
70 | $(SPHINXBUILD) -b htmlhelp $(ALLSPHINXOPTS) $(BUILDDIR)/htmlhelp
71 | @echo
72 | @echo "Build finished; now you can run HTML Help Workshop with the" \
73 | ".hhp project file in $(BUILDDIR)/htmlhelp."
74 |
75 | qthelp:
76 | $(SPHINXBUILD) -b qthelp $(ALLSPHINXOPTS) $(BUILDDIR)/qthelp
77 | @echo
78 | @echo "Build finished; now you can run "qcollectiongenerator" with the" \
79 | ".qhcp project file in $(BUILDDIR)/qthelp, like this:"
80 | @echo "# qcollectiongenerator $(BUILDDIR)/qthelp/UniversalBinaryJSON.qhcp"
81 | @echo "To view the help file:"
82 | @echo "# assistant -collectionFile $(BUILDDIR)/qthelp/UniversalBinaryJSON.qhc"
83 |
84 | devhelp:
85 | $(SPHINXBUILD) -b devhelp $(ALLSPHINXOPTS) $(BUILDDIR)/devhelp
86 | @echo
87 | @echo "Build finished."
88 | @echo "To view the help file:"
89 | @echo "# mkdir -p $$HOME/.local/share/devhelp/UniversalBinaryJSON"
90 | @echo "# ln -s $(BUILDDIR)/devhelp $$HOME/.local/share/devhelp/UniversalBinaryJSON"
91 | @echo "# devhelp"
92 |
93 | epub:
94 | $(SPHINXBUILD) -b epub $(ALLSPHINXOPTS) $(BUILDDIR)/epub
95 | @echo
96 | @echo "Build finished. The epub file is in $(BUILDDIR)/epub."
97 |
98 | latex:
99 | $(SPHINXBUILD) -b latex $(ALLSPHINXOPTS) $(BUILDDIR)/latex
100 | @echo
101 | @echo "Build finished; the LaTeX files are in $(BUILDDIR)/latex."
102 | @echo "Run \`make' in that directory to run these through (pdf)latex" \
103 | "(use \`make latexpdf' here to do that automatically)."
104 |
105 | latexpdf:
106 | $(SPHINXBUILD) -b latex $(ALLSPHINXOPTS) $(BUILDDIR)/latex
107 | @echo "Running LaTeX files through pdflatex..."
108 | $(MAKE) -C $(BUILDDIR)/latex all-pdf
109 | @echo "pdflatex finished; the PDF files are in $(BUILDDIR)/latex."
110 |
111 | text:
112 | $(SPHINXBUILD) -b text $(ALLSPHINXOPTS) $(BUILDDIR)/text
113 | @echo
114 | @echo "Build finished. The text files are in $(BUILDDIR)/text."
115 |
116 | man:
117 | $(SPHINXBUILD) -b man $(ALLSPHINXOPTS) $(BUILDDIR)/man
118 | @echo
119 | @echo "Build finished. The manual pages are in $(BUILDDIR)/man."
120 |
121 | texinfo:
122 | $(SPHINXBUILD) -b texinfo $(ALLSPHINXOPTS) $(BUILDDIR)/texinfo
123 | @echo
124 | @echo "Build finished. The Texinfo files are in $(BUILDDIR)/texinfo."
125 | @echo "Run \`make' in that directory to run these through makeinfo" \
126 | "(use \`make info' here to do that automatically)."
127 |
128 | info:
129 | $(SPHINXBUILD) -b texinfo $(ALLSPHINXOPTS) $(BUILDDIR)/texinfo
130 | @echo "Running Texinfo files through makeinfo..."
131 | make -C $(BUILDDIR)/texinfo info
132 | @echo "makeinfo finished; the Info files are in $(BUILDDIR)/texinfo."
133 |
134 | gettext:
135 | $(SPHINXBUILD) -b gettext $(I18NSPHINXOPTS) $(BUILDDIR)/locale
136 | @echo
137 | @echo "Build finished. The message catalogs are in $(BUILDDIR)/locale."
138 |
139 | changes:
140 | $(SPHINXBUILD) -b changes $(ALLSPHINXOPTS) $(BUILDDIR)/changes
141 | @echo
142 | @echo "The overview file is in $(BUILDDIR)/changes."
143 |
144 | linkcheck:
145 | $(SPHINXBUILD) -b linkcheck $(ALLSPHINXOPTS) $(BUILDDIR)/linkcheck
146 | @echo
147 | @echo "Link check complete; look for any errors in the above output " \
148 | "or in $(BUILDDIR)/linkcheck/output.txt."
149 |
150 | doctest:
151 | $(SPHINXBUILD) -b doctest $(ALLSPHINXOPTS) $(BUILDDIR)/doctest
152 | @echo "Testing of doctests in the sources finished, look at the " \
153 | "results in $(BUILDDIR)/doctest/output.txt."
154 |
--------------------------------------------------------------------------------
/spec8/_static/.keep:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ubjson/universal-binary-json/b3037c84600d6d34f505f6175716f10f5274538e/spec8/_static/.keep
--------------------------------------------------------------------------------
/spec8/conf.py:
--------------------------------------------------------------------------------
1 | # -*- coding: utf-8 -*-
2 | #
3 | # Universal Binary JSON documentation build configuration file, created by
4 | # sphinx-quickstart on Sat Aug 4 16:26:00 2012.
5 | #
6 | # This file is execfile()d with the current directory set to its containing dir.
7 | #
8 | # Note that not all possible configuration values are present in this
9 | # autogenerated file.
10 | #
11 | # All configuration values have a default; values that are commented out
12 | # serve to show the default.
13 |
14 | import sys, os
15 |
16 | # If extensions (or modules to document with autodoc) are in another directory,
17 | # add these directories to sys.path here. If the directory is relative to the
18 | # documentation root, use os.path.abspath to make it absolute, like shown here.
19 | #sys.path.insert(0, os.path.abspath('.'))
20 |
21 | # -- General configuration -----------------------------------------------------
22 |
23 | # If your documentation needs a minimal Sphinx version, state it here.
24 | #needs_sphinx = '1.0'
25 |
26 | # Add any Sphinx extension module names here, as strings. They can be extensions
27 | # coming with Sphinx (named 'sphinx.ext.*') or your custom ones.
28 | extensions = []
29 |
30 | # Add any paths that contain templates here, relative to this directory.
31 | templates_path = ['_templates']
32 |
33 | # The suffix of source filenames.
34 | source_suffix = '.rst'
35 |
36 | # The encoding of source files.
37 | #source_encoding = 'utf-8-sig'
38 |
39 | # The master toctree document.
40 | master_doc = 'index'
41 |
42 | # General information about the project.
43 | project = u'Universal Binary JSON'
44 | copyright = u'2012, UBJSON Community'
45 |
46 | # The version info for the project you're documenting, acts as replacement for
47 | # |version| and |release|, also used in various other places throughout the
48 | # built documents.
49 | #
50 | # The short X.Y version.
51 | version = '0.9'
52 | # The full version, including alpha/beta/rc tags.
53 | release = '0.9-dev'
54 |
55 | # The language for content autogenerated by Sphinx. Refer to documentation
56 | # for a list of supported languages.
57 | #language = None
58 |
59 | # There are two options for replacing |today|: either, you set today to some
60 | # non-false value, then it is used:
61 | #today = ''
62 | # Else, today_fmt is used as the format for a strftime call.
63 | #today_fmt = '%B %d, %Y'
64 |
65 | # List of patterns, relative to source directory, that match files and
66 | # directories to ignore when looking for source files.
67 | exclude_patterns = ['_build']
68 |
69 | # The reST default role (used for this markup: `text`) to use for all documents.
70 | #default_role = None
71 |
72 | # If true, '()' will be appended to :func: etc. cross-reference text.
73 | #add_function_parentheses = True
74 |
75 | # If true, the current module name will be prepended to all description
76 | # unit titles (such as .. function::).
77 | #add_module_names = True
78 |
79 | # If true, sectionauthor and moduleauthor directives will be shown in the
80 | # output. They are ignored by default.
81 | #show_authors = False
82 |
83 | # The name of the Pygments (syntax highlighting) style to use.
84 | pygments_style = 'sphinx'
85 |
86 | # A list of ignored prefixes for module index sorting.
87 | #modindex_common_prefix = []
88 |
89 |
90 | # -- Options for HTML output ---------------------------------------------------
91 |
92 | # The theme to use for HTML and HTML Help pages. See the documentation for
93 | # a list of builtin themes.
94 | html_theme = 'haiku'
95 |
96 | # Theme options are theme-specific and customize the look and feel of a theme
97 | # further. For a list of options available for each theme, see the
98 | # documentation.
99 | #html_theme_options = {}
100 |
101 | # Add any paths that contain custom themes here, relative to this directory.
102 | #html_theme_path = []
103 |
104 | # The name for this set of Sphinx documents. If None, it defaults to
105 | # " v documentation".
106 | #html_title = None
107 |
108 | # A shorter title for the navigation bar. Default is the same as html_title.
109 | #html_short_title = None
110 |
111 | # The name of an image file (relative to this directory) to place at the top
112 | # of the sidebar.
113 | #html_logo = None
114 |
115 | # The name of an image file (within the static path) to use as favicon of the
116 | # docs. This file should be a Windows icon file (.ico) being 16x16 or 32x32
117 | # pixels large.
118 | #html_favicon = None
119 |
120 | # Add any paths that contain custom static files (such as style sheets) here,
121 | # relative to this directory. They are copied after the builtin static files,
122 | # so a file named "default.css" will overwrite the builtin "default.css".
123 | html_static_path = ['_static']
124 |
125 | # If not '', a 'Last updated on:' timestamp is inserted at every page bottom,
126 | # using the given strftime format.
127 | #html_last_updated_fmt = '%b %d, %Y'
128 |
129 | # If true, SmartyPants will be used to convert quotes and dashes to
130 | # typographically correct entities.
131 | #html_use_smartypants = True
132 |
133 | # Custom sidebar templates, maps document names to template names.
134 | #html_sidebars = {}
135 |
136 | # Additional templates that should be rendered to pages, maps page names to
137 | # template names.
138 | #html_additional_pages = {}
139 |
140 | # If false, no module index is generated.
141 | #html_domain_indices = True
142 |
143 | # If false, no index is generated.
144 | #html_use_index = True
145 |
146 | # If true, the index is split into individual pages for each letter.
147 | #html_split_index = False
148 |
149 | # If true, links to the reST sources are added to the pages.
150 | #html_show_sourcelink = True
151 |
152 | # If true, "Created using Sphinx" is shown in the HTML footer. Default is True.
153 | #html_show_sphinx = True
154 |
155 | # If true, "(C) Copyright ..." is shown in the HTML footer. Default is True.
156 | #html_show_copyright = True
157 |
158 | # If true, an OpenSearch description file will be output, and all pages will
159 | # contain a tag referring to it. The value of this option must be the
160 | # base URL from which the finished HTML is served.
161 | #html_use_opensearch = ''
162 |
163 | # This is the file name suffix for HTML files (e.g. ".xhtml").
164 | #html_file_suffix = None
165 |
166 | # Output file base name for HTML help builder.
167 | htmlhelp_basename = 'UniversalBinaryJSONdoc'
168 |
169 |
170 | # -- Options for LaTeX output --------------------------------------------------
171 |
172 | latex_elements = {
173 | # The paper size ('letterpaper' or 'a4paper').
174 | #'papersize': 'letterpaper',
175 |
176 | # The font size ('10pt', '11pt' or '12pt').
177 | #'pointsize': '10pt',
178 |
179 | # Additional stuff for the LaTeX preamble.
180 | #'preamble': '',
181 | }
182 |
183 | # Grouping the document tree into LaTeX files. List of tuples
184 | # (source start file, target name, title, author, documentclass [howto/manual]).
185 | latex_documents = [
186 | ('index', 'UniversalBinaryJSON.tex', u'Universal Binary JSON Documentation',
187 | u'UBJSON Community', 'manual'),
188 | ]
189 |
190 | # The name of an image file (relative to this directory) to place at the top of
191 | # the title page.
192 | #latex_logo = None
193 |
194 | # For "manual" documents, if this is true, then toplevel headings are parts,
195 | # not chapters.
196 | #latex_use_parts = False
197 |
198 | # If true, show page references after internal links.
199 | #latex_show_pagerefs = False
200 |
201 | # If true, show URL addresses after external links.
202 | #latex_show_urls = False
203 |
204 | # Documents to append as an appendix to all manuals.
205 | #latex_appendices = []
206 |
207 | # If false, no module index is generated.
208 | #latex_domain_indices = True
209 |
210 |
211 | # -- Options for manual page output --------------------------------------------
212 |
213 | # One entry per manual page. List of tuples
214 | # (source start file, name, description, authors, manual section).
215 | man_pages = [
216 | ('index', 'universalbinaryjson', u'Universal Binary JSON Documentation',
217 | [u'UBJSON Community'], 1)
218 | ]
219 |
220 | # If true, show URL addresses after external links.
221 | #man_show_urls = False
222 |
223 |
224 | # -- Options for Texinfo output ------------------------------------------------
225 |
226 | # Grouping the document tree into Texinfo files. List of tuples
227 | # (source start file, target name, title, author,
228 | # dir menu entry, description, category)
229 | texinfo_documents = [
230 | ('index', 'UniversalBinaryJSON', u'Universal Binary JSON Documentation',
231 | u'UBJSON Community', 'UniversalBinaryJSON', 'One line description of project.',
232 | 'Miscellaneous'),
233 | ]
234 |
235 | # Documents to append as an appendix to all manuals.
236 | #texinfo_appendices = []
237 |
238 | # If false, no module index is generated.
239 | #texinfo_domain_indices = True
240 |
241 | # How to display URL addresses: 'footnote', 'no', or 'inline'.
242 | #texinfo_show_urls = 'footnote'
243 |
--------------------------------------------------------------------------------
/spec8/index.html:
--------------------------------------------------------------------------------
1 |
2 |
3 |
4 |
5 |
6 |
7 |
8 |
315 |
316 |
317 |
318 |
319 |
320 |
324 |
325 |
Universal Binary JSON
326 |
JSON has become a ubiquitous text-based file format for
327 | data interchange. Its simplicity, ease of processing and (relatively) rich data
328 | typing made it a natural choice for many developers needing to store or shuffle
329 | data between systems quickly and easy.
330 |
Unfortunately, marshaling native programming language constructs in and out of
331 | a text-based representations does have a measurable processing cost associated
332 | with it.
333 |
In high-performance applications, avoiding the text-processing step of JSON can
334 | net big wins in both processing time and size reduction of stored information,
335 | which is where a binary JSON format becomes helpful.
336 |
337 |
Why
338 |
Attempts to make using JSON faster through binary specifications like
339 | BSON, BJSON or Smile exist, but have been rejected
340 | from mass-adoption for two reasons:
341 |
342 |
Custom (Binary-Only) Data Types:
343 | Inclusion of custom data types that have no ancillary in the original JSON
344 | spec, leaving room for incompatibilities to exist as different implementations
345 | of the spec handle the binary-only data types differently.
346 |
Complexity: Some specifications provide higher performance or smaller
347 | representations at the cost of a much more complex specification, making
348 | implementations more difficult which can slow or block adoption. One of the key
349 | reasons JSON became as popular as it did was because of its ease of use.
350 |
351 |
352 |
353 |
Goals
354 |
The Universal Binary JSON <> specification has 3 goals:
355 |
356 |
Universal Compatibility
357 |
358 |
359 |
Meaning absolute compatibility with the JSON spec itself as well as only
360 | utilizing data types that are natively supported in all popular programming
361 | languages.
362 |
This allows 1:1 transforms between standard JSON and Universal Binary JSON as
363 | well as efficient representation in all popular programming languages without
364 | requiring parser developers to account for strange data types that their
365 | language may not support.
366 |
367 |
368 |
Ease of Use
369 |
370 |
371 |
The Universal Binary JSON specification is intentionally defined using a
372 | single core data structure to build up the entire specification.
373 |
This accomplishes two things: it allows the spec to be understood quickly and
374 | allows developers to write trivially simple code to take advantage of it or
375 | interchange data with another system utilizing it.
376 |
377 |
378 |
Speed / Efficiency
379 |
380 |
381 | Typically the motivation for using a binary specification over a text-based
382 | one is speed and/or efficiency, so strict attention was paid to selecting data
383 | constructs and representations that are (roughly) 30% smaller than their
384 | compacted JSON counterparts and optimized for fast parsing.
System Message: ERROR/3 (/home/kxepal/projects/universal-binary-json/spec/index.rst, line 90); backlink
420 |
Unknown interpreted text role "ref".
421 |
422 |
423 |
424 |
425 |
426 |
427 |
428 |
--------------------------------------------------------------------------------
/spec8/index.rst:
--------------------------------------------------------------------------------
1 | .. Universal Binary JSON documentation master file, created by
2 | sphinx-quickstart on Sat Aug 4 16:26:00 2012.
3 | You can adapt this file completely to your liking, but it should at least
4 | contain the root `toctree` directive.
5 |
6 | Universal Binary JSON
7 | =====================
8 |
9 | `JSON`_ has become a ubiquitous text-based file format for data interchange.
10 | Its simplicity, ease of processing and (relatively) rich data typing made it a
11 | natural choice for many developers needing to store or shuffle data between
12 | systems quickly and easy.
13 |
14 | Unfortunately, marshaling native programming language constructs in and out of
15 | a text-based representations does have a measurable processing cost associated
16 | with it.
17 |
18 | In high-performance applications, avoiding the text-processing step of JSON can
19 | net big wins in both processing time and size reduction of stored information,
20 | which is where a binary JSON format becomes helpful.
21 |
22 | .. toctree::
23 | :maxdepth: 3
24 |
25 | spec.rst
26 | type_reference.rst
27 | libraries.rst
28 | thanks.rst
29 |
30 | Why UBJSON?
31 | -----------
32 |
33 | Attempts to make using JSON faster through binary specifications like
34 | `BSON`_, `BJSON`_ or `Smile`_ exist, but have been `rejected`_
35 | from `mass-adoption`_ for two reasons:
36 |
37 | * Custom (Binary-Only) Data Types:
38 | Inclusion of custom data types that have no ancillary in the original JSON
39 | spec, leaving room for incompatibilities to exist as different implementations
40 | of the spec handle the binary-only data types differently.
41 |
42 | * Complexity: Some specifications provide higher performance or smaller
43 | representations at the cost of a `much more complex specification`_,
44 | making implementations more difficult which can slow or block adoption. One of
45 | the key reasons JSON became as popular as it did was because of its ease of
46 | use.
47 |
48 | BSON, for example, defines types for binary data, regular expressions,
49 | JavaScript code blocks and other constructs that have no equivalent data type in
50 | JSON. BJSON defines a binary data type as well, again leaving the door wide open
51 | to interpretation that can potentially lead to incompatibilities between two
52 | implementations of the spec and Smile, while the closest, defines more complex
53 | data constructs and generation/parsing rules in the name of absolute space
54 | efficiency.
55 |
56 | The existing binary JSON specifications all define incompatibilities or
57 | complexities that undo the singular tenet that made JSON so successful:
58 | **simplicity**.
59 |
60 | JSON’s simplicity made it accessible to anyone, made implementations in every
61 | language available and made explaining it to anyone consuming your data
62 | immediate.
63 |
64 | Any successful binary JSON specification must carry these properties forward for
65 | it to be genuinely helpful to the community at large.
66 |
67 | This specification is defined around a singular construct used to build up and
68 | represent JSON values and objects. Reading and writing the format is trivial,
69 | designed with the goal of being understood in under 10 minutes (likely less if
70 | you are very comfortable with JSON already).
71 |
72 | Fortunately, while the Universal Binary JSON specification carriers these
73 | tenets of simplicity forward, it is also able to take advantage of optimized
74 | binary data structures that are (on average) 30% smaller than compacted JSON and
75 | specified for ultimate read performance; bringing **simplicity**, **size** and
76 | **performance** all together into a single specification that is 100% compatible
77 | with JSON.
78 |
79 | Why not JSON+gzip?
80 | ------------------
81 |
82 | On the surface simply gzipping your compacted JSON may seem like a valid (and
83 | smaller) alternative to using the Universal Binary JSON specification, but there
84 | are two significant costs associated with this approach that you should be aware
85 | of:
86 |
87 | #. At least a `50% performance overhead`_ for processing the data.
88 | #. Lack of data clarity and inability to inspect it directly.
89 |
90 | While gzipping your JSON will give you great compression, about 75% on average,
91 | the overhead required to read/write the data becomes significantly higher.
92 | Additionally, because the binary data is now in a compressed format you can no
93 | longer open it directly in an editor and scan the human-readable portions of it
94 | easily; which can be important during debugging, testing or data verification
95 | and recovery.
96 |
97 | Utilizing the Universal Binary JSON format will typically provide a
98 | 30% reduction in size and store your data in a read-optimized format offering
99 | you much higher performance than even compacted JSON. If you had a usage
100 | scenario where your data is put into long-term cold storage and pulled out in
101 | large chunks for processing, you might even consider gzipping your
102 | Universal Binary JSON files, storing those, and when they are pulled out and
103 | unzipped, you can then process them with all the speed advantages of UBJ.
104 |
105 | As always, deciding which approach is right for your project depends heavily on
106 | what you need.
107 |
108 | Goals
109 | -----
110 |
111 | The `Universal Binary JSON`_ specification has 3 goals:
112 |
113 | #. **Universal Compatibility**
114 |
115 | Meaning absolute compatibility with the JSON spec itself as well as only
116 | utilizing data types that are natively supported in all popular programming
117 | languages.
118 |
119 | This allows 1:1 transforms between standard JSON and Universal Binary JSON as
120 | well as efficient representation in all popular programming languages without
121 | requiring parser developers to account for strange data types that their
122 | language may not support.
123 |
124 | #. **Ease of Use**
125 |
126 | The Universal Binary JSON specification is intentionally defined using a
127 | single core data structure to build up the entire specification.
128 |
129 | This accomplishes two things: it allows the spec to be understood quickly and
130 | allows developers to write trivially simple code to take advantage of it or
131 | interchange data with another system utilizing it.
132 |
133 | #. **Speed / Efficiency**
134 |
135 | Typically the motivation for using a binary specification over a text-based
136 | one is speed and/or efficiency, so strict attention was paid to selecting
137 | data constructs and representations that are (roughly) 30% smaller than their
138 | compacted JSON counterparts and optimized for fast parsing.
139 |
140 | Indices and tables
141 | ==================
142 |
143 | * :ref:`genindex`
144 | * :ref:`search`
145 |
146 | .. _JSON: http://json.org
147 | .. _UBJSON: http://ubjson.org
148 | .. _Universal Binary JSON: http://ubjson.org
149 | .. _BSON: http://bsonspec.org
150 | .. _BJSON: http://bjson.org
151 | .. _Smile: http://wiki.fasterxml.com/SmileFormat
152 | .. _rejected: https://issues.apache.org/jira/browse/COUCHDB-702
153 | .. _mass-adoption: http://bsonspec.org/#/implementation
154 | .. _much more complex specification: http://wiki.fasterxml.com/SmileFormatSpec
155 | .. _50% performance overhead: http://www.cowtowncoder.com/blog/archives/2009/05/entry_263.html
156 |
--------------------------------------------------------------------------------
/spec8/libraries.rst:
--------------------------------------------------------------------------------
1 |
2 | Libraries
3 | =========
4 |
5 | Below are a list of libraries, by language, that implement the Universal Binary
6 | JSON Specification.
7 |
8 | D
9 | ----
10 |
11 | * `UBJSON for D `_
12 |
13 | Java
14 | ----
15 |
16 | * `Universal Binary JSON Java Library `_
17 |
18 | .NET
19 | ----
20 |
21 | * `Ubjson.NET `_
22 |
23 | Node.js
24 | -------
25 |
26 | * `node-ubjson `_
27 |
28 | Python
29 | ------
30 |
31 | * `simpleubjson `_
32 |
33 |
--------------------------------------------------------------------------------
/spec8/make.bat:
--------------------------------------------------------------------------------
1 | @ECHO OFF
2 |
3 | REM Command file for Sphinx documentation
4 |
5 | if "%SPHINXBUILD%" == "" (
6 | set SPHINXBUILD=sphinx-build
7 | )
8 | set BUILDDIR=_build
9 | set ALLSPHINXOPTS=-d %BUILDDIR%/doctrees %SPHINXOPTS% .
10 | set I18NSPHINXOPTS=%SPHINXOPTS% .
11 | if NOT "%PAPER%" == "" (
12 | set ALLSPHINXOPTS=-D latex_paper_size=%PAPER% %ALLSPHINXOPTS%
13 | set I18NSPHINXOPTS=-D latex_paper_size=%PAPER% %I18NSPHINXOPTS%
14 | )
15 |
16 | if "%1" == "" goto help
17 |
18 | if "%1" == "help" (
19 | :help
20 | echo.Please use `make ^` where ^ is one of
21 | echo. html to make standalone HTML files
22 | echo. dirhtml to make HTML files named index.html in directories
23 | echo. singlehtml to make a single large HTML file
24 | echo. pickle to make pickle files
25 | echo. json to make JSON files
26 | echo. htmlhelp to make HTML files and a HTML help project
27 | echo. qthelp to make HTML files and a qthelp project
28 | echo. devhelp to make HTML files and a Devhelp project
29 | echo. epub to make an epub
30 | echo. latex to make LaTeX files, you can set PAPER=a4 or PAPER=letter
31 | echo. text to make text files
32 | echo. man to make manual pages
33 | echo. texinfo to make Texinfo files
34 | echo. gettext to make PO message catalogs
35 | echo. changes to make an overview over all changed/added/deprecated items
36 | echo. linkcheck to check all external links for integrity
37 | echo. doctest to run all doctests embedded in the documentation if enabled
38 | goto end
39 | )
40 |
41 | if "%1" == "clean" (
42 | for /d %%i in (%BUILDDIR%\*) do rmdir /q /s %%i
43 | del /q /s %BUILDDIR%\*
44 | goto end
45 | )
46 |
47 | if "%1" == "html" (
48 | %SPHINXBUILD% -b html %ALLSPHINXOPTS% %BUILDDIR%/html
49 | if errorlevel 1 exit /b 1
50 | echo.
51 | echo.Build finished. The HTML pages are in %BUILDDIR%/html.
52 | goto end
53 | )
54 |
55 | if "%1" == "dirhtml" (
56 | %SPHINXBUILD% -b dirhtml %ALLSPHINXOPTS% %BUILDDIR%/dirhtml
57 | if errorlevel 1 exit /b 1
58 | echo.
59 | echo.Build finished. The HTML pages are in %BUILDDIR%/dirhtml.
60 | goto end
61 | )
62 |
63 | if "%1" == "singlehtml" (
64 | %SPHINXBUILD% -b singlehtml %ALLSPHINXOPTS% %BUILDDIR%/singlehtml
65 | if errorlevel 1 exit /b 1
66 | echo.
67 | echo.Build finished. The HTML pages are in %BUILDDIR%/singlehtml.
68 | goto end
69 | )
70 |
71 | if "%1" == "pickle" (
72 | %SPHINXBUILD% -b pickle %ALLSPHINXOPTS% %BUILDDIR%/pickle
73 | if errorlevel 1 exit /b 1
74 | echo.
75 | echo.Build finished; now you can process the pickle files.
76 | goto end
77 | )
78 |
79 | if "%1" == "json" (
80 | %SPHINXBUILD% -b json %ALLSPHINXOPTS% %BUILDDIR%/json
81 | if errorlevel 1 exit /b 1
82 | echo.
83 | echo.Build finished; now you can process the JSON files.
84 | goto end
85 | )
86 |
87 | if "%1" == "htmlhelp" (
88 | %SPHINXBUILD% -b htmlhelp %ALLSPHINXOPTS% %BUILDDIR%/htmlhelp
89 | if errorlevel 1 exit /b 1
90 | echo.
91 | echo.Build finished; now you can run HTML Help Workshop with the ^
92 | .hhp project file in %BUILDDIR%/htmlhelp.
93 | goto end
94 | )
95 |
96 | if "%1" == "qthelp" (
97 | %SPHINXBUILD% -b qthelp %ALLSPHINXOPTS% %BUILDDIR%/qthelp
98 | if errorlevel 1 exit /b 1
99 | echo.
100 | echo.Build finished; now you can run "qcollectiongenerator" with the ^
101 | .qhcp project file in %BUILDDIR%/qthelp, like this:
102 | echo.^> qcollectiongenerator %BUILDDIR%\qthelp\UniversalBinaryJSON.qhcp
103 | echo.To view the help file:
104 | echo.^> assistant -collectionFile %BUILDDIR%\qthelp\UniversalBinaryJSON.ghc
105 | goto end
106 | )
107 |
108 | if "%1" == "devhelp" (
109 | %SPHINXBUILD% -b devhelp %ALLSPHINXOPTS% %BUILDDIR%/devhelp
110 | if errorlevel 1 exit /b 1
111 | echo.
112 | echo.Build finished.
113 | goto end
114 | )
115 |
116 | if "%1" == "epub" (
117 | %SPHINXBUILD% -b epub %ALLSPHINXOPTS% %BUILDDIR%/epub
118 | if errorlevel 1 exit /b 1
119 | echo.
120 | echo.Build finished. The epub file is in %BUILDDIR%/epub.
121 | goto end
122 | )
123 |
124 | if "%1" == "latex" (
125 | %SPHINXBUILD% -b latex %ALLSPHINXOPTS% %BUILDDIR%/latex
126 | if errorlevel 1 exit /b 1
127 | echo.
128 | echo.Build finished; the LaTeX files are in %BUILDDIR%/latex.
129 | goto end
130 | )
131 |
132 | if "%1" == "text" (
133 | %SPHINXBUILD% -b text %ALLSPHINXOPTS% %BUILDDIR%/text
134 | if errorlevel 1 exit /b 1
135 | echo.
136 | echo.Build finished. The text files are in %BUILDDIR%/text.
137 | goto end
138 | )
139 |
140 | if "%1" == "man" (
141 | %SPHINXBUILD% -b man %ALLSPHINXOPTS% %BUILDDIR%/man
142 | if errorlevel 1 exit /b 1
143 | echo.
144 | echo.Build finished. The manual pages are in %BUILDDIR%/man.
145 | goto end
146 | )
147 |
148 | if "%1" == "texinfo" (
149 | %SPHINXBUILD% -b texinfo %ALLSPHINXOPTS% %BUILDDIR%/texinfo
150 | if errorlevel 1 exit /b 1
151 | echo.
152 | echo.Build finished. The Texinfo files are in %BUILDDIR%/texinfo.
153 | goto end
154 | )
155 |
156 | if "%1" == "gettext" (
157 | %SPHINXBUILD% -b gettext %I18NSPHINXOPTS% %BUILDDIR%/locale
158 | if errorlevel 1 exit /b 1
159 | echo.
160 | echo.Build finished. The message catalogs are in %BUILDDIR%/locale.
161 | goto end
162 | )
163 |
164 | if "%1" == "changes" (
165 | %SPHINXBUILD% -b changes %ALLSPHINXOPTS% %BUILDDIR%/changes
166 | if errorlevel 1 exit /b 1
167 | echo.
168 | echo.The overview file is in %BUILDDIR%/changes.
169 | goto end
170 | )
171 |
172 | if "%1" == "linkcheck" (
173 | %SPHINXBUILD% -b linkcheck %ALLSPHINXOPTS% %BUILDDIR%/linkcheck
174 | if errorlevel 1 exit /b 1
175 | echo.
176 | echo.Link check complete; look for any errors in the above output ^
177 | or in %BUILDDIR%/linkcheck/output.txt.
178 | goto end
179 | )
180 |
181 | if "%1" == "doctest" (
182 | %SPHINXBUILD% -b doctest %ALLSPHINXOPTS% %BUILDDIR%/doctest
183 | if errorlevel 1 exit /b 1
184 | echo.
185 | echo.Testing of doctests in the sources finished, look at the ^
186 | results in %BUILDDIR%/doctest/output.txt.
187 | goto end
188 | )
189 |
190 | :end
191 |
--------------------------------------------------------------------------------
/spec8/spec.rst:
--------------------------------------------------------------------------------
1 |
2 | Specification
3 | +++++++++++++
4 |
5 | Data Format
6 | ===========
7 |
8 | The Universal Binary JSON specification utilizes a single binary tuple to
9 | represent all JSON data types (both value and container types)::
10 |
11 | [][]
12 |
13 | Each element in the tuple is defined as:
14 |
15 | * **type**
16 |
17 | * A 1-byte ASCII char used to indicate the type of the data following it.
18 | * A single ASCII char was chosen to make manually walking and debugging
19 | data stored in the Universal Binary JSON format as easy as possible
20 | (e.g. making the data relatively readable in a hex editor).
21 | * **length** (OPTIONAL)
22 | 1-byte or 4-byte length value based on the type specified. This allows
23 | for more aggressive compression and space-optimization when dealing with
24 | a lot of small values.
25 |
26 | * 1-byte: An unsigned byte value (``0`` to ``254``) used to indicate the
27 | length of the data payload following it. Useful for small items.
28 | * 4-byte: An unsigned integer value (``0`` to ``2,147,483,647``) used to
29 | indicate the length of the data payload following it. Useful for larger
30 | items.
31 | * **data** (OPTIONAL)
32 | A run of bytes representing the actual binary data for this type of value.
33 |
34 | In the name of efficiency, the length and data fields are optional depending on
35 | the type of value being encoded. Some value are simple enough that just writing
36 | the 1-byte ASCII marker into the stream is enough to represent the value
37 | (e.g. `null`) while others have a type that is specific enough that no length is
38 | needed as the length is implied by the type (e.g. `int32`).
39 |
40 | The specifics of each data type will be spelled out down below for more clarity.
41 |
42 | The basic organization provided by this tuple (`type-length-data`) allows each
43 | JSON construct to be represented in a binary format that is simple to read and
44 | write without the need for complex/custom encodings or ``null``-termating bytes
45 | anywhere in the stream that has to be scanned for or references resolved.
46 |
47 | .. _value_types:
48 |
49 | Value Types
50 | ===========
51 |
52 | This section describes the mapping between the 5 discrete value types from the
53 | JSON specification into the Universal Binary JSON format.
54 |
55 | JSON
56 | ----
57 |
58 | The JSON specification defines 7 value types:
59 |
60 | * string
61 | * number
62 | * object (container)
63 | * array (container)
64 | * true
65 | * false
66 | * null
67 |
68 | Of those 7 values, 2 of them are types describing containers that hold the 5
69 | basic values. We have a separate section below for looking at the 2 container
70 | types specifically, so for the time being let’s only consider the following 5
71 | discrete value types:
72 |
73 | * string
74 | * number
75 | * true
76 | * false
77 | * null
78 |
79 | Most of these types have a ``1:1`` mapping to a primitive type in most popular
80 | programming languages (Java, C, Python, PHP, Erlang, etc.) except for `number`.
81 | This makes defining the types for the 4 easy, but let’s take a closer look at
82 | how we might deconstruct `number` into its core representations.
83 |
84 | Number Type
85 | ^^^^^^^^^^^
86 |
87 | In JavaScript, the `Number`_ type can represent any numeric value where as many
88 | other languages define numbers using 3-6 discrete numeric types depending on the
89 | type and length of the value being stored. This allows the runtime to handle
90 | numeric operations more efficiently.
91 |
92 | In order for the Universal Binary JSON specification to be a performant
93 | alternative to JSON, support for these most common numeric types had to be added
94 | to allow for more efficient reading and writing of numeric values.
95 |
96 | `number` is deconstructed in the Universal Binary JSON specification and defined
97 | by the following **signed** numeric types:
98 |
99 | * byte (8-bits, 1-byte)
100 | * int16 (16-bits, 2-bytes)
101 | * int32 (32-bits, 4-bytes)
102 | * int64 (64-bits, 8-bytes)
103 | * float (32-bits, 4-bytes)
104 | * double (64-bits, 8-bytes)
105 | * huge (arbitrarily long, UTF-8 string-encoded numeric value)
106 |
107 | Trying to maintain a single `number` type represented in binary form would have
108 | lead to parsing complexity and slow-downs as the processing language would have
109 | to further inspect the value and map it to the most optimal type.
110 | By pre-defining these different numeric types directly in binary, in most
111 | languages the number can stay in their optimal form on disk and be deserialized
112 | back into their native representation with very little overhead.
113 |
114 | When working on a platform like JavaScript that has a singular type for numbers,
115 | all of these data types (with the exception of `huge`) can simply be mapped back
116 | to the `number` type with ease and no loss of precision.
117 |
118 | When converting these formats back to JSON, all of the numeric types can simply
119 | be rendered as the singular number type defined by the JSON spec without issue;
120 | there is total compatibility!
121 |
122 | Value Type Summary
123 | ^^^^^^^^^^^^^^^^^^
124 |
125 | Now that we have clearly defined all of our (signed) numeric types and mapped
126 | the 4 remaining simple types to Universal Binary JSON, we have our final list of
127 | discrete value types:
128 |
129 | * null
130 | * false
131 | * true
132 | * byte
133 | * int16
134 | * int32
135 | * int64
136 | * float
137 | * double
138 | * huge
139 | * string
140 |
141 | Now that we have defined all the types we need, let’s see how these are actually
142 | represented in binary in the next section.
143 |
144 | Universal Binary JSON
145 | ---------------------
146 |
147 | The Universal Binary JSON specification defines a total of 13 discrete value
148 | types (that we saw in the last section) all delimited in the binary file by a
149 | specific, 1-byte ASCII character (optionally) followed by a length and
150 | (optionally) a data payload containing the value data itself.
151 |
152 | Some of the values (`null`, `true`` and `false`) are specific enough that
153 | the single 1-byte ASCII character is enough to represent the value in the format
154 | and they will have no `length` or `data` section.
155 |
156 | All of the numeric values (except `huge`) automatically imply a length by
157 | virtue of the type of number they are. For example, a 4-byte `int32` always
158 | has a length of 4-bytes; an 8-byte `double` always requires 8 bytes of data.
159 |
160 | In these cases the ASCII marker for these types are immediately followed by the
161 | data representing the number with no `length` value in between.
162 |
163 | Because `string` and `huge` are potentially variable length, they contain all 3
164 | elements of the tuple: `type-length-data`.
165 |
166 | This table shows the official definitions of the discrete value types:
167 |
168 | +-----------------+--------------------------+--------+---------+--------------+
169 | | Type | Size | Marker | Length? | Data? |
170 | +=================+==========================+========+=========+==============+
171 | | null | 1-byte | Z | No | No |
172 | +-----------------+--------------------------+--------+---------+--------------+
173 | | true | 1-byte | T | No | No |
174 | +-----------------+--------------------------+--------+---------+--------------+
175 | | false | 1-byte | F | No | No |
176 | +-----------------+--------------------------+--------+---------+--------------+
177 | | byte | 2-bytes | B | No | Yes |
178 | +-----------------+--------------------------+--------+---------+--------------+
179 | | int16 | 3-bytes | i | No | Yes |
180 | +-----------------+--------------------------+--------+---------+--------------+
181 | | int32 | 5-bytes | I | No | Yes |
182 | +-----------------+--------------------------+--------+---------+--------------+
183 | | int64 | 9-bytes | L | No | Yes |
184 | +-----------------+--------------------------+--------+---------+--------------+
185 | | float (32-bit) | 5-bytes | d | No | Yes |
186 | +-----------------+--------------------------+--------+---------+--------------+
187 | | double (64-bit) | 9-bytes | D | No | Yes |
188 | +-----------------+--------------------------+--------+---------+--------------+
189 | | huge (number) | 2-bytes | h | Yes | Yes |
190 | | | + byte length of string | | | if non-empty |
191 | +-----------------+--------------------------+--------+---------+--------------+
192 | | huge (number) | 5-bytes | H | Yes | Yes, |
193 | | | + byte length of string | | | if non-empty |
194 | +-----------------+--------------------------+--------+---------+--------------+
195 | | string | 2-bytes | s | Yes | Yes, |
196 | | | + byte length of string | | | if non-empty |
197 | +-----------------+--------------------------+--------+---------+--------------+
198 | | string | 5-bytes | S | Yes | Yes, |
199 | | | + byte length of string | | | if non-empty |
200 | +-----------------+--------------------------+--------+---------+--------------+
201 |
202 | .. note::
203 |
204 | The duplicate (lowercased) ``h`` and ``s`` types are just versions of those
205 | types that allow for a 1-byte length (instead of 4-byte length) to be used for
206 | more compact storage when length is ``<= 254``.
207 |
208 | With each field of the table described as:
209 |
210 | * **Type**
211 |
212 | * The binary value data type defined by the spec.
213 |
214 | * **Size**
215 |
216 | * The byte-size of the construct, as stored in the binary format. This is not
217 | the value of the `length` field, just an indicator to you (the reader) of
218 | approximately how much space writing out a value of this type will take.
219 |
220 | * **Marker**
221 |
222 | * The single ASCII character marker used to delimit the different types of
223 | values in the binary format. When reading in bytes from a file stored in
224 | this format, you can simply check the decimal value of the byte
225 | (e.g. ``'A' = 65``) and switch on that value for processing.
226 |
227 | * **Length?**
228 |
229 | * Indicates if the data type provides a length value between the ASCII marker
230 | and the data payload.
231 | * Many of the data types represented in the binary format either don’t have a
232 | length (`null`, `true` or `false`) or their types (e.g. the numeric
233 | values) are specific enough that the length is implied.
234 | * When specifying the length for a string or huge value (UTF-8 encoded), the
235 | length **must represent the number of bytes** of the UTF-8 string and not
236 | the number of characters in the string.
237 |
238 | .. note::
239 |
240 | For example, English typically uses 1-byte per character, so the string
241 | “hello” has a length of 5. The same string in Russian is “привет” with a
242 | byte length of 12 and in Arabic the text becomes “مرحبا” with a byte length
243 | of 10.
244 |
245 | * **Data?**
246 |
247 | * Indicates if the data type provides a data payload representing the value.
248 | * Most types except for `null`, `true` and `false` provide a data payload
249 | indicating their value.
250 | * Variable-length types like `string` and `huge` **do not** provide a data
251 | payload when they are empty (i.e. length of 0).More specifically, if you are
252 | writing a parser for the Universal Binary JSON format and you encounter a
253 | `string` of length 0, you know the very next byte is an ASCII marker for
254 | another value since the `string` has no data payload.
255 |
256 | .. note::
257 |
258 | **Using Numeric Types**
259 |
260 | It is always recommended to use the smallest numeric type that fits your
261 | needs. For data with a large amount of numeric data, this can cut down the
262 | size of the payloads significantly (on average a **50% reduction in size**).
263 |
264 | All numeric types are **signed**.
265 |
266 | Numeric values of `infinity` are encoded as a `null` (``Z``) value.
267 | (See `ECMA`_, See `JSON presentation`_)
268 |
269 | **64-bit Integers**
270 |
271 | While almost all languages native support 64-bit integers, not all do
272 | (e.g. C89 and JavaScript (`yet`_)) and care must be taken when encoding 64-bit
273 | integer values into binary JSON then attempting to decode it on a platform
274 | that doesn't support it.
275 |
276 | If you are fully aware of the platforms and runtime environments your binary
277 | JSON is being used on and know they all support 64-bit integers, then you are
278 | fine.
279 |
280 | If you are trying to deserialize 64-bit integers in a client’s browser in
281 | JavaScript or another environment that does not support 64-bit integers, then
282 | you will want to take care to skip them in the input or have the client
283 | producing them encode them as `double` or `huge` values if that is easier to
284 | handle.
285 |
286 | Alternatively you might consider encoding your 64-bit values as doubles if you
287 | know you are going from the server to a client JavaScript environment with the
288 | binary-encoded information.
289 |
290 | **32-bit Floats**
291 |
292 | All 32-bit float values are written into the binary format using the
293 | `IEEE 754 single precision floating point format`_, which is the following
294 | structure:
295 |
296 | * Bit 31 (1 bit) – sign
297 | * Bit 30-23 (8 bits) – exponent
298 | * Bit 22-0 (23 bits) – fraction (significand)
299 |
300 | **64-bit Doubles**
301 |
302 | All 64-bit double values are written into the binary format using the
303 | `IEEE 754 double precision floating point format`_, which is the following
304 | structure:
305 |
306 | * Bit 63 (1 bit) – sign
307 | * Bit 62-52 (11 bits) – exponent
308 | * Bit 51-0 (52 bits) – fraction (significand)
309 |
310 | **huge Numeric Type**
311 |
312 | The huge numeric type is a safe and portable way for representing
313 | **values > 64-bit** by way of an UTF-8 encoded string. The format of this
314 | string **must adhere** to the `JSON number specification`_.
315 |
316 | This allows `huge` numbers to be portable across all platforms and easily
317 | converted to/from JSON as well as more robust handling on platforms that may
318 | not support arbitrarily large numbers.
319 |
320 | If you are working on a platform that has no support for huge numbers, please
321 | see our :ref:`Best Practices ` recommendation on how to handle
322 | them.
323 |
324 | It is considered a violation of this specification to store numeric
325 | **values <= 64-bit** as a `huge`.
326 |
327 | This decision was made in order to simplify the parsing logic required to
328 | process the Universal Binary JSON specification; there is no need to
329 | introspect `huge` values to see if they contain smaller numeric values when
330 | mapping UBJSON types to native types of the runtime environment.
331 |
332 | The `huge` type should only be used when you need to (safely and portably)
333 | represent **values > 64-bit**.
334 |
335 | **UTF-8 Encoding**
336 |
337 | The JSON specification does not dictate a specific required encoding, it does
338 | however use UTF-8 as the default encoding.
339 |
340 | The Universal Binary JSON specification dictates `UTF-8`_ as the
341 | **required string encoding**. This will allow you to easily exchange binary
342 | JSON between open systems that all follow this encoding requirement.
343 |
344 | Fortunately, this is ideal for `a multitude of reasons`_ like space efficiency
345 | and compatibility across systems and alternative formats.
346 |
347 | To further clarify the binary layout of these data types, below are some visual
348 | examples of what the bytes would look like inside of a binary JSON file.
349 |
350 | NOTE: ``[ ]``-block notation is used for readability, the ``[ ]`` characters
351 | **are not** actually written out in the binary format.
352 |
353 | +----------------------------------+-------------------------------------------+
354 | | Binary Representation | Description |
355 | +==================================+===========================================+
356 | | ``[Z]`` | 1-byte, null value |
357 | +----------------------------------+-------------------------------------------+
358 | | ``[T]`` | 1-byte, true value |
359 | +----------------------------------+-------------------------------------------+
360 | | ``[F]`` | 1-byte, false value |
361 | +----------------------------------+-------------------------------------------+
362 | | ``[B][127]`` | 2-bytes, 8-bit byte value of 127 |
363 | +----------------------------------+-------------------------------------------+
364 | | ``[I][32427]`` | 5-bytes, 32-bit integer value of 32,427 |
365 | +----------------------------------+-------------------------------------------+
366 | | ``[L][12147483647]`` | 9-bytes, 64-bit integer value of |
367 | | | 12,147,483,647 |
368 | +----------------------------------+-------------------------------------------+
369 | | ``[d][3.14159]`` | 5-bytes, 32-bit float value of 3.14159 |
370 | +----------------------------------+-------------------------------------------+
371 | | ``[D][72.38138221]`` | 9-bytes, 64-bit double value of |
372 | | | 72.38138221 |
373 | +----------------------------------+-------------------------------------------+
374 | | ``[s][5][hello]`` | 7 bytes, string UTF-8 “hello” (English) |
375 | +----------------------------------+-------------------------------------------+
376 | | ``[s][12][привет]`` | 14 bytes, string UTF-8 “hello” (Russian) |
377 | +----------------------------------+-------------------------------------------+
378 | | ``[s][10][مرحبا]`` | 12 bytes, string UTF-8 “hello” (Arabic) |
379 | +----------------------------------+-------------------------------------------+
380 | | ``[S][1024][...long string...]`` | 5 bytes + 1024 bytes for the long string |
381 | +----------------------------------+-------------------------------------------+
382 | | ``[s][4][name][s][3][bob]`` | 6 + 5 bytes, equivalent of “name”: “bob” |
383 | +----------------------------------+-------------------------------------------+
384 |
385 | Now that we have seen how the JSON data value types map to the binary format,
386 | in the next section we will see how we can combine these values together into
387 | the two container types (objects and arrays) to create complex object
388 | hierarchies using the Universal Binary JSON format.
389 |
390 | .. _container_types:
391 |
392 | Container Types
393 | ===============
394 |
395 | In this section we will look at the 2 remaining JSON value types that we are
396 | referring to as “container types”, namely object and array.
397 |
398 | JSON
399 | ----
400 |
401 | The two JSON container types are described as follows:
402 |
403 | * **object**
404 |
405 | * A construct containing 0 or more name-value pairings, where the name is
406 | always a string and the value can be any valid value type including
407 | container types themselves.
408 |
409 | * **array**
410 |
411 | * A flat list of values only, where the values can by any valid value type
412 | including container types themselves.
413 | * The JSON specification does not make it a requirement that the values in an
414 | array are all of the same type and neither does the Universal Binary JSON
415 | specification.
416 |
417 | .. note::
418 | **Advanced**: This can actually be to your benefit. Take for example an array
419 | of `int64` values, as you are writing them out to a file or a stream, you can
420 | check the actual value of each `int64` and depending on the value, encode
421 | each one into the smallest possible numeric type (e.g. `byte`, `int32`, etc.).
422 |
423 | This can lead to a significant size reduction (say **50% smaller**) in
424 | smaller numeric values!
425 |
426 | Given these two constructs, let’s see how they are modeled in the Universal
427 | Binary JSON format.
428 |
429 | Universal Binary JSON
430 | ---------------------
431 |
432 | The two container types defined by JSON are modeled using the same tuple that
433 | defines all of our other data structures in this specification so far with a
434 | minor modification: the length value is considered a count of the child elements
435 | the container holds. It does not mean the byte length of the child elements.
436 |
437 | .. note::
438 | Exactly what *child element* means depends on the container. In an `object`, a
439 | single child element is a name-value pair; in an `array`, a child element is a
440 | single value.
441 |
442 | More specifically, the tuple stays exactly the same, it is just the meaning of
443 | the center `length` element that changes::
444 |
445 | [][]
446 |
447 | All the code used to process the constructs defined by this specification stays
448 | the same, but when an `object` or `array` construct are encountered, the code
449 | needs to be aware that the `length` value is the **child element count** so it
450 | can know when the scope of the container ends.
451 |
452 | For example, if you have an object that contains 4 arrays of `length` 50, the
453 | `length` argument for the object is 4 (because it contains the four arrays)
454 | while the `length` argument for each array is 50 (because they each hold
455 | 50 elements).
456 |
457 | .. note::
458 | Unknown-length container types are also supported by the Universal Binary JSON
459 | specification and are covered in detail in the :ref:`Streaming `
460 | section of this document.
461 |
462 | Additionally, the only optional field in the tuple for container types is
463 | `data`, if the container is empty and contains no elements
464 | (i.e. the `length` is 0) then there is no `data` segment.
465 |
466 | All together, the definitions for the `object` and `array` container types looks
467 | like this:
468 |
469 | +-----------------+--------------------------+--------+---------+--------------+
470 | | Type | Size | Marker | Length? | Data? |
471 | +=================+==========================+========+=========+==============+
472 | | array | 2-bytes | a | Yes | Yes, |
473 | | | + byte length of string | | | if non-empty |
474 | +-----------------+--------------------------+--------+---------+--------------+
475 | | array | 5-bytes | A | Yes | Yes, |
476 | | | + byte length of string | | | if non-empty |
477 | +-----------------+--------------------------+--------+---------+--------------+
478 | | object | 2-bytes | o | Yes | Yes |
479 | | | + byte length of string | | | if non-empty |
480 | +-----------------+--------------------------+--------+---------+--------------+
481 | | object | 5-bytes | O | Yes | Yes, |
482 | | | + byte length of string | | | if non-empty |
483 | +-----------------+--------------------------+--------+---------+--------------+
484 |
485 | .. note::
486 | `array` and `object` can also be specified in a more compact manner using
487 | 1-byte for the `length` when it is ``<= 254``. Specifying a `length` of
488 | ``255`` for the 1-byte variants has a special meaning of **length unknown**
489 | and is covered in more detail in the :ref:`Streaming ` section of
490 | the spec.
491 |
492 | The details for each field are the same as described for the non-container
493 | values in the previous section with the one caveat that `length` is a count of
494 | child elements and **not** the number of bytes representing the contents of the
495 | container.
496 |
497 | Let’s look at a quick example of encoding an object, again using the handy
498 | ``[ ]``-notation we used before simply for readability (the ``[ ]`` chars are
499 | not written out or part of the file format).
500 |
501 | Consider the following JSON (30-bytes compacted)::
502 |
503 | {
504 | "id": 1234567890,
505 | "name": "bob"
506 | }
507 |
508 | Storing that object in the Universal Binary JSON format would look like this
509 | (whitespace added for readability)::
510 |
511 | [o][2] 2 bytes
512 | [s][2][id][I][1234567890] 4 + 5 = 9 bytes
513 | [s][4][name][s][3][bob] 6 + 5 = 11 bytes
514 |
515 | Our Universal Binary JSON format is 22 bytes, **27% smaller** than our compacted
516 | JSON!
517 |
518 | Walking through our example above, using a word-journey this is what a parser might see and do:
519 |
520 | #. I see an ``o``, so I know I am parsing an `object` and that the next byte is
521 | the `length` (or count) for this object.
522 | #. I see a ``2``, so I know the object contains 2 elements that I must account
523 | for to know when the `object` scope is closed (because we don’t use the
524 | ``{ }`` brackets like JSON).
525 | #. I see an ``s``, knowing how the name-value pairings inside of an object work,
526 | I know this is the `name` portion of some upcoming value.
527 | #. I see an ``I``, I know this is an `int32` value and that it belongs to the
528 | `name` I parsed in the previous step.
529 | #. I see another ``s``, I know this is a new name-value pair and this is the
530 | `name` portion.
531 | #. I see another ``s`` and know this is the value belonging to the `name` I just
532 | processed.
533 | #. I have just parsed 2 values, so now I know the `object` scope is closed.
534 |
535 | Encoding objects containing other `objects` would work identically except we would
536 | have encountered another ``o`` or ``O`` marker and descended a level further
537 | into a new object.
538 |
539 | Let’s look at another example, this time a simple JSON array construct
540 | (remember, they only contain values and not name-value pairs like `objects`).
541 |
542 | This array is 48-bytes in compacted JSON::
543 |
544 | [
545 | null,
546 | true,
547 | false,
548 | 4782345193,
549 | 153.132417549,
550 | "ham"
551 | ]
552 |
553 | Storing the array in the Universal Binary JSON format would look like this
554 | (whitespace added for readability)::
555 |
556 | [a][6] - 2 bytes
557 | [Z] - 1 byte
558 | [T] - 1 byte
559 | [F] - 1 byte
560 | [I][4782345193] - 5 bytes
561 | [D][153.132417549] - 9 bytes
562 | [s][3][ham] - 5 bytes
563 |
564 | Our Universal Binary JSON format is 24 bytes or **50% smaller** than the
565 | compacted JSON!
566 |
567 | Because the container types specify their total child element count, it is
568 | easier and faster for parsers to know when the scope of a container has closed
569 | or is still open waiting for more children (e.g. in the case of streaming over
570 | the network). This is not unlike the high-performance `Redis protocol`_.
571 |
572 | This also has the added benefit of not needing any terminating values in the
573 | binary that need to be scanned for to know when a container-scope is closed.
574 | This way data can be read in chunks and not read-and-scanned byte-by-byte.
575 |
576 | As was mentioned previously though, there are some cases where having an
577 | unbounded container are important (for example, streaming content from a server
578 | as it generates it on-the-fly).
579 |
580 | In the next section we will take a look at the Universal Binary JSON constructs
581 | that are optimized for streaming. Fortunately, there are only 3 and they are
582 | just as easy as the constructs we have covered so far!
583 |
584 |
585 | .. _streaming:
586 |
587 | Streaming Types
588 | ===============
589 |
590 | The Universal Binary JSON specification is optimized for fast read-speed by
591 | prefixing the byte-length of every construct to the front of it, this allows
592 | parsers to digest entire chunks of the data stream at a time without scanning
593 | for terminating byte values.
594 |
595 | Unfortunately, this model of data becomes very expensive (and sometimes
596 | impossible) to adhere to in a streaming-friendly environment where a server may
597 | be generating `UBJ` formatted data on-the-fly and streaming it back in real time
598 | to the client.
599 |
600 | If the server had to adhere to the prefixed-length requirement of this
601 | specification up until now, it would have to generate, buffer and count all the
602 | elements in its reply before writing out the Universal Binary JSON so it could
603 | correctly prefix the lengths to all the containers.
604 |
605 | In this section of the specification we look at 1 new additional type to the
606 | Universal Binary JSON specification that compliments our streaming scenario and
607 | then two minor changes to the existing **container types** to enable easy and
608 | efficient streaming with unknown-length support for our `array` and `object`
609 | containers.
610 |
611 | .. _noop:
612 |
613 | No-Op Type
614 | ----------
615 |
616 | The noop value stands for `No Op` or `No Operation`, it is a specific value
617 | (like ``Z`` for `null`, ``T`` for `true` and ``F`` for `false`) that is useful
618 | in streaming scenarios where an acknowledge of life needs to be sent between two
619 | end points, but the confirmation being sent cannot change the meaning of the
620 | data it is sent within.
621 |
622 | The most common use for such a value type is as a `keep-alive` signal from a
623 | server to the client; letting the client know the server is possibly operating
624 | on a long-running job and is still alive, but just isn’t ready to send more data
625 | yet.
626 |
627 | The `noop` type is defined as follows:
628 |
629 | +-----------------+--------------------------+--------+---------+--------------+
630 | | Type | Size | Marker | Length? | Data? |
631 | +=================+==========================+========+=========+==============+
632 | | noop | 1-byte | N | No | No |
633 | +-----------------+--------------------------+--------+---------+--------------+
634 |
635 | Any parser code written to load the Universal Binary Spec needs to be aware that
636 | encountering the ``N`` marker in files of any kind is valid and is merely useful
637 | as a signal mechanism from producer to consumer to say “Hey, I am still alive.”,
638 | the marker is intended to be safely ignored if the server or client doesn’t need
639 | the acknowledgement.
640 |
641 | In order for this keep-alive-esque construct to work, the specification had to
642 | define a single byte value that had no meaning for the server and client to
643 | exchange if needed, but caused no modification to the meaning of the data that
644 | they are exchanging.
645 |
646 | In code that handles streaming from a server, your handler for the `noop` type
647 | might just reset a disconnect timer. In code that handles UBJ files, you would
648 | simply ignore the noop marker when you encountered it in the file because it
649 | would mean nothing.
650 |
651 | .. warning::
652 |
653 | The `noop` type is only defined to be used inside of an
654 | :ref:`unknown-length container `. If you have a
655 | container that clearly defines a child element count (`length`) it should not
656 | contain a `noop` marker element.
657 |
658 | Also, the `noop` type **should never** be sent inside of a value (e.g.
659 | embedded inside of a `string` being streamed); it must only be written to the
660 | stream between declared values.
661 |
662 | If your interaction with the Universal Binary JSON format is primarily as a file
663 | format, it is likely that you may never need to use the `noop` type; its value
664 | becomes more apparent in long-lived, client-server, data-streaming scenarios.
665 |
666 | .. _unsized_container:
667 |
668 | Unknown-Length Containers
669 | -------------------------
670 |
671 | The Universal Binary JSON specification supports containers (`array` and
672 | `object`) of unknown length to be specified when the producer of the binary data
673 | cannot (efficiently) know in advance how many elements it is going to write out.
674 |
675 | In these cases, the lowercased, 1-byte-length versions of array or object must
676 | be used (``a`` or ``o`` markers) with a length value of ``0xFF`` (255) as well
677 | as specifying an ``E`` terminator character after the last element in the
678 | container.
679 |
680 | The ``E`` type used to delimit the end of unknown-length containers is defined as
681 | follows:
682 |
683 | +-----------------+--------------------------+--------+---------+--------------+
684 | | Type | Size | Marker | Length? | Data? |
685 | +=================+==========================+========+=========+==============+
686 | | end | 1-byte | E | No | No |
687 | +-----------------+--------------------------+--------+---------+--------------+
688 |
689 | .. warning::
690 |
691 | Using a length of ``0xFF`` with the uppercase, 4-byte-length versions of array
692 | (``A``) and object (``O``) is not valid according to this specification.
693 | You must use the 1-byte-length variants of the container types when specifying
694 | an unknown `length`.
695 |
696 | An example would look like this::
697 |
698 | [a][255]
699 | [S][3][bob]
700 | [I][1024]
701 | [T]
702 | [F]
703 | [S][4][ham!]
704 | [E]
705 |
706 | The three key elements being the lowercased ``a`` marker, the length of ``0xFF``
707 | (255) and the ``E`` marker at the end of the container.
708 |
709 | Another example might look like this::
710 |
711 | [o][255]
712 | [B][4]
713 | [D][21.786]
714 | [N]
715 | [Z]
716 | [h][27][131.098412283059e2371293452]
717 | [E]
718 |
719 | You might notice in the example above we injected a `noop` instruction right in
720 | the middle, before the `null`. As mentioned in the :ref:`No-Op Type `
721 | section, this is valid and can occur at any time while parsing the contents of
722 | an `unknown-length` container.
723 |
724 | If your parser has no need for recognizing the `noop` code (e.g. listening for
725 | a keep-alive) then it can just be safely ignored and parsing continued
726 | (hence the name “no-op”). It is up to the implementation to decide what to do
727 | with the `noop` type.
728 |
729 | You might be wondering how using a 1-byte ``E`` as a terminator to an unbounded
730 | container can work and not get confused with say another ``E`` inside of a
731 | `string`, the reason this works is because none of the discrete value types
732 | (numeric, string, etc.) are of unknown `length`.
733 |
734 | The lengths of all the values contained inside of the container are known and
735 | must be read completely, doing so will guarantee that the ``E`` is only ever
736 | encountered by itself as an element marker which is easily handled by parsing
737 | code to know the scope of the container has been closed.
738 |
739 |
740 | .. _size:
741 |
742 | Size Requirements
743 | =================
744 |
745 | The Universal Binary JSON specification tries to strike the perfect balance
746 | between space savings, simplicity and performance.
747 |
748 | Data stored using the Universal Binary JSON format are on average
749 | **30% smaller** as a rule of thumb. As you can see from some of the examples in
750 | this document though, it is not uncommon to see the binary representation of
751 | some data lead to a **50% or 60% reduction in size**.
752 |
753 | The size reduction of your data depends heavily on the type of data you are
754 | storing. It is best to do your own benchmarking with a comprehensive sampling
755 | of your own data.
756 |
757 | .. warning::
758 |
759 | The Universal Binary JSON specification does not use compression algorithms to
760 | achieve smaller storage sizes. The size reduction is a side effect of the
761 | efficient binary storage format.
762 |
763 | Size Reduction Tips
764 | -------------------
765 |
766 | The amount of storage size reduction you’ll experience with the Universal Binary
767 | JSON format will depend heavily on the type of data you are encoding.
768 |
769 | Some data shrinks considerably, some mildly and some not at all, but in every
770 | case your data will be stored in a much more efficient format that is faster to
771 | read and write.
772 |
773 | Below are pointers to give you an idea of how certain data may shrink in this
774 | format:
775 |
776 | * `null`, `true` and `false` values will compress 75%
777 | (80% in the case of `false`)
778 | * large `numeric` values (> 5 digits < 20 digits) will compress on average 50%.
779 | * `string` values
780 | * of length <= 254 stay the same size.
781 | * of length > 254 are 3-bytes bigger per string.
782 | * `object` and `array` values compress 1-byte-per-element.
783 |
784 | One of the great things about the Universal Binary JSON format is that even
785 | though most all your data will be represented in a smaller footprint, you still
786 | get two big wins:
787 |
788 | #. A smaller data format means faster writes and smaller reads. It also means
789 | less data to process when parsing.
790 | #. Binary format means no encoding/decoding primitive values to text and no
791 | parsing primitive values from text.
792 |
793 | Endianness
794 | ==========
795 |
796 | The Universal Binary JSON specification requires that all numeric values be
797 | written in `Big-Endian`_ order.
798 |
799 | MIME Type
800 | =========
801 |
802 | The Universal Binary JSON specification is a binary format and recommends using
803 | the following mime type::
804 |
805 | application/ubjson
806 |
807 | This was added directly to the specification in hopes of avoiding
808 | `similar confusion with JSON`_.
809 |
810 | File Extension
811 | ==============
812 |
813 | ``ubj`` is the `recommended file extension`_ when writing out files using the
814 | Universal Binary JSON format (e.g. ``user.ubj``).
815 |
816 | The extension stands for `“Universal Binary JSON”` and has no known conflicting
817 | mappings to other file formats.
818 |
819 |
820 | .. _best_practices:
821 |
822 | Best Practices
823 | ==============
824 |
825 | Through work with the community, feedback from others and our own experience
826 | with the specification, below are some of the best-practices collected into one
827 | place making it easy for folks working with the format to find answers to the
828 | more flexible portions of the spec.
829 |
830 | Handling `huge` Numbers
831 | -----------------------
832 |
833 | Not every language supports arbitrarily long numbers greater than 64-bits
834 | (represented by the `huge` data type), but many do.
835 |
836 | If you are writing a library to read/write Universal Binary JSON and the
837 | platform you are working with does not support them, we recommend throwing an
838 | exception or returning an error to the caller, letting them know unsupported
839 | data is contained in the file they are trying to parse.
840 |
841 | If the library you are writing is meant to be a general-purpose parser and needs
842 | to be more resilient than that, we recommend the following:
843 |
844 | #. Make the default behavior to throw an exception or return an error when the
845 | unsupported huge data type is encountered.
846 | #. Provide an optional behavior to the parser (that must be specifically enabled
847 | by the caller) that treats the huge value as a simple string and returns it
848 | to the caller to handle (e.g. insert in a database) if they need it.
849 | #. Provide an optional behavior to the parser (again, that must be specifically
850 | enabled by the caller) to simply skip unsupported values.
851 |
852 | This implementation should give the user the most functional experience working
853 | with your library and the Universal Binary JSON format while making it clear on
854 | their particular platform some data types could cause trouble; this is preferred
855 | to making the default operation to ignore the unsupported values.
856 |
857 |
858 |
859 | .. _Number: http://people.mozilla.org/~jorendorff/es5.html#sec-8.5
860 | .. _JSON presentation: http://json.org/json.ppt
861 | .. _ECMA: http://www.ecma-international.org/publications/files/ECMA-ST/Ecma-262.pdf
862 | .. _yet: http://wiki.ecmascript.org/doku.php?id=harmony:binary_data_discussion&s=int64
863 | .. _IEEE 754 single precision floating point format: http://en.wikipedia.org/wiki/IEEE_754-1985
864 | .. _IEEE 754 double precision floating point format: http://en.wikipedia.org/wiki/Double_precision_floating-point_format#Double_precision_binary_floating-point_format
865 | .. _JSON number specification: http://json.org
866 | .. _UTF-8: http://en.wikipedia.org/wiki/UTF-8
867 | .. _a multitude of reasons: http://en.wikipedia.org/wiki/UTF-8#Advantages
868 | .. _Redis protocol: http://redis.io/topics/protocol
869 | .. _Big-Endian: http://en.wikipedia.org/wiki/Endianness
870 | .. _similar confusion with JSON: http://stackoverflow.com/questions/477816/the-right-json-content-type
871 | .. _recommended file extension: http://www.fileinfo.com/extension/ubj
872 |
--------------------------------------------------------------------------------
/spec8/tests/CouchDB4k.compact.json:
--------------------------------------------------------------------------------
1 | {"data3":"ColreUHAtuYoUOx1N4ZloouQt2o6ugnUT6eYtS10gu7niM8i0vEiNufpk1RlMQXaHXlIwQBDsMFDFUQcFeg2vW5eD259Xm","data4":"zCxriJhL726WNNTdJJzurgSA8vKT6rHA0cFCb9koZcLUMXg4rmoXVPqIHWYaCV0ovl2t6xm7I1Hm36jXpLlXEb8fRfbwBeTW2V0OAsVqYH8eAT","data0":"9EVqHm5ARqcEB5jq2D14U2bCJPyBY0JWDr1Tjh8gTB0sWUNjqYiWDxFzlx6S","data7":"Bi1ujcgEvfADfBeyZudE7nwxc3Ik8qpYjsJIfKmwOMEbV2L3Bi0x2tcRpGuf4fiyvIbypDvJN1PPdQtfQW1Gv6zccXHwwZwKzUq6","data5":{"integers":[756509,116117,776378,275045,703447,50156,685803,147958,941747,905651,57367,530248,312888,740951,988947,450154],"float1":76.572,"float2":83.5299,"nested1":{"integers":[756509,116117,776378,275045,703447,50156,685803,147958,941747,905651,57367,530248,312888,740951,988947,450154],"floats":[43121609.5543,99454976.3019,32945584.756,18122905.9212,45893183.44,63052200.6225,69032152.6897,3748217.6946,75449850.474,37111527.415,84852536.859,32906366.487,27027600.417,63874310.5614,39440408.51,97176857.1716,97438252.1171,49728043.5056,40818570.245,41415831.8949,24796297.4251,2819085.3449,84263963.4848,74503228.6878,67925677.403,4758851.9417,75227407.9214,76946667.8403,72518275.9469,94167085.9588,75883067.8321,27389831.6101,57987075.5053,1298995.2674,14590614.6939,45292214.2242,3332166.364,53784167.729,25193846.1867,81456965.477,68532032.39,73820009.7952,57736110.5717,37304166.7363,20054244.864,29746392.7397,86467624.6,45192685.8793,44008816.5186,1861872.8736,14595859.467,87795257.6703,57768720.8303,18290154.3126,45893183.44,63052200.6225,69032152.6897,3748217.6946,75449850.474,37111527.415,84852536.859,32906366.487,27027600.417,63874310.5614,39440408.51,97176857.1716,97438252.1171,49728043.5056,40818570.245,41415831.8949,24796297.4251,2819085.3449,84263963.4848,74503228.6878,67925677.403,4758851.9417,75227407.9214,76946667.8403,72518275.9469,94167085.9588,75883067.8321,27389831.6101,57987075.5053,1298995.2674,80858801.2712,98262252.4656,51612877.944,33397812.7835,36089655.3049,50164685.8153,16852105.5192,61171929.752,86376339.7175,73009014.5521,7397302.331,34345128.9589,98343269.4418,95039116.9058,44833102.5752,51052997.8873,22719195.6783,64883244.8699]},"nested2":{"integers":[756509,116117,776378,275045,703447,50156,685803,147958,941747,905651,57367,530248,312888,740951,988947,450154],"float1":76.572,"float2":83.5299}},"strings":["edx5XzRkPVeEW2MBQzQMcUSuMI4FntjhlJ9VGhQaBHKPEazAaT","2fQUbzRUax4A","jURcBZ0vrJcmf2roZUMzZJQoTsKZDIdj7KhO7itskKvM80jBU9","8jKLmo3N2zYdKyTyfTczfr2x6bPaarorlnTNJ7r8lIkiZyBvrP","jbUeAVOdBSPzYmYhH0sabUHUH39O5e","I8yAQKZsyZhMfpzWjArQU9pQ6PfU6b14q2eWvQjtCUdgAUxFjg","97N8ZmGcxRZO4ZabzRRcY4KVHqxJwQ8qY","0DtY1aWXmUfJENt9rYW9","DtpBUEppPwMnWexi8eIIxlXRO3GUpPgeNFG9ONpWJYvk8xBkVj","YsX8V2xOrTw6LhNIMMhO4F4VXFyXUXFr66L3sTkLWgFA9NZuBV","fKYYthv8iFvaYoFoYZyB","zGuLsPXoJqMbO4PcePteZfDMYFXdWtvNF8WvaplXypsd6"],"data1":"9EVqHm5ARqcEB5jq21v2g0jVcG9CXB0Abk7uAF4NHYyTzeF3TnHhpZBECD14U2bCJPyBY0JWDr1Tjh8gTB0sWUNjqYiWDxFzlx6S","integers":[756509,116117,776378,275045,703447,50156,685803,147958,941747,905651,57367,530248,312888,740951,988947,450154],"more_nested":{"integers":[756509,116117,776378,275045,703447,50156,685803,147958,941747,905651,57367,530248,312888,740951,988947,450154],"float1":76.572,"float2":83.5299,"nested1":{"integers":[756509,116117,776378,275045,703447,50156,685803,147958,941747,905651,57367,530248,312888,740951,988947,450154]},"nested2":{"strings":["2fQUbzRUax4A","jURcBZ0vrJcmf2roZUMzZJQoTsKZDIdj7KhO7itskKvM80jBU9","8jKLmo3N2zYdKyTyfTczfr2x6bPaarorlnTNJ7r8lIkiZyBvrP","jbUeAVOdBSPzYmYhH0sabUHUH39O5e","I8yAQKZsyZhMfpzWjArQU9pQ6PfU6b14q2eWvQjtCUdgAUxFjg","97N8ZmGcxRZO4ZabzRRcY4KVHqxJwQ8qY","0DtY1aWXmUfJENt9rYW9","DtpBUEppPwMnWexi8eIIxlXRO3GUpPgeNFG9ONpWJYvk8xBkVj","YsX8V2xOrTw6LhNIMMhO4F4VXFyXUXFr66L3sTkLWgFA9NZuBV","fKYYthv8iFvaYoFoYZyB","zGuLsPXoJqMbO4PcePteZfDMYFXdWtvNF8WvaplXypsd6"],"integers":[756509,116117,776378,57367,530248,312888,740951,988947,450154]}}}
--------------------------------------------------------------------------------
/spec8/tests/CouchDB4k.formatted.json:
--------------------------------------------------------------------------------
1 | {
2 | "data3":"ColreUHAtuYoUOx1N4ZloouQt2o6ugnUT6eYtS10gu7niM8i0vEiNufpk1RlMQXaHXlIwQBDsMFDFUQcFeg2vW5eD259Xm",
3 | "data4":"zCxriJhL726WNNTdJJzurgSA8vKT6rHA0cFCb9koZcLUMXg4rmoXVPqIHWYaCV0ovl2t6xm7I1Hm36jXpLlXEb8fRfbwBeTW2V0OAsVqYH8eAT",
4 | "data0":"9EVqHm5ARqcEB5jq2D14U2bCJPyBY0JWDr1Tjh8gTB0sWUNjqYiWDxFzlx6S",
5 | "data7":"Bi1ujcgEvfADfBeyZudE7nwxc3Ik8qpYjsJIfKmwOMEbV2L3Bi0x2tcRpGuf4fiyvIbypDvJN1PPdQtfQW1Gv6zccXHwwZwKzUq6",
6 | "data5":{
7 | "integers":[
8 | 756509,
9 | 116117,
10 | 776378,
11 | 275045,
12 | 703447,
13 | 50156,
14 | 685803,
15 | 147958,
16 | 941747,
17 | 905651,
18 | 57367,
19 | 530248,
20 | 312888,
21 | 740951,
22 | 988947,
23 | 450154
24 | ],
25 | "float1":76.572,
26 | "float2":83.5299,
27 | "nested1":{
28 | "integers":[
29 | 756509,
30 | 116117,
31 | 776378,
32 | 275045,
33 | 703447,
34 | 50156,
35 | 685803,
36 | 147958,
37 | 941747,
38 | 905651,
39 | 57367,
40 | 530248,
41 | 312888,
42 | 740951,
43 | 988947,
44 | 450154
45 | ],
46 | "floats":[
47 | 43121609.5543,
48 | 99454976.3019,
49 | 32945584.756,
50 | 18122905.9212,
51 | 45893183.44,
52 | 63052200.6225,
53 | 69032152.6897,
54 | 3748217.6946,
55 | 75449850.474,
56 | 37111527.415,
57 | 84852536.859,
58 | 32906366.487,
59 | 27027600.417,
60 | 63874310.5614,
61 | 39440408.51,
62 | 97176857.1716,
63 | 97438252.1171,
64 | 49728043.5056,
65 | 40818570.245,
66 | 41415831.8949,
67 | 24796297.4251,
68 | 2819085.3449,
69 | 84263963.4848,
70 | 74503228.6878,
71 | 67925677.403,
72 | 4758851.9417,
73 | 75227407.9214,
74 | 76946667.8403,
75 | 72518275.9469,
76 | 94167085.9588,
77 | 75883067.8321,
78 | 27389831.6101,
79 | 57987075.5053,
80 | 1298995.2674,
81 | 14590614.6939,
82 | 45292214.2242,
83 | 3332166.364,
84 | 53784167.729,
85 | 25193846.1867,
86 | 81456965.477,
87 | 68532032.39,
88 | 73820009.7952,
89 | 57736110.5717,
90 | 37304166.7363,
91 | 20054244.864,
92 | 29746392.7397,
93 | 86467624.6,
94 | 45192685.8793,
95 | 44008816.5186,
96 | 1861872.8736,
97 | 14595859.467,
98 | 87795257.6703,
99 | 57768720.8303,
100 | 18290154.3126,
101 | 45893183.44,
102 | 63052200.6225,
103 | 69032152.6897,
104 | 3748217.6946,
105 | 75449850.474,
106 | 37111527.415,
107 | 84852536.859,
108 | 32906366.487,
109 | 27027600.417,
110 | 63874310.5614,
111 | 39440408.51,
112 | 97176857.1716,
113 | 97438252.1171,
114 | 49728043.5056,
115 | 40818570.245,
116 | 41415831.8949,
117 | 24796297.4251,
118 | 2819085.3449,
119 | 84263963.4848,
120 | 74503228.6878,
121 | 67925677.403,
122 | 4758851.9417,
123 | 75227407.9214,
124 | 76946667.8403,
125 | 72518275.9469,
126 | 94167085.9588,
127 | 75883067.8321,
128 | 27389831.6101,
129 | 57987075.5053,
130 | 1298995.2674,
131 | 80858801.2712,
132 | 98262252.4656,
133 | 51612877.944,
134 | 33397812.7835,
135 | 36089655.3049,
136 | 50164685.8153,
137 | 16852105.5192,
138 | 61171929.752,
139 | 86376339.7175,
140 | 73009014.5521,
141 | 7397302.331,
142 | 34345128.9589,
143 | 98343269.4418,
144 | 95039116.9058,
145 | 44833102.5752,
146 | 51052997.8873,
147 | 22719195.6783,
148 | 64883244.8699
149 | ]
150 | },
151 | "nested2":{
152 | "integers":[
153 | 756509,
154 | 116117,
155 | 776378,
156 | 275045,
157 | 703447,
158 | 50156,
159 | 685803,
160 | 147958,
161 | 941747,
162 | 905651,
163 | 57367,
164 | 530248,
165 | 312888,
166 | 740951,
167 | 988947,
168 | 450154
169 | ],
170 | "float1":76.572,
171 | "float2":83.5299
172 | }
173 | },
174 | "strings":[
175 | "edx5XzRkPVeEW2MBQzQMcUSuMI4FntjhlJ9VGhQaBHKPEazAaT",
176 | "2fQUbzRUax4A",
177 | "jURcBZ0vrJcmf2roZUMzZJQoTsKZDIdj7KhO7itskKvM80jBU9",
178 | "8jKLmo3N2zYdKyTyfTczfr2x6bPaarorlnTNJ7r8lIkiZyBvrP",
179 | "jbUeAVOdBSPzYmYhH0sabUHUH39O5e",
180 | "I8yAQKZsyZhMfpzWjArQU9pQ6PfU6b14q2eWvQjtCUdgAUxFjg",
181 | "97N8ZmGcxRZO4ZabzRRcY4KVHqxJwQ8qY",
182 | "0DtY1aWXmUfJENt9rYW9",
183 | "DtpBUEppPwMnWexi8eIIxlXRO3GUpPgeNFG9ONpWJYvk8xBkVj",
184 | "YsX8V2xOrTw6LhNIMMhO4F4VXFyXUXFr66L3sTkLWgFA9NZuBV",
185 | "fKYYthv8iFvaYoFoYZyB",
186 | "zGuLsPXoJqMbO4PcePteZfDMYFXdWtvNF8WvaplXypsd6"
187 | ],
188 | "data1":"9EVqHm5ARqcEB5jq21v2g0jVcG9CXB0Abk7uAF4NHYyTzeF3TnHhpZBECD14U2bCJPyBY0JWDr1Tjh8gTB0sWUNjqYiWDxFzlx6S",
189 | "integers":[
190 | 756509,
191 | 116117,
192 | 776378,
193 | 275045,
194 | 703447,
195 | 50156,
196 | 685803,
197 | 147958,
198 | 941747,
199 | 905651,
200 | 57367,
201 | 530248,
202 | 312888,
203 | 740951,
204 | 988947,
205 | 450154
206 | ],
207 | "more_nested":{
208 | "integers":[
209 | 756509,
210 | 116117,
211 | 776378,
212 | 275045,
213 | 703447,
214 | 50156,
215 | 685803,
216 | 147958,
217 | 941747,
218 | 905651,
219 | 57367,
220 | 530248,
221 | 312888,
222 | 740951,
223 | 988947,
224 | 450154
225 | ],
226 | "float1":76.572,
227 | "float2":83.5299,
228 | "nested1":{
229 | "integers":[
230 | 756509,
231 | 116117,
232 | 776378,
233 | 275045,
234 | 703447,
235 | 50156,
236 | 685803,
237 | 147958,
238 | 941747,
239 | 905651,
240 | 57367,
241 | 530248,
242 | 312888,
243 | 740951,
244 | 988947,
245 | 450154
246 | ]
247 | },
248 | "nested2":{
249 | "strings":[
250 | "2fQUbzRUax4A",
251 | "jURcBZ0vrJcmf2roZUMzZJQoTsKZDIdj7KhO7itskKvM80jBU9",
252 | "8jKLmo3N2zYdKyTyfTczfr2x6bPaarorlnTNJ7r8lIkiZyBvrP",
253 | "jbUeAVOdBSPzYmYhH0sabUHUH39O5e",
254 | "I8yAQKZsyZhMfpzWjArQU9pQ6PfU6b14q2eWvQjtCUdgAUxFjg",
255 | "97N8ZmGcxRZO4ZabzRRcY4KVHqxJwQ8qY",
256 | "0DtY1aWXmUfJENt9rYW9",
257 | "DtpBUEppPwMnWexi8eIIxlXRO3GUpPgeNFG9ONpWJYvk8xBkVj",
258 | "YsX8V2xOrTw6LhNIMMhO4F4VXFyXUXFr66L3sTkLWgFA9NZuBV",
259 | "fKYYthv8iFvaYoFoYZyB",
260 | "zGuLsPXoJqMbO4PcePteZfDMYFXdWtvNF8WvaplXypsd6"
261 | ],
262 | "integers":[
263 | 756509,
264 | 116117,
265 | 776378,
266 | 57367,
267 | 530248,
268 | 312888,
269 | 740951,
270 | 988947,
271 | 450154
272 | ]
273 | }
274 | }
275 | }
--------------------------------------------------------------------------------
/spec8/tests/MediaContent.compact.json:
--------------------------------------------------------------------------------
1 | {"Media":{"uri":"http://javaone.com/keynote.mpg","title":"Javaone Keynote","width":640,"height":480,"format":"video/mpg4","duration":18000000,"size":58982400,"bitrate":262144,"persons":["Bill Gates","Steve Jobs"],"player":"JAVA","copyright":null},"Images":[{"uri":"http://javaone.com/keynote_large.jpg","title":"Javaone Keynote","width":1024,"height":768,"size":"LARGE"},{"uri":"http://javaone.com/keynote_small.jpg","title":"Javaone Keynote","width":320,"height":240,"size":"SMALL"}]}
--------------------------------------------------------------------------------
/spec8/tests/MediaContent.formatted.json:
--------------------------------------------------------------------------------
1 | {
2 | "Media":{
3 | "uri":"http://javaone.com/keynote.mpg",
4 | "title":"Javaone Keynote",
5 | "width":640,
6 | "height":480,
7 | "format":"video/mpg4",
8 | "duration":18000000,
9 | "size":58982400,
10 | "bitrate":262144,
11 | "persons":[
12 | "Bill Gates",
13 | "Steve Jobs"
14 | ],
15 | "player":"JAVA",
16 | "copyright":null
17 | },
18 | "Images":[
19 | {
20 | "uri":"http://javaone.com/keynote_large.jpg",
21 | "title":"Javaone Keynote",
22 | "width":1024,
23 | "height":768,
24 | "size":"LARGE"
25 | },
26 | {
27 | "uri":"http://javaone.com/keynote_small.jpg",
28 | "title":"Javaone Keynote",
29 | "width":320,
30 | "height":240,
31 | "size":"SMALL"
32 | }
33 | ]
34 | }
--------------------------------------------------------------------------------
/spec8/tests/TwitterTimeline.compact.json:
--------------------------------------------------------------------------------
1 | {"id_str":"121769183821312000","retweet_count":0,"in_reply_to_screen_name":null,"in_reply_to_user_id":null,"truncated":false,"retweeted":false,"possibly_sensitive":false,"in_reply_to_status_id_str":null,"entities":{"urls":[{"url":"http:\/\/t.co\/wtioKkFS","display_url":"dlvr.it\/pWQy2","indices":[33,53],"expanded_url":"http:\/\/dlvr.it\/pWQy2"}],"hashtags":[],"user_mentions":[]},"geo":null,"place":null,"coordinates":null,"created_at":"Thu Oct 06 02:10:10 +0000 2011","in_reply_to_user_id_str":null,"user":{"id_str":"77029015","profile_link_color":"009999","protected":false,"url":"http:\/\/www.techday.co.nz\/","screen_name":"techdaynz","statuses_count":5144,"profile_image_url":"http:\/\/a0.twimg.com\/profile_images\/1479058408\/techday_48_normal.jpg","name":"TechDay","default_profile_image":false,"default_profile":false,"profile_background_color":"131516","lang":"en","profile_background_tile":false,"utc_offset":43200,"description":"","is_translator":false,"show_all_inline_media":false,"contributors_enabled":false,"profile_background_image_url_https":"https:\/\/si0.twimg.com\/profile_background_images\/75893948\/Techday_Background.jpg","created_at":"Thu Sep 24 20:02:01 +0000 2009","profile_sidebar_fill_color":"efefef","follow_request_sent":false,"friends_count":3215,"followers_count":3149,"time_zone":"Auckland","favourites_count":0,"profile_sidebar_border_color":"eeeeee","profile_image_url_https":"https:\/\/si0.twimg.com\/profile_images\/1479058408\/techday_48_normal.jpg","following":false,"geo_enabled":false,"notifications":false,"profile_use_background_image":true,"listed_count":151,"verified":false,"profile_text_color":"333333","location":"Ponsonby, Auckland, NZ","id":77029015,"profile_background_image_url":"http:\/\/a0.twimg.com\/profile_background_images\/75893948\/Techday_Background.jpg"},"contributors":null,"source":"\u003Ca href=\"http:\/\/dlvr.it\" rel=\"nofollow\"\u003Edlvr.it\u003C\/a\u003E","in_reply_to_status_id":null,"favorited":false,"id":121769183821312000,"text":"Apple CEO's message to employees http:\/\/t.co\/wtioKkFS"}
--------------------------------------------------------------------------------
/spec8/tests/TwitterTimeline.formatted.json:
--------------------------------------------------------------------------------
1 | {
2 | "id_str":"121769183821312000",
3 | "retweet_count":0,
4 | "in_reply_to_screen_name":null,
5 | "in_reply_to_user_id":null,
6 | "truncated":false,
7 | "retweeted":false,
8 | "possibly_sensitive":false,
9 | "in_reply_to_status_id_str":null,
10 | "entities":{
11 | "urls":[
12 | {
13 | "url":"http:\/\/t.co\/wtioKkFS",
14 | "display_url":"dlvr.it\/pWQy2",
15 | "indices":[
16 | 33,
17 | 53
18 | ],
19 | "expanded_url":"http:\/\/dlvr.it\/pWQy2"
20 | }
21 | ],
22 | "hashtags":[
23 |
24 | ],
25 | "user_mentions":[
26 |
27 | ]
28 | },
29 | "geo":null,
30 | "place":null,
31 | "coordinates":null,
32 | "created_at":"Thu Oct 06 02:10:10 +0000 2011",
33 | "in_reply_to_user_id_str":null,
34 | "user":{
35 | "id_str":"77029015",
36 | "profile_link_color":"009999",
37 | "protected":false,
38 | "url":"http:\/\/www.techday.co.nz\/",
39 | "screen_name":"techdaynz",
40 | "statuses_count":5144,
41 | "profile_image_url":"http:\/\/a0.twimg.com\/profile_images\/1479058408\/techday_48_normal.jpg",
42 | "name":"TechDay",
43 | "default_profile_image":false,
44 | "default_profile":false,
45 | "profile_background_color":"131516",
46 | "lang":"en",
47 | "profile_background_tile":false,
48 | "utc_offset":43200,
49 | "description":"",
50 | "is_translator":false,
51 | "show_all_inline_media":false,
52 | "contributors_enabled":false,
53 | "profile_background_image_url_https":"https:\/\/si0.twimg.com\/profile_background_images\/75893948\/Techday_Background.jpg",
54 | "created_at":"Thu Sep 24 20:02:01 +0000 2009",
55 | "profile_sidebar_fill_color":"efefef",
56 | "follow_request_sent":false,
57 | "friends_count":3215,
58 | "followers_count":3149,
59 | "time_zone":"Auckland",
60 | "favourites_count":0,
61 | "profile_sidebar_border_color":"eeeeee",
62 | "profile_image_url_https":"https:\/\/si0.twimg.com\/profile_images\/1479058408\/techday_48_normal.jpg",
63 | "following":false,
64 | "geo_enabled":false,
65 | "notifications":false,
66 | "profile_use_background_image":true,
67 | "listed_count":151,
68 | "verified":false,
69 | "profile_text_color":"333333",
70 | "location":"Ponsonby, Auckland, NZ",
71 | "id":77029015,
72 | "profile_background_image_url":"http:\/\/a0.twimg.com\/profile_background_images\/75893948\/Techday_Background.jpg"
73 | },
74 | "contributors":null,
75 | "source":"\u003Ca href=\"http:\/\/dlvr.it\" rel=\"nofollow\"\u003Edlvr.it\u003C\/a\u003E",
76 | "in_reply_to_status_id":null,
77 | "favorited":false,
78 | "id":121769183821312000,
79 | "text":"Apple CEO's message to employees http:\/\/t.co\/wtioKkFS"
80 | }
--------------------------------------------------------------------------------
/spec8/thanks.rst:
--------------------------------------------------------------------------------
1 |
2 | Thanks
3 | ======
4 |
5 | Below is a list of people that have submitted specific contributions,
6 | corrections and implementations to help make the Universal Binary JSON
7 | specification better.
8 |
9 | Thank you all!
10 |
11 | * `Alex Blewitt `_
12 |
13 | Helped catch a number of specification errors around UTF-8 encoding in the
14 | original draft of the specification that would have been confusing/nasty to
15 | release. He also provided great feedback about the size and performance
16 | metrics for the specification.
17 |
18 | * `Alexander Shorin `_
19 |
20 | Alex is both the author of the Python library and a valued collaborator on the
21 | Universal Binary JSON spec as it matured. Alex provided instrumental insight
22 | into the modifications made between Draft 8 and Draft 9 of the spec to help
23 | simplify the spec by removing all the duplicate (compact) type
24 | representations, simplifying the length-arguments for `STRING` and `HUGE` as
25 | well as being the one to point out that the length arguments for the `ARRAY`
26 | and `OBJECT` container types are effectively useless once the streaming-format
27 | support was added (and do not make generator code or parsing code any easier
28 | or more performant).
29 |
30 | * `John Cowan `_
31 |
32 | John was the one that recommended using UTF-8 string-encoded values
33 | (or `huge`) for arbitrarily huge numbers after seeing my desire to avoid
34 | including any non-portable constructs into the binary format.
35 |
36 | Given that the discussion on numeric formats had been a very active one with
37 | lots of feelings on all sides, it was a boon to have John step up with such a
38 | simple suggestion that allowed for maximum compatibility and portability.
39 | It was a win-win all the way around.
40 |
41 | * `Michael Makarenko `_ (aka “M1xA”)
42 |
43 | Michael is the author behind the `Ubjson.NET `_
44 | library and contributor of the `int16` and `float` numeric types to the
45 | specification. For numeric-heavy (e.g. scientific) data, the inclusions of the
46 | `int16` and `float` types can lead to significant space savings when writing
47 | out values in the Universal Binary JSON format.
48 |
49 | Michael has also gone to great lengths to make the .NET implementation of
50 | UBJSON as tight and performant as possible; collaborating on benchmark design
51 | and testing data as well as compatibility testing between implementations to
52 | ensure a great Universal Binary JSON experience for .NET developers.
53 |
54 | In addition to development, Michael has helped contribute to the growth of the
55 | Universal Binary JSON community with
56 | `articles about the specification `_.
57 |
58 | * `Paul Davis `_
59 |
60 | While approaching the CouchDB team for feedback on the Universal Binary JSON
61 | spec, I met Paul who was willing to spend a significant amount of time
62 | reviewing the specification and recommending suggestions, changes and
63 | improvements from everything the CouchDB team has learned by dealing closely
64 | with JSON for years.
65 |
66 | Paul was the brains behind the compacted type presentation
67 | (``s``, ``h``, ``a`` and ``o``) using a single byte instead of 3 bytes to
68 | represent the length of an entity which was something the spec had originally
69 | avoided due to complexity, but as Paul clarified at-scale (e.g. a huge CouchDB
70 | data store) those few bytes in some data sets that are working with very small
71 | values (like string keywords) can really add up.
72 |
73 | Paul also pointed out the shortcomings of prefixing the length to the two
74 | container types if the specification could ever be used easily with services
75 | or apps that streamed UBJ format for huge runs of data that the server
76 | couldn't load, buffer and count ahead of time before responding to the client.
77 | In order to more easily support streaming, unknown-length container types had
78 | to be added.
79 |
80 | Paul also pointed out the importance of a ``NO_OP``/``SKIP``/``IGNORE`` type
81 | that can be useful during a long-lived streaming operation where the server
82 | may be waiting on something (like a DB) and you need to keep the connection
83 | alive between client/server and avoid the client timing out, but you need the
84 | client to know the data it is receiving is just meant as a “Hang on” message
85 | from the server and not actual data. This is where the ``NO_OP`` command comes
86 | in handy.
87 |
88 | * `Stephan Beal `_
89 |
90 | Stephan helped quite a bit with understanding the implications of a >= 64-bit
91 | numeric format and the implications of portability across a number of popular
92 | platforms.
93 |
94 | * `JSON Specification Group `_
95 |
96 | I would like to personally thank everyone in the JSON Specification Group.
97 | The amount of feedback and help with the specification has been wonderful,
98 | constructive and creative. It also lead to one of the busiest conversations
99 | in the last year!
100 |
--------------------------------------------------------------------------------
/spec8/type_reference.rst:
--------------------------------------------------------------------------------
1 |
2 | Type reference
3 | ++++++++++++++
4 |
5 | The table below is a quick-reference for folks working closely with the
6 | Universal Binary JSON format that want all the information at their finger tips:
7 |
8 | +-----------------+--------------------------+--------+---------+--------------+
9 | | Type | Size | Marker | Length? | Data? |
10 | +=================+==========================+========+=========+==============+
11 | | null | 1-byte | Z | No | No |
12 | +-----------------+--------------------------+--------+---------+--------------+
13 | | true | 1-byte | T | No | No |
14 | +-----------------+--------------------------+--------+---------+--------------+
15 | | false | 1-byte | F | No | No |
16 | +-----------------+--------------------------+--------+---------+--------------+
17 | | byte | 2-bytes | B | No | Yes |
18 | +-----------------+--------------------------+--------+---------+--------------+
19 | | int16 | 3-bytes | i | No | Yes |
20 | +-----------------+--------------------------+--------+---------+--------------+
21 | | int32 | 5-bytes | I | No | Yes |
22 | +-----------------+--------------------------+--------+---------+--------------+
23 | | int64 | 9-bytes | L | No | Yes |
24 | +-----------------+--------------------------+--------+---------+--------------+
25 | | float (32-bit) | 5-bytes | d | No | Yes |
26 | +-----------------+--------------------------+--------+---------+--------------+
27 | | double (64-bit) | 9-bytes | D | No | Yes |
28 | +-----------------+--------------------------+--------+---------+--------------+
29 | | huge (number) | 2-bytes | h | Yes | Yes |
30 | | | + byte length of string | | | if non-empty |
31 | +-----------------+--------------------------+--------+---------+--------------+
32 | | huge (number) | 5-bytes | H | Yes | Yes, |
33 | | | + byte length of string | | | if non-empty |
34 | +-----------------+--------------------------+--------+---------+--------------+
35 | | string | 2-bytes | s | Yes | Yes, |
36 | | | + byte length of string | | | if non-empty |
37 | +-----------------+--------------------------+--------+---------+--------------+
38 | | string | 5-bytes | S | Yes | Yes, |
39 | | | + byte length of string | | | if non-empty |
40 | +-----------------+--------------------------+--------+---------+--------------+
41 | | array | 2-bytes | a | Yes | Yes, |
42 | | | + byte length of string | | | if non-empty |
43 | +-----------------+--------------------------+--------+---------+--------------+
44 | | array | 5-bytes | A | Yes | Yes, |
45 | | | + byte length of string | | | if non-empty |
46 | +-----------------+--------------------------+--------+---------+--------------+
47 | | object | 2-bytes | o | Yes | Yes |
48 | | | + byte length of string | | | if non-empty |
49 | +-----------------+--------------------------+--------+---------+--------------+
50 | | object | 5-bytes | O | Yes | Yes, |
51 | | | + byte length of string | | | if non-empty |
52 | +-----------------+--------------------------+--------+---------+--------------+
53 | | noop | 1-byte | N | No | No |
54 | +-----------------+--------------------------+--------+---------+--------------+
55 | | end | 1-byte | E | No | No |
56 | +-----------------+--------------------------+--------+---------+--------------+
57 |
58 | Numeric Types
59 | =============
60 |
61 | All numeric types are signed.
62 |
63 | floats (32-bit)
64 | ---------------
65 |
66 | All 32-bit float values are written into the binary format using the
67 | `IEEE 754 single precision floating point format`_, which is the following
68 | structure:
69 |
70 | * Bit 31 (1 bit) – sign
71 | * Bit 30-23 (8 bits) – exponent
72 | * Bit 22-0 (23 bits) – fraction (significand)
73 |
74 | doubles (64-bit)
75 | ----------------
76 |
77 | All 64-bit double values are written into the binary format using the
78 | `IEEE 754 double precision floating point format`_, which is the following
79 | structure:
80 |
81 | * Bit 63 (1 bit) – sign
82 | * Bit 62-52 (11 bits) – exponent
83 | * Bit 51-0 (52 bits) – fraction (significand)
84 |
85 | String Encoding
86 | ===============
87 |
88 | All `string` values (which includes `huge` values since they are string-encoded)
89 | must be `UTF-8`_ encoded.
90 |
91 | This provides a `number of advantages`_ and inter-compatibility across systems and
92 | alternative data formats.
93 |
94 | Arrays & Objects
95 | ================
96 |
97 | The `length` argument specified is the `number of child elements` the parent
98 | container contains. A `child element` is defined as:
99 |
100 | * in an `object`, a single name-value pair.
101 | * in an `array`, a single value.
102 |
103 | For example:
104 |
105 | * if an array contains 4 integers, the `length` of that array is 4.
106 | * if an object contains 4 name-value pairs, the `length` of that object is 4.
107 | * if an array contains 13 `User objects`, the `length` of the array is 13.
108 | * if an object contains 7 arrays, the `length` of the object is 7.
109 |
110 | .. note::
111 |
112 | Universal Binary JSON is a :ref:`streaming-friendly ` specification
113 | and supports the use of :ref:`unknown-length container `
114 | types if you need them!
115 |
116 | Support for ‘huge’ Numeric Type
117 | ===============================
118 |
119 | The huge data type is an ultra-portable mechanism by which arbitrarily long
120 | numbers ``> 64-bit`` in size (integer or decimal) can be passed between systems
121 | that support them and degraded gracefully in systems that do not support them.
122 |
123 | .. note::
124 |
125 | `huge` values are **only** meant to store values ``> 64-bit`` in size.
126 | It is in violation of the Universal Binary JSON specification to store a value
127 | ``<= 64-bits`` as a huge.
128 |
129 | This design was chosen intentionally as it greatly simplifies (and optimizes)
130 | the generation and parsing code for the UBJ format as no introspection of the
131 | `huge` value is necessary for a platform to try and marshal them into a
132 | smaller format.
133 |
134 | This way the parsing code becomes simple, either creating an arbitrarily large
135 | number out of the value (e.g. `BigDecimal`_ in Java), returns an error to the
136 | caller because of an unsupported type or optionally skips the unsupported data
137 | and continues parsing.
138 |
139 | `huge` values must be written out in accordance with the original
140 | `JSON number specification`_.
141 |
142 | Many programming languages have native support for arbitrarily large numbers,
143 | but many do not. If you are working in an environment that does not support
144 | numbers > 64-bit numbers, please see our recommendation on handling them in the
145 | :ref:`Best Practices ` section.
146 |
147 | Optimized Storage Size
148 | ======================
149 |
150 | All variable-length value types (`string`, `huge`, `array`, `object`) have a
151 | more compact representation using 1-byte (instead of 4-bytes) for the `length`
152 | argument when the `length` value is <= 254.
153 |
154 | These more compact types always use the lowercased version of the `marker`
155 | ASCII char. For example, ``a`` for `array`, ``s`` for `string` and so on.
156 |
157 | .. warning::
158 |
159 | When using the compact representations of these different types, remember that
160 | the `length` must be ``<= 254`` because the `length` of 255 (``0xFF``) has a
161 | special meaning when it comes to `array` and `object` types.
162 |
163 | noop and Streaming Support
164 | ==========================
165 |
166 | The :ref:`noop type ` is a general purpose type that has no meaning, but
167 | is mostly commonly used in streaming scenarios where a server must send a client
168 | a `keep alive` message.
169 |
170 | To support this use-case, the specification needed to support a special type
171 | that meant nothing, so a server and client could make use of it without
172 | polluting the actual data that was being exchanged.
173 |
174 | .. warning::
175 |
176 | The `noop` type can be used for other purposes or signals as well, but it is
177 | defined to have no value and no effect on the data it may be included in.
178 |
179 | The `noop` type is meant to be sent between discrete values in a streaming
180 | scenario and can never be sent inside of the byte-data that makes up a single
181 | value.
182 |
183 | For example, if a server is writing a string “Hello World” back to the client,
184 | the server must write the entire ``[s][11][Hello World]`` sequence back to the
185 | client unbroken; a `noop` marker cannot be sent inside of that value.
186 |
187 | `noop` markers must only be written between values being transmitted (e.g.
188 | between values in an `array` or between the name and value pair inside of an
189 | `object`).
190 |
191 | Examples
192 | ========
193 |
194 | Please see the :ref:`value_types` and :ref:`container_types` sections of the
195 | specification for examples.
196 |
197 | .. _IEEE 754 single precision floating point format: http://en.wikipedia.org/wiki/IEEE_754-1985
198 | .. _IEEE 754 double precision floating point format: http://en.wikipedia.org/wiki/Double_precision_floating-point_format#Double_precision_binary_floating-point_format
199 | .. _UTF-8: http://en.wikipedia.org/wiki/UTF-8
200 | .. _number of advantages: http://en.wikipedia.org/wiki/UTF-8#Advantages
201 | .. _BigDecimal: http://download.oracle.com/javase/6/docs/api/java/math/BigDecimal.html
202 | .. _JSON number specification: http://json.org
203 |
--------------------------------------------------------------------------------