├── README.md ├── newspec_notes ├── spec12 ├── binary-data.html ├── contact.html ├── container-types.html ├── developer-resources.html ├── index.html ├── libraries.html ├── thanks.html ├── type-reference.html └── value-types.html └── spec8 ├── Makefile ├── _static └── .keep ├── conf.py ├── index.html ├── index.rst ├── libraries.rst ├── make.bat ├── spec.rst ├── tests ├── CouchDB4k.compact.json ├── CouchDB4k.formatted.json ├── MediaContent.compact.json ├── MediaContent.formatted.json ├── TwitterTimeline.compact.json └── TwitterTimeline.formatted.json ├── thanks.rst └── type_reference.rst /README.md: -------------------------------------------------------------------------------- 1 | Universal Binary JSON 2 | ===================== 3 | 4 | Community workspace for the [Universal Binary JSON Specification][ubjson]. 5 | 6 | Introduction 7 | ============ 8 | 9 | [JSON][json] has become a ubiquitous text-based file format for 10 | data interchange. Its simplicity, ease of processing and (relatively) rich data 11 | typing made it a natural choice for many developers needing to store or shuffle 12 | data between systems quickly and easy. 13 | 14 | Unfortunately, marshaling native programming language constructs in and out of 15 | a text-based representations does have a measurable processing cost associated 16 | with it. 17 | 18 | In high-performance applications, avoiding the text-processing step of JSON can 19 | net big wins in both processing time and size reduction of stored information, 20 | which is where a binary JSON format becomes helpful. 21 | 22 | Why 23 | === 24 | 25 | Attempts to make using JSON faster through binary specifications like 26 | [BSON][bson], [BJSON][bjson] or [Smile][smile] exist, but have been rejected 27 | from mass-adoption for two reasons: 28 | 29 | * Custom (Binary-Only) Data Types: 30 | Inclusion of custom data types that have no ancillary in the original JSON 31 | spec, leaving room for incompatibilities to exist as different implementations 32 | of the spec handle the binary-only data types differently. 33 | * Complexity: Some specifications provide higher performance or smaller 34 | representations at the cost of a much more complex specification, making 35 | implementations more difficult which can slow or block adoption. One of the key 36 | reasons JSON became as popular as it did was because of its ease of use. 37 | 38 | Goals 39 | ===== 40 | 41 | The Universal Binary JSON specification has 3 goals: 42 | 43 | 1. **Universal Compatibility** 44 | 45 | Meaning absolute compatibility with the JSON spec itself as well as only 46 | utilizing data types that are natively supported in all popular programming 47 | languages. 48 | 49 | This allows 1:1 transforms between standard JSON and Universal Binary JSON as 50 | well as efficient representation in all popular programming languages without 51 | requiring parser developers to account for strange data types that their 52 | language may not support. 53 | 54 | 2. **Ease of Use** 55 | 56 | The Universal Binary JSON specification is intentionally defined using a 57 | single core data structure to build up the entire specification. 58 | 59 | This accomplishes two things: it allows the spec to be understood quickly and 60 | allows developers to write trivially simple code to take advantage of it or 61 | interchange data with another system utilizing it. 62 | 63 | 3. **Speed / Efficiency** 64 | 65 | Typically the motivation for using a binary specification over a text-based 66 | one is speed and/or efficiency, so strict attention was paid to selecting data 67 | constructs and representations that are (roughly) 30% smaller than their 68 | compacted JSON counterparts and optimized for fast parsing. 69 | 70 | Got interested? Find more at [http://ubjson.org][ubjson] 71 | 72 | [ubjson]: http://ubjson.org 73 | [json]: http://json.org 74 | [bson]: http://bsonspec.org 75 | [bjson]: http://bjson.org 76 | [smile]: http://wiki.fasterxml.com/SmileFormat 77 | -------------------------------------------------------------------------------- /newspec_notes: -------------------------------------------------------------------------------- 1 | 2 | 3 | "alpha" spec: type+count,object+array, 4 | 5 | T,F 6 | Z,N 7 | C (0-127 byte) 8 | H (string) 9 | S (string) 10 | i,U 11 | I 12 | l 13 | L 14 | d 15 | D 16 | [ 17 | { 18 | 19 | 20 | https://realtimelogic.com/ba/doc/en/C/reference/html/ubjson_8h_source.html "alpha"...possibly TC swap? 21 | https://github.com/Steve132/ubj "alpha" +array draft 22 | https://bitbucket.org/tsieprawski/ubjsc "alpha"..TC swap? 23 | https://github.com/WhiZTiM/UbjsonCpp "alpha" 24 | https://github.com/dinocore1/ubjson "alpha" 25 | http://iso2mesh.sourceforge.net/cgi-bin/index.cgi?jsonlab "alpha" Incorrectly seperates #$ as independent 26 | http://libgdx.badlogicgames.com/nightlies/docs/api/com/badlogic/gdx/utils/UBJsonReader.html "alpha" +support for "a" which is wrong 27 | https://sourceforge.net/p/protoc/wiki/Home/ "pre-alpha" "Draft-9" 28 | https://github.com/Iotic-Labs/py-ubjson "pre-alpha" "Draft-9" 29 | https://github.com/dizews/php-ubjson "pre-alpha" "Draft-9" 30 | https://code.google.com/archive/p/simpleubjson/source/default/source "pre-alpha" "Draft-8/9" 31 | https://github.com/artcompiler/L16 "pre-alpha" "Draft-8" 32 | https://github.com/adilbaig/ubjsond "pre-alpha" "Draft-8" 33 | https://github.com/ubjson/universal-binary-json-java "pre-alpha" "Draft-8" 34 | http://ubjsonnet.codeplex.com/ "pre-alpha" "Draft-8" 35 | https://github.com/Sannis/node-ubjson "pre-alpha" "Draft-8" 36 | -------------------------------------------------------------------------------- /spec12/binary-data.html: -------------------------------------------------------------------------------- 1 | Page not organized well and under development, but here are the highlights... 2 |

Overview

3 | Support for binary data in the Universal Binary JSON specification was in discussion for 2 years before it was finalized. Many, many different approaches were considered and discarded all in the name of maintaining compatibility with JSON while keeping an eye on performance. 4 | 5 | The result is a surprisingly simple and binary-efficient construct that is also easily translated to JSON and back to UBJSON again with the help of a good library, namely: a strongly-typed array of uint8 values. 6 |

Compatibility with JSON

7 | Representing binary data efficiently in Universal Binary JSON while still maintaining compatibility with JSON is deceptively simple: leverage a strongly-typed array of uint8 values -- essentially a list of integers. 8 | 9 | There is no explicit binary type, but instead the ability to represent binary inside of Universal Binary JSON in a very optimized and JSON-compatible construct. 10 | 11 | The #1 goal of Universal Binary JSON is compatibility with JSON. Compatibility is defined as: 12 |
if 
13 |     A.ubjson -> translated to -> B.json
14 |     &&
15 |     B.json -> translated to -> C.ubjson
16 | then
17 |     A.ubjson == C.ubjson
18 | All of the Universal Binary JSON value and container types are 1:1 compatible with JSON. The only semantically (but not structurally) incompatible construct in UBJSON is strongly-typed containers in that once the container is converted to JSON the typing of the container is lost. Converting the container back to UBJSON and re-enabling the strong-typing does require assistance from the encoding library. 19 | 20 | Since JSON has no direct support for binary data or this style of strongly-typed container, the translation to JSON converts the strongly-typed array to an array of simple JSON types - in the case of binary data, it would be an array of number values (In the example above this is the translation step from A.ubjson to B.json). 21 | 22 | Going from JSON back to UBJSON (B.json -> C.ubjson) has the potential for losing the strongly-typed container information and has to be handled with care to re-enable the optimized representation of that information back in the UBJSON format. 23 |

Library Implementation Recommendation

24 | The library implementors are encouraged to provide this functionality in the form of two optional settings that can be turned on during generation: 25 | 29 | [box type="info"]Specific naming and implementation is up to the developer. This is merely a suggestion on how to handle this situation as elegantly as possible for the client.[/box] 30 | 31 | The idea being that the library can either make an automated attempt at reconstructing the strongly typed containers OR if you have a lot of knowledge of your data, you can force the library to reconstitute what looks to be a strongly typed container based on the fist element type. 32 | 33 | [box type="alert"]If Force is used the library should take care to detect and fail if a different type of value is found in the container during generation. More specifically, the library should remember the first element type and continue checking types as it is generating UBJSON to ensure the type continues to stay consistent.[/box] 34 | 35 | Still under development... 36 |

Performance Considerations

37 | Something to be aware of when converting UBJSON containing a large amount of binary data is that each strongly-typed container of uint8 values will convert to a JSON array of number values, because this translation also introduces a ',' character between every value in the array, this effectively doubles the size of the binary data. -------------------------------------------------------------------------------- /spec12/contact.html: -------------------------------------------------------------------------------- 1 | Please use the form below, email rkalla@gmail.com, post on the Google Group or file an issue on GitHub! I really would like to get any comments, questions or feedback on the specification you think is important to share. UBJSON will only be successful through the passion of many. 2 | 3 | If you are using the Universal Binary JSON format in an application we'd love to hear about it or if you wrote a library to add support for it to your favorite language please let us know and we'll add it to the site! 4 | -------------------------------------------------------------------------------- /spec12/container-types.html: -------------------------------------------------------------------------------- 1 | The Universal Binary JSON Specification defines a total of 2 container types matching JSON's container types: 2 |
    3 |
  1. Array Type
  2. 4 |
  3. Object Type
  4. 5 |
6 | Ignoring special-case optimizations, the design of the Universal Binary JSON containers is intentionally identical to JSON (the same start/end markers) and are streaming-friendly; more specifically they can be written out on-demand without knowing the size of the container ahead of time. 7 |

Optimized Format

8 | Both array and object container types in UBJSON support being represented in a more optimized format that can increase parsing performance as well as shrink data size in most cases (without compression). 9 | 10 | Please see Optimized Format below for details on how to leverage this support. 11 |

Array Type

12 | 13 |
14 | 15 | 19 | The array type in Universal Binary JSON is defined as: 20 | 21 | 22 | 23 | 24 | 25 | 26 | 27 | 28 | 29 | 30 | 31 | 32 | 33 | 34 | 35 | 36 | 37 | 38 | 39 |
TypeSizeMarkerLengthData Payload
array 2+ bytes** [ and ] Optional Yes (if non-empty)
40 | ** See Optimized Format below. 41 |

Usage

42 | The array type in Universal Binary JSON is equivalent to the array type from the JSON specification. 43 |

Example

44 | JSON snippet (42 bytes compacted): 45 |
[
 46 |     null,
 47 |     true,
 48 |     false,
 49 |     4782345193,
 50 |     153.132,
 51 |     "ham"
 52 | ]
53 | UBJSON snippet (21 bytes, 50% smaller): 54 |
[[]
 55 |     [Z]
 56 |     [T]
 57 |     [F]
 58 |     [l][4782345193]
 59 |     [d][153.132]
 60 |     [S][i][3][ham]
 61 | []]
62 | [box type="tick"]Universal Binary JSON format is 50% smaller than the compacted JSON.[/box] 63 |

Object Type

64 | 65 |
66 | 67 | 71 | The object type in Universal Binary JSON is defined as: 72 | 73 | 74 | 75 | 76 | 77 | 78 | 79 | 80 | 81 | 82 | 83 | 84 | 85 | 86 | 87 | 88 | 89 | 90 | 91 |
TypeSizeMarkerLengthData Payload
object 2+ bytes** { and } Optional Yes (if non-empty)
92 | ** See Optimized Format below. 93 |

Usage

94 | The object type in Universal Binary JSON is equivalent to the object type from the JSON specification. 95 |

Example

96 | JSON snippet (90 bytes compacted): 97 |
{
 98 |     "post": {
 99 |         "id": 1137,
100 |         "author": "rkalla",
101 |         "timestamp": 1364482090592,
102 |         "body": "I totally agree!"
103 |     }
104 | }
105 | UBJSON snippet (82 bytes, 9% smaller): 106 |
[{]
107 |     [i][4][post][{]
108 |         [i][2][id][I][1137]
109 |         [i][6][author][S][i][5][rkalla]
110 |         [i][9][timestamp][L][1364482090592]
111 |         [i][4][body][S][i][16][I totally agree!]
112 |     [}]
113 | [}]
114 | [box type="info"]NOTE: The [S] (string) marker is omitted from each of the names in the name/value pairings inside the object. The JSON specification does not allow non-string name values, therefore the [S] marker is redundant and must not be used.[/box] 115 |

Optimized Format

116 | 117 |
118 | 119 | 126 | While the basic specification for the array and object types are identical to the JSON specification (i.e. simple beginning and end markers), both containers support optional parameters that can help optimize the container for better parsing performance and smaller size. 127 | 128 | At a very high level, the optimized format for both array and object container types are built around two optional parameters: type and count 129 | 130 | 131 | 132 | 133 | 134 | 135 | 136 | 137 | 138 | 139 | 140 | 141 | 142 | 143 | 144 | 145 | 146 | 147 | 148 | 149 | 150 | 151 | 152 | 153 | 154 | 155 | 156 | 157 | 158 |
TypeSizeMarkerArg. TypeExampleDesc
type 1-byte $ Value Type or Container Type Marker [$][S] string type
count 1-byte # Integer Numeric Value [#][i][64] count of 64
159 | The effect on the container when specifying one or both parameters is as follows: 160 | 173 | [box type="info"]NOTE: Yes it is possible for an array or object to define their type as '[' or '{' to signal that they themselves contain additional containers![/box] 174 | 175 | [box type="download"]BONUS: Parsers can provide highly-optimized implementations for strongly typed containers of non-variable-length types (e.g. numeric, boolean, etc.) because the exact byte-length of the data is known![/box] 176 | 177 | Some rules that generators and parsers need to be aware of when dealing with these optional parameters is as follows: 178 | 188 | 189 |

Array Example

190 | Below are examples of incrementally more optimized representations of an array in UBJSON. 191 |

No Optimization 192 |

193 |
[[]
194 |     [d][29.97]
195 |     [d][31.13]
196 |     [d][67.0]
197 |     [d][2.113]
198 |     [d][23.888]
199 | []]
200 |

Optimized with count

201 |
[[][#][i][5] // An array of 5 elements.
202 |     [d][29.97]
203 |     [d][31.13]
204 |     [d][67.0]
205 |     [d][2.113]
206 |     [d][23.8889]
207 | // No end marker since a count was specified.
208 |

Optimized with type & count

209 |
[[][$][d][#][i][5] // An array of 5 float32 elements.
210 |     [29.97] // Value type is known, so type markers are omitted.
211 |     [31.13]
212 |     [67.0]
213 |     [2.113]
214 |     [23.8889]
215 | // No end marker since a count was specified.
216 | 217 |

Object Example

218 | Below are examples of incrementally more optimized representations of an object in UBJSON. 219 | 220 | [box type="info"]Remember, in UBJSON the string markers ([S]) are omitted from the names in the name-value pairs of an Object because JSON only allows names of type string.[/box] 221 |

No Optimization 222 |

223 |
[{]
224 |     [i][3][lat][d][29.976]
225 |     [i][4][long][d][31.131]
226 |     [i][3][alt][d][67.0]
227 | [}]
228 |

Optimized with count

229 |
[{][#][i][3] // An object of 3 name:value pairs.
230 |     [i][3][lat][d][29.976]
231 |     [i][4][long][d][31.131]
232 |     [i][3][alt][d][67.0]
233 | // No end marker since a count was specified.
234 |

Optimized with type & count

235 |
[{][$][d][#][i][3] // An object of 3 name:float32-value pairs.
236 |     [i][3][lat][29.976] // Value type is known, so type markers are omitted.
237 |     [i][4][long][31.131] 
238 |     [i][3][alt][67.0] 
239 | // No end marker since a count was specified.
240 | 241 |

Special Cases (Null and Boolean)

242 | Up until now all the examples of leveraging type and count have illustrated the benefit of optimizing out the markers from value types that have a data payload (e.g. numeric values, strings, etc.); since the type of all the values are known, the markers are easily omitted. There are, however, a few special value types that have no data payload and the markers themselves represent the value, specifically: null and boolean (no-op is not a valid type for a container). 243 | 244 | This section will take a look at how those types behave when used with strongly-typed containers. 245 | 246 | At a high level, placing these values in a strongly-typed container provides the basic behavior of essentially pre-defining the value for every element in the container. In the case of and array, all the values contained in it. In the case of an object, all the values associated with all the names in the name-value pairs. 247 |

Array

248 |
[[][$][F][#][I][512] // 512 'false' values.
249 | The example above is a strongly typed array of type false and with a count of 512. 250 | 251 | This simple declaration is equivalent to a 514-byte array containing 512 [F] markers; instead this single line is 6-bytes providing a 99% size reduction. 252 | 253 | Admittedly this is a selective example of leveraging this feature, but the point is that there are potentially very large performance and size optimizations available if your data can take advantage of this shorthand. 254 | 255 | [box type="info"]Strongly-typed arrays of null and boolean must have an empty body. The header itself defines the container's contents.[/box] 256 |

Object

257 |
[{][$][Z][#][i][3]
258 |     [i][4][name] // name only, no value specified.
259 |     [i][8][password]
260 |     [i][5][email]
261 | The example above is a strongly typed object of type null and with a count of 3. 262 | 263 | When used in the context of an object, specifying one of these special-case values as a type has the effect of setting the default value for every name-value pair in the object; therefore the object only contains the names of all the pairs. 264 | 265 | In the case of objects the space-savings is typically a little less drastic than in the array case depending on the size of the names; in the case of small names, it could be significant, approaching a 50% reduction. 266 | 267 | [box type="info"]Strongly-typed objects of null and boolean must not have any values specified in the body, just the name portions of the name-value pairs. The header itself defines the value for every name-value pair.[/box] 268 | 269 | 270 |

Size & Performance Benefits

271 | 276 | The benefits realized by leveraging the optimized container types in UBJSON depend heavily on the data being stored and the implementation of the generator or parser. Baring the frustration of "it depends" as an answer, the benefits can be viewed at a very high level as the following: 277 | 278 |

Optimized for Parsing

279 | By specifying a count, you are hinting to the parser about the number of elements to expect. The performance gains are primarily around allowing the parser to pre-size its internal data structures to exactly the right size to hold pointers to the parsed values. 280 | 281 | By specifying a type and count, the parser not only knows how many child elements to expect, as well as less data to parse and less conditions to run (no marker checks), but in the cases of fixed-length values, the parser knows the exact byte length of the payload! 282 | 283 | For example, consider: 284 |
[[][$][l][#][I][1024] // 1,024 int32 values
285 |     [32]
286 |     [2147483647]
287 |     [101231]
288 |     [77832823]
289 |     ... 1,000 more int32 values ...
290 | After the parser parses the container's header, it knows the byte length of the entire payload is 4096 and in a single read operation can read all the values in and quickly break them up into their int32 representations. 291 | 292 | When you are able to leverage the type and count together to help the parser understand the payload in more detail is where the real performance gains come from. 293 | 294 |

Simple Validation Mechanism

295 | By specifying a count parameter, you are telling the parser the number of child elements it should find in the container. In the case where the parser is unable to find the specified number of child elements it can quickly report a format error to the caller. 296 | 297 | This is a very simple version of verification and not as robust as say a checksum-based approach, but it still provides benefit in addition to a performance gain. 298 | 299 |

Reduce Size up to 50%

300 | This is a 1-byte-per-value reduction in any container where strong typing is used. 301 | 302 | In the case of containers holding large amounts of fairly compact data (small numbers, chars, small strings or value-types like null), removing the type marker from the beginning of each of the values in the container can almost cut the size requirements for the data in half. 303 | 304 | The smaller the containers and bigger the individual values are (large numbers, large strings) the less size benefit this optimization will have, but it still provides a potentially significant opportunity to the parser to optimize it's code paths for parsing large chunks of same-type values (and not needing to worry about type changes mid-container). This is covered in more detail in the previous section: Optimized for Parsing 305 | 306 |

Binary Data Support

307 | This section is here for referential convenience; please see Binary Data for information on storing binary data in UBJSON. 308 | -------------------------------------------------------------------------------- /spec12/developer-resources.html: -------------------------------------------------------------------------------- 1 | This page contains information for developers looking to develop a Universal Binary JSON library. 2 | 7 |

Library Implementation Requirements

8 | Libraries implementing the Universal Binary JSON spec must adhere to the following guidelines: 9 | 12 |

Best Practices

13 | 18 | Through work with the community, feedback from others and our own experience with the specification, below are some of the best-practices collected into one place making it easy for folks working with the format to find answers to the more flexible portions of the spec. 19 |

Optimizing Container Performance

20 | [box type="tick"]Why: (Potentially large) data size reduction and parsing performance increase. 21 | How: Homogeneous data type in a container.[/box] 22 | 23 | Very large performance advantages are available when writing out ARRAY or OBJECT containers that contain same-type values. Be sure to read through the optimized container format that can be leveraged in these cases. 24 | 25 | A typical level of optimization is being able to omit all the marker characters for all same-typed values in a container, making the sizes of all typical value types 1-byte smaller. 26 | 27 | An a-typical level of optimization, that leads to the biggest reduction, is for all 1-byte value types (e.g. NO-OP, NULL, etc); when used in conjunction with the optimized container format, the values themselves can be omitted from the container entirely leading to a space savings that approaches 100% as the size of the container grows. 28 |

Using Smallest Number Representation

29 | [box type="tick"]Why: ~50% size reduction for numbers > 5 digits and < 20 digits. 30 | How: Always use the most compact numeric type possible when writing UBJSON.[/box] 31 | 32 | Numeric values can be represented in a number of ways in UBJSON; you can reduce the size of your UBJSON by inspecting the stored value and ensuring it is represented in the most-compact numeric representation possible when storing the UBJSON blob. 33 | 34 | Keep in mind that varying the type of values inside of a container may impact your ability to use the type parameter to optimize container storage. 35 |

Handling High-Precision Numbers on Unsupported Platforms

36 | [box type="tick"]Why: Cleanly handle > 64-bit numbers on platforms that don't support them. 37 | How: By using the high-precision type.[/box] 38 | 39 | Not every language supports arbitrarily long numbers and some not even numbers greater than 64-bits in size. In order to safely allow the transport and handling of > 64-bit numbers across every platform, UBJSON provides the high-precision numeric type. 40 | 41 | The high-precision type is a string-based type (identical in format to the string type) that provides a universally compatible mechanism by which arbitrarily large or precise numbers can be handled. 42 | 43 | For platforms with arbitrarily large/precise number support, they are free to parse the high-precision value into a native type; for platforms without support, the high-precision value can be safely passed on, persisted to storage or handled in other non-numeric ways while still allowing the client to handle the request and not overflow or otherwise balk at the unsupported numeric type. 44 | 45 | That said, for libraries written to support platforms that do not natively support arbitrarily large or precise values, the following guidance can be employed to provide a safe and consistent behavior when encountering them: 46 |
    47 |
  1. [Default] Exception/Error: Throw an exception(or return an error) when an unsupported high-precision value is encountered during parsing. The platform doesn't support them so allow the client a chance to be aware of the fact that it is receiving data it won't know how to parse into a native type.
  2. 48 |
  3. [Optional] Handle as a String: (must be user-enabled) In the case where the client doesn't need to do any processing of the value and is just doing pass-through like persisting it to a data store, treat the high-precision value as a string and return it to the caller.
  4. 49 |
  5. [Optional] Skip: (must be user-enabled) Provide the ability for the parser to optionally skip unsupported values during parsing. Be aware that this is a dangerous approach and will likely lead to data loss (skipped values won't be visible to the client), but in the case where a client must be able to parse any and all UBJSON it received even if it doesn't support arbitrarily large or precise numbers, then this has to be considered.
  6. 50 |
51 | These guidelines should provide the most functional experience for a client to work with UBJSON on their platform of choice. 52 |

Example Files

53 | [box type="alert"]Example files below only support Draft 8[/box] 54 | 55 | You can find files to test your implementation with here. There are formatted-json, compacted-json and UBJ versions of each of the testing files contained in the repository. 56 | 57 | The simple Java classes that have matching names to the UBJ files are Java class representations of the files (for Java testing) and the Marshaller classes are the hand-coded serialization and deserialization code used to write out and read in those test files from UBJ format. 58 | 59 | Even if you are not working in Java, you can use those classes as a high level guide if you are curious or ignore them completely and just test against the raw file resources. -------------------------------------------------------------------------------- /spec12/index.html: -------------------------------------------------------------------------------- 1 |
    2 |
  1. Quick Start
  2. 3 |
  3. License
  4. 4 |
  5. Why
  6. 5 |
  7. Goals
  8. 6 |
  9. Data Format
  10. 7 |
  11. Size Requirements
  12. 8 |
  13. Endianness
  14. 9 |
  15. MIME Type
  16. 10 |
  17. File Extension
  18. 11 |
  19. Requests for Enhancement (RFE)
  20. 12 |
13 |

Quick Start

14 | 15 |
16 | 17 | You know what JSON is and you understand data formats and just want the good bits? 18 | 30 |

License

31 | 32 |
33 | 34 | The Universal Binary JSON Specification is licensed under the Apache 2.0 License. 35 | 36 | Use of the spec, either as-defined or a customized extension of it, is intended to be commercial-friendly. 37 | 38 | The ultimate purpose of this specification is to provide a useful tool for software developers to leverage in any way they see fit. 39 |

Why

40 | 41 |
42 | 43 | JSON has become a ubiquitous text-based file format for data interchange. Its simplicity, ease of processing and (relatively) rich data typing made it a natural choice for many developers needing to store or shuffle data between systems quickly and easy. 44 | 45 | Unfortunately, marshalling native programming language constructs in and out of a text-based representations does have a measurable processing cost associated with it. 46 | 47 | In high-performance applications, avoiding the text-processing step of JSON can net big wins in both processing time and size reduction of stored information, which is where a binary JSON format becomes helpful. 48 | 49 | Attempts to make using JSON faster through binary specifications like BSON, BJSON or Smile exist, but have been rejected from mass-adoption for two reasons: 50 |
    51 |
  1. Custom (Binary-Only) Data Types: Inclusion of custom data types that have no ancillary in the original JSON spec, leaving room for incompatibilities to exist as different implementations of the spec handle the binary-only data types differently.
  2. 52 |
  3. Complexity: Some specifications provide higher performance or smaller representations at the cost of a much more complex specification, making implementations more difficult which can slow or block adoption. One of the key reasons JSON became as popular as it did was because of its ease of use.
  4. 53 |
54 | BSON, for example, defines types for binary data, regular expressions, JavaScript code blocks and other constructs that have no equivalent data type in JSON. BJSON defines a binary data type as well, again leaving the door wide open to interpretation that can potentially lead to incompatibilities between two implementations of the spec and Smile, while the closest, defines more complex data constructs and generation/parsing rules in the name of absolute space efficiency. These are not short-comings, just trade-offs the different specs made in order to service specific use-cases. 55 | 56 | The existing binary JSON specifications all define incompatibilities or complexities that undo the singular tenet that made JSON so successful: simplicity. 57 | 58 | JSON's simplicity made it accessible to anyone, made implementations in every language available and made explaining it to anyone consuming your data immediate. 59 | 60 | Any successful binary JSON specification must carry these properties forward for it to be genuinely helpful to the community at large. 61 | 62 | This specification is defined around a singular marker-based construct used to build up and represent JSON values and objects. Reading and writing the format is trivial, designed with the goal of being understood in under 10 minutes (likely less if you are very comfortable with JSON already). 63 | 64 | [box type="info"]TIP: UBJSON is built exclusively out of marker-characters like 'C' (for CHAR), 'S' (for STRING), etc. followed by either the payload itself, or a length and then the payload... that's it![/box] 65 | 66 | Fortunately, while the Universal Binary JSON specification carries these tenets of simplicity forward, it is also able to take advantage of optimized binary data structures that are (on average) 30% smaller than compacted JSON and specified for ultimate read performance; bringing simplicity, size and performance all together into a single specification that is 100% compatible with JSON. 67 |

Why not JSON+gzip?

68 | On the surface simply gzipping your compacted JSON may seem like a valid (and smaller) alternative to using the Universal Binary JSON specification, but there are two significant costs associated with this approach that you should be aware of: 69 |
    70 |
  1. At least a 50% performance overhead for processing the data.
  2. 71 |
  3. Lack of data clarity and inability to inspect it directly.
  4. 72 |
73 | While gzipping your JSON will give you great compression, about 75% on average, the overhead required to read/write the data becomes significantly higher. 74 | 75 | Additionally, because the binary data is now in a compressed format you can no longer open it directly in an editor and scan the human-readable portions of it easily; which can be important during debugging, testing or data verification and recovery. 76 | 77 | Utilizing the Universal Binary JSON format will typically provide a 30% reduction in size and store your data in an optimized format offering you much higher performance while still allowing you to open the file directly and read through it. 78 | 79 | If you had a usage scenario where your data is put into long-term cold storage and pulled out in large chunks for processing, you might even consider gzipping your Universal Binary JSON files, storing those, and when they are pulled out and unzipped, you can then process them with all the speed advantages of UBJSON. 80 | 81 | As always, deciding which approach is right for your project depends heavily on what you need. 82 |

Goals

83 | 84 |
85 | 86 | The Universal Binary JSON specification has 3 goals: 87 | 88 | 1. Universal Compatibility 89 |

Meaning absolute compatibility with the JSON spec itself as well as only utilizing data types that are natively supported in all popular programming languages.

90 |

This allows 1:1 transforms between standard JSON and Universal Binary JSON as well as efficient representation in all popular programming languages without requiring parser developers to account for strange data types that their language may not support.

91 | 2. Ease of Use 92 |

The Universal Binary JSON specification is intentionally defined using a single core data structure to build up the entire specification.

93 |

This accomplishes two things: it allows the spec to be understood quickly and allows developers to write trivially simple code to take advantage of it or interchange data with another system utilizing it.

94 | 3. Speed / Efficiency 95 |

Typically the motivation for using a binary specification over a text-based one is speed and/or efficiency, so strict attention was paid to selecting data constructs and representations that are (roughly) 30% smaller than their compacted JSON counterparts and optimized for fast parsing.

96 | 97 |

Data Format

98 | 99 |
100 | 101 | The Universal Binary JSON specification utilizes a single construct with two optional segments (length and data) for all types: 102 |
[box type="info" border="full" icon="none"][type, 1-byte char]([integer numeric length])([data])[/box]
103 | Each element in the tuple is defined as: 104 | 121 | Some value are simple enough that just writing the 1-byte ASCII marker into the stream is enough to represent the value (e.g. null) while others have a type that is specific enough that no length is needed as the length is implied by the type (e.g. int32) while others still require both a type and a length to communicate their value (e.g. string). 122 |

Types

123 | Universal Binary JSON defines a number of Value Types and Container Types that map directly to JSON's types. For the most part the correlation is 1:1 except in the case of numeric types where UBJSON defines many more specific types of number storage and representation than JSON's single number type. 124 | 132 |

Size Requirements

133 | 134 |
135 | 136 | The Universal Binary JSON specification tries to strike the perfect balance between space savings, simplicity and performance. 137 | 138 | Data stored using the Universal Binary JSON format are on average 30% smaller as a rule of thumb. As you can see from some of the examples in this document though, it is not uncommon to see the binary representation of some data lead to a 50% or 60% size reduction without compression. 139 | 140 | The size reduction of your data depends heavily on the type of data you are storing. It is best to do your own benchmarking with a comprehensive sampling of your own data. 141 | 142 | [box type="note"]The Universal Binary JSON specification does not use compression algorithms to achieve smaller storage sizes. The size reduction is a side effect of the efficient binary storage format.[/box] 143 |

Size Reduction Tips

144 | The amount of storage size reduction you'll experience with the Universal Binary JSON format will depend heavily on the type of data you are encoding. 145 | 146 | Some data shrinks considerably, some mildly and some not at all, but in every case your data will be stored in a much more efficient format that is faster to read and write. 147 | 148 | Below are pointers to give you an idea of how certain data may shrink in this format: 149 | 156 | One of the great things about the Universal Binary JSON format is that even though most all your data will be represented in a smaller footprint, you still get two big wins: 157 |
    158 |
  1. A smaller data format means faster writes and smaller reads. It also means less data to process when parsing.
  2. 159 |
  3. Binary format means no encoding/decoding primitive values to text and no parsing primitive values from text.
  4. 160 |
161 |

Endianness

162 | 163 |
164 | 165 | The Universal Binary JSON specification requires that all numeric values be written in Big-Endian order. 166 |

MIME Type

167 | 168 |
169 | 170 | The Universal Binary JSON specification is a binary format and recommends using the following mime type: 171 | [box type="info" border="full" icon="none"]application/ubjson[/box] 172 | 173 | This was added directly to the specification in hopes of avoiding similar confusion with JSON. 174 |

File Extension

175 | 176 |
177 | 178 | "ubj" is the recommended file extension when writing out files using the Universal Binary JSON format (e.g. "user.ubj"). 179 | 180 | The extension stands for "Universal Binary JSON" and has no known conflicting mappings to other file formats. 181 |

Requests for Enhancement (RFE)

182 | 183 |
184 | 185 | All (proposed) changes to the specification are being tracked in GitHub. -------------------------------------------------------------------------------- /spec12/libraries.html: -------------------------------------------------------------------------------- 1 | Below are a list of libraries, by language, that implement the Universal Binary JSON Specification. 2 | 3 |
4 | 5 |

ASM.JS

6 | 9 |

C

10 | 14 |

C++

15 | 19 |

D

20 | 23 |

Java

24 | 29 |

MATLAB

30 | 33 |

.NET

34 | 37 |

Node.js

38 | 41 |

PHP

42 | 45 |

Python

46 | 50 |

Qt

51 | 54 |

Swift

55 | 58 | -------------------------------------------------------------------------------- /spec12/thanks.html: -------------------------------------------------------------------------------- 1 | Universal Binary JSON was originally motivated by a desire to provide an on-disk & over-the-wire format that required no parsing or marshalling in CouchDB (inspiration). In its original draft form, UBJSON was much too simple of a spec with too many holes but over the next number of years and only with the help of the following people (among many others) did the spec grow up. 2 | 3 | I want to express my personal thanks to each one of you for all the help you lent at the different stages of UBJSON's development (and continue to provide in some cases). 4 | 5 | Sincerely, Riyad Kalla 6 | 7 |
8 | 9 | Adil Baig 10 |

Adil has been very involved in the in-depth and multi-year long discussions surrounding a more optimized container specification as well as binary data support. Adil also provided a very compelling, diff-typing proposal for an optimized container format that provided a lot of good guidance around elegant alternatives to consider.

11 | Alex Blewitt 12 |

Helped catch a number of specification errors around UTF-8 encoding in the original draft of the specification that would have been confusing/nasty to release. He also provided great feedback about the size and performance metrics for the specification.

13 | Alexander Shorin 14 |

Alex is both the author of the UBJSON Python library and a valued collaborator on the Universal Binary JSON spec as it matured. Alex provided instrumental insight into the modifications made between Draft 8 and Draft 9 of the spec to help simplify the spec by removing all the duplicate (compact) type representations, simplifying the length-arguments for STRING and HUGE as well as being the one to point out that the length arguments for the ARRAY and OBJECT container types are effectively useless once the streaming-format support was added (and do not make generator code or parsing code any easier or more performant).

15 | Bjørn Reese 16 |

Bjørn has been involved in most all of the binary data support discussions that have taken place since 2012. His detail-oriented contributions helped move the discussion forwad.

17 | John Cowan 18 |

John was the one that recommended using UTF-8 string-encoded values (or huge) for arbitrarily huge numbers after seeing my desire to avoid including any non-portable constructs into the binary format.

19 |

Given that the discussion on numeric formats had been a very active one with lots of feelings on all sides, it was a boon to have John step up with such a simple suggestion that allowed for maximum compatibility and portability. It was a win-win all the way around.

20 | Michael Makarenko (aka "M1xA") 21 |

Michael is the author behind the Ubjson.NET library and contributor of the int16 and float numeric types to the specification. For numeric-heavy (e.g. scientific) data, the inclusions of the in16 and float types can lead to significant space savings when writing out values in the Universal Binary JSON format.

22 |

Michael has also gone to great lengths to make the .NET implementation of UBJSON as tight and performant as possible; collaborating on benchmark design and testing data as well as compatibility testing between implementations to ensure a great Universal Binary JSON experience for .NET developers.

23 |

In addition to development, Michael has helped contribute to the growth of the Universal Binary JSON community with articles about the specification.

24 | Paul Davis 25 |

While approaching the CouchDB team for feedback on the Universal Binary JSON spec, I met Paul who was willing to spend a significant amount of time reviewing the specification and recommending suggestions, changes and improvements from everything the CouchDB team has learned by dealing closely with JSON for years.

26 |

Paul pointed out the shortcomings of prefixing the length to the two container types if the specification could ever be used easily with services or apps that streamed UBJ format for huge runs of data that the server couldn't load, buffer and count ahead of time before responding to the client. In order to more easily support streaming, unknown-length container types had to be added.

27 |

Paul also pointed out the importance of a NO_OP/SKIP/IGNORE type that can be useful during a long-lived streaming operation where the server may be waiting on something (like a DB) and you need to keep the connection alive between client/server and avoid the client timing out, but you need the client to know the data it is receiving is just meant as a "Hang on" message from the server and not actual data. This is where the NO_OP command comes in handy.

28 | Stephan Beal 29 |

Stephan helped quite a bit with understanding the implications of a >= 64-bit numeric format and the implications of portability across a number of popular platforms.

30 | 31 | 32 |
33 | 34 | JSON Specification Group 35 |

I would like to personally thank everyone in the JSON Specification Group. The amount of feedback and help with the specification has been wonderful, constructive and creative. It also lead to one of the busiest conversations in the last year!

-------------------------------------------------------------------------------- /spec12/type-reference.html: -------------------------------------------------------------------------------- 1 | The table below is a quick-reference for folks working closely with the Universal Binary JSON format that want all the information at their finger tips: 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 | 20 | 21 | 22 | 23 | 24 | 25 | 26 | 27 | 28 | 29 | 30 | 31 | 32 | 33 | 34 | 35 | 36 | 37 | 38 | 39 | 40 | 41 | 42 | 43 | 44 | 45 | 46 | 47 | 48 | 49 | 50 | 51 | 52 | 53 | 54 | 55 | 56 | 57 | 58 | 59 | 60 | 61 | 62 | 63 | 64 | 65 | 66 | 67 | 68 | 69 | 70 | 71 | 72 | 73 | 74 | 75 | 76 | 77 | 78 | 79 | 80 | 81 | 82 | 83 | 84 | 85 | 86 | 87 | 88 | 89 | 90 | 91 | 92 | 93 | 94 | 95 | 96 | 97 | 98 | 99 | 100 | 101 | 102 | 103 | 104 | 105 | 106 | 107 | 108 | 109 | 110 | 111 | 112 | 113 | 114 | 115 | 116 | 117 | 118 | 119 | 120 | 121 | 122 | 123 | 124 | 125 | 126 | 127 | 128 | 129 | 130 | 131 | 132 |
TypeSizeMarkerLengthData Payload
[box type="download" icon="none"]Value Types[/box]
null 1-byte Z No No
no-op 1-byte N No No
true 1-byte T No No
false 1-byte F No No
int8 2-bytes i No Yes
uint8 2-bytes U No Yes
int16 3-bytes I No Yes
int32 5-bytes l No Yes
int64 9-bytes L No Yes
float32 5-bytes d No Yes
float64 9-bytes D No Yes
high-precision number 1-byte + int num val + string byte len H Yes Yes (if non-empty)
char 2-bytes C No Yes
string 1-byte + int num val + string byte len S Yes Yes (if non-empty)
[box type="download" icon="none"]Container Types[/box]
array** 2+ bytes [ and ] Optional Yes (if non-empty)
object** 2+ bytes { and } Optional Yes (if non-empty)
133 | ** See container optimized format for details. 134 |

Example

135 | Below is an example of what a common JSON response would look like in UBJSON. This particular example was taken from the GitHub developer docs. 136 | 137 | JSON Response 138 |
{
139 |   "login": "octocat",
140 |   "id": 1,
141 |   "avatar_url": "https://github.com/images/error/octocat_happy.gif",
142 |   "gravatar_id": "somehexcode",
143 |   "url": "https://api.github.com/users/octocat",
144 |   "name": "monalisa octocat",
145 |   "company": "GitHub",
146 |   "blog": "https://github.com/blog",
147 |   "location": "San Francisco",
148 |   "email": "octocat@github.com",
149 |   "hireable": false,
150 |   "bio": "There once was...",
151 |   "public_repos": 2,
152 |   "public_gists": 1,
153 |   "followers": 20,
154 |   "following": 0,
155 |   "html_url": "https://github.com/octocat",
156 |   "created_at": "2008-01-14T04:33:35Z",
157 |   "type": "User",
158 |   "total_private_repos": 100,
159 |   "owned_private_repos": 100,
160 |   "private_gists": 81,
161 |   "disk_usage": 10000,
162 |   "collaborators": 8,
163 |   "plan": {
164 |     "name": "Medium",
165 |     "space": 400,
166 |     "collaborators": 10,
167 |     "private_repos": 20
168 |   }
169 | }
170 | UBJSON Response (using block-notation) 171 |
[{]
172 |     [i][5][login][S][i][7][octocat]
173 |     [i][2][id][i][1]
174 |     [i][10][avatar_url][S][i][49][https://github.com/images/error/octocat_happy.gif]
175 |     [i][11][gravatar_id][S][i][11][somehexcode]
176 |     [i][3][url][S][i][36][https://api.github.com/users/octocat]
177 |     [i][4][name][S][i][16][monalisa octocat]
178 |     [i][7][company][S][i][6][GitHub]
179 |     [i][4][blog][S][i][23][https://github.com/blog]
180 |     [i][8][location][S][i][13][San Francisco]
181 |     [i][5][email][S][i][18][octocat@github.com]
182 |     [i][8][hireable][F]
183 |     [i][3][bio][S][i][17][There once was...]
184 |     [i][12][public_repos][i][2]
185 |     [i][12][public_gists][i][1]
186 |     [i][9][followers][i][20]
187 |     [i][9][following][i][0]
188 |     [i][8][html_url][S][i][26][https://github.com/octocat]
189 |     [i][10][created_at][S][i][20][2008-01-14T04:33:35Z]
190 |     [i][4][type][S][i][4][User]
191 |     [i][19][total_private_repos][i][100]
192 |     [i][19][owned_private_repos][i][100]
193 |     [i][13][private_gists][i][81]
194 |     [i][10][disk_usage][I][10000]
195 |     [i][13][collaborators][i][8]
196 |     [i][4][plan][{]
197 |         [i][4][name][S][i][6][Medium]
198 |         [i][5][space][I][400]
199 |         [i][13][collaborators][i][10]
200 |         [i][13][private_repos][i][20]
201 |     [}]
202 | [}]
-------------------------------------------------------------------------------- /spec12/value-types.html: -------------------------------------------------------------------------------- 1 | The Universal Binary JSON Specification defines a total of 13 value types (to JSON's 5 value types). 2 | 3 | The reason for the increased number of value types is because UBJSON defines 8 numeric value types (to JSON's 1) allowing for highly optimized storage/retrieval of numeric values depending on the necessary precision; in addition to a number of other more optimized representations of JSON values. 4 | 5 | The specifications for each of the Universal Binary JSON Specification value types are below. 6 |
    7 |
  1. Null Value
  2. 8 |
  3. No-Op Value
  4. 9 |
  5. Boolean Types
  6. 10 |
  7. Numeric Types
  8. 11 |
  9. Char Type
  10. 12 |
  11. String Type
  12. 13 |
  13. Binary Data
  14. 14 |
15 |

Null Value

16 | 17 |
18 | 19 | 23 | The null value in Universal Binary JSON is defined as: 24 | 25 | 26 | 27 | 28 | 29 | 30 | 31 | 32 | 33 | 34 | 35 | 36 | 37 | 38 | 39 | 40 | 41 | 42 | 43 |
TypeSizeMarkerLengthData Payload
null 1-byte Z No No
44 |

Usage

45 | The null value in Universal Binary JSON is equivalent to the null value from the JSON specification. 46 |

Example

47 | JSON snippet: 48 |
{
 49 |     "passcode": null
 50 | }
51 | UBJSON snippet (using block-notation): 52 |
[{]
 53 |     [i][8][passcode][Z]
 54 | [}]
55 |

No-Op Value

56 | 57 |
58 | 59 | 63 | The no-op value in Universal Binary JSON is defined as: 64 | 65 | 66 | 67 | 68 | 69 | 70 | 71 | 72 | 73 | 74 | 75 | 76 | 77 | 78 | 79 | 80 | 81 | 82 | 83 |
TypeSizeMarkerLengthData Payload
noop 1-byte N No No
84 |

Usage

85 | The intended usage of the no-op value is as a valueless signal between a producer (most likely a server) and a consumer (most likely a client) to indicate activity; for example, as a keep-alive signal so a client knows a server is still working and hasn't hung or timed out. 86 | 87 | There is no equivalent to no-op value in the original JSON specification. 88 | 89 | The NO-OP value is meant to be a valueless value; meaning it can be added to the elements of a container and when parsed by the receiver, the no-op values are simply skipped and carry know meaningful value with them. 90 | 91 | For example, the two following array elements are considered equal (using JSON format for readability): 92 |
["foo", "bar", "baz"]
93 | and 94 |
["foo", no-op, "bar", no-op, no-op, no-op, "baz", no-op, no-op]
95 | There are a number of interesting advantages to having a valueless-value defined directly in the spec. 96 |

Example

97 | Consider a web service that performs an expensive operation that can take quite a while (let's say 5 minutes): 98 |
<start response>
 99 | [N]
100 | <10 second delay>
101 | [N]
102 | <10 second delay>
103 | [N]
104 | <10 second delay>
105 | <...receiving data...>
106 | <10 second delay>
107 | [N]
108 | <10 second delay>
109 | [N]
110 | <...receiving remainder of data...>
111 | <end response>
112 | Most clients by default will timeout after 60 seconds and more aggressive clients will timeout even faster. To help let clients know that the server has not hung, is still alive and is still processing the request the server can reply at some determined interval (e.g. every X seconds) with the no-op value and the client can parse it, acknowledge it and reset its timeout-disconnect timer as a result. 113 | 114 | Another example of leveraging no-op in an interesting way is modeling an efficient delete operation for UBJSON on-disk when elements of a container are removed. Instead of reading the entire container, removing the elements and writing the whole thing out again, no-op bytes can simply be written over the records that were removed from the containers. When the record is parsed, it is semantically identical to a container without the values. 115 | 116 | These are just a few examples of how you can leverage the no-op value. 117 |

Boolean Types

118 | 119 |
120 | 121 | 125 | The boolean types in Universal Binary JSON are defined as: 126 | 127 | 128 | 129 | 130 | 131 | 132 | 133 | 134 | 135 | 136 | 137 | 138 | 139 | 140 | 141 | 142 | 143 | 144 | 145 | 146 | 147 | 148 | 149 | 150 | 151 | 152 |
TypeSizeMarkerLengthData Payload
true 1-byte T No No
false 1-byte F No No
153 |

Usage

154 | A boolean type is represented in Universal Binary JSON similar to the JSON specification: using a T (true) and F (false) character marker. 155 |

Example

156 | JSON snippet: 157 |
{
158 |     "authorized": true,
159 |     "verified": false
160 | }
161 | UBJSON snippet (using block-notation): 162 |
[{]
163 |     [i][10][authorized][T]
164 |     [i][8][verified][F]
165 | [}]
166 |

Numeric Types

167 | 168 |
169 | 170 | 181 | There are 8 numeric types in Universal Binary JSON and are defined as: 182 | 183 | 184 | 185 | 186 | 187 | 188 | 189 | 190 | 191 | 192 | 193 | 194 | 195 | 196 | 197 | 198 | 199 | 200 | 201 | 202 | 203 | 204 | 205 | 206 | 207 | 208 | 209 | 210 | 211 | 212 | 213 | 214 | 215 | 216 | 217 | 218 | 219 | 220 | 221 | 222 | 223 | 224 | 225 | 226 | 227 | 228 | 229 | 230 | 231 | 232 | 233 | 234 | 235 | 236 | 237 | 238 | 239 | 240 | 241 | 242 | 243 | 244 | 245 | 246 | 247 | 248 | 249 | 250 |
TypeSizeMarkerLengthData Payload
int8 2-bytes i No Yes
uint8 2-bytes U No Yes
int16 3-bytes I No Yes
int32 5-bytes l No Yes
int64 9-bytes L No Yes
float32 5-bytes d No Yes
float64 9-bytes D No Yes
high-precision number 1-byte + int num val + string byte len H Yes Yes (if non-empty)
251 | In JavaScript (and JSON) the Number type can represent any numeric value, while in most other languages multiple (discrete) numeric types exist to describe different sizes and types of numeric values; this allows the runtime to handle numeric operations more efficiently. 252 | 253 | In order for the Universal Binary JSON specification to be a performant alternative to JSON, support for these most common numeric types had to be added to allow for more efficient reading and writing of numeric values. 254 | 255 | Trying to maintain a single numeric type in UBJSON would have lead to parsing complexity, requiring each language to further inspect the numeric value and marshall it down to the most appropriate internal type. By pre-defining these different numeric types directly in UBJSON, it allows for either a direct conversion into a native language type (e.g. Java) or a straight forward marshaling into the nearest-supported language type (e.g. Erlang). 256 |

Usage

257 | The intended usage of the different numeric types are to efficiently store numbers in a space and encoding-optimized format. 258 | 259 | [box type="info"]It is always recommended to use the smallest numeric type that fits your needs. For data with a large amount of numeric data, this can cut down the size of the payloads significantly (on average a 50% reduction in size).[/box] 260 |

Example

261 | JSON Snippet: 262 |
{
263 |     "int8": 16,
264 |     "uint8": 255,
265 |     "int16": 32767,
266 |     "int32": 2147483647,
267 |     "int64": 9223372036854775807,
268 |     "float32": 3.14,
269 |     "float64": 113243.7863123,
270 |     "huge1": "3.14159265358979323846",
271 |     "huge2": "-1.93+E190",
272 |     "huge3": "719..."
273 | }
274 | UBJSON snippets (using block-notation): 275 |
[i][4][int8][i][16]
276 | [i][5][uint8][U][255]
277 | [i][5][int16][I]32767]
278 | [i][5][int32][l][2147483647]
279 | [i][5][int64][L][9223372036854775807]
280 | [i][7][float32][d][3.14]
281 | [i][7][float64][D][113243.7863123]
282 | [i][5][huge1][H][i][22][3.14159265358979323846]
283 | [i][5][huge2][H][i][10][-1.93+E190]
284 | [i][5][huge3][H][U][200][719...]
285 | 
286 |

Infinity

287 | Numeric values of infinity are encoded as a null value. (See ECMA and JSON) 288 |

Signage & Min/Max Values

289 | The min/max range of values (inclusive) for each numeric type are as follows: 290 | 291 | 292 | 293 | 294 | 295 | 296 | 297 | 298 | 299 | 300 | 301 | 302 | 303 | 304 | 305 | 306 | 307 | 308 | 309 | 310 | 311 | 312 | 313 | 314 | 315 | 316 | 317 | 318 | 319 | 320 | 321 | 322 | 323 | 324 | 325 | 326 | 327 | 328 | 329 | 330 | 331 | 332 | 333 | 334 | 335 | 336 | 337 | 338 | 339 | 340 | 341 | 342 | 343 | 344 | 345 | 346 | 347 | 348 | 349 |
TypeSignedMin ValueMax Value
int8 Yes -128 127
uint8 No 0 255
int16 Yes -32,768 32,767
int32 Yes -2,147,483,648 2,147,483,647
int64 Yes -9,223,372,036,854,775,808 9,223,372,036,854,775,807
float32 Yes See IEEE 754 Spec See IEEE 754 Spec
float64 Yes See IEEE 754 Spec See IEEE 754 Spec
high-precision number Yes Infinite Infinite
350 |

64-bit Integers

351 | While almost all languages native support 64-bit integers, not all do (e.g. C89 and JavaScript (yet)) and care must be taken when encoding 64-bit integer values into binary JSON then attempting to decode it on a platform that doesn’t support it. 352 | 353 | If you are fully aware of the platforms and runtime environments your binary JSON is being used on and know they all support 64-bit integers, then you are fine. 354 | 355 | If you are trying to deserialize 64-bit integers in a client’s browser in JavaScript or another environment that does not support 64-bit integers, then you will want to take care to skip them in the input or have the client producing them encode them as double or high-precision values if that is easier to handle. 356 | 357 | Alternatively you might consider encoding your 64-bit values as doubles if you know you are going from the server to a client JavaScript environment with the binary-encoded information. 358 |

High-Precision Numbers (Larger than 64-bit)

359 | The high-precision number type is an ultra-portable mechanism by which arbitrarily large (or precise) numbers, greater than 64-bit in size, are encoded as a UTF-8 string and passed between systems that support them. This allows high-precision number values to degrade gracefully on systems that do not have a built-in type to support numeric values larger than 64-bit. Please refer to the Best Practices page for techniques on working around the lack of larger-than-64-bit numeric types on certain platforms if you need them. 360 | 361 | 362 | high-precision number values must be written out in accordance with the original JSON number type specification. 363 |

Byte Order / Endianness

364 | All integer types (int8, uint8, int16, int32 and int64) are written in most-significant-bit order (high byte written first, aka "big endian"). 365 | 366 | float32 values are written in IEEE 754 single precision floating point format, which is the following structure: 367 | 372 | float64 values are written in IEEE 754 double precision floating point format, which is the following structure: 373 | 378 |

Storage Size

379 | The size of the high-precision number type "on-disk" follows the same structure and sizing of the string type (see Storage Size section). 380 | 381 | All other numeric types storage size is reflected at the beginning of this section as well as in the Type Reference table. 382 |

Char Type

383 | 384 |
385 | 386 | 392 | The char type in Universal Binary JSON is defined as: 393 | 394 | 395 | 396 | 397 | 398 | 399 | 400 | 401 | 402 | 403 | 404 | 405 | 406 | 407 | 408 | 409 | 410 | 411 | 412 |
TypeSizeMarkerLengthData Payload
char 2-bytes C No Yes
413 |

Usage

414 | The char type in Universal Binary JSON is an unsigned byte meant to represent a single printable ASCII character (decimal values 0-127). Put another way, the char type represents a single-byte UTF-8 encoded character. 415 | 416 | [box type="note"]The char type is synonymous with 1-byte, UTF8 encoded value (decimal values 0-127). A char value must not have a decimal value larger than 127.[/box] 417 | 418 | The char type is functionally identical to the uint8 type, but semantically is meant to represent a character and not a numeric value. 419 |

Example

420 | JSON snippet: 421 |
{
422 |     "rolecode": "a",
423 |     "delim": ";",
424 | }
425 | UBJSON snippet (using block-notation): 426 |
[[]
427 |     [i][8][rolecode][C][a]
428 |     [i][5][delim][C][;]
429 | []]
430 |

String Type

431 | 432 |
433 | 434 | 440 | The string type in Universal Binary JSON is defined as: 441 | 442 | 443 | 444 | 445 | 446 | 447 | 448 | 449 | 450 | 451 | 452 | 453 | 454 | 455 | 456 | 457 | 458 | 459 | 460 |
TypeSizeMarkerLengthData Payload
string 1-byte + int num val + string byte len S Yes Yes (if non-empty)
461 |

Usage

462 | The string type in Universal Binary JSON is equivalent to the string type from the JSON specification. 463 |

Example

464 | JSON snippet: 465 |
{
466 |     "username": "rkalla",
467 |     "imagedata": "<huge string payload...>"
468 | }
469 | UBJSON snippet (using block-notation): 470 |
[[]
471 |     [i][8][username][S][i][5][rkalla]
472 |     [i][9][imagedata][S][l][2097152][...huge string payload...]
473 | []]
474 |

Encoding (UTF-8)

475 | The JSON specification does not dictate a specific required encoding, it does however use UTF-8 as the default encoding. 476 | 477 | The Universal Binary JSON specification dictates UTF-8 as the required string encoding (this includes the high-precision number type as it is a string-encoded value). This will allow you to easily exchange binary JSON between open systems that all support and follow this encoding requirement as well as providing a number of advantages and optimizations. 478 |

Storage Size

479 | The size of the string type varies depending on two things: 480 |
    481 |
  1. The integral numeric type used to describe the length of the string (e.g. int8, in16, int32 or int64)
  2. 482 |
  3. The UTF-8 encoded size, in bytes, of the string.
  4. 483 |
484 | For example, English typically uses 1-byte per character, so the string “hello” has a length of 5. The same string in Russian is “привет” with a byte length of 12 and in Arabic the text becomes “مرحبا” with a byte length of 10. 485 | 486 | Here are some examples of what different string values look like to illustrate the point: 487 | 488 | 489 | 490 | 491 | 492 | 493 | 494 | 495 | 496 | 497 | 498 | 499 | 500 | 501 | 502 | 503 | 504 | 505 | 506 | 507 | 508 | 509 | 510 | 511 | 512 |
Binary RepresentationDescription
[S][i][5][hello] 8 bytes, string UTF-8 "hello" (English)
[S][i][12][привет] 15 bytes, string UTF-8 "hello" (Russian)
[S][i][10][مرحبا] 13 bytes, string UTF-8 "hello" (Arabic)
[S][I][1024][...1k long string...] 1 + 3 + 1024 bytes = 1028 bytes total
513 |

Binary Data

514 | 515 |
516 | 517 | Please see the Binary Data page... -------------------------------------------------------------------------------- /spec8/Makefile: -------------------------------------------------------------------------------- 1 | # Makefile for Sphinx documentation 2 | # 3 | 4 | # You can set these variables from the command line. 5 | SPHINXOPTS = 6 | SPHINXBUILD = sphinx-build 7 | PAPER = 8 | BUILDDIR = _build 9 | 10 | # Internal variables. 11 | PAPEROPT_a4 = -D latex_paper_size=a4 12 | PAPEROPT_letter = -D latex_paper_size=letter 13 | ALLSPHINXOPTS = -d $(BUILDDIR)/doctrees $(PAPEROPT_$(PAPER)) $(SPHINXOPTS) . 14 | # the i18n builder cannot share the environment and doctrees with the others 15 | I18NSPHINXOPTS = $(PAPEROPT_$(PAPER)) $(SPHINXOPTS) . 16 | 17 | .PHONY: help clean html dirhtml singlehtml pickle json htmlhelp qthelp devhelp epub latex latexpdf text man changes linkcheck doctest gettext 18 | 19 | help: 20 | @echo "Please use \`make ' where is one of" 21 | @echo " html to make standalone HTML files" 22 | @echo " dirhtml to make HTML files named index.html in directories" 23 | @echo " singlehtml to make a single large HTML file" 24 | @echo " pickle to make pickle files" 25 | @echo " json to make JSON files" 26 | @echo " htmlhelp to make HTML files and a HTML help project" 27 | @echo " qthelp to make HTML files and a qthelp project" 28 | @echo " devhelp to make HTML files and a Devhelp project" 29 | @echo " epub to make an epub" 30 | @echo " latex to make LaTeX files, you can set PAPER=a4 or PAPER=letter" 31 | @echo " latexpdf to make LaTeX files and run them through pdflatex" 32 | @echo " text to make text files" 33 | @echo " man to make manual pages" 34 | @echo " texinfo to make Texinfo files" 35 | @echo " info to make Texinfo files and run them through makeinfo" 36 | @echo " gettext to make PO message catalogs" 37 | @echo " changes to make an overview of all changed/added/deprecated items" 38 | @echo " linkcheck to check all external links for integrity" 39 | @echo " doctest to run all doctests embedded in the documentation (if enabled)" 40 | 41 | clean: 42 | -rm -rf $(BUILDDIR)/* 43 | 44 | html: 45 | $(SPHINXBUILD) -b html $(ALLSPHINXOPTS) $(BUILDDIR)/html 46 | @echo 47 | @echo "Build finished. The HTML pages are in $(BUILDDIR)/html." 48 | 49 | dirhtml: 50 | $(SPHINXBUILD) -b dirhtml $(ALLSPHINXOPTS) $(BUILDDIR)/dirhtml 51 | @echo 52 | @echo "Build finished. The HTML pages are in $(BUILDDIR)/dirhtml." 53 | 54 | singlehtml: 55 | $(SPHINXBUILD) -b singlehtml $(ALLSPHINXOPTS) $(BUILDDIR)/singlehtml 56 | @echo 57 | @echo "Build finished. The HTML page is in $(BUILDDIR)/singlehtml." 58 | 59 | pickle: 60 | $(SPHINXBUILD) -b pickle $(ALLSPHINXOPTS) $(BUILDDIR)/pickle 61 | @echo 62 | @echo "Build finished; now you can process the pickle files." 63 | 64 | json: 65 | $(SPHINXBUILD) -b json $(ALLSPHINXOPTS) $(BUILDDIR)/json 66 | @echo 67 | @echo "Build finished; now you can process the JSON files." 68 | 69 | htmlhelp: 70 | $(SPHINXBUILD) -b htmlhelp $(ALLSPHINXOPTS) $(BUILDDIR)/htmlhelp 71 | @echo 72 | @echo "Build finished; now you can run HTML Help Workshop with the" \ 73 | ".hhp project file in $(BUILDDIR)/htmlhelp." 74 | 75 | qthelp: 76 | $(SPHINXBUILD) -b qthelp $(ALLSPHINXOPTS) $(BUILDDIR)/qthelp 77 | @echo 78 | @echo "Build finished; now you can run "qcollectiongenerator" with the" \ 79 | ".qhcp project file in $(BUILDDIR)/qthelp, like this:" 80 | @echo "# qcollectiongenerator $(BUILDDIR)/qthelp/UniversalBinaryJSON.qhcp" 81 | @echo "To view the help file:" 82 | @echo "# assistant -collectionFile $(BUILDDIR)/qthelp/UniversalBinaryJSON.qhc" 83 | 84 | devhelp: 85 | $(SPHINXBUILD) -b devhelp $(ALLSPHINXOPTS) $(BUILDDIR)/devhelp 86 | @echo 87 | @echo "Build finished." 88 | @echo "To view the help file:" 89 | @echo "# mkdir -p $$HOME/.local/share/devhelp/UniversalBinaryJSON" 90 | @echo "# ln -s $(BUILDDIR)/devhelp $$HOME/.local/share/devhelp/UniversalBinaryJSON" 91 | @echo "# devhelp" 92 | 93 | epub: 94 | $(SPHINXBUILD) -b epub $(ALLSPHINXOPTS) $(BUILDDIR)/epub 95 | @echo 96 | @echo "Build finished. The epub file is in $(BUILDDIR)/epub." 97 | 98 | latex: 99 | $(SPHINXBUILD) -b latex $(ALLSPHINXOPTS) $(BUILDDIR)/latex 100 | @echo 101 | @echo "Build finished; the LaTeX files are in $(BUILDDIR)/latex." 102 | @echo "Run \`make' in that directory to run these through (pdf)latex" \ 103 | "(use \`make latexpdf' here to do that automatically)." 104 | 105 | latexpdf: 106 | $(SPHINXBUILD) -b latex $(ALLSPHINXOPTS) $(BUILDDIR)/latex 107 | @echo "Running LaTeX files through pdflatex..." 108 | $(MAKE) -C $(BUILDDIR)/latex all-pdf 109 | @echo "pdflatex finished; the PDF files are in $(BUILDDIR)/latex." 110 | 111 | text: 112 | $(SPHINXBUILD) -b text $(ALLSPHINXOPTS) $(BUILDDIR)/text 113 | @echo 114 | @echo "Build finished. The text files are in $(BUILDDIR)/text." 115 | 116 | man: 117 | $(SPHINXBUILD) -b man $(ALLSPHINXOPTS) $(BUILDDIR)/man 118 | @echo 119 | @echo "Build finished. The manual pages are in $(BUILDDIR)/man." 120 | 121 | texinfo: 122 | $(SPHINXBUILD) -b texinfo $(ALLSPHINXOPTS) $(BUILDDIR)/texinfo 123 | @echo 124 | @echo "Build finished. The Texinfo files are in $(BUILDDIR)/texinfo." 125 | @echo "Run \`make' in that directory to run these through makeinfo" \ 126 | "(use \`make info' here to do that automatically)." 127 | 128 | info: 129 | $(SPHINXBUILD) -b texinfo $(ALLSPHINXOPTS) $(BUILDDIR)/texinfo 130 | @echo "Running Texinfo files through makeinfo..." 131 | make -C $(BUILDDIR)/texinfo info 132 | @echo "makeinfo finished; the Info files are in $(BUILDDIR)/texinfo." 133 | 134 | gettext: 135 | $(SPHINXBUILD) -b gettext $(I18NSPHINXOPTS) $(BUILDDIR)/locale 136 | @echo 137 | @echo "Build finished. The message catalogs are in $(BUILDDIR)/locale." 138 | 139 | changes: 140 | $(SPHINXBUILD) -b changes $(ALLSPHINXOPTS) $(BUILDDIR)/changes 141 | @echo 142 | @echo "The overview file is in $(BUILDDIR)/changes." 143 | 144 | linkcheck: 145 | $(SPHINXBUILD) -b linkcheck $(ALLSPHINXOPTS) $(BUILDDIR)/linkcheck 146 | @echo 147 | @echo "Link check complete; look for any errors in the above output " \ 148 | "or in $(BUILDDIR)/linkcheck/output.txt." 149 | 150 | doctest: 151 | $(SPHINXBUILD) -b doctest $(ALLSPHINXOPTS) $(BUILDDIR)/doctest 152 | @echo "Testing of doctests in the sources finished, look at the " \ 153 | "results in $(BUILDDIR)/doctest/output.txt." 154 | -------------------------------------------------------------------------------- /spec8/_static/.keep: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ubjson/universal-binary-json/b3037c84600d6d34f505f6175716f10f5274538e/spec8/_static/.keep -------------------------------------------------------------------------------- /spec8/conf.py: -------------------------------------------------------------------------------- 1 | # -*- coding: utf-8 -*- 2 | # 3 | # Universal Binary JSON documentation build configuration file, created by 4 | # sphinx-quickstart on Sat Aug 4 16:26:00 2012. 5 | # 6 | # This file is execfile()d with the current directory set to its containing dir. 7 | # 8 | # Note that not all possible configuration values are present in this 9 | # autogenerated file. 10 | # 11 | # All configuration values have a default; values that are commented out 12 | # serve to show the default. 13 | 14 | import sys, os 15 | 16 | # If extensions (or modules to document with autodoc) are in another directory, 17 | # add these directories to sys.path here. If the directory is relative to the 18 | # documentation root, use os.path.abspath to make it absolute, like shown here. 19 | #sys.path.insert(0, os.path.abspath('.')) 20 | 21 | # -- General configuration ----------------------------------------------------- 22 | 23 | # If your documentation needs a minimal Sphinx version, state it here. 24 | #needs_sphinx = '1.0' 25 | 26 | # Add any Sphinx extension module names here, as strings. They can be extensions 27 | # coming with Sphinx (named 'sphinx.ext.*') or your custom ones. 28 | extensions = [] 29 | 30 | # Add any paths that contain templates here, relative to this directory. 31 | templates_path = ['_templates'] 32 | 33 | # The suffix of source filenames. 34 | source_suffix = '.rst' 35 | 36 | # The encoding of source files. 37 | #source_encoding = 'utf-8-sig' 38 | 39 | # The master toctree document. 40 | master_doc = 'index' 41 | 42 | # General information about the project. 43 | project = u'Universal Binary JSON' 44 | copyright = u'2012, UBJSON Community' 45 | 46 | # The version info for the project you're documenting, acts as replacement for 47 | # |version| and |release|, also used in various other places throughout the 48 | # built documents. 49 | # 50 | # The short X.Y version. 51 | version = '0.9' 52 | # The full version, including alpha/beta/rc tags. 53 | release = '0.9-dev' 54 | 55 | # The language for content autogenerated by Sphinx. Refer to documentation 56 | # for a list of supported languages. 57 | #language = None 58 | 59 | # There are two options for replacing |today|: either, you set today to some 60 | # non-false value, then it is used: 61 | #today = '' 62 | # Else, today_fmt is used as the format for a strftime call. 63 | #today_fmt = '%B %d, %Y' 64 | 65 | # List of patterns, relative to source directory, that match files and 66 | # directories to ignore when looking for source files. 67 | exclude_patterns = ['_build'] 68 | 69 | # The reST default role (used for this markup: `text`) to use for all documents. 70 | #default_role = None 71 | 72 | # If true, '()' will be appended to :func: etc. cross-reference text. 73 | #add_function_parentheses = True 74 | 75 | # If true, the current module name will be prepended to all description 76 | # unit titles (such as .. function::). 77 | #add_module_names = True 78 | 79 | # If true, sectionauthor and moduleauthor directives will be shown in the 80 | # output. They are ignored by default. 81 | #show_authors = False 82 | 83 | # The name of the Pygments (syntax highlighting) style to use. 84 | pygments_style = 'sphinx' 85 | 86 | # A list of ignored prefixes for module index sorting. 87 | #modindex_common_prefix = [] 88 | 89 | 90 | # -- Options for HTML output --------------------------------------------------- 91 | 92 | # The theme to use for HTML and HTML Help pages. See the documentation for 93 | # a list of builtin themes. 94 | html_theme = 'haiku' 95 | 96 | # Theme options are theme-specific and customize the look and feel of a theme 97 | # further. For a list of options available for each theme, see the 98 | # documentation. 99 | #html_theme_options = {} 100 | 101 | # Add any paths that contain custom themes here, relative to this directory. 102 | #html_theme_path = [] 103 | 104 | # The name for this set of Sphinx documents. If None, it defaults to 105 | # " v documentation". 106 | #html_title = None 107 | 108 | # A shorter title for the navigation bar. Default is the same as html_title. 109 | #html_short_title = None 110 | 111 | # The name of an image file (relative to this directory) to place at the top 112 | # of the sidebar. 113 | #html_logo = None 114 | 115 | # The name of an image file (within the static path) to use as favicon of the 116 | # docs. This file should be a Windows icon file (.ico) being 16x16 or 32x32 117 | # pixels large. 118 | #html_favicon = None 119 | 120 | # Add any paths that contain custom static files (such as style sheets) here, 121 | # relative to this directory. They are copied after the builtin static files, 122 | # so a file named "default.css" will overwrite the builtin "default.css". 123 | html_static_path = ['_static'] 124 | 125 | # If not '', a 'Last updated on:' timestamp is inserted at every page bottom, 126 | # using the given strftime format. 127 | #html_last_updated_fmt = '%b %d, %Y' 128 | 129 | # If true, SmartyPants will be used to convert quotes and dashes to 130 | # typographically correct entities. 131 | #html_use_smartypants = True 132 | 133 | # Custom sidebar templates, maps document names to template names. 134 | #html_sidebars = {} 135 | 136 | # Additional templates that should be rendered to pages, maps page names to 137 | # template names. 138 | #html_additional_pages = {} 139 | 140 | # If false, no module index is generated. 141 | #html_domain_indices = True 142 | 143 | # If false, no index is generated. 144 | #html_use_index = True 145 | 146 | # If true, the index is split into individual pages for each letter. 147 | #html_split_index = False 148 | 149 | # If true, links to the reST sources are added to the pages. 150 | #html_show_sourcelink = True 151 | 152 | # If true, "Created using Sphinx" is shown in the HTML footer. Default is True. 153 | #html_show_sphinx = True 154 | 155 | # If true, "(C) Copyright ..." is shown in the HTML footer. Default is True. 156 | #html_show_copyright = True 157 | 158 | # If true, an OpenSearch description file will be output, and all pages will 159 | # contain a tag referring to it. The value of this option must be the 160 | # base URL from which the finished HTML is served. 161 | #html_use_opensearch = '' 162 | 163 | # This is the file name suffix for HTML files (e.g. ".xhtml"). 164 | #html_file_suffix = None 165 | 166 | # Output file base name for HTML help builder. 167 | htmlhelp_basename = 'UniversalBinaryJSONdoc' 168 | 169 | 170 | # -- Options for LaTeX output -------------------------------------------------- 171 | 172 | latex_elements = { 173 | # The paper size ('letterpaper' or 'a4paper'). 174 | #'papersize': 'letterpaper', 175 | 176 | # The font size ('10pt', '11pt' or '12pt'). 177 | #'pointsize': '10pt', 178 | 179 | # Additional stuff for the LaTeX preamble. 180 | #'preamble': '', 181 | } 182 | 183 | # Grouping the document tree into LaTeX files. List of tuples 184 | # (source start file, target name, title, author, documentclass [howto/manual]). 185 | latex_documents = [ 186 | ('index', 'UniversalBinaryJSON.tex', u'Universal Binary JSON Documentation', 187 | u'UBJSON Community', 'manual'), 188 | ] 189 | 190 | # The name of an image file (relative to this directory) to place at the top of 191 | # the title page. 192 | #latex_logo = None 193 | 194 | # For "manual" documents, if this is true, then toplevel headings are parts, 195 | # not chapters. 196 | #latex_use_parts = False 197 | 198 | # If true, show page references after internal links. 199 | #latex_show_pagerefs = False 200 | 201 | # If true, show URL addresses after external links. 202 | #latex_show_urls = False 203 | 204 | # Documents to append as an appendix to all manuals. 205 | #latex_appendices = [] 206 | 207 | # If false, no module index is generated. 208 | #latex_domain_indices = True 209 | 210 | 211 | # -- Options for manual page output -------------------------------------------- 212 | 213 | # One entry per manual page. List of tuples 214 | # (source start file, name, description, authors, manual section). 215 | man_pages = [ 216 | ('index', 'universalbinaryjson', u'Universal Binary JSON Documentation', 217 | [u'UBJSON Community'], 1) 218 | ] 219 | 220 | # If true, show URL addresses after external links. 221 | #man_show_urls = False 222 | 223 | 224 | # -- Options for Texinfo output ------------------------------------------------ 225 | 226 | # Grouping the document tree into Texinfo files. List of tuples 227 | # (source start file, target name, title, author, 228 | # dir menu entry, description, category) 229 | texinfo_documents = [ 230 | ('index', 'UniversalBinaryJSON', u'Universal Binary JSON Documentation', 231 | u'UBJSON Community', 'UniversalBinaryJSON', 'One line description of project.', 232 | 'Miscellaneous'), 233 | ] 234 | 235 | # Documents to append as an appendix to all manuals. 236 | #texinfo_appendices = [] 237 | 238 | # If false, no module index is generated. 239 | #texinfo_domain_indices = True 240 | 241 | # How to display URL addresses: 'footnote', 'no', or 'inline'. 242 | #texinfo_show_urls = 'footnote' 243 | -------------------------------------------------------------------------------- /spec8/index.html: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 315 | 316 | 317 |
318 | 319 | 320 | 324 |
325 |

Universal Binary JSON

326 |

JSON has become a ubiquitous text-based file format for 327 | data interchange. Its simplicity, ease of processing and (relatively) rich data 328 | typing made it a natural choice for many developers needing to store or shuffle 329 | data between systems quickly and easy.

330 |

Unfortunately, marshaling native programming language constructs in and out of 331 | a text-based representations does have a measurable processing cost associated 332 | with it.

333 |

In high-performance applications, avoiding the text-processing step of JSON can 334 | net big wins in both processing time and size reduction of stored information, 335 | which is where a binary JSON format becomes helpful.

336 |
337 |

Why

338 |

Attempts to make using JSON faster through binary specifications like 339 | BSON, BJSON or Smile exist, but have been rejected 340 | from mass-adoption for two reasons:

341 |
    342 |
  • Custom (Binary-Only) Data Types: 343 | Inclusion of custom data types that have no ancillary in the original JSON 344 | spec, leaving room for incompatibilities to exist as different implementations 345 | of the spec handle the binary-only data types differently.
  • 346 |
  • Complexity: Some specifications provide higher performance or smaller 347 | representations at the cost of a much more complex specification, making 348 | implementations more difficult which can slow or block adoption. One of the key 349 | reasons JSON became as popular as it did was because of its ease of use.
  • 350 |
351 |
352 |
353 |

Goals

354 |

The Universal Binary JSON <> specification has 3 goals:

355 |
    356 |
  1. Universal Compatibility
  2. 357 |
358 |
359 |

Meaning absolute compatibility with the JSON spec itself as well as only 360 | utilizing data types that are natively supported in all popular programming 361 | languages.

362 |

This allows 1:1 transforms between standard JSON and Universal Binary JSON as 363 | well as efficient representation in all popular programming languages without 364 | requiring parser developers to account for strange data types that their 365 | language may not support.

366 |
367 |
    368 |
  1. Ease of Use
  2. 369 |
370 |
371 |

The Universal Binary JSON specification is intentionally defined using a 372 | single core data structure to build up the entire specification.

373 |

This accomplishes two things: it allows the spec to be understood quickly and 374 | allows developers to write trivially simple code to take advantage of it or 375 | interchange data with another system utilizing it.

376 |
377 |
    378 |
  1. Speed / Efficiency
  2. 379 |
380 |
381 | Typically the motivation for using a binary specification over a text-based 382 | one is speed and/or efficiency, so strict attention was paid to selecting data 383 | constructs and representations that are (roughly) 30% smaller than their 384 | compacted JSON counterparts and optimized for fast parsing.
385 |

Got interested? Find more at [http://ubjson.org][ubjson]

386 |

Contents:

387 |
388 |

System Message: ERROR/3 (/home/kxepal/projects/universal-binary-json/spec/index.rst, line 78)

389 |

Unknown directive type "toctree".

390 |
391 | .. toctree::
392 |    :maxdepth: 2
393 | 
394 |   why.rst
395 | 
396 | 
397 | 
398 | 
399 |
400 |
401 |
402 |
403 |

Indices and tables

404 |
    405 |
  • :ref:`genindex`

    406 |
    407 |

    System Message: ERROR/3 (/home/kxepal/projects/universal-binary-json/spec/index.rst, line 88); backlink

    408 |

    Unknown interpreted text role "ref".

    409 |
    410 |
  • 411 |
  • :ref:`modindex`

    412 |
    413 |

    System Message: ERROR/3 (/home/kxepal/projects/universal-binary-json/spec/index.rst, line 89); backlink

    414 |

    Unknown interpreted text role "ref".

    415 |
    416 |
  • 417 |
  • :ref:`search`

    418 |
    419 |

    System Message: ERROR/3 (/home/kxepal/projects/universal-binary-json/spec/index.rst, line 90); backlink

    420 |

    Unknown interpreted text role "ref".

    421 |
    422 |
  • 423 |
424 |
425 |
426 | 427 | 428 | -------------------------------------------------------------------------------- /spec8/index.rst: -------------------------------------------------------------------------------- 1 | .. Universal Binary JSON documentation master file, created by 2 | sphinx-quickstart on Sat Aug 4 16:26:00 2012. 3 | You can adapt this file completely to your liking, but it should at least 4 | contain the root `toctree` directive. 5 | 6 | Universal Binary JSON 7 | ===================== 8 | 9 | `JSON`_ has become a ubiquitous text-based file format for data interchange. 10 | Its simplicity, ease of processing and (relatively) rich data typing made it a 11 | natural choice for many developers needing to store or shuffle data between 12 | systems quickly and easy. 13 | 14 | Unfortunately, marshaling native programming language constructs in and out of 15 | a text-based representations does have a measurable processing cost associated 16 | with it. 17 | 18 | In high-performance applications, avoiding the text-processing step of JSON can 19 | net big wins in both processing time and size reduction of stored information, 20 | which is where a binary JSON format becomes helpful. 21 | 22 | .. toctree:: 23 | :maxdepth: 3 24 | 25 | spec.rst 26 | type_reference.rst 27 | libraries.rst 28 | thanks.rst 29 | 30 | Why UBJSON? 31 | ----------- 32 | 33 | Attempts to make using JSON faster through binary specifications like 34 | `BSON`_, `BJSON`_ or `Smile`_ exist, but have been `rejected`_ 35 | from `mass-adoption`_ for two reasons: 36 | 37 | * Custom (Binary-Only) Data Types: 38 | Inclusion of custom data types that have no ancillary in the original JSON 39 | spec, leaving room for incompatibilities to exist as different implementations 40 | of the spec handle the binary-only data types differently. 41 | 42 | * Complexity: Some specifications provide higher performance or smaller 43 | representations at the cost of a `much more complex specification`_, 44 | making implementations more difficult which can slow or block adoption. One of 45 | the key reasons JSON became as popular as it did was because of its ease of 46 | use. 47 | 48 | BSON, for example, defines types for binary data, regular expressions, 49 | JavaScript code blocks and other constructs that have no equivalent data type in 50 | JSON. BJSON defines a binary data type as well, again leaving the door wide open 51 | to interpretation that can potentially lead to incompatibilities between two 52 | implementations of the spec and Smile, while the closest, defines more complex 53 | data constructs and generation/parsing rules in the name of absolute space 54 | efficiency. 55 | 56 | The existing binary JSON specifications all define incompatibilities or 57 | complexities that undo the singular tenet that made JSON so successful: 58 | **simplicity**. 59 | 60 | JSON’s simplicity made it accessible to anyone, made implementations in every 61 | language available and made explaining it to anyone consuming your data 62 | immediate. 63 | 64 | Any successful binary JSON specification must carry these properties forward for 65 | it to be genuinely helpful to the community at large. 66 | 67 | This specification is defined around a singular construct used to build up and 68 | represent JSON values and objects. Reading and writing the format is trivial, 69 | designed with the goal of being understood in under 10 minutes (likely less if 70 | you are very comfortable with JSON already). 71 | 72 | Fortunately, while the Universal Binary JSON specification carriers these 73 | tenets of simplicity forward, it is also able to take advantage of optimized 74 | binary data structures that are (on average) 30% smaller than compacted JSON and 75 | specified for ultimate read performance; bringing **simplicity**, **size** and 76 | **performance** all together into a single specification that is 100% compatible 77 | with JSON. 78 | 79 | Why not JSON+gzip? 80 | ------------------ 81 | 82 | On the surface simply gzipping your compacted JSON may seem like a valid (and 83 | smaller) alternative to using the Universal Binary JSON specification, but there 84 | are two significant costs associated with this approach that you should be aware 85 | of: 86 | 87 | #. At least a `50% performance overhead`_ for processing the data. 88 | #. Lack of data clarity and inability to inspect it directly. 89 | 90 | While gzipping your JSON will give you great compression, about 75% on average, 91 | the overhead required to read/write the data becomes significantly higher. 92 | Additionally, because the binary data is now in a compressed format you can no 93 | longer open it directly in an editor and scan the human-readable portions of it 94 | easily; which can be important during debugging, testing or data verification 95 | and recovery. 96 | 97 | Utilizing the Universal Binary JSON format will typically provide a 98 | 30% reduction in size and store your data in a read-optimized format offering 99 | you much higher performance than even compacted JSON. If you had a usage 100 | scenario where your data is put into long-term cold storage and pulled out in 101 | large chunks for processing, you might even consider gzipping your 102 | Universal Binary JSON files, storing those, and when they are pulled out and 103 | unzipped, you can then process them with all the speed advantages of UBJ. 104 | 105 | As always, deciding which approach is right for your project depends heavily on 106 | what you need. 107 | 108 | Goals 109 | ----- 110 | 111 | The `Universal Binary JSON`_ specification has 3 goals: 112 | 113 | #. **Universal Compatibility** 114 | 115 | Meaning absolute compatibility with the JSON spec itself as well as only 116 | utilizing data types that are natively supported in all popular programming 117 | languages. 118 | 119 | This allows 1:1 transforms between standard JSON and Universal Binary JSON as 120 | well as efficient representation in all popular programming languages without 121 | requiring parser developers to account for strange data types that their 122 | language may not support. 123 | 124 | #. **Ease of Use** 125 | 126 | The Universal Binary JSON specification is intentionally defined using a 127 | single core data structure to build up the entire specification. 128 | 129 | This accomplishes two things: it allows the spec to be understood quickly and 130 | allows developers to write trivially simple code to take advantage of it or 131 | interchange data with another system utilizing it. 132 | 133 | #. **Speed / Efficiency** 134 | 135 | Typically the motivation for using a binary specification over a text-based 136 | one is speed and/or efficiency, so strict attention was paid to selecting 137 | data constructs and representations that are (roughly) 30% smaller than their 138 | compacted JSON counterparts and optimized for fast parsing. 139 | 140 | Indices and tables 141 | ================== 142 | 143 | * :ref:`genindex` 144 | * :ref:`search` 145 | 146 | .. _JSON: http://json.org 147 | .. _UBJSON: http://ubjson.org 148 | .. _Universal Binary JSON: http://ubjson.org 149 | .. _BSON: http://bsonspec.org 150 | .. _BJSON: http://bjson.org 151 | .. _Smile: http://wiki.fasterxml.com/SmileFormat 152 | .. _rejected: https://issues.apache.org/jira/browse/COUCHDB-702 153 | .. _mass-adoption: http://bsonspec.org/#/implementation 154 | .. _much more complex specification: http://wiki.fasterxml.com/SmileFormatSpec 155 | .. _50% performance overhead: http://www.cowtowncoder.com/blog/archives/2009/05/entry_263.html 156 | -------------------------------------------------------------------------------- /spec8/libraries.rst: -------------------------------------------------------------------------------- 1 | 2 | Libraries 3 | ========= 4 | 5 | Below are a list of libraries, by language, that implement the Universal Binary 6 | JSON Specification. 7 | 8 | D 9 | ---- 10 | 11 | * `UBJSON for D `_ 12 | 13 | Java 14 | ---- 15 | 16 | * `Universal Binary JSON Java Library `_ 17 | 18 | .NET 19 | ---- 20 | 21 | * `Ubjson.NET `_ 22 | 23 | Node.js 24 | ------- 25 | 26 | * `node-ubjson `_ 27 | 28 | Python 29 | ------ 30 | 31 | * `simpleubjson `_ 32 | 33 | -------------------------------------------------------------------------------- /spec8/make.bat: -------------------------------------------------------------------------------- 1 | @ECHO OFF 2 | 3 | REM Command file for Sphinx documentation 4 | 5 | if "%SPHINXBUILD%" == "" ( 6 | set SPHINXBUILD=sphinx-build 7 | ) 8 | set BUILDDIR=_build 9 | set ALLSPHINXOPTS=-d %BUILDDIR%/doctrees %SPHINXOPTS% . 10 | set I18NSPHINXOPTS=%SPHINXOPTS% . 11 | if NOT "%PAPER%" == "" ( 12 | set ALLSPHINXOPTS=-D latex_paper_size=%PAPER% %ALLSPHINXOPTS% 13 | set I18NSPHINXOPTS=-D latex_paper_size=%PAPER% %I18NSPHINXOPTS% 14 | ) 15 | 16 | if "%1" == "" goto help 17 | 18 | if "%1" == "help" ( 19 | :help 20 | echo.Please use `make ^` where ^ is one of 21 | echo. html to make standalone HTML files 22 | echo. dirhtml to make HTML files named index.html in directories 23 | echo. singlehtml to make a single large HTML file 24 | echo. pickle to make pickle files 25 | echo. json to make JSON files 26 | echo. htmlhelp to make HTML files and a HTML help project 27 | echo. qthelp to make HTML files and a qthelp project 28 | echo. devhelp to make HTML files and a Devhelp project 29 | echo. epub to make an epub 30 | echo. latex to make LaTeX files, you can set PAPER=a4 or PAPER=letter 31 | echo. text to make text files 32 | echo. man to make manual pages 33 | echo. texinfo to make Texinfo files 34 | echo. gettext to make PO message catalogs 35 | echo. changes to make an overview over all changed/added/deprecated items 36 | echo. linkcheck to check all external links for integrity 37 | echo. doctest to run all doctests embedded in the documentation if enabled 38 | goto end 39 | ) 40 | 41 | if "%1" == "clean" ( 42 | for /d %%i in (%BUILDDIR%\*) do rmdir /q /s %%i 43 | del /q /s %BUILDDIR%\* 44 | goto end 45 | ) 46 | 47 | if "%1" == "html" ( 48 | %SPHINXBUILD% -b html %ALLSPHINXOPTS% %BUILDDIR%/html 49 | if errorlevel 1 exit /b 1 50 | echo. 51 | echo.Build finished. The HTML pages are in %BUILDDIR%/html. 52 | goto end 53 | ) 54 | 55 | if "%1" == "dirhtml" ( 56 | %SPHINXBUILD% -b dirhtml %ALLSPHINXOPTS% %BUILDDIR%/dirhtml 57 | if errorlevel 1 exit /b 1 58 | echo. 59 | echo.Build finished. The HTML pages are in %BUILDDIR%/dirhtml. 60 | goto end 61 | ) 62 | 63 | if "%1" == "singlehtml" ( 64 | %SPHINXBUILD% -b singlehtml %ALLSPHINXOPTS% %BUILDDIR%/singlehtml 65 | if errorlevel 1 exit /b 1 66 | echo. 67 | echo.Build finished. The HTML pages are in %BUILDDIR%/singlehtml. 68 | goto end 69 | ) 70 | 71 | if "%1" == "pickle" ( 72 | %SPHINXBUILD% -b pickle %ALLSPHINXOPTS% %BUILDDIR%/pickle 73 | if errorlevel 1 exit /b 1 74 | echo. 75 | echo.Build finished; now you can process the pickle files. 76 | goto end 77 | ) 78 | 79 | if "%1" == "json" ( 80 | %SPHINXBUILD% -b json %ALLSPHINXOPTS% %BUILDDIR%/json 81 | if errorlevel 1 exit /b 1 82 | echo. 83 | echo.Build finished; now you can process the JSON files. 84 | goto end 85 | ) 86 | 87 | if "%1" == "htmlhelp" ( 88 | %SPHINXBUILD% -b htmlhelp %ALLSPHINXOPTS% %BUILDDIR%/htmlhelp 89 | if errorlevel 1 exit /b 1 90 | echo. 91 | echo.Build finished; now you can run HTML Help Workshop with the ^ 92 | .hhp project file in %BUILDDIR%/htmlhelp. 93 | goto end 94 | ) 95 | 96 | if "%1" == "qthelp" ( 97 | %SPHINXBUILD% -b qthelp %ALLSPHINXOPTS% %BUILDDIR%/qthelp 98 | if errorlevel 1 exit /b 1 99 | echo. 100 | echo.Build finished; now you can run "qcollectiongenerator" with the ^ 101 | .qhcp project file in %BUILDDIR%/qthelp, like this: 102 | echo.^> qcollectiongenerator %BUILDDIR%\qthelp\UniversalBinaryJSON.qhcp 103 | echo.To view the help file: 104 | echo.^> assistant -collectionFile %BUILDDIR%\qthelp\UniversalBinaryJSON.ghc 105 | goto end 106 | ) 107 | 108 | if "%1" == "devhelp" ( 109 | %SPHINXBUILD% -b devhelp %ALLSPHINXOPTS% %BUILDDIR%/devhelp 110 | if errorlevel 1 exit /b 1 111 | echo. 112 | echo.Build finished. 113 | goto end 114 | ) 115 | 116 | if "%1" == "epub" ( 117 | %SPHINXBUILD% -b epub %ALLSPHINXOPTS% %BUILDDIR%/epub 118 | if errorlevel 1 exit /b 1 119 | echo. 120 | echo.Build finished. The epub file is in %BUILDDIR%/epub. 121 | goto end 122 | ) 123 | 124 | if "%1" == "latex" ( 125 | %SPHINXBUILD% -b latex %ALLSPHINXOPTS% %BUILDDIR%/latex 126 | if errorlevel 1 exit /b 1 127 | echo. 128 | echo.Build finished; the LaTeX files are in %BUILDDIR%/latex. 129 | goto end 130 | ) 131 | 132 | if "%1" == "text" ( 133 | %SPHINXBUILD% -b text %ALLSPHINXOPTS% %BUILDDIR%/text 134 | if errorlevel 1 exit /b 1 135 | echo. 136 | echo.Build finished. The text files are in %BUILDDIR%/text. 137 | goto end 138 | ) 139 | 140 | if "%1" == "man" ( 141 | %SPHINXBUILD% -b man %ALLSPHINXOPTS% %BUILDDIR%/man 142 | if errorlevel 1 exit /b 1 143 | echo. 144 | echo.Build finished. The manual pages are in %BUILDDIR%/man. 145 | goto end 146 | ) 147 | 148 | if "%1" == "texinfo" ( 149 | %SPHINXBUILD% -b texinfo %ALLSPHINXOPTS% %BUILDDIR%/texinfo 150 | if errorlevel 1 exit /b 1 151 | echo. 152 | echo.Build finished. The Texinfo files are in %BUILDDIR%/texinfo. 153 | goto end 154 | ) 155 | 156 | if "%1" == "gettext" ( 157 | %SPHINXBUILD% -b gettext %I18NSPHINXOPTS% %BUILDDIR%/locale 158 | if errorlevel 1 exit /b 1 159 | echo. 160 | echo.Build finished. The message catalogs are in %BUILDDIR%/locale. 161 | goto end 162 | ) 163 | 164 | if "%1" == "changes" ( 165 | %SPHINXBUILD% -b changes %ALLSPHINXOPTS% %BUILDDIR%/changes 166 | if errorlevel 1 exit /b 1 167 | echo. 168 | echo.The overview file is in %BUILDDIR%/changes. 169 | goto end 170 | ) 171 | 172 | if "%1" == "linkcheck" ( 173 | %SPHINXBUILD% -b linkcheck %ALLSPHINXOPTS% %BUILDDIR%/linkcheck 174 | if errorlevel 1 exit /b 1 175 | echo. 176 | echo.Link check complete; look for any errors in the above output ^ 177 | or in %BUILDDIR%/linkcheck/output.txt. 178 | goto end 179 | ) 180 | 181 | if "%1" == "doctest" ( 182 | %SPHINXBUILD% -b doctest %ALLSPHINXOPTS% %BUILDDIR%/doctest 183 | if errorlevel 1 exit /b 1 184 | echo. 185 | echo.Testing of doctests in the sources finished, look at the ^ 186 | results in %BUILDDIR%/doctest/output.txt. 187 | goto end 188 | ) 189 | 190 | :end 191 | -------------------------------------------------------------------------------- /spec8/spec.rst: -------------------------------------------------------------------------------- 1 | 2 | Specification 3 | +++++++++++++ 4 | 5 | Data Format 6 | =========== 7 | 8 | The Universal Binary JSON specification utilizes a single binary tuple to 9 | represent all JSON data types (both value and container types):: 10 | 11 | [][] 12 | 13 | Each element in the tuple is defined as: 14 | 15 | * **type** 16 | 17 | * A 1-byte ASCII char used to indicate the type of the data following it. 18 | * A single ASCII char was chosen to make manually walking and debugging 19 | data stored in the Universal Binary JSON format as easy as possible 20 | (e.g. making the data relatively readable in a hex editor). 21 | * **length** (OPTIONAL) 22 | 1-byte or 4-byte length value based on the type specified. This allows 23 | for more aggressive compression and space-optimization when dealing with 24 | a lot of small values. 25 | 26 | * 1-byte: An unsigned byte value (``0`` to ``254``) used to indicate the 27 | length of the data payload following it. Useful for small items. 28 | * 4-byte: An unsigned integer value (``0`` to ``2,147,483,647``) used to 29 | indicate the length of the data payload following it. Useful for larger 30 | items. 31 | * **data** (OPTIONAL) 32 | A run of bytes representing the actual binary data for this type of value. 33 | 34 | In the name of efficiency, the length and data fields are optional depending on 35 | the type of value being encoded. Some value are simple enough that just writing 36 | the 1-byte ASCII marker into the stream is enough to represent the value 37 | (e.g. `null`) while others have a type that is specific enough that no length is 38 | needed as the length is implied by the type (e.g. `int32`). 39 | 40 | The specifics of each data type will be spelled out down below for more clarity. 41 | 42 | The basic organization provided by this tuple (`type-length-data`) allows each 43 | JSON construct to be represented in a binary format that is simple to read and 44 | write without the need for complex/custom encodings or ``null``-termating bytes 45 | anywhere in the stream that has to be scanned for or references resolved. 46 | 47 | .. _value_types: 48 | 49 | Value Types 50 | =========== 51 | 52 | This section describes the mapping between the 5 discrete value types from the 53 | JSON specification into the Universal Binary JSON format. 54 | 55 | JSON 56 | ---- 57 | 58 | The JSON specification defines 7 value types: 59 | 60 | * string 61 | * number 62 | * object (container) 63 | * array (container) 64 | * true 65 | * false 66 | * null 67 | 68 | Of those 7 values, 2 of them are types describing containers that hold the 5 69 | basic values. We have a separate section below for looking at the 2 container 70 | types specifically, so for the time being let’s only consider the following 5 71 | discrete value types: 72 | 73 | * string 74 | * number 75 | * true 76 | * false 77 | * null 78 | 79 | Most of these types have a ``1:1`` mapping to a primitive type in most popular 80 | programming languages (Java, C, Python, PHP, Erlang, etc.) except for `number`. 81 | This makes defining the types for the 4 easy, but let’s take a closer look at 82 | how we might deconstruct `number` into its core representations. 83 | 84 | Number Type 85 | ^^^^^^^^^^^ 86 | 87 | In JavaScript, the `Number`_ type can represent any numeric value where as many 88 | other languages define numbers using 3-6 discrete numeric types depending on the 89 | type and length of the value being stored. This allows the runtime to handle 90 | numeric operations more efficiently. 91 | 92 | In order for the Universal Binary JSON specification to be a performant 93 | alternative to JSON, support for these most common numeric types had to be added 94 | to allow for more efficient reading and writing of numeric values. 95 | 96 | `number` is deconstructed in the Universal Binary JSON specification and defined 97 | by the following **signed** numeric types: 98 | 99 | * byte (8-bits, 1-byte) 100 | * int16 (16-bits, 2-bytes) 101 | * int32 (32-bits, 4-bytes) 102 | * int64 (64-bits, 8-bytes) 103 | * float (32-bits, 4-bytes) 104 | * double (64-bits, 8-bytes) 105 | * huge (arbitrarily long, UTF-8 string-encoded numeric value) 106 | 107 | Trying to maintain a single `number` type represented in binary form would have 108 | lead to parsing complexity and slow-downs as the processing language would have 109 | to further inspect the value and map it to the most optimal type. 110 | By pre-defining these different numeric types directly in binary, in most 111 | languages the number can stay in their optimal form on disk and be deserialized 112 | back into their native representation with very little overhead. 113 | 114 | When working on a platform like JavaScript that has a singular type for numbers, 115 | all of these data types (with the exception of `huge`) can simply be mapped back 116 | to the `number` type with ease and no loss of precision. 117 | 118 | When converting these formats back to JSON, all of the numeric types can simply 119 | be rendered as the singular number type defined by the JSON spec without issue; 120 | there is total compatibility! 121 | 122 | Value Type Summary 123 | ^^^^^^^^^^^^^^^^^^ 124 | 125 | Now that we have clearly defined all of our (signed) numeric types and mapped 126 | the 4 remaining simple types to Universal Binary JSON, we have our final list of 127 | discrete value types: 128 | 129 | * null 130 | * false 131 | * true 132 | * byte 133 | * int16 134 | * int32 135 | * int64 136 | * float 137 | * double 138 | * huge 139 | * string 140 | 141 | Now that we have defined all the types we need, let’s see how these are actually 142 | represented in binary in the next section. 143 | 144 | Universal Binary JSON 145 | --------------------- 146 | 147 | The Universal Binary JSON specification defines a total of 13 discrete value 148 | types (that we saw in the last section) all delimited in the binary file by a 149 | specific, 1-byte ASCII character (optionally) followed by a length and 150 | (optionally) a data payload containing the value data itself. 151 | 152 | Some of the values (`null`, `true`` and `false`) are specific enough that 153 | the single 1-byte ASCII character is enough to represent the value in the format 154 | and they will have no `length` or `data` section. 155 | 156 | All of the numeric values (except `huge`) automatically imply a length by 157 | virtue of the type of number they are. For example, a 4-byte `int32` always 158 | has a length of 4-bytes; an 8-byte `double` always requires 8 bytes of data. 159 | 160 | In these cases the ASCII marker for these types are immediately followed by the 161 | data representing the number with no `length` value in between. 162 | 163 | Because `string` and `huge` are potentially variable length, they contain all 3 164 | elements of the tuple: `type-length-data`. 165 | 166 | This table shows the official definitions of the discrete value types: 167 | 168 | +-----------------+--------------------------+--------+---------+--------------+ 169 | | Type | Size | Marker | Length? | Data? | 170 | +=================+==========================+========+=========+==============+ 171 | | null | 1-byte | Z | No | No | 172 | +-----------------+--------------------------+--------+---------+--------------+ 173 | | true | 1-byte | T | No | No | 174 | +-----------------+--------------------------+--------+---------+--------------+ 175 | | false | 1-byte | F | No | No | 176 | +-----------------+--------------------------+--------+---------+--------------+ 177 | | byte | 2-bytes | B | No | Yes | 178 | +-----------------+--------------------------+--------+---------+--------------+ 179 | | int16 | 3-bytes | i | No | Yes | 180 | +-----------------+--------------------------+--------+---------+--------------+ 181 | | int32 | 5-bytes | I | No | Yes | 182 | +-----------------+--------------------------+--------+---------+--------------+ 183 | | int64 | 9-bytes | L | No | Yes | 184 | +-----------------+--------------------------+--------+---------+--------------+ 185 | | float (32-bit) | 5-bytes | d | No | Yes | 186 | +-----------------+--------------------------+--------+---------+--------------+ 187 | | double (64-bit) | 9-bytes | D | No | Yes | 188 | +-----------------+--------------------------+--------+---------+--------------+ 189 | | huge (number) | 2-bytes | h | Yes | Yes | 190 | | | + byte length of string | | | if non-empty | 191 | +-----------------+--------------------------+--------+---------+--------------+ 192 | | huge (number) | 5-bytes | H | Yes | Yes, | 193 | | | + byte length of string | | | if non-empty | 194 | +-----------------+--------------------------+--------+---------+--------------+ 195 | | string | 2-bytes | s | Yes | Yes, | 196 | | | + byte length of string | | | if non-empty | 197 | +-----------------+--------------------------+--------+---------+--------------+ 198 | | string | 5-bytes | S | Yes | Yes, | 199 | | | + byte length of string | | | if non-empty | 200 | +-----------------+--------------------------+--------+---------+--------------+ 201 | 202 | .. note:: 203 | 204 | The duplicate (lowercased) ``h`` and ``s`` types are just versions of those 205 | types that allow for a 1-byte length (instead of 4-byte length) to be used for 206 | more compact storage when length is ``<= 254``. 207 | 208 | With each field of the table described as: 209 | 210 | * **Type** 211 | 212 | * The binary value data type defined by the spec. 213 | 214 | * **Size** 215 | 216 | * The byte-size of the construct, as stored in the binary format. This is not 217 | the value of the `length` field, just an indicator to you (the reader) of 218 | approximately how much space writing out a value of this type will take. 219 | 220 | * **Marker** 221 | 222 | * The single ASCII character marker used to delimit the different types of 223 | values in the binary format. When reading in bytes from a file stored in 224 | this format, you can simply check the decimal value of the byte 225 | (e.g. ``'A' = 65``) and switch on that value for processing. 226 | 227 | * **Length?** 228 | 229 | * Indicates if the data type provides a length value between the ASCII marker 230 | and the data payload. 231 | * Many of the data types represented in the binary format either don’t have a 232 | length (`null`, `true` or `false`) or their types (e.g. the numeric 233 | values) are specific enough that the length is implied. 234 | * When specifying the length for a string or huge value (UTF-8 encoded), the 235 | length **must represent the number of bytes** of the UTF-8 string and not 236 | the number of characters in the string. 237 | 238 | .. note:: 239 | 240 | For example, English typically uses 1-byte per character, so the string 241 | “hello” has a length of 5. The same string in Russian is “привет” with a 242 | byte length of 12 and in Arabic the text becomes “مرحبا” with a byte length 243 | of 10. 244 | 245 | * **Data?** 246 | 247 | * Indicates if the data type provides a data payload representing the value. 248 | * Most types except for `null`, `true` and `false` provide a data payload 249 | indicating their value. 250 | * Variable-length types like `string` and `huge` **do not** provide a data 251 | payload when they are empty (i.e. length of 0).More specifically, if you are 252 | writing a parser for the Universal Binary JSON format and you encounter a 253 | `string` of length 0, you know the very next byte is an ASCII marker for 254 | another value since the `string` has no data payload. 255 | 256 | .. note:: 257 | 258 | **Using Numeric Types** 259 | 260 | It is always recommended to use the smallest numeric type that fits your 261 | needs. For data with a large amount of numeric data, this can cut down the 262 | size of the payloads significantly (on average a **50% reduction in size**). 263 | 264 | All numeric types are **signed**. 265 | 266 | Numeric values of `infinity` are encoded as a `null` (``Z``) value. 267 | (See `ECMA`_, See `JSON presentation`_) 268 | 269 | **64-bit Integers** 270 | 271 | While almost all languages native support 64-bit integers, not all do 272 | (e.g. C89 and JavaScript (`yet`_)) and care must be taken when encoding 64-bit 273 | integer values into binary JSON then attempting to decode it on a platform 274 | that doesn't support it. 275 | 276 | If you are fully aware of the platforms and runtime environments your binary 277 | JSON is being used on and know they all support 64-bit integers, then you are 278 | fine. 279 | 280 | If you are trying to deserialize 64-bit integers in a client’s browser in 281 | JavaScript or another environment that does not support 64-bit integers, then 282 | you will want to take care to skip them in the input or have the client 283 | producing them encode them as `double` or `huge` values if that is easier to 284 | handle. 285 | 286 | Alternatively you might consider encoding your 64-bit values as doubles if you 287 | know you are going from the server to a client JavaScript environment with the 288 | binary-encoded information. 289 | 290 | **32-bit Floats** 291 | 292 | All 32-bit float values are written into the binary format using the 293 | `IEEE 754 single precision floating point format`_, which is the following 294 | structure: 295 | 296 | * Bit 31 (1 bit) – sign 297 | * Bit 30-23 (8 bits) – exponent 298 | * Bit 22-0 (23 bits) – fraction (significand) 299 | 300 | **64-bit Doubles** 301 | 302 | All 64-bit double values are written into the binary format using the 303 | `IEEE 754 double precision floating point format`_, which is the following 304 | structure: 305 | 306 | * Bit 63 (1 bit) – sign 307 | * Bit 62-52 (11 bits) – exponent 308 | * Bit 51-0 (52 bits) – fraction (significand) 309 | 310 | **huge Numeric Type** 311 | 312 | The huge numeric type is a safe and portable way for representing 313 | **values > 64-bit** by way of an UTF-8 encoded string. The format of this 314 | string **must adhere** to the `JSON number specification`_. 315 | 316 | This allows `huge` numbers to be portable across all platforms and easily 317 | converted to/from JSON as well as more robust handling on platforms that may 318 | not support arbitrarily large numbers. 319 | 320 | If you are working on a platform that has no support for huge numbers, please 321 | see our :ref:`Best Practices ` recommendation on how to handle 322 | them. 323 | 324 | It is considered a violation of this specification to store numeric 325 | **values <= 64-bit** as a `huge`. 326 | 327 | This decision was made in order to simplify the parsing logic required to 328 | process the Universal Binary JSON specification; there is no need to 329 | introspect `huge` values to see if they contain smaller numeric values when 330 | mapping UBJSON types to native types of the runtime environment. 331 | 332 | The `huge` type should only be used when you need to (safely and portably) 333 | represent **values > 64-bit**. 334 | 335 | **UTF-8 Encoding** 336 | 337 | The JSON specification does not dictate a specific required encoding, it does 338 | however use UTF-8 as the default encoding. 339 | 340 | The Universal Binary JSON specification dictates `UTF-8`_ as the 341 | **required string encoding**. This will allow you to easily exchange binary 342 | JSON between open systems that all follow this encoding requirement. 343 | 344 | Fortunately, this is ideal for `a multitude of reasons`_ like space efficiency 345 | and compatibility across systems and alternative formats. 346 | 347 | To further clarify the binary layout of these data types, below are some visual 348 | examples of what the bytes would look like inside of a binary JSON file. 349 | 350 | NOTE: ``[ ]``-block notation is used for readability, the ``[ ]`` characters 351 | **are not** actually written out in the binary format. 352 | 353 | +----------------------------------+-------------------------------------------+ 354 | | Binary Representation | Description | 355 | +==================================+===========================================+ 356 | | ``[Z]`` | 1-byte, null value | 357 | +----------------------------------+-------------------------------------------+ 358 | | ``[T]`` | 1-byte, true value | 359 | +----------------------------------+-------------------------------------------+ 360 | | ``[F]`` | 1-byte, false value | 361 | +----------------------------------+-------------------------------------------+ 362 | | ``[B][127]`` | 2-bytes, 8-bit byte value of 127 | 363 | +----------------------------------+-------------------------------------------+ 364 | | ``[I][32427]`` | 5-bytes, 32-bit integer value of 32,427 | 365 | +----------------------------------+-------------------------------------------+ 366 | | ``[L][12147483647]`` | 9-bytes, 64-bit integer value of | 367 | | | 12,147,483,647 | 368 | +----------------------------------+-------------------------------------------+ 369 | | ``[d][3.14159]`` | 5-bytes, 32-bit float value of 3.14159 | 370 | +----------------------------------+-------------------------------------------+ 371 | | ``[D][72.38138221]`` | 9-bytes, 64-bit double value of | 372 | | | 72.38138221 | 373 | +----------------------------------+-------------------------------------------+ 374 | | ``[s][5][hello]`` | 7 bytes, string UTF-8 “hello” (English) | 375 | +----------------------------------+-------------------------------------------+ 376 | | ``[s][12][привет]`` | 14 bytes, string UTF-8 “hello” (Russian) | 377 | +----------------------------------+-------------------------------------------+ 378 | | ``[s][10][مرحبا]`` | 12 bytes, string UTF-8 “hello” (Arabic) | 379 | +----------------------------------+-------------------------------------------+ 380 | | ``[S][1024][...long string...]`` | 5 bytes + 1024 bytes for the long string | 381 | +----------------------------------+-------------------------------------------+ 382 | | ``[s][4][name][s][3][bob]`` | 6 + 5 bytes, equivalent of “name”: “bob” | 383 | +----------------------------------+-------------------------------------------+ 384 | 385 | Now that we have seen how the JSON data value types map to the binary format, 386 | in the next section we will see how we can combine these values together into 387 | the two container types (objects and arrays) to create complex object 388 | hierarchies using the Universal Binary JSON format. 389 | 390 | .. _container_types: 391 | 392 | Container Types 393 | =============== 394 | 395 | In this section we will look at the 2 remaining JSON value types that we are 396 | referring to as “container types”, namely object and array. 397 | 398 | JSON 399 | ---- 400 | 401 | The two JSON container types are described as follows: 402 | 403 | * **object** 404 | 405 | * A construct containing 0 or more name-value pairings, where the name is 406 | always a string and the value can be any valid value type including 407 | container types themselves. 408 | 409 | * **array** 410 | 411 | * A flat list of values only, where the values can by any valid value type 412 | including container types themselves. 413 | * The JSON specification does not make it a requirement that the values in an 414 | array are all of the same type and neither does the Universal Binary JSON 415 | specification. 416 | 417 | .. note:: 418 | **Advanced**: This can actually be to your benefit. Take for example an array 419 | of `int64` values, as you are writing them out to a file or a stream, you can 420 | check the actual value of each `int64` and depending on the value, encode 421 | each one into the smallest possible numeric type (e.g. `byte`, `int32`, etc.). 422 | 423 | This can lead to a significant size reduction (say **50% smaller**) in 424 | smaller numeric values! 425 | 426 | Given these two constructs, let’s see how they are modeled in the Universal 427 | Binary JSON format. 428 | 429 | Universal Binary JSON 430 | --------------------- 431 | 432 | The two container types defined by JSON are modeled using the same tuple that 433 | defines all of our other data structures in this specification so far with a 434 | minor modification: the length value is considered a count of the child elements 435 | the container holds. It does not mean the byte length of the child elements. 436 | 437 | .. note:: 438 | Exactly what *child element* means depends on the container. In an `object`, a 439 | single child element is a name-value pair; in an `array`, a child element is a 440 | single value. 441 | 442 | More specifically, the tuple stays exactly the same, it is just the meaning of 443 | the center `length` element that changes:: 444 | 445 | [][] 446 | 447 | All the code used to process the constructs defined by this specification stays 448 | the same, but when an `object` or `array` construct are encountered, the code 449 | needs to be aware that the `length` value is the **child element count** so it 450 | can know when the scope of the container ends. 451 | 452 | For example, if you have an object that contains 4 arrays of `length` 50, the 453 | `length` argument for the object is 4 (because it contains the four arrays) 454 | while the `length` argument for each array is 50 (because they each hold 455 | 50 elements). 456 | 457 | .. note:: 458 | Unknown-length container types are also supported by the Universal Binary JSON 459 | specification and are covered in detail in the :ref:`Streaming ` 460 | section of this document. 461 | 462 | Additionally, the only optional field in the tuple for container types is 463 | `data`, if the container is empty and contains no elements 464 | (i.e. the `length` is 0) then there is no `data` segment. 465 | 466 | All together, the definitions for the `object` and `array` container types looks 467 | like this: 468 | 469 | +-----------------+--------------------------+--------+---------+--------------+ 470 | | Type | Size | Marker | Length? | Data? | 471 | +=================+==========================+========+=========+==============+ 472 | | array | 2-bytes | a | Yes | Yes, | 473 | | | + byte length of string | | | if non-empty | 474 | +-----------------+--------------------------+--------+---------+--------------+ 475 | | array | 5-bytes | A | Yes | Yes, | 476 | | | + byte length of string | | | if non-empty | 477 | +-----------------+--------------------------+--------+---------+--------------+ 478 | | object | 2-bytes | o | Yes | Yes | 479 | | | + byte length of string | | | if non-empty | 480 | +-----------------+--------------------------+--------+---------+--------------+ 481 | | object | 5-bytes | O | Yes | Yes, | 482 | | | + byte length of string | | | if non-empty | 483 | +-----------------+--------------------------+--------+---------+--------------+ 484 | 485 | .. note:: 486 | `array` and `object` can also be specified in a more compact manner using 487 | 1-byte for the `length` when it is ``<= 254``. Specifying a `length` of 488 | ``255`` for the 1-byte variants has a special meaning of **length unknown** 489 | and is covered in more detail in the :ref:`Streaming ` section of 490 | the spec. 491 | 492 | The details for each field are the same as described for the non-container 493 | values in the previous section with the one caveat that `length` is a count of 494 | child elements and **not** the number of bytes representing the contents of the 495 | container. 496 | 497 | Let’s look at a quick example of encoding an object, again using the handy 498 | ``[ ]``-notation we used before simply for readability (the ``[ ]`` chars are 499 | not written out or part of the file format). 500 | 501 | Consider the following JSON (30-bytes compacted):: 502 | 503 | { 504 | "id": 1234567890, 505 | "name": "bob" 506 | } 507 | 508 | Storing that object in the Universal Binary JSON format would look like this 509 | (whitespace added for readability):: 510 | 511 | [o][2] 2 bytes 512 | [s][2][id][I][1234567890] 4 + 5 = 9 bytes 513 | [s][4][name][s][3][bob] 6 + 5 = 11 bytes 514 | 515 | Our Universal Binary JSON format is 22 bytes, **27% smaller** than our compacted 516 | JSON! 517 | 518 | Walking through our example above, using a word-journey this is what a parser might see and do: 519 | 520 | #. I see an ``o``, so I know I am parsing an `object` and that the next byte is 521 | the `length` (or count) for this object. 522 | #. I see a ``2``, so I know the object contains 2 elements that I must account 523 | for to know when the `object` scope is closed (because we don’t use the 524 | ``{ }`` brackets like JSON). 525 | #. I see an ``s``, knowing how the name-value pairings inside of an object work, 526 | I know this is the `name` portion of some upcoming value. 527 | #. I see an ``I``, I know this is an `int32` value and that it belongs to the 528 | `name` I parsed in the previous step. 529 | #. I see another ``s``, I know this is a new name-value pair and this is the 530 | `name` portion. 531 | #. I see another ``s`` and know this is the value belonging to the `name` I just 532 | processed. 533 | #. I have just parsed 2 values, so now I know the `object` scope is closed. 534 | 535 | Encoding objects containing other `objects` would work identically except we would 536 | have encountered another ``o`` or ``O`` marker and descended a level further 537 | into a new object. 538 | 539 | Let’s look at another example, this time a simple JSON array construct 540 | (remember, they only contain values and not name-value pairs like `objects`). 541 | 542 | This array is 48-bytes in compacted JSON:: 543 | 544 | [ 545 | null, 546 | true, 547 | false, 548 | 4782345193, 549 | 153.132417549, 550 | "ham" 551 | ] 552 | 553 | Storing the array in the Universal Binary JSON format would look like this 554 | (whitespace added for readability):: 555 | 556 | [a][6] - 2 bytes 557 | [Z] - 1 byte 558 | [T] - 1 byte 559 | [F] - 1 byte 560 | [I][4782345193] - 5 bytes 561 | [D][153.132417549] - 9 bytes 562 | [s][3][ham] - 5 bytes 563 | 564 | Our Universal Binary JSON format is 24 bytes or **50% smaller** than the 565 | compacted JSON! 566 | 567 | Because the container types specify their total child element count, it is 568 | easier and faster for parsers to know when the scope of a container has closed 569 | or is still open waiting for more children (e.g. in the case of streaming over 570 | the network). This is not unlike the high-performance `Redis protocol`_. 571 | 572 | This also has the added benefit of not needing any terminating values in the 573 | binary that need to be scanned for to know when a container-scope is closed. 574 | This way data can be read in chunks and not read-and-scanned byte-by-byte. 575 | 576 | As was mentioned previously though, there are some cases where having an 577 | unbounded container are important (for example, streaming content from a server 578 | as it generates it on-the-fly). 579 | 580 | In the next section we will take a look at the Universal Binary JSON constructs 581 | that are optimized for streaming. Fortunately, there are only 3 and they are 582 | just as easy as the constructs we have covered so far! 583 | 584 | 585 | .. _streaming: 586 | 587 | Streaming Types 588 | =============== 589 | 590 | The Universal Binary JSON specification is optimized for fast read-speed by 591 | prefixing the byte-length of every construct to the front of it, this allows 592 | parsers to digest entire chunks of the data stream at a time without scanning 593 | for terminating byte values. 594 | 595 | Unfortunately, this model of data becomes very expensive (and sometimes 596 | impossible) to adhere to in a streaming-friendly environment where a server may 597 | be generating `UBJ` formatted data on-the-fly and streaming it back in real time 598 | to the client. 599 | 600 | If the server had to adhere to the prefixed-length requirement of this 601 | specification up until now, it would have to generate, buffer and count all the 602 | elements in its reply before writing out the Universal Binary JSON so it could 603 | correctly prefix the lengths to all the containers. 604 | 605 | In this section of the specification we look at 1 new additional type to the 606 | Universal Binary JSON specification that compliments our streaming scenario and 607 | then two minor changes to the existing **container types** to enable easy and 608 | efficient streaming with unknown-length support for our `array` and `object` 609 | containers. 610 | 611 | .. _noop: 612 | 613 | No-Op Type 614 | ---------- 615 | 616 | The noop value stands for `No Op` or `No Operation`, it is a specific value 617 | (like ``Z`` for `null`, ``T`` for `true` and ``F`` for `false`) that is useful 618 | in streaming scenarios where an acknowledge of life needs to be sent between two 619 | end points, but the confirmation being sent cannot change the meaning of the 620 | data it is sent within. 621 | 622 | The most common use for such a value type is as a `keep-alive` signal from a 623 | server to the client; letting the client know the server is possibly operating 624 | on a long-running job and is still alive, but just isn’t ready to send more data 625 | yet. 626 | 627 | The `noop` type is defined as follows: 628 | 629 | +-----------------+--------------------------+--------+---------+--------------+ 630 | | Type | Size | Marker | Length? | Data? | 631 | +=================+==========================+========+=========+==============+ 632 | | noop | 1-byte | N | No | No | 633 | +-----------------+--------------------------+--------+---------+--------------+ 634 | 635 | Any parser code written to load the Universal Binary Spec needs to be aware that 636 | encountering the ``N`` marker in files of any kind is valid and is merely useful 637 | as a signal mechanism from producer to consumer to say “Hey, I am still alive.”, 638 | the marker is intended to be safely ignored if the server or client doesn’t need 639 | the acknowledgement. 640 | 641 | In order for this keep-alive-esque construct to work, the specification had to 642 | define a single byte value that had no meaning for the server and client to 643 | exchange if needed, but caused no modification to the meaning of the data that 644 | they are exchanging. 645 | 646 | In code that handles streaming from a server, your handler for the `noop` type 647 | might just reset a disconnect timer. In code that handles UBJ files, you would 648 | simply ignore the noop marker when you encountered it in the file because it 649 | would mean nothing. 650 | 651 | .. warning:: 652 | 653 | The `noop` type is only defined to be used inside of an 654 | :ref:`unknown-length container `. If you have a 655 | container that clearly defines a child element count (`length`) it should not 656 | contain a `noop` marker element. 657 | 658 | Also, the `noop` type **should never** be sent inside of a value (e.g. 659 | embedded inside of a `string` being streamed); it must only be written to the 660 | stream between declared values. 661 | 662 | If your interaction with the Universal Binary JSON format is primarily as a file 663 | format, it is likely that you may never need to use the `noop` type; its value 664 | becomes more apparent in long-lived, client-server, data-streaming scenarios. 665 | 666 | .. _unsized_container: 667 | 668 | Unknown-Length Containers 669 | ------------------------- 670 | 671 | The Universal Binary JSON specification supports containers (`array` and 672 | `object`) of unknown length to be specified when the producer of the binary data 673 | cannot (efficiently) know in advance how many elements it is going to write out. 674 | 675 | In these cases, the lowercased, 1-byte-length versions of array or object must 676 | be used (``a`` or ``o`` markers) with a length value of ``0xFF`` (255) as well 677 | as specifying an ``E`` terminator character after the last element in the 678 | container. 679 | 680 | The ``E`` type used to delimit the end of unknown-length containers is defined as 681 | follows: 682 | 683 | +-----------------+--------------------------+--------+---------+--------------+ 684 | | Type | Size | Marker | Length? | Data? | 685 | +=================+==========================+========+=========+==============+ 686 | | end | 1-byte | E | No | No | 687 | +-----------------+--------------------------+--------+---------+--------------+ 688 | 689 | .. warning:: 690 | 691 | Using a length of ``0xFF`` with the uppercase, 4-byte-length versions of array 692 | (``A``) and object (``O``) is not valid according to this specification. 693 | You must use the 1-byte-length variants of the container types when specifying 694 | an unknown `length`. 695 | 696 | An example would look like this:: 697 | 698 | [a][255] 699 | [S][3][bob] 700 | [I][1024] 701 | [T] 702 | [F] 703 | [S][4][ham!] 704 | [E] 705 | 706 | The three key elements being the lowercased ``a`` marker, the length of ``0xFF`` 707 | (255) and the ``E`` marker at the end of the container. 708 | 709 | Another example might look like this:: 710 | 711 | [o][255] 712 | [B][4] 713 | [D][21.786] 714 | [N] 715 | [Z] 716 | [h][27][131.098412283059e2371293452] 717 | [E] 718 | 719 | You might notice in the example above we injected a `noop` instruction right in 720 | the middle, before the `null`. As mentioned in the :ref:`No-Op Type ` 721 | section, this is valid and can occur at any time while parsing the contents of 722 | an `unknown-length` container. 723 | 724 | If your parser has no need for recognizing the `noop` code (e.g. listening for 725 | a keep-alive) then it can just be safely ignored and parsing continued 726 | (hence the name “no-op”). It is up to the implementation to decide what to do 727 | with the `noop` type. 728 | 729 | You might be wondering how using a 1-byte ``E`` as a terminator to an unbounded 730 | container can work and not get confused with say another ``E`` inside of a 731 | `string`, the reason this works is because none of the discrete value types 732 | (numeric, string, etc.) are of unknown `length`. 733 | 734 | The lengths of all the values contained inside of the container are known and 735 | must be read completely, doing so will guarantee that the ``E`` is only ever 736 | encountered by itself as an element marker which is easily handled by parsing 737 | code to know the scope of the container has been closed. 738 | 739 | 740 | .. _size: 741 | 742 | Size Requirements 743 | ================= 744 | 745 | The Universal Binary JSON specification tries to strike the perfect balance 746 | between space savings, simplicity and performance. 747 | 748 | Data stored using the Universal Binary JSON format are on average 749 | **30% smaller** as a rule of thumb. As you can see from some of the examples in 750 | this document though, it is not uncommon to see the binary representation of 751 | some data lead to a **50% or 60% reduction in size**. 752 | 753 | The size reduction of your data depends heavily on the type of data you are 754 | storing. It is best to do your own benchmarking with a comprehensive sampling 755 | of your own data. 756 | 757 | .. warning:: 758 | 759 | The Universal Binary JSON specification does not use compression algorithms to 760 | achieve smaller storage sizes. The size reduction is a side effect of the 761 | efficient binary storage format. 762 | 763 | Size Reduction Tips 764 | ------------------- 765 | 766 | The amount of storage size reduction you’ll experience with the Universal Binary 767 | JSON format will depend heavily on the type of data you are encoding. 768 | 769 | Some data shrinks considerably, some mildly and some not at all, but in every 770 | case your data will be stored in a much more efficient format that is faster to 771 | read and write. 772 | 773 | Below are pointers to give you an idea of how certain data may shrink in this 774 | format: 775 | 776 | * `null`, `true` and `false` values will compress 75% 777 | (80% in the case of `false`) 778 | * large `numeric` values (> 5 digits < 20 digits) will compress on average 50%. 779 | * `string` values 780 | * of length <= 254 stay the same size. 781 | * of length > 254 are 3-bytes bigger per string. 782 | * `object` and `array` values compress 1-byte-per-element. 783 | 784 | One of the great things about the Universal Binary JSON format is that even 785 | though most all your data will be represented in a smaller footprint, you still 786 | get two big wins: 787 | 788 | #. A smaller data format means faster writes and smaller reads. It also means 789 | less data to process when parsing. 790 | #. Binary format means no encoding/decoding primitive values to text and no 791 | parsing primitive values from text. 792 | 793 | Endianness 794 | ========== 795 | 796 | The Universal Binary JSON specification requires that all numeric values be 797 | written in `Big-Endian`_ order. 798 | 799 | MIME Type 800 | ========= 801 | 802 | The Universal Binary JSON specification is a binary format and recommends using 803 | the following mime type:: 804 | 805 | application/ubjson 806 | 807 | This was added directly to the specification in hopes of avoiding 808 | `similar confusion with JSON`_. 809 | 810 | File Extension 811 | ============== 812 | 813 | ``ubj`` is the `recommended file extension`_ when writing out files using the 814 | Universal Binary JSON format (e.g. ``user.ubj``). 815 | 816 | The extension stands for `“Universal Binary JSON”` and has no known conflicting 817 | mappings to other file formats. 818 | 819 | 820 | .. _best_practices: 821 | 822 | Best Practices 823 | ============== 824 | 825 | Through work with the community, feedback from others and our own experience 826 | with the specification, below are some of the best-practices collected into one 827 | place making it easy for folks working with the format to find answers to the 828 | more flexible portions of the spec. 829 | 830 | Handling `huge` Numbers 831 | ----------------------- 832 | 833 | Not every language supports arbitrarily long numbers greater than 64-bits 834 | (represented by the `huge` data type), but many do. 835 | 836 | If you are writing a library to read/write Universal Binary JSON and the 837 | platform you are working with does not support them, we recommend throwing an 838 | exception or returning an error to the caller, letting them know unsupported 839 | data is contained in the file they are trying to parse. 840 | 841 | If the library you are writing is meant to be a general-purpose parser and needs 842 | to be more resilient than that, we recommend the following: 843 | 844 | #. Make the default behavior to throw an exception or return an error when the 845 | unsupported huge data type is encountered. 846 | #. Provide an optional behavior to the parser (that must be specifically enabled 847 | by the caller) that treats the huge value as a simple string and returns it 848 | to the caller to handle (e.g. insert in a database) if they need it. 849 | #. Provide an optional behavior to the parser (again, that must be specifically 850 | enabled by the caller) to simply skip unsupported values. 851 | 852 | This implementation should give the user the most functional experience working 853 | with your library and the Universal Binary JSON format while making it clear on 854 | their particular platform some data types could cause trouble; this is preferred 855 | to making the default operation to ignore the unsupported values. 856 | 857 | 858 | 859 | .. _Number: http://people.mozilla.org/~jorendorff/es5.html#sec-8.5 860 | .. _JSON presentation: http://json.org/json.ppt 861 | .. _ECMA: http://www.ecma-international.org/publications/files/ECMA-ST/Ecma-262.pdf 862 | .. _yet: http://wiki.ecmascript.org/doku.php?id=harmony:binary_data_discussion&s=int64 863 | .. _IEEE 754 single precision floating point format: http://en.wikipedia.org/wiki/IEEE_754-1985 864 | .. _IEEE 754 double precision floating point format: http://en.wikipedia.org/wiki/Double_precision_floating-point_format#Double_precision_binary_floating-point_format 865 | .. _JSON number specification: http://json.org 866 | .. _UTF-8: http://en.wikipedia.org/wiki/UTF-8 867 | .. _a multitude of reasons: http://en.wikipedia.org/wiki/UTF-8#Advantages 868 | .. _Redis protocol: http://redis.io/topics/protocol 869 | .. _Big-Endian: http://en.wikipedia.org/wiki/Endianness 870 | .. _similar confusion with JSON: http://stackoverflow.com/questions/477816/the-right-json-content-type 871 | .. _recommended file extension: http://www.fileinfo.com/extension/ubj 872 | -------------------------------------------------------------------------------- /spec8/tests/CouchDB4k.compact.json: -------------------------------------------------------------------------------- 1 | {"data3":"ColreUHAtuYoUOx1N4ZloouQt2o6ugnUT6eYtS10gu7niM8i0vEiNufpk1RlMQXaHXlIwQBDsMFDFUQcFeg2vW5eD259Xm","data4":"zCxriJhL726WNNTdJJzurgSA8vKT6rHA0cFCb9koZcLUMXg4rmoXVPqIHWYaCV0ovl2t6xm7I1Hm36jXpLlXEb8fRfbwBeTW2V0OAsVqYH8eAT","data0":"9EVqHm5ARqcEB5jq2D14U2bCJPyBY0JWDr1Tjh8gTB0sWUNjqYiWDxFzlx6S","data7":"Bi1ujcgEvfADfBeyZudE7nwxc3Ik8qpYjsJIfKmwOMEbV2L3Bi0x2tcRpGuf4fiyvIbypDvJN1PPdQtfQW1Gv6zccXHwwZwKzUq6","data5":{"integers":[756509,116117,776378,275045,703447,50156,685803,147958,941747,905651,57367,530248,312888,740951,988947,450154],"float1":76.572,"float2":83.5299,"nested1":{"integers":[756509,116117,776378,275045,703447,50156,685803,147958,941747,905651,57367,530248,312888,740951,988947,450154],"floats":[43121609.5543,99454976.3019,32945584.756,18122905.9212,45893183.44,63052200.6225,69032152.6897,3748217.6946,75449850.474,37111527.415,84852536.859,32906366.487,27027600.417,63874310.5614,39440408.51,97176857.1716,97438252.1171,49728043.5056,40818570.245,41415831.8949,24796297.4251,2819085.3449,84263963.4848,74503228.6878,67925677.403,4758851.9417,75227407.9214,76946667.8403,72518275.9469,94167085.9588,75883067.8321,27389831.6101,57987075.5053,1298995.2674,14590614.6939,45292214.2242,3332166.364,53784167.729,25193846.1867,81456965.477,68532032.39,73820009.7952,57736110.5717,37304166.7363,20054244.864,29746392.7397,86467624.6,45192685.8793,44008816.5186,1861872.8736,14595859.467,87795257.6703,57768720.8303,18290154.3126,45893183.44,63052200.6225,69032152.6897,3748217.6946,75449850.474,37111527.415,84852536.859,32906366.487,27027600.417,63874310.5614,39440408.51,97176857.1716,97438252.1171,49728043.5056,40818570.245,41415831.8949,24796297.4251,2819085.3449,84263963.4848,74503228.6878,67925677.403,4758851.9417,75227407.9214,76946667.8403,72518275.9469,94167085.9588,75883067.8321,27389831.6101,57987075.5053,1298995.2674,80858801.2712,98262252.4656,51612877.944,33397812.7835,36089655.3049,50164685.8153,16852105.5192,61171929.752,86376339.7175,73009014.5521,7397302.331,34345128.9589,98343269.4418,95039116.9058,44833102.5752,51052997.8873,22719195.6783,64883244.8699]},"nested2":{"integers":[756509,116117,776378,275045,703447,50156,685803,147958,941747,905651,57367,530248,312888,740951,988947,450154],"float1":76.572,"float2":83.5299}},"strings":["edx5XzRkPVeEW2MBQzQMcUSuMI4FntjhlJ9VGhQaBHKPEazAaT","2fQUbzRUax4A","jURcBZ0vrJcmf2roZUMzZJQoTsKZDIdj7KhO7itskKvM80jBU9","8jKLmo3N2zYdKyTyfTczfr2x6bPaarorlnTNJ7r8lIkiZyBvrP","jbUeAVOdBSPzYmYhH0sabUHUH39O5e","I8yAQKZsyZhMfpzWjArQU9pQ6PfU6b14q2eWvQjtCUdgAUxFjg","97N8ZmGcxRZO4ZabzRRcY4KVHqxJwQ8qY","0DtY1aWXmUfJENt9rYW9","DtpBUEppPwMnWexi8eIIxlXRO3GUpPgeNFG9ONpWJYvk8xBkVj","YsX8V2xOrTw6LhNIMMhO4F4VXFyXUXFr66L3sTkLWgFA9NZuBV","fKYYthv8iFvaYoFoYZyB","zGuLsPXoJqMbO4PcePteZfDMYFXdWtvNF8WvaplXypsd6"],"data1":"9EVqHm5ARqcEB5jq21v2g0jVcG9CXB0Abk7uAF4NHYyTzeF3TnHhpZBECD14U2bCJPyBY0JWDr1Tjh8gTB0sWUNjqYiWDxFzlx6S","integers":[756509,116117,776378,275045,703447,50156,685803,147958,941747,905651,57367,530248,312888,740951,988947,450154],"more_nested":{"integers":[756509,116117,776378,275045,703447,50156,685803,147958,941747,905651,57367,530248,312888,740951,988947,450154],"float1":76.572,"float2":83.5299,"nested1":{"integers":[756509,116117,776378,275045,703447,50156,685803,147958,941747,905651,57367,530248,312888,740951,988947,450154]},"nested2":{"strings":["2fQUbzRUax4A","jURcBZ0vrJcmf2roZUMzZJQoTsKZDIdj7KhO7itskKvM80jBU9","8jKLmo3N2zYdKyTyfTczfr2x6bPaarorlnTNJ7r8lIkiZyBvrP","jbUeAVOdBSPzYmYhH0sabUHUH39O5e","I8yAQKZsyZhMfpzWjArQU9pQ6PfU6b14q2eWvQjtCUdgAUxFjg","97N8ZmGcxRZO4ZabzRRcY4KVHqxJwQ8qY","0DtY1aWXmUfJENt9rYW9","DtpBUEppPwMnWexi8eIIxlXRO3GUpPgeNFG9ONpWJYvk8xBkVj","YsX8V2xOrTw6LhNIMMhO4F4VXFyXUXFr66L3sTkLWgFA9NZuBV","fKYYthv8iFvaYoFoYZyB","zGuLsPXoJqMbO4PcePteZfDMYFXdWtvNF8WvaplXypsd6"],"integers":[756509,116117,776378,57367,530248,312888,740951,988947,450154]}}} -------------------------------------------------------------------------------- /spec8/tests/CouchDB4k.formatted.json: -------------------------------------------------------------------------------- 1 | { 2 | "data3":"ColreUHAtuYoUOx1N4ZloouQt2o6ugnUT6eYtS10gu7niM8i0vEiNufpk1RlMQXaHXlIwQBDsMFDFUQcFeg2vW5eD259Xm", 3 | "data4":"zCxriJhL726WNNTdJJzurgSA8vKT6rHA0cFCb9koZcLUMXg4rmoXVPqIHWYaCV0ovl2t6xm7I1Hm36jXpLlXEb8fRfbwBeTW2V0OAsVqYH8eAT", 4 | "data0":"9EVqHm5ARqcEB5jq2D14U2bCJPyBY0JWDr1Tjh8gTB0sWUNjqYiWDxFzlx6S", 5 | "data7":"Bi1ujcgEvfADfBeyZudE7nwxc3Ik8qpYjsJIfKmwOMEbV2L3Bi0x2tcRpGuf4fiyvIbypDvJN1PPdQtfQW1Gv6zccXHwwZwKzUq6", 6 | "data5":{ 7 | "integers":[ 8 | 756509, 9 | 116117, 10 | 776378, 11 | 275045, 12 | 703447, 13 | 50156, 14 | 685803, 15 | 147958, 16 | 941747, 17 | 905651, 18 | 57367, 19 | 530248, 20 | 312888, 21 | 740951, 22 | 988947, 23 | 450154 24 | ], 25 | "float1":76.572, 26 | "float2":83.5299, 27 | "nested1":{ 28 | "integers":[ 29 | 756509, 30 | 116117, 31 | 776378, 32 | 275045, 33 | 703447, 34 | 50156, 35 | 685803, 36 | 147958, 37 | 941747, 38 | 905651, 39 | 57367, 40 | 530248, 41 | 312888, 42 | 740951, 43 | 988947, 44 | 450154 45 | ], 46 | "floats":[ 47 | 43121609.5543, 48 | 99454976.3019, 49 | 32945584.756, 50 | 18122905.9212, 51 | 45893183.44, 52 | 63052200.6225, 53 | 69032152.6897, 54 | 3748217.6946, 55 | 75449850.474, 56 | 37111527.415, 57 | 84852536.859, 58 | 32906366.487, 59 | 27027600.417, 60 | 63874310.5614, 61 | 39440408.51, 62 | 97176857.1716, 63 | 97438252.1171, 64 | 49728043.5056, 65 | 40818570.245, 66 | 41415831.8949, 67 | 24796297.4251, 68 | 2819085.3449, 69 | 84263963.4848, 70 | 74503228.6878, 71 | 67925677.403, 72 | 4758851.9417, 73 | 75227407.9214, 74 | 76946667.8403, 75 | 72518275.9469, 76 | 94167085.9588, 77 | 75883067.8321, 78 | 27389831.6101, 79 | 57987075.5053, 80 | 1298995.2674, 81 | 14590614.6939, 82 | 45292214.2242, 83 | 3332166.364, 84 | 53784167.729, 85 | 25193846.1867, 86 | 81456965.477, 87 | 68532032.39, 88 | 73820009.7952, 89 | 57736110.5717, 90 | 37304166.7363, 91 | 20054244.864, 92 | 29746392.7397, 93 | 86467624.6, 94 | 45192685.8793, 95 | 44008816.5186, 96 | 1861872.8736, 97 | 14595859.467, 98 | 87795257.6703, 99 | 57768720.8303, 100 | 18290154.3126, 101 | 45893183.44, 102 | 63052200.6225, 103 | 69032152.6897, 104 | 3748217.6946, 105 | 75449850.474, 106 | 37111527.415, 107 | 84852536.859, 108 | 32906366.487, 109 | 27027600.417, 110 | 63874310.5614, 111 | 39440408.51, 112 | 97176857.1716, 113 | 97438252.1171, 114 | 49728043.5056, 115 | 40818570.245, 116 | 41415831.8949, 117 | 24796297.4251, 118 | 2819085.3449, 119 | 84263963.4848, 120 | 74503228.6878, 121 | 67925677.403, 122 | 4758851.9417, 123 | 75227407.9214, 124 | 76946667.8403, 125 | 72518275.9469, 126 | 94167085.9588, 127 | 75883067.8321, 128 | 27389831.6101, 129 | 57987075.5053, 130 | 1298995.2674, 131 | 80858801.2712, 132 | 98262252.4656, 133 | 51612877.944, 134 | 33397812.7835, 135 | 36089655.3049, 136 | 50164685.8153, 137 | 16852105.5192, 138 | 61171929.752, 139 | 86376339.7175, 140 | 73009014.5521, 141 | 7397302.331, 142 | 34345128.9589, 143 | 98343269.4418, 144 | 95039116.9058, 145 | 44833102.5752, 146 | 51052997.8873, 147 | 22719195.6783, 148 | 64883244.8699 149 | ] 150 | }, 151 | "nested2":{ 152 | "integers":[ 153 | 756509, 154 | 116117, 155 | 776378, 156 | 275045, 157 | 703447, 158 | 50156, 159 | 685803, 160 | 147958, 161 | 941747, 162 | 905651, 163 | 57367, 164 | 530248, 165 | 312888, 166 | 740951, 167 | 988947, 168 | 450154 169 | ], 170 | "float1":76.572, 171 | "float2":83.5299 172 | } 173 | }, 174 | "strings":[ 175 | "edx5XzRkPVeEW2MBQzQMcUSuMI4FntjhlJ9VGhQaBHKPEazAaT", 176 | "2fQUbzRUax4A", 177 | "jURcBZ0vrJcmf2roZUMzZJQoTsKZDIdj7KhO7itskKvM80jBU9", 178 | "8jKLmo3N2zYdKyTyfTczfr2x6bPaarorlnTNJ7r8lIkiZyBvrP", 179 | "jbUeAVOdBSPzYmYhH0sabUHUH39O5e", 180 | "I8yAQKZsyZhMfpzWjArQU9pQ6PfU6b14q2eWvQjtCUdgAUxFjg", 181 | "97N8ZmGcxRZO4ZabzRRcY4KVHqxJwQ8qY", 182 | "0DtY1aWXmUfJENt9rYW9", 183 | "DtpBUEppPwMnWexi8eIIxlXRO3GUpPgeNFG9ONpWJYvk8xBkVj", 184 | "YsX8V2xOrTw6LhNIMMhO4F4VXFyXUXFr66L3sTkLWgFA9NZuBV", 185 | "fKYYthv8iFvaYoFoYZyB", 186 | "zGuLsPXoJqMbO4PcePteZfDMYFXdWtvNF8WvaplXypsd6" 187 | ], 188 | "data1":"9EVqHm5ARqcEB5jq21v2g0jVcG9CXB0Abk7uAF4NHYyTzeF3TnHhpZBECD14U2bCJPyBY0JWDr1Tjh8gTB0sWUNjqYiWDxFzlx6S", 189 | "integers":[ 190 | 756509, 191 | 116117, 192 | 776378, 193 | 275045, 194 | 703447, 195 | 50156, 196 | 685803, 197 | 147958, 198 | 941747, 199 | 905651, 200 | 57367, 201 | 530248, 202 | 312888, 203 | 740951, 204 | 988947, 205 | 450154 206 | ], 207 | "more_nested":{ 208 | "integers":[ 209 | 756509, 210 | 116117, 211 | 776378, 212 | 275045, 213 | 703447, 214 | 50156, 215 | 685803, 216 | 147958, 217 | 941747, 218 | 905651, 219 | 57367, 220 | 530248, 221 | 312888, 222 | 740951, 223 | 988947, 224 | 450154 225 | ], 226 | "float1":76.572, 227 | "float2":83.5299, 228 | "nested1":{ 229 | "integers":[ 230 | 756509, 231 | 116117, 232 | 776378, 233 | 275045, 234 | 703447, 235 | 50156, 236 | 685803, 237 | 147958, 238 | 941747, 239 | 905651, 240 | 57367, 241 | 530248, 242 | 312888, 243 | 740951, 244 | 988947, 245 | 450154 246 | ] 247 | }, 248 | "nested2":{ 249 | "strings":[ 250 | "2fQUbzRUax4A", 251 | "jURcBZ0vrJcmf2roZUMzZJQoTsKZDIdj7KhO7itskKvM80jBU9", 252 | "8jKLmo3N2zYdKyTyfTczfr2x6bPaarorlnTNJ7r8lIkiZyBvrP", 253 | "jbUeAVOdBSPzYmYhH0sabUHUH39O5e", 254 | "I8yAQKZsyZhMfpzWjArQU9pQ6PfU6b14q2eWvQjtCUdgAUxFjg", 255 | "97N8ZmGcxRZO4ZabzRRcY4KVHqxJwQ8qY", 256 | "0DtY1aWXmUfJENt9rYW9", 257 | "DtpBUEppPwMnWexi8eIIxlXRO3GUpPgeNFG9ONpWJYvk8xBkVj", 258 | "YsX8V2xOrTw6LhNIMMhO4F4VXFyXUXFr66L3sTkLWgFA9NZuBV", 259 | "fKYYthv8iFvaYoFoYZyB", 260 | "zGuLsPXoJqMbO4PcePteZfDMYFXdWtvNF8WvaplXypsd6" 261 | ], 262 | "integers":[ 263 | 756509, 264 | 116117, 265 | 776378, 266 | 57367, 267 | 530248, 268 | 312888, 269 | 740951, 270 | 988947, 271 | 450154 272 | ] 273 | } 274 | } 275 | } -------------------------------------------------------------------------------- /spec8/tests/MediaContent.compact.json: -------------------------------------------------------------------------------- 1 | {"Media":{"uri":"http://javaone.com/keynote.mpg","title":"Javaone Keynote","width":640,"height":480,"format":"video/mpg4","duration":18000000,"size":58982400,"bitrate":262144,"persons":["Bill Gates","Steve Jobs"],"player":"JAVA","copyright":null},"Images":[{"uri":"http://javaone.com/keynote_large.jpg","title":"Javaone Keynote","width":1024,"height":768,"size":"LARGE"},{"uri":"http://javaone.com/keynote_small.jpg","title":"Javaone Keynote","width":320,"height":240,"size":"SMALL"}]} -------------------------------------------------------------------------------- /spec8/tests/MediaContent.formatted.json: -------------------------------------------------------------------------------- 1 | { 2 | "Media":{ 3 | "uri":"http://javaone.com/keynote.mpg", 4 | "title":"Javaone Keynote", 5 | "width":640, 6 | "height":480, 7 | "format":"video/mpg4", 8 | "duration":18000000, 9 | "size":58982400, 10 | "bitrate":262144, 11 | "persons":[ 12 | "Bill Gates", 13 | "Steve Jobs" 14 | ], 15 | "player":"JAVA", 16 | "copyright":null 17 | }, 18 | "Images":[ 19 | { 20 | "uri":"http://javaone.com/keynote_large.jpg", 21 | "title":"Javaone Keynote", 22 | "width":1024, 23 | "height":768, 24 | "size":"LARGE" 25 | }, 26 | { 27 | "uri":"http://javaone.com/keynote_small.jpg", 28 | "title":"Javaone Keynote", 29 | "width":320, 30 | "height":240, 31 | "size":"SMALL" 32 | } 33 | ] 34 | } -------------------------------------------------------------------------------- /spec8/tests/TwitterTimeline.compact.json: -------------------------------------------------------------------------------- 1 | {"id_str":"121769183821312000","retweet_count":0,"in_reply_to_screen_name":null,"in_reply_to_user_id":null,"truncated":false,"retweeted":false,"possibly_sensitive":false,"in_reply_to_status_id_str":null,"entities":{"urls":[{"url":"http:\/\/t.co\/wtioKkFS","display_url":"dlvr.it\/pWQy2","indices":[33,53],"expanded_url":"http:\/\/dlvr.it\/pWQy2"}],"hashtags":[],"user_mentions":[]},"geo":null,"place":null,"coordinates":null,"created_at":"Thu Oct 06 02:10:10 +0000 2011","in_reply_to_user_id_str":null,"user":{"id_str":"77029015","profile_link_color":"009999","protected":false,"url":"http:\/\/www.techday.co.nz\/","screen_name":"techdaynz","statuses_count":5144,"profile_image_url":"http:\/\/a0.twimg.com\/profile_images\/1479058408\/techday_48_normal.jpg","name":"TechDay","default_profile_image":false,"default_profile":false,"profile_background_color":"131516","lang":"en","profile_background_tile":false,"utc_offset":43200,"description":"","is_translator":false,"show_all_inline_media":false,"contributors_enabled":false,"profile_background_image_url_https":"https:\/\/si0.twimg.com\/profile_background_images\/75893948\/Techday_Background.jpg","created_at":"Thu Sep 24 20:02:01 +0000 2009","profile_sidebar_fill_color":"efefef","follow_request_sent":false,"friends_count":3215,"followers_count":3149,"time_zone":"Auckland","favourites_count":0,"profile_sidebar_border_color":"eeeeee","profile_image_url_https":"https:\/\/si0.twimg.com\/profile_images\/1479058408\/techday_48_normal.jpg","following":false,"geo_enabled":false,"notifications":false,"profile_use_background_image":true,"listed_count":151,"verified":false,"profile_text_color":"333333","location":"Ponsonby, Auckland, NZ","id":77029015,"profile_background_image_url":"http:\/\/a0.twimg.com\/profile_background_images\/75893948\/Techday_Background.jpg"},"contributors":null,"source":"\u003Ca href=\"http:\/\/dlvr.it\" rel=\"nofollow\"\u003Edlvr.it\u003C\/a\u003E","in_reply_to_status_id":null,"favorited":false,"id":121769183821312000,"text":"Apple CEO's message to employees http:\/\/t.co\/wtioKkFS"} -------------------------------------------------------------------------------- /spec8/tests/TwitterTimeline.formatted.json: -------------------------------------------------------------------------------- 1 | { 2 | "id_str":"121769183821312000", 3 | "retweet_count":0, 4 | "in_reply_to_screen_name":null, 5 | "in_reply_to_user_id":null, 6 | "truncated":false, 7 | "retweeted":false, 8 | "possibly_sensitive":false, 9 | "in_reply_to_status_id_str":null, 10 | "entities":{ 11 | "urls":[ 12 | { 13 | "url":"http:\/\/t.co\/wtioKkFS", 14 | "display_url":"dlvr.it\/pWQy2", 15 | "indices":[ 16 | 33, 17 | 53 18 | ], 19 | "expanded_url":"http:\/\/dlvr.it\/pWQy2" 20 | } 21 | ], 22 | "hashtags":[ 23 | 24 | ], 25 | "user_mentions":[ 26 | 27 | ] 28 | }, 29 | "geo":null, 30 | "place":null, 31 | "coordinates":null, 32 | "created_at":"Thu Oct 06 02:10:10 +0000 2011", 33 | "in_reply_to_user_id_str":null, 34 | "user":{ 35 | "id_str":"77029015", 36 | "profile_link_color":"009999", 37 | "protected":false, 38 | "url":"http:\/\/www.techday.co.nz\/", 39 | "screen_name":"techdaynz", 40 | "statuses_count":5144, 41 | "profile_image_url":"http:\/\/a0.twimg.com\/profile_images\/1479058408\/techday_48_normal.jpg", 42 | "name":"TechDay", 43 | "default_profile_image":false, 44 | "default_profile":false, 45 | "profile_background_color":"131516", 46 | "lang":"en", 47 | "profile_background_tile":false, 48 | "utc_offset":43200, 49 | "description":"", 50 | "is_translator":false, 51 | "show_all_inline_media":false, 52 | "contributors_enabled":false, 53 | "profile_background_image_url_https":"https:\/\/si0.twimg.com\/profile_background_images\/75893948\/Techday_Background.jpg", 54 | "created_at":"Thu Sep 24 20:02:01 +0000 2009", 55 | "profile_sidebar_fill_color":"efefef", 56 | "follow_request_sent":false, 57 | "friends_count":3215, 58 | "followers_count":3149, 59 | "time_zone":"Auckland", 60 | "favourites_count":0, 61 | "profile_sidebar_border_color":"eeeeee", 62 | "profile_image_url_https":"https:\/\/si0.twimg.com\/profile_images\/1479058408\/techday_48_normal.jpg", 63 | "following":false, 64 | "geo_enabled":false, 65 | "notifications":false, 66 | "profile_use_background_image":true, 67 | "listed_count":151, 68 | "verified":false, 69 | "profile_text_color":"333333", 70 | "location":"Ponsonby, Auckland, NZ", 71 | "id":77029015, 72 | "profile_background_image_url":"http:\/\/a0.twimg.com\/profile_background_images\/75893948\/Techday_Background.jpg" 73 | }, 74 | "contributors":null, 75 | "source":"\u003Ca href=\"http:\/\/dlvr.it\" rel=\"nofollow\"\u003Edlvr.it\u003C\/a\u003E", 76 | "in_reply_to_status_id":null, 77 | "favorited":false, 78 | "id":121769183821312000, 79 | "text":"Apple CEO's message to employees http:\/\/t.co\/wtioKkFS" 80 | } -------------------------------------------------------------------------------- /spec8/thanks.rst: -------------------------------------------------------------------------------- 1 | 2 | Thanks 3 | ====== 4 | 5 | Below is a list of people that have submitted specific contributions, 6 | corrections and implementations to help make the Universal Binary JSON 7 | specification better. 8 | 9 | Thank you all! 10 | 11 | * `Alex Blewitt `_ 12 | 13 | Helped catch a number of specification errors around UTF-8 encoding in the 14 | original draft of the specification that would have been confusing/nasty to 15 | release. He also provided great feedback about the size and performance 16 | metrics for the specification. 17 | 18 | * `Alexander Shorin `_ 19 | 20 | Alex is both the author of the Python library and a valued collaborator on the 21 | Universal Binary JSON spec as it matured. Alex provided instrumental insight 22 | into the modifications made between Draft 8 and Draft 9 of the spec to help 23 | simplify the spec by removing all the duplicate (compact) type 24 | representations, simplifying the length-arguments for `STRING` and `HUGE` as 25 | well as being the one to point out that the length arguments for the `ARRAY` 26 | and `OBJECT` container types are effectively useless once the streaming-format 27 | support was added (and do not make generator code or parsing code any easier 28 | or more performant). 29 | 30 | * `John Cowan `_ 31 | 32 | John was the one that recommended using UTF-8 string-encoded values 33 | (or `huge`) for arbitrarily huge numbers after seeing my desire to avoid 34 | including any non-portable constructs into the binary format. 35 | 36 | Given that the discussion on numeric formats had been a very active one with 37 | lots of feelings on all sides, it was a boon to have John step up with such a 38 | simple suggestion that allowed for maximum compatibility and portability. 39 | It was a win-win all the way around. 40 | 41 | * `Michael Makarenko `_ (aka “M1xA”) 42 | 43 | Michael is the author behind the `Ubjson.NET `_ 44 | library and contributor of the `int16` and `float` numeric types to the 45 | specification. For numeric-heavy (e.g. scientific) data, the inclusions of the 46 | `int16` and `float` types can lead to significant space savings when writing 47 | out values in the Universal Binary JSON format. 48 | 49 | Michael has also gone to great lengths to make the .NET implementation of 50 | UBJSON as tight and performant as possible; collaborating on benchmark design 51 | and testing data as well as compatibility testing between implementations to 52 | ensure a great Universal Binary JSON experience for .NET developers. 53 | 54 | In addition to development, Michael has helped contribute to the growth of the 55 | Universal Binary JSON community with 56 | `articles about the specification `_. 57 | 58 | * `Paul Davis `_ 59 | 60 | While approaching the CouchDB team for feedback on the Universal Binary JSON 61 | spec, I met Paul who was willing to spend a significant amount of time 62 | reviewing the specification and recommending suggestions, changes and 63 | improvements from everything the CouchDB team has learned by dealing closely 64 | with JSON for years. 65 | 66 | Paul was the brains behind the compacted type presentation 67 | (``s``, ``h``, ``a`` and ``o``) using a single byte instead of 3 bytes to 68 | represent the length of an entity which was something the spec had originally 69 | avoided due to complexity, but as Paul clarified at-scale (e.g. a huge CouchDB 70 | data store) those few bytes in some data sets that are working with very small 71 | values (like string keywords) can really add up. 72 | 73 | Paul also pointed out the shortcomings of prefixing the length to the two 74 | container types if the specification could ever be used easily with services 75 | or apps that streamed UBJ format for huge runs of data that the server 76 | couldn't load, buffer and count ahead of time before responding to the client. 77 | In order to more easily support streaming, unknown-length container types had 78 | to be added. 79 | 80 | Paul also pointed out the importance of a ``NO_OP``/``SKIP``/``IGNORE`` type 81 | that can be useful during a long-lived streaming operation where the server 82 | may be waiting on something (like a DB) and you need to keep the connection 83 | alive between client/server and avoid the client timing out, but you need the 84 | client to know the data it is receiving is just meant as a “Hang on” message 85 | from the server and not actual data. This is where the ``NO_OP`` command comes 86 | in handy. 87 | 88 | * `Stephan Beal `_ 89 | 90 | Stephan helped quite a bit with understanding the implications of a >= 64-bit 91 | numeric format and the implications of portability across a number of popular 92 | platforms. 93 | 94 | * `JSON Specification Group `_ 95 | 96 | I would like to personally thank everyone in the JSON Specification Group. 97 | The amount of feedback and help with the specification has been wonderful, 98 | constructive and creative. It also lead to one of the busiest conversations 99 | in the last year! 100 | -------------------------------------------------------------------------------- /spec8/type_reference.rst: -------------------------------------------------------------------------------- 1 | 2 | Type reference 3 | ++++++++++++++ 4 | 5 | The table below is a quick-reference for folks working closely with the 6 | Universal Binary JSON format that want all the information at their finger tips: 7 | 8 | +-----------------+--------------------------+--------+---------+--------------+ 9 | | Type | Size | Marker | Length? | Data? | 10 | +=================+==========================+========+=========+==============+ 11 | | null | 1-byte | Z | No | No | 12 | +-----------------+--------------------------+--------+---------+--------------+ 13 | | true | 1-byte | T | No | No | 14 | +-----------------+--------------------------+--------+---------+--------------+ 15 | | false | 1-byte | F | No | No | 16 | +-----------------+--------------------------+--------+---------+--------------+ 17 | | byte | 2-bytes | B | No | Yes | 18 | +-----------------+--------------------------+--------+---------+--------------+ 19 | | int16 | 3-bytes | i | No | Yes | 20 | +-----------------+--------------------------+--------+---------+--------------+ 21 | | int32 | 5-bytes | I | No | Yes | 22 | +-----------------+--------------------------+--------+---------+--------------+ 23 | | int64 | 9-bytes | L | No | Yes | 24 | +-----------------+--------------------------+--------+---------+--------------+ 25 | | float (32-bit) | 5-bytes | d | No | Yes | 26 | +-----------------+--------------------------+--------+---------+--------------+ 27 | | double (64-bit) | 9-bytes | D | No | Yes | 28 | +-----------------+--------------------------+--------+---------+--------------+ 29 | | huge (number) | 2-bytes | h | Yes | Yes | 30 | | | + byte length of string | | | if non-empty | 31 | +-----------------+--------------------------+--------+---------+--------------+ 32 | | huge (number) | 5-bytes | H | Yes | Yes, | 33 | | | + byte length of string | | | if non-empty | 34 | +-----------------+--------------------------+--------+---------+--------------+ 35 | | string | 2-bytes | s | Yes | Yes, | 36 | | | + byte length of string | | | if non-empty | 37 | +-----------------+--------------------------+--------+---------+--------------+ 38 | | string | 5-bytes | S | Yes | Yes, | 39 | | | + byte length of string | | | if non-empty | 40 | +-----------------+--------------------------+--------+---------+--------------+ 41 | | array | 2-bytes | a | Yes | Yes, | 42 | | | + byte length of string | | | if non-empty | 43 | +-----------------+--------------------------+--------+---------+--------------+ 44 | | array | 5-bytes | A | Yes | Yes, | 45 | | | + byte length of string | | | if non-empty | 46 | +-----------------+--------------------------+--------+---------+--------------+ 47 | | object | 2-bytes | o | Yes | Yes | 48 | | | + byte length of string | | | if non-empty | 49 | +-----------------+--------------------------+--------+---------+--------------+ 50 | | object | 5-bytes | O | Yes | Yes, | 51 | | | + byte length of string | | | if non-empty | 52 | +-----------------+--------------------------+--------+---------+--------------+ 53 | | noop | 1-byte | N | No | No | 54 | +-----------------+--------------------------+--------+---------+--------------+ 55 | | end | 1-byte | E | No | No | 56 | +-----------------+--------------------------+--------+---------+--------------+ 57 | 58 | Numeric Types 59 | ============= 60 | 61 | All numeric types are signed. 62 | 63 | floats (32-bit) 64 | --------------- 65 | 66 | All 32-bit float values are written into the binary format using the 67 | `IEEE 754 single precision floating point format`_, which is the following 68 | structure: 69 | 70 | * Bit 31 (1 bit) – sign 71 | * Bit 30-23 (8 bits) – exponent 72 | * Bit 22-0 (23 bits) – fraction (significand) 73 | 74 | doubles (64-bit) 75 | ---------------- 76 | 77 | All 64-bit double values are written into the binary format using the 78 | `IEEE 754 double precision floating point format`_, which is the following 79 | structure: 80 | 81 | * Bit 63 (1 bit) – sign 82 | * Bit 62-52 (11 bits) – exponent 83 | * Bit 51-0 (52 bits) – fraction (significand) 84 | 85 | String Encoding 86 | =============== 87 | 88 | All `string` values (which includes `huge` values since they are string-encoded) 89 | must be `UTF-8`_ encoded. 90 | 91 | This provides a `number of advantages`_ and inter-compatibility across systems and 92 | alternative data formats. 93 | 94 | Arrays & Objects 95 | ================ 96 | 97 | The `length` argument specified is the `number of child elements` the parent 98 | container contains. A `child element` is defined as: 99 | 100 | * in an `object`, a single name-value pair. 101 | * in an `array`, a single value. 102 | 103 | For example: 104 | 105 | * if an array contains 4 integers, the `length` of that array is 4. 106 | * if an object contains 4 name-value pairs, the `length` of that object is 4. 107 | * if an array contains 13 `User objects`, the `length` of the array is 13. 108 | * if an object contains 7 arrays, the `length` of the object is 7. 109 | 110 | .. note:: 111 | 112 | Universal Binary JSON is a :ref:`streaming-friendly ` specification 113 | and supports the use of :ref:`unknown-length container ` 114 | types if you need them! 115 | 116 | Support for ‘huge’ Numeric Type 117 | =============================== 118 | 119 | The huge data type is an ultra-portable mechanism by which arbitrarily long 120 | numbers ``> 64-bit`` in size (integer or decimal) can be passed between systems 121 | that support them and degraded gracefully in systems that do not support them. 122 | 123 | .. note:: 124 | 125 | `huge` values are **only** meant to store values ``> 64-bit`` in size. 126 | It is in violation of the Universal Binary JSON specification to store a value 127 | ``<= 64-bits`` as a huge. 128 | 129 | This design was chosen intentionally as it greatly simplifies (and optimizes) 130 | the generation and parsing code for the UBJ format as no introspection of the 131 | `huge` value is necessary for a platform to try and marshal them into a 132 | smaller format. 133 | 134 | This way the parsing code becomes simple, either creating an arbitrarily large 135 | number out of the value (e.g. `BigDecimal`_ in Java), returns an error to the 136 | caller because of an unsupported type or optionally skips the unsupported data 137 | and continues parsing. 138 | 139 | `huge` values must be written out in accordance with the original 140 | `JSON number specification`_. 141 | 142 | Many programming languages have native support for arbitrarily large numbers, 143 | but many do not. If you are working in an environment that does not support 144 | numbers > 64-bit numbers, please see our recommendation on handling them in the 145 | :ref:`Best Practices ` section. 146 | 147 | Optimized Storage Size 148 | ====================== 149 | 150 | All variable-length value types (`string`, `huge`, `array`, `object`) have a 151 | more compact representation using 1-byte (instead of 4-bytes) for the `length` 152 | argument when the `length` value is <= 254. 153 | 154 | These more compact types always use the lowercased version of the `marker` 155 | ASCII char. For example, ``a`` for `array`, ``s`` for `string` and so on. 156 | 157 | .. warning:: 158 | 159 | When using the compact representations of these different types, remember that 160 | the `length` must be ``<= 254`` because the `length` of 255 (``0xFF``) has a 161 | special meaning when it comes to `array` and `object` types. 162 | 163 | noop and Streaming Support 164 | ========================== 165 | 166 | The :ref:`noop type ` is a general purpose type that has no meaning, but 167 | is mostly commonly used in streaming scenarios where a server must send a client 168 | a `keep alive` message. 169 | 170 | To support this use-case, the specification needed to support a special type 171 | that meant nothing, so a server and client could make use of it without 172 | polluting the actual data that was being exchanged. 173 | 174 | .. warning:: 175 | 176 | The `noop` type can be used for other purposes or signals as well, but it is 177 | defined to have no value and no effect on the data it may be included in. 178 | 179 | The `noop` type is meant to be sent between discrete values in a streaming 180 | scenario and can never be sent inside of the byte-data that makes up a single 181 | value. 182 | 183 | For example, if a server is writing a string “Hello World” back to the client, 184 | the server must write the entire ``[s][11][Hello World]`` sequence back to the 185 | client unbroken; a `noop` marker cannot be sent inside of that value. 186 | 187 | `noop` markers must only be written between values being transmitted (e.g. 188 | between values in an `array` or between the name and value pair inside of an 189 | `object`). 190 | 191 | Examples 192 | ======== 193 | 194 | Please see the :ref:`value_types` and :ref:`container_types` sections of the 195 | specification for examples. 196 | 197 | .. _IEEE 754 single precision floating point format: http://en.wikipedia.org/wiki/IEEE_754-1985 198 | .. _IEEE 754 double precision floating point format: http://en.wikipedia.org/wiki/Double_precision_floating-point_format#Double_precision_binary_floating-point_format 199 | .. _UTF-8: http://en.wikipedia.org/wiki/UTF-8 200 | .. _number of advantages: http://en.wikipedia.org/wiki/UTF-8#Advantages 201 | .. _BigDecimal: http://download.oracle.com/javase/6/docs/api/java/math/BigDecimal.html 202 | .. _JSON number specification: http://json.org 203 | --------------------------------------------------------------------------------