└── README.md


/README.md:
--------------------------------------------------------------------------------
  1 | edn
  2 | ===
  3 | 
  4 | extensible data notation [eed-n]
  5 | 
  6 | # Rationale
  7 | 
  8 | **edn** is an extensible data notation. A superset of **edn** is used by Clojure to represent
  9 | programs, and it is used by Datomic and other applications as a data transfer format. This spec
 10 | describes **edn** in isolation from those and other specific use cases, to help facilitate
 11 | implementation of readers and writers in other languages, and for other uses.
 12 | 
 13 | **edn** supports a rich set of built-in elements, and the definition of extension elements in terms
 14 | of the others. Users of data formats without such facilities must rely on either convention or
 15 | context to convey elements not included in the base set. This greatly complicates application
 16 | logic, betraying the apparent simplicity of the format. **edn** is simple, yet powerful enough to
 17 | meet the demands of applications without convention or complex context-sensitive logic.
 18 | 
 19 | **edn** is a system for the conveyance of _values_. It is not a type system, and has no schemas.
 20 | Nor is it a system for representing objects - there are no reference types, nor should a consumer
 21 | have an expectation that two equivalent elements in some body of **edn** will yield distinct object
 22 | identities when read, unless a reader implementation goes out of its way to make such a promise.
 23 | Thus the resulting values should be considered immutable, and a reader implementation should yield
 24 | values that ensure this, to the extent possible.
 25 | 
 26 | **edn** is a set of definitions for acceptable _elements_. A use of **edn** might be a stream or
 27 | file containing elements, but it could be as small as the conveyance of a single element in e.g. an
 28 | HTTP query param.
 29 | 
 30 | There is no enclosing element at the top level. Thus **edn** is suitable for streaming and
 31 | interactive applications.
 32 | 
 33 | The base set of elements in **edn** is meant to cover the basic set of data structures common to
 34 | most programming languages. While **edn** specifies how those elements are formatted in text, it
 35 | does not dictate the representation that results on the consumer side. A well behaved reader
 36 | library should endeavor to map the elements to programming language types with similar semantics.
 37 | 
 38 | # Spec
 39 | 
 40 | Currently this specification is casual, as we gather feedback from implementors. A more rigorous
 41 | e.g. BNF will follow.
 42 | 
 43 | ## General considerations
 44 | 
 45 | **edn** elements, streams and files should be encoded using [UTF-8](http://en.wikipedia.org/wiki/UTF-8).
 46 | 
 47 | Elements are generally separated by whitespace. Whitespace, other than within strings, is not
 48 | otherwise significant, nor need redundant whitespace be preserved during transmissions. Commas `,`
 49 | are also considered whitespace, other than within strings.
 50 | 
 51 | The delimiters `{ } ( ) [ ]` need not be separated from adjacent elements by whitespace.
 52 | 
 53 | ### # dispatch character
 54 | 
 55 | Tokens beginning with `#` are reserved. The character following `#` determines the behavior. The
 56 | dispatches `#{` (sets), `#_` (discard), #alphabetic-char (tag) are defined below. `#` is not a
 57 | delimiter.
 58 | 
 59 | ## Built-in elements
 60 | 
 61 | ### nil
 62 | 
 63 | `nil` represents nil, null or nothing. It should be read as an object with similar meaning on the
 64 | target platform.
 65 | 
 66 | ### booleans
 67 | 
 68 | `true` and `false` should be mapped to booleans.
 69 | 
 70 | If a platform has canonic values for true and false, it is a further semantic of booleans that all
 71 | instances of `true` yield that (identical) value, and similarly for `false`.
 72 | 
 73 | ### strings
 74 | 
 75 | Strings are enclosed in `"double quotes"`. May span multiple lines. Standard C/Java escape
 76 | characters `\t, \r, \n, \\ and \"` are supported.
 77 | 
 78 | ### characters
 79 | 
 80 | Characters are preceded by a backslash: `\c`, `\newline`, `\return`, `\space` and `\tab` yield the
 81 | corresponding characters. Unicode characters are represented with \uNNNN as in Java. Backslash cannot be
 82 | followed by whitespace.
 83 | 
 84 | ### symbols
 85 | 
 86 | Symbols are used to represent identifiers, and should map to something other than strings, if
 87 | possible.
 88 | 
 89 | Symbols begin with a non-numeric character and can contain alphanumeric characters and `. * + ! - _ ?
 90 | $ % & = < >`. If `-`, `+` or `.` are the first character, the second character (if any) must be
 91 | non-numeric. Additionally, `: #` are allowed as constituent characters in symbols other than as the
 92 | first character.
 93 | 
 94 | `/` has special meaning in symbols. It can be used once only in the middle of a symbol to separate
 95 | the _prefix_ (often a namespace) from the _name_, e.g. `my-namespace/foo`. `/` by itself is a legal
 96 | symbol, but otherwise neither the _prefix_ nor the _name_ part can be empty when the symbol
 97 | contains `/`.
 98 | 
 99 | If a symbol has a _prefix_ and `/`, the following _name_ component should follow the
100 | first-character restrictions for symbols as a whole. This is to avoid ambiguity in reading contexts
101 | where prefixes might be presumed as implicitly included namespaces and elided thereafter.
102 | 
103 | ### keywords
104 | 
105 | Keywords are identifiers that typically designate themselves. They are semantically akin to
106 | enumeration values. Keywords follow the rules of symbols, except they can (and must) begin with `:`, e.g. `:fred` or `:my/fred`. If the target platform does not have a keyword type distinct
107 | from a symbol type, the same type can be used without conflict, since the mandatory leading `:` of
108 | keywords is disallowed for symbols. Per the symbol rules above, :/ and :/anything are not legal keywords.
109 | A keyword cannot begin with ::
110 | 
111 | If the target platform supports some notion of interning, it is a further semantic of keywords that
112 | all instances of the same keyword yield the identical object.
113 | 
114 | ### integers
115 | 
116 | Integers consist of the digits `0` - `9`, optionally prefixed by `-` to indicate a negative number, or
117 | (redundantly) by `+`. No integer other than 0 may begin with 0. 64-bit (signed integer) precision is
118 | expected. An integer can have the suffix `N` to indicate that arbitrary precision is desired. -0 is a
119 | valid integer not distinct from 0.
120 | 
121 |     integer
122 |       int
123 |       int N
124 |     digit
125 |       0-9
126 |     int
127 |       digit
128 |       1-9 digits
129 |       + digit
130 |       + 1-9 digits
131 |       - digit
132 |       - 1-9 digits
133 | 
134 | ### floating point numbers
135 | 
136 | 64-bit (double) precision is expected.
137 | 
138 |     floating-point-number
139 |       int M
140 |       int frac
141 |       int exp
142 |       int frac exp
143 |     digit
144 |       0-9
145 |     int
146 |       digit
147 |       1-9 digits
148 |       + digit
149 |       + 1-9 digits
150 |       - digit
151 |       - 1-9 digits
152 |     frac
153 |       . digits
154 |     exp
155 |       ex digits
156 |     digits
157 |       digit
158 |       digit digits
159 |     ex
160 |       e
161 |       e+
162 |       e-
163 |       E
164 |       E+
165 |       E-
166 | 
167 | In addition, a floating-point number may have the suffix `M` to indicate that exact precision is
168 | desired.
169 | 
170 | ### lists
171 | 
172 | A list is a sequence of values. Lists are represented by zero or more elements enclosed in
173 | parentheses `()`. Note that lists can be heterogeneous.
174 |  
175 |     (a b 42)
176 | 
177 | ### vectors
178 | 
179 | A vector is a sequence of values that supports random access. Vectors are represented by zero or
180 | more elements enclosed in square brackets `[]`. Note that vectors can be heterogeneous.
181 | 
182 |     [a b 42]
183 | 
184 | ### maps
185 | 
186 | A map is a collection of associations between keys and values. Maps are represented by zero or more
187 | key and value pairs enclosed in curly braces `{}`. Each key should appear at most once. No
188 | semantics should be associated with the order in which the pairs appear.
189 | 
190 |     {:a 1, "foo" :bar, [1 2 3] four}
191 | 
192 | Note that keys and values can be elements of any type. The use of commas above is optional, as they
193 | are parsed as whitespace.
194 | 
195 | ### sets
196 | 
197 | A set is a collection of unique values. Sets are represented by zero or more elements enclosed in
198 | curly braces preceded by `#` `#{}`. No semantics should be associated with the order in which the
199 | elements appear. Note that sets can be heterogeneous.
200 | 
201 |     #{a b [1 2 3]}
202 | 
203 | ## tagged elements
204 | 
205 | **edn** supports extensibility through a simple mechanism. `#` followed immediately by a symbol
206 | starting with an alphabetic character indicates that _that symbol_ is a **_tag_**. A tag indicates
207 | the semantic interpretation of _the following element_. It is envisioned that a reader
208 | implementation will allow clients to register handlers for specific tags. Upon encountering a tag,
209 | the reader will first read the next element (which may itself be or comprise other tagged elements),
210 | then pass the result to the corresponding handler for further interpretation, and the result of the
211 | handler will be the data value yielded by the tag + tagged element, i.e. reading a tag and tagged
212 | element yields one value. This value is the value to be returned to the program and is not further
213 | interpreted as **edn** data by the reader.
214 | 
215 | This process will bottom out on elements either understood or built-in. 
216 | 
217 | Thus you can build new distinct readable elements out of (and only out of) other readable elements,
218 | keeping extenders and extension consumers out of the text business.
219 | 
220 | The semantics of a tag, and the type and interpretation of the tagged element are defined by the
221 | steward of the tag.
222 | 
223 |     #myapp/Person {:first "Fred" :last "Mertz"}
224 | 
225 | If a reader encounters a tag for which no handler is registered, the implementation can either
226 | report an error, call a designated 'unknown element' handler, or create a well-known generic
227 | representation that contains both the tag and the tagged element, as it sees fit. Note that the
228 | non-error strategies allow for readers which are capable of reading any and all **edn**, in spite
229 | of being unaware of the details of any extensions present.
230 | 
231 | ### rules for tags
232 | 
233 | Tag symbols without a prefix are reserved by **edn** for built-ins defined using the tag system. 
234 | 
235 | User tags _**must**_ contain a prefix component, which must be owned by the user (e.g. trademark or
236 | domain) or known unique in the communication context.
237 | 
238 | A tag _may_ specify more than one format for the tagged element, e.g. both a string and a vector
239 | representation.
240 | 
241 | Tags themselves are not elements. It is an error to have a tag without a corresponding tagged
242 | element.
243 | 
244 | ## built-in tagged elements
245 | 
246 | ### #inst "rfc-3339-format"
247 | 
248 | An instant in time. The tagged element is a string in
249 | [RFC-3339](http://www.ietf.org/rfc/rfc3339.txt) format.
250 | 
251 | `#inst "1985-04-12T23:20:50.52Z"`
252 | 
253 | ### #uuid "f81d4fae-7dec-11d0-a765-00a0c91e6bf6"
254 | 
255 | A [UUID](http://en.wikipedia.org/wiki/Universally_unique_identifier). The tagged element is a
256 | canonical UUID string representation.
257 | 
258 | ## comments
259 | 
260 | If a `;` character is encountered outside of a string, that character and all subsequent characters
261 | to the next newline should be ignored.
262 | 
263 | ## discard
264 | 
265 | `#` followed immediately by `_` is the discard sequence, indicating that the next element (whether
266 | separated from `#_` by whitespace or not) should be read and discarded. Note that the next element
267 | must still be a readable element. A reader should not call user-supplied tag handlers during the
268 | processing of the element to be discarded.
269 | 
270 |     [a b #_foo 42] => [a b 42]
271 | 
272 | The discard sequence is not an element. It is an error to have a discard sequence without a
273 | following element.
274 | 
275 | ## equality
276 | 
277 | Sets and maps have requirements that their elements and keys respectively be unique, which requires
278 | a mechanism for determining when 2 values are not unique (i.e. are equal).
279 | 
280 | nil, booleans, strings, characters, and symbols are equal to values of the same type with the same
281 | **edn** representation.
282 | 
283 | integers and floating point numbers should be considered equal to values only of the same
284 | magnitude, _type, and precision_. Comingling numeric types and precision in map/set key/elements,
285 | or constituents therein, is not advised.
286 | 
287 | sequences (lists and vectors) are equal to other sequences whose count of elements is the same, and
288 | for which each corresponding pair of elements (by ordinal) is equal.
289 | 
290 | sets are equal if they have the same count of elements and, for every element in one set, an equal
291 | element is in the other.
292 | 
293 | maps are equal if they have the same number of entries, and for every key/value entry in one map an
294 | equal key is present and mapped to an equal value in the other.
295 | 
296 | tagged elements must define their own equality semantics. #uuid elements are equal if their canonic
297 | representations are equal. #inst elements are equal if their representation strings designate the
298 | same timestamp per [RFC-3339](http://www.ietf.org/rfc/rfc3339.txt).
299 | 
300 | 
301 | 


--------------------------------------------------------------------------------