├── .gitignore ├── README.md ├── epl-v10.html ├── project.clj ├── src └── clj_lazy_json │ ├── cdx.clj │ └── core.clj └── test └── clj_lazy_json └── test └── core.clj /.gitignore: -------------------------------------------------------------------------------- 1 | /pom.xml 2 | *jar 3 | /lib 4 | /classes 5 | /native 6 | /.lein-failures 7 | /checkouts 8 | /.lein-deps-sum 9 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # clj-lazy-json 2 | 3 | A [Jackson](http://jackson.codehaus.org/)-based lazy JSON parsing 4 | library for Clojure. 5 | 6 | Some code from the (EPL-licensed) `clojure.data.xml` library is being 7 | reused here; see below for details. 8 | 9 | Please note that at this early stage the API and even the scope of 10 | this library is subject to change without notice. 11 | 12 | ## Usage 13 | 14 | ### Overview 15 | 16 | `clj-lazy-json` provides a Clojure wrapper for Jackson's stream -- 17 | parse event-based -- API and a method of processing seqs parse event. 18 | The latter is based on a query / path specification language for 19 | matching nodes in a JSON document (vaguely resembling -- simplified -- 20 | XQuery, CSS selectors and the like); a `define-json-processor` macro 21 | allows one to package a handful of paths together with appropriate 22 | callbacks in a regular Clojure function which can then be used to 23 | process JSON documents. 24 | 25 | JSON text can be parsed into a parse event seq by the `parse` 26 | function, which can be called on anything acceptable to 27 | `clojure.java.io/reader` (e.g. a `File`, `URI` or a ready-made 28 | `Reader`). `parse-string` is a convenience wrapper for dealing with 29 | JSON documents contained in strings. 30 | 31 | During development, rather than defining named JSON processing 32 | functions, it may be convenient to use the `process-json` 33 | function; for example 34 | 35 | (process-json (parse-string "{\"foo\": 1, \"bar\": 2}") 36 | {} 37 | [:$ "foo"] #(apply prn "Foo!" %&) 38 | [:$ "bar"] #(apply prn "Bar!" %&)) 39 | 40 | prints 41 | 42 | "Foo!" [:$ "foo"] 1 43 | "Bar!" [:$ "bar"] 2 44 | 45 | and returns `nil`. To achieve the same effect with a named processor, 46 | one would say 47 | 48 | (define-json-processor foo-bar-processor 49 | [:$ "foo"] #(apply prn "Foo!" %&) 50 | [:$ "bar"] #(apply prn "Bar!" %&)) 51 | 52 | (foo-bar-processor (parse-string "{\"foo\": 1, \"bar\": 2}")) 53 | 54 | Wildcards matching "any key/index" (`:*`) or "any subpath" (`:**`) are 55 | supported in paths. The docstring of the `define-json-processor` macro 56 | contains a description of the path language and the contract which 57 | must be met by the callback functions. 58 | 59 | Note that no JSON emitting functionality is currently supported; this 60 | is available in both `clojure.data.json` and `clj-json`. 61 | 62 | ### Example 63 | 64 | Let's have a look at an example. First, a simple JSON document: 65 | 66 | (def test-json 67 | "{\"foo\": [{\"bar\": 1}, {\"foo\": {\"quux\": {\"bar\": 2}}}], 68 | \"bar\": [3]}") 69 | 70 | Suppose we want to call some function with the values attached to bars 71 | below at least one foo. We'll use the following callback function: 72 | 73 | (defn print-value-callback [_ v] (prn v)) 74 | 75 | To demonstrate the use of a callback's first argument, we'll also call 76 | a function to print out its value at a different path. This function 77 | is defined inline in the parser specification below, just to show it's 78 | possible. 79 | 80 | (define-json-processor example-processor 81 | "Print out the values attached to bars below at least one foo." 82 | [:** "foo" :** "bar"] print-value-callback 83 | [:$ "bar" :*] (fn print-path [path _] (prn path))) 84 | 85 | Here `example-processor` is a regular Clojure function. It takes one 86 | argument named `lazy-json-tree` and has the specified docstring 87 | attached. To test it out on our example document, one would say 88 | 89 | (example-processor (parse-string test-json)) 90 | 1 91 | 2 92 | 2 93 | [:$ "bar" 0] 94 | ; nil 95 | 96 | The `2` is printed twice, because its position in the tree matches the 97 | first path in two ways (see below for details). 98 | 99 | The DSL used to define `example-processor` breaks down as follows: 100 | 101 | [:** "foo" :** "bar"] print-value 102 | ; <----- path -----> 103 | ; ^- :** -- skip any (possibly empty) subpath 104 | ; ^- "foo" -- expect to see an object; descend into the value 105 | ; attached to key "foo" 106 | ; ^- :** -- skip any subpath 107 | ; ^- "bar" -- descend at key "bar"; this is the end 108 | ; of the path spec, so call the attached 109 | ; callback -- print-value -- with the 110 | ; current node 111 | 112 | [:$ "bar" :*] (fn print-path [path _] (prn path)) 113 | ; ^- match document root 114 | ; ^- expect previously matched element (= root) to be an object; 115 | ; follow key "bar" 116 | ; ^- expect to see an object or an array; call the callback 117 | ; for all children (:* matches any single step in the path) 118 | 119 | The callbacks receive two arguments: the exact path to the current 120 | node in the JSON document, which is a vector of `:$` possibly followed 121 | by strings (object keys) and numbers (array indices), and a standard 122 | "Clojurized" representation of the node's value (with objects 123 | converted to maps and arrays to vectors). 124 | 125 | ## Use of `clojure.data.xml` code 126 | 127 | The lazy trees used here are constructed using two functions from 128 | `clojure.data.xml`, `fill-queue` and `seq-tree`, copied here because 129 | they are marked private in their original namespace of residence. 130 | `clojure.data.xml` code carries the following notice: 131 | 132 | ; Copyright (c) Rich Hickey. All rights reserved. 133 | ; The use and distribution terms for this software are covered by the 134 | ; Eclipse Public License 1.0 (http://opensource.org/licenses/eclipse-1.0.php) 135 | ; which can be found in the file epl-v10.html at the root of this distribution. 136 | ; By using this software in any fashion, you are agreeing to be bound by 137 | ; the terms of this license. 138 | ; You must not remove this notice, or any other, from this software. 139 | 140 | This notice is also reproduced in the `src/clj_lazy_json/cdx.clj` file 141 | containing this code. See also the file `epl-v10.html` at the root of 142 | the present distribution. 143 | 144 | ## Fablo 145 | 146 | This work was sponsored by Fablo (http://fablo.eu/). Fablo provides a 147 | set of tools for building modern e-commerce storefronts. Tools include 148 | a search engine, product and search result navigation, accelerators, 149 | personalized recommendations, and real-time statistics and analytics. 150 | 151 | ## Licence 152 | 153 | Copyright (C) 2011 Michał Marczyk 154 | 155 | Distributed under the Eclipse Public License, the same as Clojure. 156 | -------------------------------------------------------------------------------- /epl-v10.html: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 5 | 6 | 7 | Eclipse Public License - Version 1.0 8 | 25 | 26 | 27 | 28 | 29 | 30 |

Eclipse Public License - v 1.0

31 | 32 |

THE ACCOMPANYING PROGRAM IS PROVIDED UNDER THE TERMS OF THIS ECLIPSE 33 | PUBLIC LICENSE ("AGREEMENT"). ANY USE, REPRODUCTION OR 34 | DISTRIBUTION OF THE PROGRAM CONSTITUTES RECIPIENT'S ACCEPTANCE OF THIS 35 | AGREEMENT.

36 | 37 |

1. DEFINITIONS

38 | 39 |

"Contribution" means:

40 | 41 |

a) in the case of the initial Contributor, the initial 42 | code and documentation distributed under this Agreement, and

43 |

b) in the case of each subsequent Contributor:

44 |

i) changes to the Program, and

45 |

ii) additions to the Program;

46 |

where such changes and/or additions to the Program 47 | originate from and are distributed by that particular Contributor. A 48 | Contribution 'originates' from a Contributor if it was added to the 49 | Program by such Contributor itself or anyone acting on such 50 | Contributor's behalf. Contributions do not include additions to the 51 | Program which: (i) are separate modules of software distributed in 52 | conjunction with the Program under their own license agreement, and (ii) 53 | are not derivative works of the Program.

54 | 55 |

"Contributor" means any person or entity that distributes 56 | the Program.

57 | 58 |

"Licensed Patents" mean patent claims licensable by a 59 | Contributor which are necessarily infringed by the use or sale of its 60 | Contribution alone or when combined with the Program.

61 | 62 |

"Program" means the Contributions distributed in accordance 63 | with this Agreement.

64 | 65 |

"Recipient" means anyone who receives the Program under 66 | this Agreement, including all Contributors.

67 | 68 |

2. GRANT OF RIGHTS

69 | 70 |

a) Subject to the terms of this Agreement, each 71 | Contributor hereby grants Recipient a non-exclusive, worldwide, 72 | royalty-free copyright license to reproduce, prepare derivative works 73 | of, publicly display, publicly perform, distribute and sublicense the 74 | Contribution of such Contributor, if any, and such derivative works, in 75 | source code and object code form.

76 | 77 |

b) Subject to the terms of this Agreement, each 78 | Contributor hereby grants Recipient a non-exclusive, worldwide, 79 | royalty-free patent license under Licensed Patents to make, use, sell, 80 | offer to sell, import and otherwise transfer the Contribution of such 81 | Contributor, if any, in source code and object code form. This patent 82 | license shall apply to the combination of the Contribution and the 83 | Program if, at the time the Contribution is added by the Contributor, 84 | such addition of the Contribution causes such combination to be covered 85 | by the Licensed Patents. The patent license shall not apply to any other 86 | combinations which include the Contribution. No hardware per se is 87 | licensed hereunder.

88 | 89 |

c) Recipient understands that although each Contributor 90 | grants the licenses to its Contributions set forth herein, no assurances 91 | are provided by any Contributor that the Program does not infringe the 92 | patent or other intellectual property rights of any other entity. Each 93 | Contributor disclaims any liability to Recipient for claims brought by 94 | any other entity based on infringement of intellectual property rights 95 | or otherwise. As a condition to exercising the rights and licenses 96 | granted hereunder, each Recipient hereby assumes sole responsibility to 97 | secure any other intellectual property rights needed, if any. For 98 | example, if a third party patent license is required to allow Recipient 99 | to distribute the Program, it is Recipient's responsibility to acquire 100 | that license before distributing the Program.

101 | 102 |

d) Each Contributor represents that to its knowledge it 103 | has sufficient copyright rights in its Contribution, if any, to grant 104 | the copyright license set forth in this Agreement.

105 | 106 |

3. REQUIREMENTS

107 | 108 |

A Contributor may choose to distribute the Program in object code 109 | form under its own license agreement, provided that:

110 | 111 |

a) it complies with the terms and conditions of this 112 | Agreement; and

113 | 114 |

b) its license agreement:

115 | 116 |

i) effectively disclaims on behalf of all Contributors 117 | all warranties and conditions, express and implied, including warranties 118 | or conditions of title and non-infringement, and implied warranties or 119 | conditions of merchantability and fitness for a particular purpose;

120 | 121 |

ii) effectively excludes on behalf of all Contributors 122 | all liability for damages, including direct, indirect, special, 123 | incidental and consequential damages, such as lost profits;

124 | 125 |

iii) states that any provisions which differ from this 126 | Agreement are offered by that Contributor alone and not by any other 127 | party; and

128 | 129 |

iv) states that source code for the Program is available 130 | from such Contributor, and informs licensees how to obtain it in a 131 | reasonable manner on or through a medium customarily used for software 132 | exchange.

133 | 134 |

When the Program is made available in source code form:

135 | 136 |

a) it must be made available under this Agreement; and

137 | 138 |

b) a copy of this Agreement must be included with each 139 | copy of the Program.

140 | 141 |

Contributors may not remove or alter any copyright notices contained 142 | within the Program.

143 | 144 |

Each Contributor must identify itself as the originator of its 145 | Contribution, if any, in a manner that reasonably allows subsequent 146 | Recipients to identify the originator of the Contribution.

147 | 148 |

4. COMMERCIAL DISTRIBUTION

149 | 150 |

Commercial distributors of software may accept certain 151 | responsibilities with respect to end users, business partners and the 152 | like. While this license is intended to facilitate the commercial use of 153 | the Program, the Contributor who includes the Program in a commercial 154 | product offering should do so in a manner which does not create 155 | potential liability for other Contributors. Therefore, if a Contributor 156 | includes the Program in a commercial product offering, such Contributor 157 | ("Commercial Contributor") hereby agrees to defend and 158 | indemnify every other Contributor ("Indemnified Contributor") 159 | against any losses, damages and costs (collectively "Losses") 160 | arising from claims, lawsuits and other legal actions brought by a third 161 | party against the Indemnified Contributor to the extent caused by the 162 | acts or omissions of such Commercial Contributor in connection with its 163 | distribution of the Program in a commercial product offering. The 164 | obligations in this section do not apply to any claims or Losses 165 | relating to any actual or alleged intellectual property infringement. In 166 | order to qualify, an Indemnified Contributor must: a) promptly notify 167 | the Commercial Contributor in writing of such claim, and b) allow the 168 | Commercial Contributor to control, and cooperate with the Commercial 169 | Contributor in, the defense and any related settlement negotiations. The 170 | Indemnified Contributor may participate in any such claim at its own 171 | expense.

172 | 173 |

For example, a Contributor might include the Program in a commercial 174 | product offering, Product X. That Contributor is then a Commercial 175 | Contributor. If that Commercial Contributor then makes performance 176 | claims, or offers warranties related to Product X, those performance 177 | claims and warranties are such Commercial Contributor's responsibility 178 | alone. Under this section, the Commercial Contributor would have to 179 | defend claims against the other Contributors related to those 180 | performance claims and warranties, and if a court requires any other 181 | Contributor to pay any damages as a result, the Commercial Contributor 182 | must pay those damages.

183 | 184 |

5. NO WARRANTY

185 | 186 |

EXCEPT AS EXPRESSLY SET FORTH IN THIS AGREEMENT, THE PROGRAM IS 187 | PROVIDED ON AN "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS 188 | OF ANY KIND, EITHER EXPRESS OR IMPLIED INCLUDING, WITHOUT LIMITATION, 189 | ANY WARRANTIES OR CONDITIONS OF TITLE, NON-INFRINGEMENT, MERCHANTABILITY 190 | OR FITNESS FOR A PARTICULAR PURPOSE. Each Recipient is solely 191 | responsible for determining the appropriateness of using and 192 | distributing the Program and assumes all risks associated with its 193 | exercise of rights under this Agreement , including but not limited to 194 | the risks and costs of program errors, compliance with applicable laws, 195 | damage to or loss of data, programs or equipment, and unavailability or 196 | interruption of operations.

197 | 198 |

6. DISCLAIMER OF LIABILITY

199 | 200 |

EXCEPT AS EXPRESSLY SET FORTH IN THIS AGREEMENT, NEITHER RECIPIENT 201 | NOR ANY CONTRIBUTORS SHALL HAVE ANY LIABILITY FOR ANY DIRECT, INDIRECT, 202 | INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING 203 | WITHOUT LIMITATION LOST PROFITS), HOWEVER CAUSED AND ON ANY THEORY OF 204 | LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING 205 | NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OR 206 | DISTRIBUTION OF THE PROGRAM OR THE EXERCISE OF ANY RIGHTS GRANTED 207 | HEREUNDER, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGES.

208 | 209 |

7. GENERAL

210 | 211 |

If any provision of this Agreement is invalid or unenforceable under 212 | applicable law, it shall not affect the validity or enforceability of 213 | the remainder of the terms of this Agreement, and without further action 214 | by the parties hereto, such provision shall be reformed to the minimum 215 | extent necessary to make such provision valid and enforceable.

216 | 217 |

If Recipient institutes patent litigation against any entity 218 | (including a cross-claim or counterclaim in a lawsuit) alleging that the 219 | Program itself (excluding combinations of the Program with other 220 | software or hardware) infringes such Recipient's patent(s), then such 221 | Recipient's rights granted under Section 2(b) shall terminate as of the 222 | date such litigation is filed.

223 | 224 |

All Recipient's rights under this Agreement shall terminate if it 225 | fails to comply with any of the material terms or conditions of this 226 | Agreement and does not cure such failure in a reasonable period of time 227 | after becoming aware of such noncompliance. If all Recipient's rights 228 | under this Agreement terminate, Recipient agrees to cease use and 229 | distribution of the Program as soon as reasonably practicable. However, 230 | Recipient's obligations under this Agreement and any licenses granted by 231 | Recipient relating to the Program shall continue and survive.

232 | 233 |

Everyone is permitted to copy and distribute copies of this 234 | Agreement, but in order to avoid inconsistency the Agreement is 235 | copyrighted and may only be modified in the following manner. The 236 | Agreement Steward reserves the right to publish new versions (including 237 | revisions) of this Agreement from time to time. No one other than the 238 | Agreement Steward has the right to modify this Agreement. The Eclipse 239 | Foundation is the initial Agreement Steward. The Eclipse Foundation may 240 | assign the responsibility to serve as the Agreement Steward to a 241 | suitable separate entity. Each new version of the Agreement will be 242 | given a distinguishing version number. The Program (including 243 | Contributions) may always be distributed subject to the version of the 244 | Agreement under which it was received. In addition, after a new version 245 | of the Agreement is published, Contributor may elect to distribute the 246 | Program (including its Contributions) under the new version. Except as 247 | expressly stated in Sections 2(a) and 2(b) above, Recipient receives no 248 | rights or licenses to the intellectual property of any Contributor under 249 | this Agreement, whether expressly, by implication, estoppel or 250 | otherwise. All rights in the Program not expressly granted under this 251 | Agreement are reserved.

252 | 253 |

This Agreement is governed by the laws of the State of New York and 254 | the intellectual property laws of the United States of America. No party 255 | to this Agreement will bring a legal action under this Agreement more 256 | than one year after the cause of action arose. Each party waives its 257 | rights to a jury trial in any resulting litigation.

258 | 259 | 260 | 261 | 262 | -------------------------------------------------------------------------------- /project.clj: -------------------------------------------------------------------------------- 1 | (defproject clj-lazy-json "0.0.3" 2 | :description "Jackson-based lazy JSON parsing library for Clojure." 3 | :dependencies [[org.clojure/clojure "[1.2.0,1.3.0]"] 4 | [org.codehaus.jackson/jackson-core-asl "1.8.6"]] 5 | ; :jvm-opts ["-Xmx512m" "-XX:+UseConcMarkSweepGC"] 6 | ) 7 | -------------------------------------------------------------------------------- /src/clj_lazy_json/cdx.clj: -------------------------------------------------------------------------------- 1 | ;;; The following functions have been extracted from the 2 | ;;; clojure.data.xml library, see http://github.com/clojure/data.xml 3 | ;;; The only modification made to these definitions is the removal of 4 | ;;; the "private" marking (s/defn-/defn/g). 5 | 6 | ;;; clojure.data.xml code carries the following notice: 7 | ; Copyright (c) Rich Hickey. All rights reserved. 8 | ; The use and distribution terms for this software are covered by the 9 | ; Eclipse Public License 1.0 (http://opensource.org/licenses/eclipse-1.0.php) 10 | ; which can be found in the file epl-v10.html at the root of this distribution. 11 | ; By using this software in any fashion, you are agreeing to be bound by 12 | ; the terms of this license. 13 | ; You must not remove this notice, or any other, from this software. 14 | 15 | (ns clj-lazy-json.cdx 16 | (:import (java.util.concurrent LinkedBlockingQueue TimeUnit) 17 | (java.lang.ref WeakReference))) 18 | 19 | (defn seq-tree 20 | "Takes a seq of events that logically represents 21 | a tree by each event being one of: enter-sub-tree event, 22 | exit-sub-tree event, or node event. 23 | 24 | Returns a lazy sequence whose first element is a sequence of 25 | sub-trees and whose remaining elements are events that are not 26 | siblings or descendants of the initial event. 27 | 28 | The given exit? function must return true for any exit-sub-tree 29 | event. parent must be a function of two arguments: the first is an 30 | event, the second a sequence of nodes or subtrees that are children 31 | of the event. parent must return nil or false if the event is not 32 | an enter-sub-tree event. Any other return value will become 33 | a sub-tree of the output tree and should normally contain in some 34 | way the children passed as the second arg. The node function is 35 | called with a single event arg on every event that is neither parent 36 | nor exit, and its return value will become a node of the output tree. 37 | 38 | (seq-tree #(when (= %1 :<) (vector %2)) #{:>} str 39 | [1 2 :< 3 :< 4 :> :> 5 :> 6]) 40 | ;=> ((\"1\" \"2\" [(\"3\" [(\"4\")])] \"5\") 6)" 41 | [parent exit? node coll] 42 | (lazy-seq 43 | (when-let [[event] (seq coll)] 44 | (let [more (rest coll)] 45 | (if (exit? event) 46 | (cons nil more) 47 | (let [tree (seq-tree parent exit? node more)] 48 | (if-let [p (parent event (lazy-seq (first tree)))] 49 | (let [subtree (seq-tree parent exit? node (lazy-seq (rest tree)))] 50 | (cons (cons p (lazy-seq (first subtree))) 51 | (lazy-seq (rest subtree)))) 52 | (cons (cons (node event) (lazy-seq (first tree))) 53 | (lazy-seq (rest tree)))))))))) 54 | 55 | (defn fill-queue 56 | "filler-func will be called in another thread with a single arg 57 | 'fill'. filler-func may call fill repeatedly with one arg each 58 | time which will be pushed onto a queue, blocking if needed until 59 | this is possible. fill-queue will return a lazy seq of the values 60 | filler-func has pushed onto the queue, blocking if needed until each 61 | next element becomes available. filler-func's return value is ignored." 62 | ([filler-func & optseq] 63 | (let [opts (apply array-map optseq) 64 | apoll (:alive-poll opts 1) 65 | q (LinkedBlockingQueue. (:queue-size opts 1)) 66 | NIL (Object.) ;nil sentinel since LBQ doesn't support nils 67 | weak-target (Object.) 68 | alive? (WeakReference. weak-target) 69 | fill (fn fill [x] 70 | (if (.get alive?) 71 | (if (.offer q (if (nil? x) NIL x) apoll TimeUnit/SECONDS) 72 | x 73 | (recur x)) 74 | (throw (Exception. "abandoned")))) 75 | f (future 76 | (try 77 | (filler-func fill) 78 | (finally 79 | (.put q q))) ;q itself is eos sentinel 80 | nil)] ; set future's value to nil 81 | ((fn drain [] 82 | weak-target ; force closing over this object 83 | (lazy-seq 84 | (let [x (.take q)] 85 | (if (identical? x q) 86 | @f ;will be nil, touch just to propagate errors 87 | (cons (if (identical? x NIL) nil x) 88 | (drain)))))))))) 89 | -------------------------------------------------------------------------------- /src/clj_lazy_json/core.clj: -------------------------------------------------------------------------------- 1 | (ns clj-lazy-json.core 2 | (:require (clojure.java [io :as io])) 3 | (:use clj-lazy-json.cdx) 4 | (:import (org.codehaus.jackson JsonFactory JsonParser JsonToken))) 5 | 6 | (def ^{:private true 7 | :tag JsonFactory} 8 | factory (JsonFactory.)) 9 | 10 | (defrecord Event [type contents]) 11 | 12 | (defn ^:private event [type & [contents]] 13 | (Event. type contents)) 14 | 15 | (defn ^:private fill-from-jackson 16 | "Filler function for use with fill-queue and Jackson." 17 | [^JsonParser parser fill] 18 | (letfn [(offer [type & [contents]] 19 | (fill (event type contents)))] 20 | (loop [token (.nextToken parser)] 21 | (when token 22 | (condp = token 23 | JsonToken/START_OBJECT (offer :start-object) 24 | JsonToken/END_OBJECT (offer :end-object) 25 | JsonToken/START_ARRAY (offer :start-array) 26 | JsonToken/END_ARRAY (offer :end-array) 27 | JsonToken/FIELD_NAME (offer :field-name (.getCurrentName parser)) 28 | JsonToken/NOT_AVAILABLE (offer :not-available) 29 | JsonToken/VALUE_EMBEDDED_OBJECT (offer :value-embedded-object) ; ? 30 | JsonToken/VALUE_FALSE (offer :atom false (.getBooleanValue parser)) 31 | JsonToken/VALUE_TRUE (offer :atom true (.getBooleanValue parser)) 32 | JsonToken/VALUE_NULL (offer :atom nil) 33 | JsonToken/VALUE_NUMBER_FLOAT (offer :atom (.getNumberValue parser)) 34 | JsonToken/VALUE_NUMBER_INT (offer :atom (.getNumberValue parser)) 35 | JsonToken/VALUE_STRING (offer :atom (.getText parser)) 36 | (throw (RuntimeException. 37 | (str "Missed a token type in fill-from-jackson: " 38 | token)))) 39 | (recur (.nextToken parser)))))) 40 | 41 | (defn parse 42 | "Returns a seq of parse events for the given source." 43 | ([source] (parse source factory)) 44 | ([source factory] 45 | (let [parser (.createJsonParser ^JsonFactory factory (io/reader source)) 46 | 47 | token->event 48 | (fn token->event [token] 49 | (condp = token 50 | JsonToken/START_OBJECT (event :start-object) 51 | JsonToken/END_OBJECT (event :end-object) 52 | JsonToken/START_ARRAY (event :start-array) 53 | JsonToken/END_ARRAY (event :end-array) 54 | JsonToken/FIELD_NAME (event :field-name (.getCurrentName parser)) 55 | JsonToken/NOT_AVAILABLE (event :not-available) 56 | JsonToken/VALUE_EMBEDDED_OBJECT (event :value-embedded-object) ; ? 57 | JsonToken/VALUE_FALSE (event :atom false) 58 | JsonToken/VALUE_TRUE (event :atom true) 59 | JsonToken/VALUE_NULL (event :atom nil) 60 | JsonToken/VALUE_NUMBER_FLOAT (event :atom (.getNumberValue parser)) 61 | JsonToken/VALUE_NUMBER_INT (event :atom (.getNumberValue parser)) 62 | JsonToken/VALUE_STRING (event :atom (.getText parser)) 63 | (throw (RuntimeException. 64 | (str "Missed a token type in lazy-source-seq: " 65 | token))))) 66 | 67 | token-seq 68 | (fn token-seq [] 69 | (lazy-seq 70 | (when-let [token (.nextToken parser)] 71 | (cons (token->event token) 72 | (token-seq)))))] 73 | 74 | (token-seq)))) 75 | 76 | ;;; adapted from clojure.data.xml 77 | (defn ^:private event-tree 78 | "Returns a lazy tree of :object, :array and :atom nodes for the 79 | given seq of events." 80 | [events] 81 | (ffirst 82 | (seq-tree 83 | (fn [^Event event contents] 84 | (condp = (:type event) 85 | :start-object {:type :object 86 | :entries (->> contents 87 | (partition 2) 88 | (map (fn [[k v]] 89 | (clojure.lang.MapEntry. k v))))} 90 | :start-array {:type :array :entries contents} 91 | nil)) 92 | (fn [^Event event] 93 | (or (= :end-object (:type event)) 94 | (= :end-array (:type event)))) 95 | (fn [^Event event] 96 | {:type (:type event) :contents (:contents event)}) 97 | events))) 98 | 99 | (defn ^:private skip-object [events lvl] 100 | (lazy-seq 101 | (if-not (zero? lvl) 102 | (when-first [e events] 103 | (case (:type e) 104 | :end-object (cons e (skip-object (next events) (dec lvl))) 105 | :start-object (cons e (skip-object (next events) (inc lvl))) 106 | (cons e (skip-object (next events) lvl))))))) 107 | 108 | (defn ^:private skip-array [events lvl] 109 | (lazy-seq 110 | (if-not (zero? lvl) 111 | (when-first [e events] 112 | (case (:type e) 113 | :end-array (cons e (skip-array (next events) (dec lvl))) 114 | :start-array (cons e (skip-array (next events) (inc lvl))) 115 | (cons e (skip-array (next events) lvl))))))) 116 | 117 | (defn ^:private to-tree [events] 118 | (when-first [e events] 119 | (case (:type e) 120 | :start-object (event-tree (cons e (skip-object (next events) 1))) 121 | :start-array (event-tree (cons e (skip-array (next events) 1))) 122 | e))) 123 | 124 | (defn parse-string 125 | "Parses the JSON document contained in the string s into a seq of parse events." 126 | [s] 127 | (-> s java.io.StringReader. parse)) 128 | 129 | (defn ^:private to-clj 130 | "Converts a lazy JSON tree to the natural Clojure representation." 131 | [json] 132 | (case (:type json) 133 | (:atom :field-name) (:contents json) 134 | :array (vec (map to-clj (:entries json))) 135 | :object (into {} (map (fn [[k v]] 136 | [(to-clj k) (to-clj v)]) 137 | (:entries json))))) 138 | 139 | (defn build-automaton 140 | "Used internally by process-json and define-json-processor. 141 | See the docstring on the latter for a description of the supported 142 | options and the path language. See the docstring on consume-json for 143 | a description of the basic behaviour implemented." 144 | [opts paths-and-callbacks] 145 | (loop [a {} pcs paths-and-callbacks] 146 | (if-let [[path callback] (first pcs)] 147 | (recur (assoc-in a (conj path ::here) callback) 148 | (next pcs)) 149 | a))) 150 | 151 | (defn ^:private step-automaton [automaton path] 152 | (letfn [(merge-pcs [left right] 153 | (if (map? left) 154 | (merge-with merge-pcs left right) 155 | (fn [path json] (left path json) (right path json))))] 156 | (merge-with merge-pcs 157 | (get automaton (peek path)) 158 | (get automaton :*) 159 | (when-let [starstar (get automaton :**)] 160 | (merge-with merge-pcs 161 | starstar 162 | {:** starstar}))))) 163 | 164 | (defn ^:private call-callbacks [automaton path events] 165 | (when-let [callback (get automaton ::here)] 166 | (let [datum (to-clj (to-tree events))] 167 | (callback path datum)) 168 | true)) 169 | 170 | (defn consume-json 171 | "Used internally by process-json and define-json-processor." 172 | [automaton events path] 173 | (letfn [(go [as path events] 174 | (when-first [e events] 175 | (let [path (if (number? (peek path)) 176 | (conj (pop path) (inc (peek path))) 177 | path)] 178 | (case (:type e) 179 | (:end-array :end-object) 180 | (recur (pop as) (pop path) (next events)) 181 | 182 | :start-array 183 | (do (call-callbacks (peek as) path events) 184 | (let [new-path (conj path -1) 185 | new-a (step-automaton (peek as) new-path)] 186 | (recur (conj as new-a) new-path (next events)))) 187 | 188 | :start-object 189 | (do (call-callbacks (peek as) path events) 190 | (recur (conj as nil) (conj path nil) (next events))) 191 | 192 | :field-name 193 | (let [new-path (conj (pop path) (:contents e)) 194 | new-a (step-automaton (peek (pop as)) new-path)] 195 | (recur (conj (pop as) new-a) new-path (next events))) 196 | 197 | :atom 198 | (do (call-callbacks (peek as) path events) 199 | (recur as path (next events)))))))] 200 | (go [(step-automaton automaton path)] path events))) 201 | 202 | (defn process-json 203 | "Constructs a one-off JSON processor and uses it to process parsed-json. 204 | See docstring on define-json-processor for processor definition 205 | syntax and supported options." 206 | [parsed-json opts & paths-and-callbacks] 207 | (consume-json (build-automaton opts (map vec (partition 2 paths-and-callbacks))) 208 | parsed-json 209 | [:$])) 210 | 211 | (defmacro define-json-processor 212 | "Defines a function of the given name and, optionally, with the 213 | given docstring, which takes a single argument, a seq of parse 214 | events describing a JSON datum (as output by the parse and 215 | parse-string functions), and processes it lazily in accordance with 216 | the given specification. 217 | 218 | Options are currently ignored. 219 | 220 | Paths are specified using the following language: 221 | :$ matches the root datum only; 222 | :* matches any datum in the current position in the path; 223 | :** matches any subpath; 224 | a literal string matches an object entry at that key; 225 | a literal number matches an array entry at that index. 226 | 227 | Callbacks receive two arguments: the complete path to the current 228 | node (starting at :$) and the clojurized representation of the 229 | node (as would be returned by clj-json or clojure.data.json). 230 | 231 | Example: 232 | 233 | (define-json-processor example-processor 234 | \"A simple JSON processor.\" 235 | [:$ \"foo\" 0] #(do (apply prn \"This is particularly interesting:\" %&)) 236 | [:**] prn) 237 | 238 | (example-processor (-> \"{\\\"foo\\\": [1], \\\"bar\\\": [2]}\" 239 | parse-string) 240 | ;; returns nil; printed output follows: 241 | [:$] {\"foo\" [1], \"bar\" [2]} 242 | [:$ \"foo\"] [1] 243 | \"This is particularly interesting:\" [:$ \"foo\" 0] 1 244 | [:$ \"foo\" 0] 1 245 | [:$ \"bar\"] [2] 246 | [:$ \"bar\" 0] 2" 247 | [name docstring? opts? & paths-and-callbacks] 248 | (let [docstring (if (string? docstring?) docstring?) 249 | opts (if docstring 250 | (if (map? opts?) opts?) 251 | (if (map? docstring?) docstring?)) 252 | paths-and-callbacks (cond 253 | (and docstring opts) 254 | paths-and-callbacks 255 | (or docstring opts) 256 | (cons opts? paths-and-callbacks) 257 | :else 258 | (concat [docstring? opts?] paths-and-callbacks)) 259 | paths-and-callbacks (vec (map vec (partition 2 paths-and-callbacks)))] 260 | `(let [automaton# (build-automaton ~opts ~paths-and-callbacks)] 261 | (defn ~name ~@(when docstring [docstring]) [~'parsed-json] 262 | (consume-json automaton# ~'parsed-json [:$]))))) 263 | -------------------------------------------------------------------------------- /test/clj_lazy_json/test/core.clj: -------------------------------------------------------------------------------- 1 | (ns clj-lazy-json.test.core 2 | (:require [clj-lazy-json.core :as core]) 3 | (:use [clojure.test])) 4 | 5 | (deftest basic-parse-test 6 | (let [json-str "{\"foo\": [1, 2], \"bar\": [null, true, false]}" 7 | m {"foo" [1 2] 8 | "bar" [nil true false]} 9 | clj (atom nil)] 10 | (core/process-json (core/parse-string json-str) 11 | {} 12 | [:$] (fn [_ datum] (reset! clj datum))) 13 | (is (= m @clj)))) 14 | 15 | (defn counting-string-reader 16 | "Returns a StringReader on s augmented with a counter of characters 17 | read accessible via deref." 18 | [s] 19 | (let [counter (atom 0)] 20 | (proxy [java.io.StringReader clojure.lang.IDeref] [s] 21 | (close [] (proxy-super close)) 22 | (mark [n] (proxy-super mark n)) 23 | (markSupported [this] (proxy-super markSupported)) 24 | (read 25 | ([] 26 | (swap! counter inc) 27 | (proxy-super read)) 28 | ([cbuf] 29 | (let [n (proxy-super read cbuf)] 30 | (when-not (== n -1) 31 | (swap! counter + n)) 32 | n)) 33 | ([cbuf off len] 34 | (let [n (proxy-super read cbuf off len)] 35 | (when-not (== n -1) 36 | (swap! counter + n)) 37 | n))) 38 | (ready [] (proxy-super ready)) 39 | (reset [] (proxy-super reset)) 40 | (skip [n] 41 | (swap! counter + n) 42 | (proxy-super skip n)) 43 | (deref [] @counter)))) 44 | 45 | (deftest test-laziness 46 | (let [test-json-string-1 (apply str "{" (concat (for [i (range 100000)] 47 | (str "\"foo" i "\": " i ", ")) 48 | ["\"bar\": null}"])) 49 | test-json-string (apply str "{" (concat (for [i (range 100)] 50 | (str "\"quux" i "\": " 51 | test-json-string-1 ", ")) 52 | ["\"baz\": null}"])) 53 | test-json-reader (counting-string-reader test-json-string)] 54 | (are [p] (thrown-with-msg? RuntimeException #"^stop here$" 55 | (core/process-json 56 | p 57 | {} 58 | [:$ "quux2" "foo3"] 59 | (fn [_ _] (throw (RuntimeException. "stop here"))))) 60 | (core/parse test-json-reader)) 61 | ;; sanity check / silly typo avoidance: 62 | (is (< (* 100 (count test-json-string-1)) (count test-json-string))) 63 | (are [r] (<= @r (* 4 (count test-json-string-1))) 64 | test-json-reader))) 65 | 66 | (def a (atom 0)) 67 | 68 | (core/define-json-processor accumulating-processor 69 | [:$ :*] (fn [_ n] (swap! a + n))) 70 | 71 | (deftest test-accumulating-processor 72 | (accumulating-processor (core/parse-string "{\"foo\": 1, \"bar\": 2}")) 73 | (is (== @a 3))) 74 | 75 | (def b (atom 0)) 76 | 77 | (core/define-json-processor all-matching-accumulating-processor 78 | [:$ :*] (fn [_ n] (swap! b + n)) 79 | [:$ "foo"] (fn [_ _] (swap! b inc))) 80 | 81 | (deftest test-all-matching-accumulating-processor 82 | (all-matching-accumulating-processor 83 | (core/parse-string "{\"foo\": 1, \"bar\": 2}")) 84 | (is (== @b 4))) 85 | --------------------------------------------------------------------------------