├── LICENSE ├── deps.edn ├── readme.org ├── shadow-cljs.edn ├── src └── ribelo │ ├── danzig.cljc │ └── danzig │ ├── aggregate.clj │ └── io.cljc └── test └── ribelo └── danzig_test.cljc /LICENSE: -------------------------------------------------------------------------------- 1 | THE ACCOMPANYING PROGRAM IS PROVIDED UNDER THE TERMS OF THIS ECLIPSE PUBLIC 2 | LICENSE ("AGREEMENT"). ANY USE, REPRODUCTION OR DISTRIBUTION OF THE PROGRAM 3 | CONSTITUTES RECIPIENT'S ACCEPTANCE OF THIS AGREEMENT. 4 | 5 | 1. DEFINITIONS 6 | 7 | "Contribution" means: 8 | 9 | a) in the case of the initial Contributor, the initial code and 10 | documentation distributed under this Agreement, and 11 | 12 | b) in the case of each subsequent Contributor: 13 | 14 | i) changes to the Program, and 15 | 16 | ii) additions to the Program; 17 | 18 | where such changes and/or additions to the Program originate from and are 19 | distributed by that particular Contributor. A Contribution 'originates' from 20 | a Contributor if it was added to the Program by such Contributor itself or 21 | anyone acting on such Contributor's behalf. Contributions do not include 22 | additions to the Program which: (i) are separate modules of software 23 | distributed in conjunction with the Program under their own license 24 | agreement, and (ii) are not derivative works of the Program. 25 | 26 | "Contributor" means any person or entity that distributes the Program. 27 | 28 | "Licensed Patents" mean patent claims licensable by a Contributor which are 29 | necessarily infringed by the use or sale of its Contribution alone or when 30 | combined with the Program. 31 | 32 | "Program" means the Contributions distributed in accordance with this 33 | Agreement. 34 | 35 | "Recipient" means anyone who receives the Program under this Agreement, 36 | including all Contributors. 37 | 38 | 2. GRANT OF RIGHTS 39 | 40 | a) Subject to the terms of this Agreement, each Contributor hereby grants 41 | Recipient a non-exclusive, worldwide, royalty-free copyright license to 42 | reproduce, prepare derivative works of, publicly display, publicly perform, 43 | distribute and sublicense the Contribution of such Contributor, if any, and 44 | such derivative works, in source code and object code form. 45 | 46 | b) Subject to the terms of this Agreement, each Contributor hereby grants 47 | Recipient a non-exclusive, worldwide, royalty-free patent license under 48 | Licensed Patents to make, use, sell, offer to sell, import and otherwise 49 | transfer the Contribution of such Contributor, if any, in source code and 50 | object code form. This patent license shall apply to the combination of the 51 | Contribution and the Program if, at the time the Contribution is added by the 52 | Contributor, such addition of the Contribution causes such combination to be 53 | covered by the Licensed Patents. The patent license shall not apply to any 54 | other combinations which include the Contribution. No hardware per se is 55 | licensed hereunder. 56 | 57 | c) Recipient understands that although each Contributor grants the licenses 58 | to its Contributions set forth herein, no assurances are provided by any 59 | Contributor that the Program does not infringe the patent or other 60 | intellectual property rights of any other entity. Each Contributor disclaims 61 | any liability to Recipient for claims brought by any other entity based on 62 | infringement of intellectual property rights or otherwise. As a condition to 63 | exercising the rights and licenses granted hereunder, each Recipient hereby 64 | assumes sole responsibility to secure any other intellectual property rights 65 | needed, if any. For example, if a third party patent license is required to 66 | allow Recipient to distribute the Program, it is Recipient's responsibility 67 | to acquire that license before distributing the Program. 68 | 69 | d) Each Contributor represents that to its knowledge it has sufficient 70 | copyright rights in its Contribution, if any, to grant the copyright license 71 | set forth in this Agreement. 72 | 73 | 3. REQUIREMENTS 74 | 75 | A Contributor may choose to distribute the Program in object code form under 76 | its own license agreement, provided that: 77 | 78 | a) it complies with the terms and conditions of this Agreement; and 79 | 80 | b) its license agreement: 81 | 82 | i) effectively disclaims on behalf of all Contributors all warranties and 83 | conditions, express and implied, including warranties or conditions of title 84 | and non-infringement, and implied warranties or conditions of merchantability 85 | and fitness for a particular purpose; 86 | 87 | ii) effectively excludes on behalf of all Contributors all liability for 88 | damages, including direct, indirect, special, incidental and consequential 89 | damages, such as lost profits; 90 | 91 | iii) states that any provisions which differ from this Agreement are offered 92 | by that Contributor alone and not by any other party; and 93 | 94 | iv) states that source code for the Program is available from such 95 | Contributor, and informs licensees how to obtain it in a reasonable manner on 96 | or through a medium customarily used for software exchange. 97 | 98 | When the Program is made available in source code form: 99 | 100 | a) it must be made available under this Agreement; and 101 | 102 | b) a copy of this Agreement must be included with each copy of the Program. 103 | 104 | Contributors may not remove or alter any copyright notices contained within 105 | the Program. 106 | 107 | Each Contributor must identify itself as the originator of its Contribution, 108 | if any, in a manner that reasonably allows subsequent Recipients to identify 109 | the originator of the Contribution. 110 | 111 | 4. COMMERCIAL DISTRIBUTION 112 | 113 | Commercial distributors of software may accept certain responsibilities with 114 | respect to end users, business partners and the like. While this license is 115 | intended to facilitate the commercial use of the Program, the Contributor who 116 | includes the Program in a commercial product offering should do so in a 117 | manner which does not create potential liability for other Contributors. 118 | Therefore, if a Contributor includes the Program in a commercial product 119 | offering, such Contributor ("Commercial Contributor") hereby agrees to defend 120 | and indemnify every other Contributor ("Indemnified Contributor") against any 121 | losses, damages and costs (collectively "Losses") arising from claims, 122 | lawsuits and other legal actions brought by a third party against the 123 | Indemnified Contributor to the extent caused by the acts or omissions of such 124 | Commercial Contributor in connection with its distribution of the Program in 125 | a commercial product offering. The obligations in this section do not apply 126 | to any claims or Losses relating to any actual or alleged intellectual 127 | property infringement. In order to qualify, an Indemnified Contributor must: 128 | a) promptly notify the Commercial Contributor in writing of such claim, and 129 | b) allow the Commercial Contributor to control, and cooperate with the 130 | Commercial Contributor in, the defense and any related settlement 131 | negotiations. The Indemnified Contributor may participate in any such claim 132 | at its own expense. 133 | 134 | For example, a Contributor might include the Program in a commercial product 135 | offering, Product X. That Contributor is then a Commercial Contributor. If 136 | that Commercial Contributor then makes performance claims, or offers 137 | warranties related to Product X, those performance claims and warranties are 138 | such Commercial Contributor's responsibility alone. Under this section, the 139 | Commercial Contributor would have to defend claims against the other 140 | Contributors related to those performance claims and warranties, and if a 141 | court requires any other Contributor to pay any damages as a result, the 142 | Commercial Contributor must pay those damages. 143 | 144 | 5. NO WARRANTY 145 | 146 | EXCEPT AS EXPRESSLY SET FORTH IN THIS AGREEMENT, THE PROGRAM IS PROVIDED ON 147 | AN "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, EITHER 148 | EXPRESS OR IMPLIED INCLUDING, WITHOUT LIMITATION, ANY WARRANTIES OR 149 | CONDITIONS OF TITLE, NON-INFRINGEMENT, MERCHANTABILITY OR FITNESS FOR A 150 | PARTICULAR PURPOSE. Each Recipient is solely responsible for determining the 151 | appropriateness of using and distributing the Program and assumes all risks 152 | associated with its exercise of rights under this Agreement , including but 153 | not limited to the risks and costs of program errors, compliance with 154 | applicable laws, damage to or loss of data, programs or equipment, and 155 | unavailability or interruption of operations. 156 | 157 | 6. DISCLAIMER OF LIABILITY 158 | 159 | EXCEPT AS EXPRESSLY SET FORTH IN THIS AGREEMENT, NEITHER RECIPIENT NOR ANY 160 | CONTRIBUTORS SHALL HAVE ANY LIABILITY FOR ANY DIRECT, INDIRECT, INCIDENTAL, 161 | SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING WITHOUT LIMITATION 162 | LOST PROFITS), HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN 163 | CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) 164 | ARISING IN ANY WAY OUT OF THE USE OR DISTRIBUTION OF THE PROGRAM OR THE 165 | EXERCISE OF ANY RIGHTS GRANTED HEREUNDER, EVEN IF ADVISED OF THE POSSIBILITY 166 | OF SUCH DAMAGES. 167 | 168 | 7. GENERAL 169 | 170 | If any provision of this Agreement is invalid or unenforceable under 171 | applicable law, it shall not affect the validity or enforceability of the 172 | remainder of the terms of this Agreement, and without further action by the 173 | parties hereto, such provision shall be reformed to the minimum extent 174 | necessary to make such provision valid and enforceable. 175 | 176 | If Recipient institutes patent litigation against any entity (including a 177 | cross-claim or counterclaim in a lawsuit) alleging that the Program itself 178 | (excluding combinations of the Program with other software or hardware) 179 | infringes such Recipient's patent(s), then such Recipient's rights granted 180 | under Section 2(b) shall terminate as of the date such litigation is filed. 181 | 182 | All Recipient's rights under this Agreement shall terminate if it fails to 183 | comply with any of the material terms or conditions of this Agreement and 184 | does not cure such failure in a reasonable period of time after becoming 185 | aware of such noncompliance. If all Recipient's rights under this Agreement 186 | terminate, Recipient agrees to cease use and distribution of the Program as 187 | soon as reasonably practicable. However, Recipient's obligations under this 188 | Agreement and any licenses granted by Recipient relating to the Program shall 189 | continue and survive. 190 | 191 | Everyone is permitted to copy and distribute copies of this Agreement, but in 192 | order to avoid inconsistency the Agreement is copyrighted and may only be 193 | modified in the following manner. The Agreement Steward reserves the right to 194 | publish new versions (including revisions) of this Agreement from time to 195 | time. No one other than the Agreement Steward has the right to modify this 196 | Agreement. The Eclipse Foundation is the initial Agreement Steward. The 197 | Eclipse Foundation may assign the responsibility to serve as the Agreement 198 | Steward to a suitable separate entity. Each new version of the Agreement will 199 | be given a distinguishing version number. The Program (including 200 | Contributions) may always be distributed subject to the version of the 201 | Agreement under which it was received. In addition, after a new version of 202 | the Agreement is published, Contributor may elect to distribute the Program 203 | (including its Contributions) under the new version. Except as expressly 204 | stated in Sections 2(a) and 2(b) above, Recipient receives no rights or 205 | licenses to the intellectual property of any Contributor under this 206 | Agreement, whether expressly, by implication, estoppel or otherwise. All 207 | rights in the Program not expressly granted under this Agreement are 208 | reserved. 209 | 210 | This Agreement is governed by the laws of the State of New York and the 211 | intellectual property laws of the United States of America. No party to this 212 | Agreement will bring a legal action under this Agreement more than one year 213 | after the cause of action arose. Each party waives its rights to a jury trial 214 | in any resulting litigation. 215 | -------------------------------------------------------------------------------- /deps.edn: -------------------------------------------------------------------------------- 1 | {:paths ["src"] 2 | :deps {net.cgrand/xforms {:mvn/version "0.19.2"} 3 | clojure.java-time/clojure.java-time {:mvn/version "0.3.2"} 4 | meander/epsilon {:mvn/version "0.0.602"} 5 | ribelo/kemnath {:git/url "https://github.com/ribelo/kemnath" 6 | :sha "7fae2514a43c3de07a5a173319a9ee7065989dcf"} 7 | ribelo/stade {:git/url "https://github.com/ribelo/stade" 8 | :sha "5b01e968203d6d12e710ed5aef266f36e0588089"} 9 | ;; ribelo/kemnath {:local/root "../kemnath"} 10 | ;; ribelo/stade {:local/root "../stade"} 11 | }} 12 | -------------------------------------------------------------------------------- /readme.org: -------------------------------------------------------------------------------- 1 | #+TITLE: danzig 2 | 3 | a easy-to-use transducer based data analysis tools for the clojure programming 4 | language. 5 | 6 | * rationale 7 | 8 | any finitely complicated problem can be made infinitely complicated by a finite 9 | number of macros, so why not write macros that write (macro based)meander code that would generate transducers functions? 10 | 11 | ...wait, but why not just use a meander? 12 | 13 | because =meander.epsilon/scan= is slow, and because transducers are super 14 | composable and can be combined into endless sequences 15 | 16 | * usage examples 17 | 18 | #+begin_src clojure :results silent :exports code 19 | (require '[taoensso.encore :as enc]) 20 | (require '[ribelo.danzig :as dz :refer [=>>]]) 21 | 22 | (def data (vec (repeatedly 1000000 (fn [] {:a (* (rand-int 100) (if (enc/chance 0.5) 1 -1)) 23 | :b (* (rand-int 100) (if (enc/chance 0.5) 1 -1)) 24 | :c (* (rand-int 100) (if (enc/chance 0.5) 1 -1))})))) 25 | #+end_src 26 | 27 | 28 | ** why you should care? 29 | 30 | because it is super concise and pleasing to the eye 31 | 32 | #+begin_src clojure :results silent :exports code 33 | 34 | (defn q1 [] 35 | (into [] 36 | (comp 37 | (map (fn [{:keys [a b] :as m}] (assoc m :d (+ a b)))) 38 | (filter (fn [{:keys [c]}] (pos? c))) 39 | (map (fn [m] (update m :a inc))) 40 | (filter (fn [{:keys [a b c]}] (= a b c)))) 41 | data)) 42 | 43 | (defn q2 [] 44 | (=>> data 45 | (dz/with :d [+ :a :b]) 46 | (dz/where :c pos?) 47 | (dz/with :a [+ :a 1]) 48 | (dz/where [= :a :b :c]))) 49 | 50 | (= (q1) (q2)) 51 | ;; => true 52 | #+end_src 53 | 54 | because it is much faster than handwritten code 55 | 56 | #+begin_src clojure :results silent :exports code 57 | (enc/qb 1 (q1) (q2)) 58 | ;; => [309.46 145.71] - in ms 59 | #+end_src 60 | 61 | 62 | ** fat arrow 63 | 64 | the most basic function is the fat arrow which replaces the tread last 65 | #+begin_src clojure :results silent :exports code 66 | 67 | (=>> [1 2 3 4 5] 68 | (map inc) 69 | (filter even?)) 70 | ;; => [2 4 6] 71 | 72 | (macroexpand '(=>> [1 2 3 4 5] (map inc) (filter even?))) 73 | ;; => (clojure.core/into [] (ribelo.danzig/comp-some (map inc) (filter even?)) [1 2 3 4 5]) 74 | 75 | (=>> [1 2 3 4 5] 76 | (map inc) 77 | (when false 78 | (filter even?))) 79 | ;; => [2 3 4 5 6] 80 | 81 | #+end_src 82 | 83 | fat arrows can be mixed with other arrows 84 | #+begin_src clojure :results silent :exports code 85 | 86 | (=>> [1 2 3 4 5] 87 | (map inc) 88 | (->> (mapv inc))) 89 | ;; => [3 4 5 6 7] 90 | 91 | #+end_src 92 | 93 | you can also use first and last 94 | #+begin_src clojure :results silent :exports code 95 | 96 | (=>> [1 2 3 4 5] 97 | (map inc) 98 | first) 99 | ;; => 2 100 | 101 | (=>> [1 2 3 4 5] 102 | (map inc) 103 | (last)) 104 | ;; => 6 105 | #+end_src 106 | 107 | ** where 108 | 109 | where can take the function 110 | #+begin_src clojure :results silent :exports code 111 | 112 | (=>> data 113 | (dz/where (fn [{:keys [a]}] (= a 1))) 114 | (take 1)) 115 | ;; => [{:a 1, :b 87, :c -27}] 116 | 117 | #+end_src 118 | 119 | the assumption is that we have a collection of maps, so we can query the key value 120 | #+begin_src clojure :results silent :exports code 121 | 122 | (=>> data (dz/where :a 1) (take 1)) 123 | ;; => [{:a 1, :b 87, :c -27}] 124 | 125 | #+end_src 126 | 127 | if we need to search for a key, we must use ='= 128 | #+begin_src clojure :results silent :exports code 129 | 130 | (=>> [{:a :some/key} {:a :other/key}] (dz/where [= :a ':other/key])) 131 | ;; => [{:a :other/key}] 132 | 133 | #+end_src 134 | 135 | or keys 136 | #+begin_src clojure :results silent :exports code 137 | 138 | (=>> data (dz/where {:a 1 :b 1}) (take 1)) 139 | ;; => [{:a 1, :b 1, :c 74}] 140 | 141 | #+end_src 142 | 143 | or keys and functions 144 | #+begin_src clojure :results silent :exports code 145 | 146 | (=>> data (dz/where {:a even? :b odd?}) (take 1)) 147 | ;; => [{:a 40, :b 39, :c -76}] 148 | 149 | #+end_src 150 | 151 | we can use a vector, where the first argument is the function 152 | #+begin_src clojure :results silent :exports code 153 | 154 | (=>> data (dz/where [= :a :b :c]) (take 1)) 155 | ;; => [{:a 27, :b 27, :c 27}] 156 | (=>> data (dz/where [= :a 1]) (take 1)) 157 | ;; => [{:a 1, :b 87, :c -27}] 158 | 159 | #+end_src 160 | 161 | ask for the key that meets the condition 162 | #+begin_src clojure :results silent :exports code 163 | 164 | (=>> data (dz/where even? :a) (take 1)) 165 | ;; => [{:a -96, :b -84, :c -76}] 166 | 167 | (=>> data (dz/where :a even?) (take 1)) 168 | ;; => [{:a -96, :b -84, :c -76}] 169 | 170 | #+end_src 171 | 172 | square clojure is still clojure 173 | #+begin_src clojure :results silent :exports code 174 | 175 | (=>> data (dz/where [= [+ :a :b] :c]) (take 1)) 176 | ;; => [{:a 0, :b 2, :c 2}] 177 | (=>> data (dz/where [= [+ :a :b] [+ :c :a]]) (take 1)) 178 | ;; => [{:a 75, :b -43, :c -43}] 179 | 180 | #+end_src 181 | 182 | meander just works 183 | #+begin_src clojure :results silent :exports code :ns ribelo.danzig 184 | 185 | (=>> data (dz/where {:a ?x :b ?x :c ?x}) (take 1)) 186 | ;; => [{:a -32, :b -32, :c -32}] 187 | 188 | (require '[meander.epsilon :as m]) 189 | (=>> data (dz/where {:a (m/pred pos?)}) (take 1)) 190 | ;; => [{:a 92, :b -64, :c -96}] 191 | 192 | #+end_src 193 | 194 | is as fast as the fine-tuned hand-written code 195 | #+begin_src clojure :results silent :exports code 196 | 197 | (enc/qb 1 198 | (=>> data (filter (fn [{:keys [a]}] (= a 1)))) 199 | (=>> data (filter (fn [m] (= 1 (:a m))))) 200 | (=>> data (dz/where :a 1)) 201 | (=>> data (dz/where {:a 1}))) 202 | ;; => [81.88 54.14 48.77 52.16] 203 | 204 | #+end_src 205 | 206 | ** with 207 | 208 | you can change an individual value at =i= element 209 | #+begin_src clojure :results silent :exports code 210 | 211 | (=>> data (dz/with 0 :a 999) (take 1)) 212 | ;; => [{:a 999, :b 23, :c 32}] 213 | 214 | #+end_src 215 | 216 | a map can be used 217 | #+begin_src clojure :results silent :exports code 218 | 219 | (=>> data (dz/with 0 {:a 999 :b -999}) (take 1)) 220 | ;; => [{:a 999, :b -999, :c 32}] 221 | 222 | #+end_src 223 | 224 | function 225 | #+begin_src clojure :results silent :exports code 226 | 227 | (=>> data (dz/with :d (fn [{:keys [a b]}] (+ a b 10))) (take 1)) 228 | ;; => [{:a 24, :b 23, :c 32, :d 57}] 229 | 230 | #+end_src 231 | 232 | square clojure still behaves like clojure 233 | #+begin_src clojure :results silent :exports code 234 | 235 | (=>> data (dz/with :d [+ :a :b [- :c 10]]) (take 1)) 236 | ;; => [{:a 92, :b -64, :c -96, :d -78}] 237 | 238 | #+end_src 239 | 240 | a whole column can be added 241 | #+begin_src clojure :results silent :exports code 242 | 243 | (=>> data (dz/with :d 5) (take 3)) 244 | ;; => [{:a 24, :b 23, : c 32, :d 5} 245 | ;; {:a 53, :b 69, :c -99, :d 5} 246 | ;; {:a -4, :b 80, :c -16, :d 5}] 247 | 248 | #+end_src 249 | 250 | many things in one go 251 | #+begin_src clojure :results silent :exports code 252 | 253 | (=>> data (dz/with {:a 5 :b 10}) (take 1)) 254 | ;; => [{:a 5, :b 10, :c -69}] 255 | 256 | (=>> data (dz/with {:a [+ :a 1000] :b [+ :b 1000]}) (take 1)) 257 | ;; => [{:a 927, :b 905, :c -69}] 258 | #+end_src 259 | 260 | conditional with 261 | #+begin_src clojure :results silent :exports code 262 | 263 | (=>> data 264 | (dz/with 0 :a -999) 265 | (dz/with :when [= :a -999] {:a 999 :b 999 :c 999}) 266 | (dz/where :a 999) 267 | (dz/row-count)) 268 | ;; => [1] 269 | 270 | #+end_src 271 | 272 | ** aggregate 273 | wip 274 | ** group-by 275 | wip 276 | ** io 277 | wip 278 | -------------------------------------------------------------------------------- /shadow-cljs.edn: -------------------------------------------------------------------------------- 1 | ;; shadow-cljs configuration 2 | {:deps {:aliases [:shadow-cljs]} 3 | :nrepl {:port 7002} 4 | :builds {:repl {:target :node-script 5 | :main hanse.danzig.repl/main 6 | :output-to "dist/danzig.js"}}} 7 | -------------------------------------------------------------------------------- /src/ribelo/danzig.cljc: -------------------------------------------------------------------------------- 1 | (ns ribelo.danzig 2 | (:refer-clojure :exclude [set replace sort-by drop fill group-by merge update]) 3 | #?(:cljs (:require-macros [ribelo.danzig :refer [=>> +>> and-macro or-macro where]])) 4 | (:require 5 | [net.cgrand.xforms :as x] 6 | #?(:clj [java-time :as jt]) 7 | [meander.epsilon :as m] 8 | [ribelo.kemnath :as math] 9 | #?(:clj [ribelo.stade :as stats]) 10 | #?(:clj [ribelo.danzig.aggregate :as agg]))) 11 | 12 | (comment 13 | (do 14 | (require '[taoensso.encore :as enc]) 15 | (def data (vec (repeatedly 1000000 (fn [] {:a (* (rand-int 100) (if (enc/chance 0.5) 1 -1)) 16 | :b (* (rand-int 100) (if (enc/chance 0.5) 1 -1)) 17 | :c (* (rand-int 100) (if (enc/chance 0.5) 1 -1))})))))) 18 | 19 | (defn comp-some [& fns] 20 | (apply comp (filter identity fns))) 21 | 22 | (defmacro =>> [coll & forms] 23 | (loop [[form & forms] forms xfs [] threads [] output []] 24 | (if form 25 | (cond 26 | (= '[] form) 27 | (recur forms xfs threads output) 28 | ;; 29 | (= '{} form) 30 | (recur forms xfs threads {}) 31 | ;; 32 | (= '#{} form) 33 | (recur forms xfs threads #{}) 34 | ;; 35 | (and (list? form) (= '-> (first form))) 36 | (recur forms xfs (conj threads form) output) 37 | ;; 38 | (and (list? form) (= '->> (first form))) 39 | (recur forms xfs (conj threads form) output) 40 | ;; 41 | (or (= 'first form) (and (list? form) (= 'first (first form)))) 42 | (recur forms (conj xfs `(take 1)) (conj threads (-> form)) output) 43 | ;; 44 | (or (= 'last form) (and (list? form) ('first (first form)))) 45 | (recur forms (conj xfs `(x/take-last 1)) (conj threads (-> form)) output) 46 | ;; 47 | (= 'into (first form)) 48 | (recur forms xfs threads (second form)) 49 | ;; 50 | :else 51 | (recur forms (conj xfs form) threads output)) 52 | (let [xfs (->> xfs (remove nil?)) 53 | main `(into ~output ~@(when (not-empty xfs) `((comp-some ~@xfs))) ~coll)] 54 | (if (seq threads) 55 | (loop [[form & forms] threads r ['-> main]] 56 | (if form 57 | (recur forms (conj r form)) 58 | (reverse (into (list) r)))) 59 | main))))) 60 | 61 | (defmacro +>> [coll & body] `(=>> ~coll ~@body '(into {}))) 62 | 63 | #?(:clj 64 | (defn ^:private args->fn-body [args] 65 | (m/match args 66 | [?f (m/pred (some-fn keyword? string?) ?x) (m/pred (some-fn keyword? string?) ?y)] 67 | `(~?f (get ~'m ~?x) (get ~'m ~?y)) 68 | [?f (m/pred #(instance? java.util.regex.Pattern %) ?x) (m/pred (some-fn keyword? string?) ?y)] 69 | `(~?f ~?x (get ~'m ~?y)) 70 | [?f ?x ?y] 71 | `(~?f (get ~'m ~?x) ~?y)))) 72 | 73 | #?(:clj 74 | (defmacro and-macro [coll] 75 | (loop [[args & coll] coll 76 | r '()] 77 | (if args 78 | (let [body (args->fn-body args)] 79 | (recur coll (conj r body))) 80 | `(clojure.core/and ~@r))))) 81 | 82 | (comment 83 | (and-macro [:and [:a 1] [:b 2]])) 84 | 85 | #?(:clj 86 | (defmacro or-macro [coll] 87 | (loop [[args & coll] coll 88 | r '()] 89 | (if args 90 | (let [body (args->fn-body args)] 91 | (recur coll (conj r body))) 92 | `(clojure.core/or ~@r))))) 93 | 94 | (defn vecs->maps [ks] 95 | (m/match ks 96 | (m/pred map?) 97 | (map #(persistent! (reduce-kv (fn [acc i k] (assoc! acc k (nth % i ""))) (transient {}) ks))) 98 | 99 | (m/pred vector? ?x) 100 | (map #(zipmap ?x %)))) 101 | 102 | (comment 103 | (=>> [[0 0] [1 2] [3 4]] (vecs->maps {0 :a 1 :b})) 104 | ;; => [{:a 0, :b 0} {:a 1, :b 1} {:a 2, :b 2}] 105 | (=>> [[0 0] [1 1] [2 2]] (vecs->maps [:a :b]))) 106 | 107 | (defn row [ks vs] 108 | (m/match [ks vs] 109 | [(m/pred coll?) (m/pred coll?)] 110 | (zipmap ks vs))) 111 | 112 | (comment 113 | (row [:a :b :c] [1 2 3]) 114 | ;; => {:a 1, :b 2, :c 3} 115 | ) 116 | 117 | (defn- k->fn [k] 118 | (m/match k 119 | :sum (fn [& args] (reduce + args)) 120 | #?@(:clj [:mean stats/mean 121 | :max stats/max 122 | :min stats/min]) 123 | :abs math/abs 124 | :sq math/sq 125 | :sqrt math/sqrt 126 | :pow math/pow 127 | :root math/root 128 | (m/pred fn? ?fn) ?fn)) 129 | 130 | (defmacro where* [& args] 131 | (m/rewrite args 132 | ;; f 133 | ((m/pred list? ?f)) 134 | ?f 135 | ;; k v 136 | ((m/pred keyword? ?k) (m/and (m/not (m/pred (some-fn symbol? list?) ?v)) (m/some ?v))) 137 | (fn [m] (= ?v (m ?k))) 138 | ;; {?k ?v} 139 | ({:as ?m}) 140 | ~`(fn [m#] (m/find m# ~?m m#)) 141 | ;; ?k ?f 142 | ((m/pred keyword? ?k) 143 | (m/pred symbol? ?f)) 144 | (fn [m] (?f (m ?k))) 145 | ;; [?f1 [[?f2 & ?xs] | ?k/?v] ...] 146 | ([(m/pred (some-fn symbol? list?) ?f) . 147 | (m/and !args (m/or (m/pred vector?) (m/some))) ...]) 148 | ~(let [m (gensym 'map)] 149 | `(fn [~m] (~?f ~@(map (fn [x] 150 | (m/rewrite x 151 | [(m/pred (some-fn symbol? list?) ?f) . (m/cata !xs) ...] 152 | (?f & ~!xs) 153 | ;; 154 | (m/pred keyword? ?x) 155 | (~m ?x) 156 | ?x ?x)) !args)))) 157 | _ (throw (ex-info "non exhaustive pattern match" {})))) 158 | 159 | (defmacro where [& args] 160 | `(filter (where* ~@args))) 161 | 162 | (defmacro where-not [& args] 163 | `(remove (where* ~@args))) 164 | 165 | (defn row-count [] 166 | x/count) 167 | 168 | (defn column-count [] 169 | (comp (take 1) (mapcat keys) x/count)) 170 | 171 | (defn shape [] 172 | (comp (x/transjuxt [(row-count) (column-count)]) (mapcat identity))) 173 | 174 | (defn column-names [& args] 175 | (m/match args 176 | ;; all 177 | (m/or (::all) nil (m/pred empty? args)) 178 | (comp (take 1) (mapcat keys)) 179 | ;; regex 180 | ((m/pred #(instance? java.util.regex.Pattern %) ?x) & _) 181 | (comp (take 1) (mapcat keys) (filter #(re-find ?x (str %)))) 182 | ;; keys 183 | (!ks ...) 184 | (comp (take 1) (mapcat keys) (filter (into #{} !ks))))) 185 | 186 | (defn select-columns [& args] 187 | (m/match args 188 | ;; all 189 | (m/or (:all) nil (m/pred empty? args)) 190 | (map identity) 191 | ;; regex 192 | ((m/pred #(instance? java.util.regex.Pattern %) ?x) & _) 193 | (comp (x/transjuxt [(column-names ?x) (x/into [])]) 194 | (map (fn [[ks coll]] (=>> coll (select-columns ks))))) 195 | ;; ks ... 196 | ((m/pred (some-fn keyword? string?) !ks) ...) 197 | (map #(persistent! (reduce (fn [acc k] (assoc! acc k (get % k))) (transient {}) !ks))) 198 | ;; [ks] 199 | (?ks) 200 | (map #(persistent! (reduce (fn [acc k] (assoc! acc k (get % k))) (transient {}) ?ks))))) 201 | 202 | (defn rename-columns [m] 203 | (map #(clojure.set/rename-keys % m))) 204 | 205 | (defmacro with* [& args] 206 | (m/rewrite args 207 | ;; ?i ?k ?v 208 | ((m/pred integer? ?i) (m/pred keyword? ?k) ?v) 209 | (fn [i m] (if (= i ?i) (assoc m ?k ?v) m)) 210 | ;; ?i ?m 211 | ((m/pred integer? ?i) (m/pred map? ?m)) 212 | (fn [i m] (if (= i ?i) (clojure.core/merge m ?m) m)) 213 | ;; ?k ?f 214 | ((m/pred keyword? ?k) (m/pred (some-fn list? symbol?) ?fn)) 215 | (fn [m] (assoc m ?k (?fn m))) 216 | ;; ?k ?v 217 | ((m/pred keyword? ?k) (m/and (m/not (m/pred vector? ?v)) (m/some ?v))) 218 | (fn [m] (assoc m ?k ?v)) 219 | ;; ?k [?f1 [[?f2 & ?xs] | ?k/?v] ...] 220 | ((m/pred keyword? ?k) 221 | [(m/pred (some-fn symbol? list?) ?f) . 222 | (m/and !args (m/or (m/pred vector?) (m/some))) ...]) 223 | ~(let [m (gensym 'map)] 224 | `(fn [~m] (assoc ~m ~?k 225 | (~?f ~@(map (fn [x] 226 | (m/rewrite x 227 | [(m/pred (some-fn symbol? list?) ?f) . (m/cata !xs) ...] 228 | (?f & ~!xs) 229 | ;; 230 | (m/pred keyword? ?x) 231 | (~m ?x) 232 | ?x ?x)) !args))))) 233 | ;; {?k [?f . !ks]} 234 | ((m/and ?m (m/map-of (m/pred keyword? !ks) (m/pred vector? !args)))) 235 | ~`(comp ~@(map (fn [[k v]] `(with* ~k ~v)) ?m)) 236 | ;; {?k ?v} 237 | ((m/and ?m (m/map-of (m/pred keyword? !ks) 238 | (m/not (m/pred (some-fn vector? list? symbol?) !vs))))) 239 | ~`(fn [m#] 240 | (-> m# ~@(map (fn [[k v]] `(assoc ~k ~v)) ?m))) 241 | ;; {k f} ... 242 | ((m/and ?m (m/map-of (m/pred keyword? !ks) 243 | (m/pred (some-fn list? symbol?) !fns)))) 244 | ~(let [m (gensym 'map)] 245 | `(fn [~m] 246 | (-> ~m ~@(map (fn [[k f]] `(assoc ~k (~f ~m))) ?m)))) 247 | ;; :when ?pred ?x 248 | (:when ?pred & ?rest) 249 | ~`(fn [m#] (if ((where* ~?pred) m#) ((with* ~@?rest) m#) m#)) 250 | ;; 251 | _ (throw (ex-info "non exhaustive pattern match" {})))) 252 | 253 | (defmacro with [& args] 254 | (m/rewrite args 255 | ((m/pred number?) & _) 256 | ~`(map-indexed (with* ~@args)) 257 | _ 258 | ~`(map (with* ~@args)))) 259 | 260 | (defmacro aggregate [& arg] 261 | (m/rewrite arg 262 | ;; ... 263 | ;; {& [[!ks !fns] ...]} 264 | ({:as ?m}) 265 | ~(let [acc (gensym 'acc) 266 | k (gensym 'k) 267 | v (gensym 'v) 268 | f (gensym 'f)] 269 | `(x/transjuxt 270 | ~(reduce-kv 271 | (fn [acc k v] 272 | (println v) 273 | (let [f (m/match v 274 | (m/pred (some-fn list? symbol?) ?v) 275 | ?v 276 | ;; 277 | (m/pred keyword? ?k) 278 | `((agg/agg->fn ~v) ~k) 279 | ;; 280 | [(m/pred keyword ?k) (m/pred keyword? ?f)] 281 | `((agg/agg->fn ~?f) ~?k) 282 | ;; 283 | [(m/pred keyword ?k) (m/pred (some-fn list? symbol?) ?f)] 284 | (comp (map ?k) ?f) 285 | )] 286 | (assoc acc k f))) 287 | {} 288 | ?m))) 289 | ;; (x/transjuxt 290 | ;; (persistent! 291 | ;; (reduce-kv 292 | ;; (fn [acc k f] 293 | ;; (let [f (m/match f 294 | ;; (m/pred fn? ?f) ?f 295 | ;; (m/pred keyword? ?k) ((agg->fn ?k) k) 296 | ;; [(m/pred keyword? ?k) (m/pred keyword? ?f)] ((agg->fn ?f) ?k) 297 | ;; [(m/pred keyword? ?k) (m/pred fn? ?f)] (comp (map ?k) ?f))] 298 | ;; (assoc! acc k f))) 299 | ;; (transient {}) 300 | ;; ?m))) 301 | ;; [!ks !fns ...]]} 302 | ;; [(m/pred (some-fn fn? keyword?) !ks) (m/pred (some-fn fn? keyword?) !fns) ...] 303 | ;; (x/transjuxt 304 | ;; (persistent! 305 | ;; (reduce 306 | ;; (fn [acc [k f]] 307 | ;; (conj! acc ((agg->fn f) k))) 308 | ;; (transient []) 309 | ;; (m/subst [[!ks !fns] ...])))) 310 | )) 311 | 312 | (defmacro group-by [& args] 313 | (m/rewrite args 314 | ;; ?k/?f 315 | ((m/pred (some-fn list? symbol? keyword?) ?f)) 316 | (x/by-key ?f (x/into [])) 317 | ;; ?k/?f ?xf 318 | ((m/pred (some-fn list? symbol? keyword?) ?f) 319 | (m/pred (some-fn list? symbol?) ?xf)) 320 | (x/by-key ?f (if (keyword? ?xf) ((agg/agg->fn ?xf) ?f) ?xf)) 321 | ;; ?k/?f ?agg 322 | ((m/pred (some-fn list? symbol? keyword?) ?f) 323 | (m/pred keyword? ?agg)) 324 | (x/by-key ?f ((agg/agg->fn ?agg) ?f)) 325 | ;; ?k/?f {& [[?k ?f] ...]} 326 | ((m/pred (some-fn list? symbol? keyword?) ?f) (m/pred map? ?m)) 327 | (x/by-key ?f (agg/aggregate ?m)) 328 | ;; [?k ?j] ?xf 329 | ([(m/pred (some-fn list? symbol? keyword?) !fs) ...] 330 | (m/pred (some-fn list? symbol?) ?xf)) 331 | (x/by-key ~`(juxt ~@!fs) ?xf) 332 | ;; [?k/?f ?j/?f] ?map 333 | ([(m/pred (some-fn list? symbol? keyword?) !fs) ...] 334 | (m/pred map? ?m)) 335 | (x/by-key ~`(juxt ~@!fs) (agg/aggregate ?m)) 336 | ;; ?f [[ks fns] ...] 337 | ((m/pred (some-fn list? symbol? keyword?)?f) 338 | (m/pred vector? ?coll)) 339 | (x/by-key ?f (agg/aggregate ?coll)) 340 | ;; ... 341 | (?k [!coll ...] '...) 342 | (comp (group-by ?k (into [?k :first] !coll)) (map second)) 343 | ;; 344 | (?k {:as ?m} '...) 345 | (comp (group-by ?k (assoc ?m ?k :first)) (map second)) 346 | )) 347 | 348 | (comment 349 | (=>> [{:a 1 :c 1} {:a 1 :c 2} {:a 2 :c 3} {:a 2 :c 4}] (group-by :a {:c :sum})) 350 | (=>> data (group-by :a {:c :sum 351 | :b :sum})) 352 | (=>> data (group-by :a {:c :sum 353 | :b :sum} '...)) 354 | (=>> data (group-by :a [:c :sum :b :sum])) 355 | (=>> data (group-by :a [:c :sum :b :sum] '...)) 356 | (=>> data (x/by-key :a [(comp (map :a) (x/into []))])) 357 | [(enc/qb 1 (=>> data (group-by :a {:b-mean [:b :mean] 358 | :b-sum [:b :sum] 359 | :c-mean [:c :mean] 360 | :c-sum [:c :sum]}))) 361 | (enc/qb 1 (->> data 362 | (clojure.core/group-by :a) 363 | (mapv (fn [[v coll]] 364 | (let [b-coll (->> coll (mapv :b)) 365 | c-coll (->> coll (mapv :c))] 366 | [v {:b-mean (hanse.rostock.stats/mean b-coll) 367 | :b-sum (->> b-coll (reduce +)) 368 | :c-mean (hanse.rostock.stats/mean c-coll) 369 | :c-sum (->> c-coll (reduce +))}])))))] 370 | ;; => [515.28 531.06]) 371 | ) 372 | 373 | (comment 374 | (=>> data (group-by :a (x/into [])))) 375 | 376 | (comment 377 | (into [] (group-by [:a :b] {:a :sum}) data)) 378 | 379 | (comment 380 | (=>> data (group-by [:a :b] (x/into [])))) 381 | 382 | (defn value-counts 383 | ([] 384 | (x/by-key identity x/count)) 385 | ([k] 386 | (comp (map k) (value-counts)))) 387 | 388 | (comment 389 | (into {} (value-counts) data) 390 | (into {} (value-counts :a) data)) 391 | 392 | #?(:clj 393 | (defn- keyword->freq [[n k]] 394 | (case k 395 | :ms (jt/millis n) 396 | :s (jt/seconds n) 397 | :min (jt/minutes n) 398 | :t (jt/minutes n) 399 | :h (jt/hours n) 400 | :d (jt/days n) 401 | :m (jt/months n) 402 | :y (jt/years n)))) 403 | 404 | #?(:clj 405 | (defn- as-day-freq [freq {:keys [key fill] :or {key :date fill []}}] 406 | (let [fill (conj fill key)] 407 | (comp 408 | (x/sort-by key) 409 | (fn [rf] 410 | (let [freq (keyword->freq freq) 411 | lst (volatile! ::none)] 412 | (fn 413 | ([] (rf)) 414 | ([acc] (rf acc)) 415 | ([acc x] 416 | (if (identical? @lst ::none) 417 | (do (vreset! lst (select-keys x fill)) 418 | (rf acc x)) 419 | (let [last-date (jt/plus (get @lst key) freq) 420 | date (get x key) 421 | dts (take-while #(jt/before? % date) 422 | (jt/iterate jt/plus last-date freq)) 423 | tmps (reduce (fn [acc d] (conj acc (assoc @lst key d))) 424 | [] 425 | dts)] 426 | (vreset! lst (reduce (fn [acc k] (assoc acc k (get x k))) 427 | {} 428 | fill)) 429 | (vreset! lst (select-keys x fill)) 430 | (reduce (fn [acc v] (rf acc v)) acc tmps) 431 | (rf acc x))))))))))) 432 | 433 | #?(:clj 434 | (defn- as-month-freq [freq {:keys [key fill] :or {key :date fill []}}] 435 | (let [fill (conj fill key)] 436 | (comp 437 | (x/sort-by key) 438 | (fn [rf] 439 | (let [lst (volatile! ::none)] 440 | (fn 441 | ([] (rf)) 442 | ([acc] (rf acc)) 443 | ([acc x] 444 | (if (identical? @lst ::none) 445 | (do (vreset! lst (select-keys x fill)) 446 | (rf acc (clojure.core/update x key #(jt/adjust % :last-day-of-month)))) 447 | (let [last-date (jt/adjust (jt/plus (get @lst key) (jt/months 1)) :last-day-of-month) 448 | date (jt/adjust (get x key) :last-day-of-month) 449 | dts (take-while #(jt/before? % date) 450 | (iterate #(-> % 451 | (jt/plus (jt/months 1)) 452 | (jt/adjust :last-day-of-month)) 453 | last-date)) 454 | tmps (reduce (fn [acc d] (conj acc (assoc @lst key d))) 455 | [] 456 | dts)] 457 | (vreset! lst (reduce (fn [acc k] (assoc acc k (get x k))) 458 | {} 459 | fill)) 460 | (reduce (fn [acc v] (rf acc v)) acc tmps) 461 | (rf acc (clojure.core/update x key #(jt/adjust % :last-day-of-month))))))))))))) 462 | 463 | #?(:clj 464 | (defn- as-year-freq [freq {:keys [key fill] :or {key :date fill []}}] 465 | (let [fill (conj fill key)] 466 | (comp 467 | (x/sort-by key) 468 | (fn [rf] 469 | (let [lst (volatile! ::none)] 470 | (fn 471 | ([] (rf)) 472 | ([acc] (rf acc)) 473 | ([acc x] 474 | (if (identical? @lst ::none) 475 | (do (vreset! lst (select-keys x fill)) 476 | (rf acc (clojure.core/update x key #(jt/adjust % :last-day-of-year)))) 477 | (let [last-date (jt/adjust (jt/plus (get @lst key) (jt/months 1)) :last-day-of-year) 478 | date (jt/adjust (get x key) :last-day-of-year) 479 | dts (take-while #(jt/before? % date) 480 | (iterate #(-> % 481 | (jt/plus (jt/months 1)) 482 | (jt/adjust :last-day-of-year)) 483 | last-date)) 484 | tmps (reduce (fn [acc d] (conj acc (assoc @lst key d))) 485 | [] 486 | dts)] 487 | (vreset! lst (reduce (fn [acc k] (assoc acc k (get x k))) 488 | {} 489 | fill)) 490 | (reduce (fn [acc v] (rf acc v)) acc tmps) 491 | (rf acc (clojure.core/update x key #(jt/adjust % :last-day-of-year))))))))))))) 492 | 493 | (defmulti asfreq (fn [[_ k] & _] k)) 494 | 495 | (defmethod asfreq :d 496 | ([freq opts] 497 | (as-day-freq freq opts)) 498 | ([freq] 499 | (as-day-freq freq {}))) 500 | 501 | (defmethod asfreq :m 502 | ([freq opts] 503 | (as-month-freq freq opts)) 504 | ([freq] 505 | (as-month-freq freq {}))) 506 | 507 | (defmethod asfreq :y 508 | ([freq opts] 509 | (as-year-freq freq opts)) 510 | ([freq] 511 | (as-year-freq freq {}))) 512 | 513 | (comment 514 | (into [] (comp (asfreq [1 :d] {:fill [:a]})) 515 | [{:date (jt/local-date "2019-01-01") :a 1} 516 | {:date (jt/local-date "2019-01-06") :a 2} 517 | {:date (jt/local-date "2019-01-03") :a 3}]) 518 | (into [] (comp (asfreq [1 :m] {:fill [:a]})) 519 | [{:date (jt/local-date "2019-01-01") :a 1} 520 | {:date (jt/local-date "2019-03-06") :a 2} 521 | {:date (jt/local-date "2019-06-03") :a 3}])) 522 | 523 | (defn ->month-series 524 | "Convert series smalest than monht to a month series" 525 | ([k] 526 | (comp 527 | (x/by-key (fn [m] (jt/as (get m k) :year :month-of-year)) 528 | (x/take-last 1)) 529 | (map second) 530 | (x/sort-by k))) 531 | ([k data] 532 | (loop [[m & data] data 533 | p nil 534 | r (transient [])] 535 | (if m 536 | (if (and p (not= (.getMonthValue ^java.time.LocalDate (get p k)) 537 | (.getMonthValue ^java.time.LocalDate (get m k)))) 538 | (recur data m (conj! r p)) 539 | (recur data m r)) 540 | (persistent! (conj! r p)))))) 541 | 542 | (defn ->year-series 543 | "Convert series smalest than year to a year series" 544 | ([k] 545 | (comp 546 | (x/by-key (fn [m] (jt/as (get m k) :year)) 547 | (x/take-last 1)) 548 | (map second) 549 | (x/sort-by k))) 550 | ([k data] 551 | (loop [[m & data] data 552 | p nil 553 | r (transient [])] 554 | (if m 555 | (if (and p (not= (.getYear ^java.time.LocalDate (get p k)) 556 | (.getYear ^java.time.LocalDate (get m k)))) 557 | (recur data m (conj! r p)) 558 | (recur data m r)) 559 | (persistent! (conj! r p)))))) 560 | 561 | (defn window 562 | ([n] 563 | (x/partition n 1)) 564 | ([n step] 565 | (x/partition n step)) 566 | ([n step xform] 567 | (x/partition n step xform))) 568 | 569 | (comment 570 | (into [] (window 3) (range 10)) 571 | (into [] (window 3 10) (range 100)) 572 | (into [] (window 10 1 x/avg) (range 100))) 573 | 574 | (defn rolling [n xform] 575 | (comp 576 | (window n 1 xform) 577 | (x/into (vec (repeat (dec n) nil))) 578 | (mapcat identity))) 579 | 580 | (comment 581 | (into [] (rolling 10 x/avg) (range 100)) 582 | (into [] (rolling 10 10 (x/reduce +)) (range 100)) 583 | (into [] (window 10 1 x/avg) (range 10))) 584 | 585 | (defn head 586 | ([] (head 5)) 587 | ([n] (take n))) 588 | 589 | (comment 590 | (into [] (head) data)) 591 | 592 | (defn tail 593 | ([] (tail 5)) 594 | ([n] (x/take-last n))) 595 | 596 | (comment 597 | (into [] (tail) data)) 598 | -------------------------------------------------------------------------------- /src/ribelo/danzig/aggregate.clj: -------------------------------------------------------------------------------- 1 | (ns ribelo.danzig.aggregate 2 | (:refer-clojure :exclude [first last min max count]) 3 | (:require 4 | [net.cgrand.xforms :as x] 5 | [meander.epsilon :as m] 6 | [ribelo.kemnath :as math] 7 | [ribelo.stade :as stats])) 8 | 9 | (defn map->rfs [k rf] 10 | (comp (map k) rf)) 11 | 12 | (defn first [k] 13 | (map->rfs k (take 1))) 14 | 15 | (defn last [k] 16 | (map->rfs k x/last)) 17 | 18 | (defn min [k] 19 | (map->rfs k (stats/min))) 20 | 21 | (defn max [k] 22 | (map->rfs k (stats/max))) 23 | 24 | (defn count [k] 25 | (map->rfs k x/count)) 26 | 27 | (defn sum [k] 28 | (map->rfs k (x/reduce +))) 29 | 30 | (defn mean [k] 31 | (map->rfs k (stats/mean))) 32 | 33 | (defn median [k] 34 | (map->rfs k (stats/median))) 35 | 36 | (defn std [k] 37 | (map->rfs k (stats/std))) 38 | 39 | (defn quantile [k p] 40 | (map->rfs k (stats/quantile p))) 41 | 42 | (defn percentile [k p] 43 | (map->rfs k (stats/percentile p))) 44 | 45 | (defn iqr [k] 46 | (map->rfs k (stats/iqr))) 47 | 48 | (defn variance [k] 49 | (map->rfs k (stats/variance))) 50 | 51 | (defn covariance [k] 52 | (map->rfs k (stats/covariance))) 53 | 54 | (defn flatten [k] 55 | (map->rfs k (mapcat identity))) 56 | 57 | (defn into-vec [k] 58 | (map->rfs k (x/into []))) 59 | 60 | (defn into-map [k] 61 | (map->rfs k (x/into {}))) 62 | 63 | (defn into-set [k] 64 | (map->rfs k (x/into #{}))) 65 | 66 | (defn round [k] 67 | (map->rfs k (map math/round))) 68 | 69 | (defn round2 [k] 70 | (map->rfs k (map math/round2))) 71 | 72 | (defn agg->fn [k] 73 | (m/match k 74 | :first first 75 | :last last 76 | :min min 77 | :max max 78 | :count count 79 | :sum sum 80 | :mean mean 81 | :median median 82 | :std std 83 | :quantile quantile 84 | :percentile percentile 85 | :iqr iqr 86 | :variance variance 87 | :covariance covariance 88 | :flatten flatten 89 | :into-vec into-vec 90 | :into-map into-map 91 | :into-set into-set 92 | 93 | (m/pred keyword? ?k) (map ?k) 94 | 95 | (m/pred (some-fn fn? list? symbol?) ?fn) 96 | ?fn)) 97 | -------------------------------------------------------------------------------- /src/ribelo/danzig/io.cljc: -------------------------------------------------------------------------------- 1 | (ns ribelo.danzig.io 2 | (:require 3 | [net.cgrand.xforms :as x] 4 | #?(:clj [net.cgrand.xforms.io :as xio]) 5 | #?(:clj [java-time :as jt]) 6 | [ribelo.danzig :as dz :refer [=>> vecs->maps comp-some]] 7 | [meander.epsilon :as m] 8 | [clojure.string :as str])) 9 | 10 | (defn ^:private dtype->fn [x] 11 | (m/match x 12 | :string str 13 | :keyword keyword 14 | :long #(Long/parseLong ^String %) 15 | :double #(Double/parseDouble ^String %) 16 | :date #(jt/local-date ^String %) 17 | :datetime #(jt/local-date-time ^String %) 18 | [:date ?y] #(jt/local-date ^String ?y ^String %) 19 | [:datetime ?y] #(jt/local-date-time ^String ?y ^String %) 20 | (m/pred fn? ?x) ?x 21 | nil identity)) 22 | 23 | (defn ^:private add-header 24 | ([x] 25 | (add-header x {})) 26 | ([x opts] 27 | (m/match [x opts] 28 | [(m/pred int? ?i) 29 | {:keywordize-keys (m/pred (some-fn nil? boolean?) ?keywordize) 30 | :key-fn (m/pred (some-fn nil? fn?) ?key-fn)}] 31 | (comp 32 | (x/transjuxt {:xs (comp (drop (inc ?i)) (x/into [])) 33 | :headers (comp-some 34 | (when (>= ?i 1) (drop ?i)) 35 | (take 1) 36 | (when ?keywordize 37 | (map #(map keyword %))) 38 | (map #(map vector % (range))))}) 39 | (mapcat (fn [{:keys [xs headers]}] 40 | (into [] (vecs->maps (into {} headers)) xs)))) 41 | [(m/with [%p1 (m/or [!ks & (m/or (m/pred vector? !vs) (m/app vector !vs))] 42 | (m/and (m/pred (some-fn keyword? string?) !ks) (m/let [!vs []]))) 43 | %p2 (m/seqable [!xs %p1] ...)] 44 | %p2) _] 45 | (comp-some 46 | (vecs->maps (zipmap !xs !ks)) 47 | (when (some seq !vs) 48 | (dz/update 49 | (persistent! 50 | (reduce 51 | (fn [acc [k fns]] 52 | (if (seq fns) 53 | (assoc! acc k (apply comp (reverse (map dtype->fn fns)))) 54 | acc)) 55 | (transient {}) 56 | (map vector !ks !vs))))))))) 57 | 58 | (defn remove-quote [q] 59 | (fn [rf] 60 | (fn 61 | ([] (rf)) 62 | ([acc] (rf acc)) 63 | ([acc x] 64 | (let [x (reduce-kv 65 | (fn [acc k v] 66 | (assoc acc k (str/trim (str/replace v (re-pattern q) "")))) 67 | {} 68 | x)] 69 | (rf acc x)))))) 70 | 71 | (defn lines-in 72 | ([path] 73 | (lines-in path {})) 74 | ([path opts] 75 | (apply xio/lines-in path (flatten (vec opts))))) 76 | 77 | #?(:clj 78 | (defn read-csv 79 | ([path {:keys [sep quote header-row? header encoding parse] 80 | :or {sep "," 81 | header-row? false 82 | encoding "utf-8" 83 | keywordize-headers? false} 84 | :as opts}] 85 | (=>> (xio/lines-in path :encoding encoding) 86 | (when header-row? (drop 1)) 87 | (map #(clojure.string/split % (re-pattern sep))) 88 | (when header (add-header header opts)) 89 | (when quote (remove-quote quote)) 90 | (when parse 91 | (map (fn [m] 92 | (reduce-kv 93 | (fn [acc k v] 94 | (update acc k (dtype->fn v))) 95 | m parse)))))) 96 | ([path] 97 | (read-csv path {})))) 98 | 99 | #?(:clj 100 | (defn to-csv 101 | ([path data {:keys [sep add-headers? add-index? encoding format] 102 | :or {sep "," 103 | add-headers? true 104 | add-index? false 105 | encoding "utf8"}}] 106 | (xio/lines-out 107 | path 108 | (comp-some 109 | (map ;; stringify keys 110 | (fn [m] 111 | (persistent! 112 | (reduce-kv 113 | (fn [acc k v] 114 | (assoc! acc (str k) v)) 115 | (transient {}) 116 | m)))) 117 | (when add-index? 118 | (map-indexed #(merge {" idx" %1} %2))) 119 | (if add-headers? 120 | (comp 121 | (x/transjuxt {:xs (comp (map #(into (sorted-map) %)) (map vals) (x/into [])) 122 | :headers (comp (take 1) (map #(into (sorted-map) %)) (map keys))}) 123 | (mapcat (fn [{:keys [xs headers]}] 124 | (into [headers] xs)))) 125 | (comp 126 | (map #(into (sorted-map) %)) 127 | (map vals))) 128 | (when format 129 | (map (fn [m] 130 | (reduce-kv 131 | (fn [acc k f] 132 | (update acc k f)) 133 | m format)))) 134 | (map #(clojure.string/join sep %))) 135 | data 136 | :encoding encoding)) 137 | ([path data] 138 | (to-csv path data {})))) 139 | -------------------------------------------------------------------------------- /test/ribelo/danzig_test.cljc: -------------------------------------------------------------------------------- 1 | (ns ribelo.danzig-test 2 | (:require [ribelo.danzig :as dz :refer [=>>]] 3 | #?(:clj [clojure.test :as t] 4 | :cljs [cljs.test :as t :include-macros true]))) 5 | 6 | (comment 7 | (def data [0 1 2])) 8 | 9 | (t/deftest comp-some 10 | (let [data [0 1 2]] 11 | (t/is 12 | (= [1 2 3] 13 | (into [] (dz/comp-some (map inc) (when false (map dec))) data))))) 14 | 15 | (t/deftest fat-thread-last 16 | (let [data [0 1 2]] 17 | (t/is 18 | (= [1 2 3] 19 | (=>> data (map inc)))) 20 | (t/is 21 | (= [1 2 3] 22 | (=>> data (map inc) (when false (map dec))))) 23 | (t/is 24 | (= 1 25 | (=>> data (map inc) first))) 26 | (t/is 27 | (= 3 28 | (=>> data (map inc) last))) 29 | (t/is 30 | (= #{1 2 3} 31 | (=>> data (map inc) (into #{})))) 32 | (t/is 33 | (= #{1 2 3} 34 | (=>> data (map inc) (into #{})))))) 35 | 36 | (t/deftest vecs->maps 37 | (t/is 38 | (= [{:a 0, :b 0} {:a 1, :b 2} {:a 3, :b 4}] 39 | (=>> [[0 0] [1 2] [3 4]] (dz/vecs->maps {0 :a 1 :b})) 40 | (=>> [[0 0] [1 2] [3 4]] (dz/vecs->maps [:a :b]))))) 41 | 42 | (t/deftest row 43 | (t/is 44 | (= {:a 0, :b 0} 45 | (dz/row [:a :b] [0 0])))) 46 | 47 | (comment 48 | (def data [{:a -1 :b 0 :c -1} {:a 0 :b 1 :c 1} {:a 1 :b 2 :c 3}])) 49 | 50 | (t/deftest where 51 | (let [data [{:a -1 :b 0 :c -1} {:a 0 :b 1 :c 1} {:a 1 :b 2 :c 3}]] 52 | (t/testing "?f" 53 | (t/is 54 | (= [{:a -1, :b 0 :c -1}] 55 | (=>> data (dz/where (fn [{:keys [a]}] (= a -1))))))) 56 | (t/testing "?k ?v" 57 | (t/is 58 | (= [{:a -1, :b 0 :c -1}] 59 | (=>> data (dz/where :a -1))))) 60 | (t/testing "{?k ?v}" 61 | (t/is 62 | (= [{:a -1, :b 0 :c -1}] 63 | (=>> data (dz/where {:a -1}))))) 64 | (t/testing "?k ?f" 65 | (t/is 66 | (= [{:a -1, :b 0 :c -1}] 67 | (=>> data (dz/where :a neg?))))) 68 | (t/testing "[?f1 [[?f2 & ?xs] | ?k/?v] ...]" 69 | (t/is 70 | (= [{:a -1, :b 0 :c -1}] 71 | (=>> data (dz/where [= :a :c])))) 72 | (t/is 73 | (= [{:a -1, :b 0 :c -1}] 74 | (=>> data (dz/where [= :a -1])))) 75 | (t/is 76 | (= [{:a :some/key}] 77 | (=>> [{:a :some/key} {:a :another/key}] (dz/where [= :a ':some/key])))) 78 | (t/is 79 | (= [{:a -1, :b 0, :c -1} {:a 0, :b 1, :c 1} {:a 1, :b 2, :c 3}] 80 | (=>> data (dz/where [= [+ :a :b] :c])))) 81 | (t/is 82 | (= [{:a 0, :b 1, :c 1}] 83 | (=>> data (dz/where [= [+ :a :b] [+ :a :c]])))) 84 | (t/is 85 | (= [{:a 0, :b 1, :c 1}] 86 | (=>> data (dz/where [= [+ :a :b] [+ [+ :a :c] [- :b 1]]]))))))) 87 | 88 | (t/deftest column-names 89 | (let [data [{:a -1 :b 0 :c -1} {:a 0 :b 1 :c 1} {:a 1 :b 2 :c 3}]] 90 | (t/is [:a :b :c] 91 | (=>> data (dz/column-names ::dz/all))) 92 | (t/is [:a :b :c] 93 | (=>> data (dz/column-names :a))))) 94 | 95 | (t/deftest select-column 96 | (let [data [{:a -1 :b 0 :c -1} {:a 0 :b 1 :c 1} {:a 1 :b 2 :c 3}]] 97 | (t/is [{:a -1 :b 0 :c -1} {:a 0 :b 1 :c 1} {:a 1 :b 2 :c 3}] 98 | (=>> data (dz/select-columns :all))) 99 | (t/is [{:a -1, :b 0} {:a 0, :b 1} {:a 1, :b 2}] 100 | (=>> data (dz/select-columns :a :b))) 101 | (t/is [{:a -1, :b 0} {:a 0, :b 1} {:a 1, :b 2}] 102 | (=>> data (dz/select-columns #"a|b"))))) 103 | 104 | (t/deftest with 105 | (let [data [{:a -1 :b 0 :c -1} {:a 0 :b 1 :c 1} {:a 1 :b 2 :c 3}]] 106 | (t/testing "?i ?k ?v" 107 | (t/is 108 | (= [{:a 999 :b 0 :c -1} {:a 0 :b 1 :c 1} {:a 1 :b 2 :c 3}] 109 | (=>> data (dz/with 0 :a 999))))) 110 | (t/testing "?i ?m" 111 | (t/is 112 | (= [{:a 999 :b 0 :c -1} {:a 0 :b 1 :c 1} {:a 1 :b 2 :c 3}] 113 | (=>> data (dz/with 0 {:a 999}))))) 114 | (t/testing "?k ?f" 115 | (t/is 116 | (= [{:a -1, :b 0, :c -1, :d 9} {:a 0, :b 1, :c 1, :d 11} {:a 1, :b 2, :c 3, :d 13}] 117 | (=>> data (dz/with :d (fn [{:keys [a b]}] (+ a b 10))))))) 118 | (t/testing "?k ?v" 119 | (t/is 120 | (= [{:a -1, :b 0, :c -1, :d 5} {:a 0, :b 1, :c 1, :d 5} {:a 1, :b 2, :c 3, :d 5}] 121 | (=>> data (dz/with :d 5))))) 122 | (t/testing "?k [?f . !ks]" 123 | (t/is 124 | (= [{:a -1, :b 0, :c -1, :d 9} {:a 0, :b 1, :c 1, :d 11} {:a 1, :b 2, :c 3, :d 13}] 125 | (=>> data (dz/with :d [+ :a :b 10]))))) 126 | (t/testing "?k [?f1 [[?f2 & ?xs] | ?k/?v] ...]}" 127 | (t/is 128 | (= [{:a 0, :b 0, :c -1} {:a 1, :b 1, :c 1} {:a 2, :b 2, :c 3}] 129 | (=>> data (dz/with :a [+ :a 1])))) 130 | (t/is 131 | (= [{:a -1, :b 0, :c -1} {:a 1, :b 1, :c 1} {:a 3, :b 2, :c 3}] 132 | (=>> data (dz/with :a [+ :a :b]))))) 133 | (t/testing "{?k [?f1 [[?f2 & ?xs] | ?k/?v] ...]}" 134 | (t/is 135 | (= [{:a 0, :b 5, :c -1} {:a 1, :b 6, :c 1} {:a 2, :b 7, :c 3}] 136 | (=>> data (dz/with {:a [+ :a 1] :b [+ :b 5]})))) 137 | (t/is 138 | (= [{:a 0, :b 1, :c -1} {:a 1, :b 0, :c 1} {:a 2, :b -1, :c 3}] 139 | (=>> data (dz/with {:a [+ :a 1] :b [- :b :c]}))))) 140 | (t/testing "{?k ?v}" 141 | (t/is 142 | (= [{:a 5, :b 10, :c -1} {:a 5, :b 10, :c 1} {:a 5, :b 10, :c 3}] 143 | (=>> data (dz/with {:a 5 :b 10}))))) 144 | (t/testing "{?k ?f}" 145 | (t/is 146 | (= [{:a 0, :b 5, :c -1} {:a 1, :b 6, :c 1} {:a 2, :b 7, :c 3}] 147 | (=>> data (dz/with {:a (fn [{:keys [a]}] (+ a 1)) 148 | :b (fn [{:keys [b]}] (+ b 5))}))))) 149 | (t/testing ":when ?pred & ?rest" 150 | (t/is 151 | (= [{:a 0, :b 5, :c -1} {:a 1, :b 6, :c 1} {:a 2, :b 7, :c 3}] 152 | (=>> data (dz/with :when [= :a 0] :a 999))))))) 153 | 154 | (comment 155 | (require '[ribelo.danzig.aggregate :as agg]) 156 | (def data [{:a -1 :b [0 1 2] :c 0} {:a -1 :b [1 2 3] :c 0} {:a 1 :b [2 3 4] :c 1}])) 157 | 158 | (t/deftest aggregate 159 | (let [data [{:a -1 :b [0 1 2] :c 0} {:a -1 :b [1 2 3] :c 0} {:a 1 :b [2 3 4] :c 1}]] 160 | (t/testing "?k/?f" 161 | (t/is 162 | (= [{:a -1, :b 18, :c 1}] 163 | (=>> data (dz/aggregate {:a :sum 164 | :b (comp (mapcat :b) (x/reduce +)) 165 | :c :sum}))))) 166 | )) 167 | 168 | (t/deftest group-by 169 | (let [data [{:a -1 :b [0 1 2] :c 0} {:a -1 :b [1 2 3] :c 0} {:a 1 :b [2 3 4] :c 1}]] 170 | (t/testing "?k/?f" 171 | (t/is 172 | (= [[-1 [{:a -1, :b [0 1 2], :c 0} {:a -1, :b [1 2 3], :c 0}]] [1 [{:a 1, :b [2 3 4], :c 1}]]] 173 | (=>> data (dz/group-by :a))))) 174 | (t/testing "?k/?f ?xf" 175 | (t/is 176 | (= [[-1 [{:a -1, :b [0 1 2], :c 0} {:a -1, :b [1 2 3], :c 0}]] [1 [{:a 1, :b [2 3 4], :c 1}]]] 177 | (=>> data (dz/group-by :a (x/into []))))) 178 | (t/is 179 | (= [[-1 -2] [1 1]] 180 | (=>> data (dz/group-by :a (comp (map :a) (x/reduce +)))))) 181 | (t/is 182 | (= [[-1 -2] [1 1]] 183 | (=>> data (dz/group-by :a (agg/sum :a))))) 184 | (t/is 185 | (= [[-1 -2] [1 1]] 186 | (=>> data (dz/group-by :a :sum))))) 187 | (t/testing "[!ks/!fs] ?xf" 188 | (t/is 189 | (= [[[-1 0] [{:a -1, :b [0 1 2], :c 0} {:a -1, :b [1 2 3], :c 0}]] [[1 1] [{:a 1, :b [2 3 4], :c 1}]]] 190 | (=>> data (dz/group-by [:a :c] (x/into []))))) 191 | ))) 192 | --------------------------------------------------------------------------------