├── .gitignore
├── .travis.yml
├── LICENSE
├── README.md
├── doc
└── intro.md
├── java
└── dataframe
│ ├── Lists.java
│ └── TableBuilder.java
├── profiles.clj
├── project.clj
├── src
└── dataframe
│ ├── core.clj
│ ├── frame.clj
│ ├── series.clj
│ └── util.clj
└── test
└── dataframe
├── core_test.clj
├── frame_test.clj
├── pipeline_test.clj
├── series_test.clj
└── util_test.clj
/.gitignore:
--------------------------------------------------------------------------------
1 |
2 | /target
3 | /classes
4 | /checkouts
5 | pom.xml
6 | pom.xml.asc
7 | *.jar
8 | *.class
9 | /.lein-*
10 | /.nrepl-port
11 | .hgignore
12 | .hg/
13 | .idea
14 | *.iml
15 |
16 | pom.xml
17 | pom.xml.asc
18 | *jar
19 | /lib/
20 | /classes/
21 | /target/
22 | /checkouts/
23 | .lein-deps-sum
24 | .lein-repl-history
25 | .lein-plugins/
26 | .lein-failures
27 | .nrepl-port
28 |
29 |
--------------------------------------------------------------------------------
/.travis.yml:
--------------------------------------------------------------------------------
1 | language: clojure
2 |
3 | script: lein expectations
4 |
--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
1 | MIT License
2 |
3 | Copyright (c) 2016 George Herbert Lewis
4 |
5 | Permission is hereby granted, free of charge, to any person obtaining a copy
6 | of this software and associated documentation files (the "Software"), to deal
7 | in the Software without restriction, including without limitation the rights
8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9 | copies of the Software, and to permit persons to whom the Software is
10 | furnished to do so, subject to the following conditions:
11 |
12 | The above copyright notice and this permission notice shall be included in all
13 | copies or substantial portions of the Software.
14 |
15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21 | SOFTWARE.
22 |
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 | # dataframe
2 |
3 | [](https://travis-ci.org/ghl3/dataframe)
4 |
5 | DataFrames for Clojure (inspired by Python's Pandas)
6 |
7 |
8 | The dataframe package contains two core data structures:
9 |
10 | - A Series is a map of index keys to values. It is ordered and supports O(1) lookup of values by index as well as O(1) lookup of values by positional offset (based on the order of the index).
11 | - A Frame is a map of column names to column values, which are represented as Series, each with an identical index. A Frame may also be thought of as a map of index keys to maps, where each map is a row of a Frame that maps column names to the value in that row.
12 |
13 |
14 |
15 | Series
16 | ======
17 |
18 | A series can be thought of as a 1-D vector of data with an index (vector of keys) for every value. The keys are typically either integers or clojure Keywords, but can be any value. Any of values may be nil, but the non-nil values must all be of the same type.
19 |
20 | When iterated over, a Series is a collection of pairs of `[index value]`.
21 |
22 | | index | val |
23 | |-------|-----|
24 | | :a | 10 |
25 | | :b | 20 |
26 | | :c | 30 |
27 | | :d | 40 |
28 |
29 |
30 | To create a Series, pass a sequence of values and an index sequence to the constructor function:
31 |
32 | ```clojure
33 |
34 | (require '[dataframe.core :as df])
35 |
36 | (def srs (df/series [1 2 3] [:a :b :c]))
37 | srs
38 | ```
39 |
40 |
41 | => class dataframe.series.Series
42 | :a 1
43 | :b 2
44 | :c 3
45 |
46 |
47 | DataFrame core has a number of functions for operating on or manipulating Series objects.
48 |
49 | ```clojure
50 | (df/ix srs :b)
51 | ; 2
52 |
53 | (df/values srs)
54 | ; [1 2 3]
55 | ```
56 |
57 | One can apply arithmetic operations on a Series which return Series objects. These operations obey broadcast rules: You may combine a primitive with a series which will apply the operation to every element of a series and return a new series with the same index as the first. Or, you may apply a row-by-row operation on two series (if their indices exactly align):
58 |
59 | ```clojure
60 | (df/add 1 srs)
61 | ```
62 |
63 |
64 | => class dataframe.series.Series
65 | :a 2
66 | :b 3
67 | :c 4
68 |
69 |
70 | ```clojure
71 | (df/eq 2 srs)
72 | ```
73 |
74 |
75 | => class dataframe.series.Series
76 | :a false
77 | :b true
78 | :c false
79 |
80 |
81 | ```clojure
82 | (df/add (series [1 2 3]) (series [10 20 30]))
83 | ```
84 |
85 |
86 | => class dataframe.series.Series
87 | 0 11
88 | 1 22
89 | 2 33
90 |
91 |
92 |
93 | Frames
94 | ======
95 |
96 | Frames are aligned collections of column-names to Series.
97 |
98 | When iterated over, a Frame is a collection of pairs of indexes to maps of rows: `[index {col->val}]`.
99 |
100 |
101 | | columns: | :a | :b | :c |
102 | |----------|----|----|-----|
103 | | index | | | |
104 | | :x | 10 | 2 | 100 |
105 | | :y | 20 | 4 | 300 |
106 | | :z | 30 | 6 | 600 |
107 |
108 |
109 |
110 | There are a number of equivalent ways to create a DataFrame. These all use the `dataframe.core/frame` constructor function. These ways are:
111 |
112 | - Pass a map of column names to column values as well as an optional index (if no index is passed, then a standard index of integers starting at 0 will be used). The column values can either be sequences or they can be Series objects, but must all have the same length.
113 |
114 |
115 | ```clojure
116 |
117 | (require '[dataframe.core :as df])
118 |
119 | (def frame (df/frame {:a [1 2 3] :b [10 20 30]} [:x :y :z]))
120 | frame
121 | ```
122 |
123 | => class dataframe.frame.Frame
124 | :a :b
125 | :x 1 10
126 | :y 2 20
127 | :z 3 30
128 |
129 |
130 | Here, `:a` and `:b` are the names of the columns and the index over rows is `[:x :y :z]`.
131 |
132 | - Pass a list of pairs of index keys and rows-as-maps.
133 |
134 | ```clojure
135 | (def frame (df/frame [[:x {:a 1 :b 10}]
136 | [:y {:a 2 :b 20}]
137 | [:z {:a 3 :b 30}]]))
138 | frame
139 | ```
140 |
141 | => class dataframe.frame.Frame
142 | :a :b
143 | :x 1 10
144 | :y 2 20
145 | :z 3 30
146 |
147 |
148 | - Pass a list of maps and an optional index sequence:
149 |
150 | ```clojure
151 | (def frame (df/frame [{:a 1 :b 10}
152 | {:a 2 :b 20}
153 | {:a 3 :b 30}]
154 | [:x :y :z]))
155 | frame
156 | ```
157 |
158 | => class dataframe.frame.Frame
159 | :a :b
160 | :x 1 10
161 | :y 2 20
162 | :z 3 30
163 |
164 |
165 |
166 | Selecting
167 | =========
168 |
169 | DataFrame core contains a number of functions for selecting specific subsets and items from Series and Frames.
170 |
171 | We've already seen the `ix` function, which selects either a single value from a Series or a single row-map from a Frame.
172 |
173 | ```clojure
174 | (ix (df/series [1 2 3] [:x :y :z]) :x)
175 | ;1
176 | ```
177 |
178 | ```clojure
179 | (ix (df/frame [{:a 1 :b 10}
180 | {:a 2 :b 20}
181 | {:a 3 :b 30}]
182 | [:x :y :z]))
183 | :x)
184 | ;{:a 1 :b 10}
185 | ```
186 |
187 | The `loc` function allows one to select a subset of the input Series or Frame consisting of a list of index values.
188 |
189 |
190 | ```clojure
191 | (loc (df/series [1 2 3] [:x :y :z]) [:x :y])
192 | ```
193 |
194 | => class dataframe.series.Series
195 | :x 1
196 | :y 2
197 |
198 |
199 |
200 | ```clojure
201 | (loc (df/frame [{:a 1 :b 10}
202 | {:a 2 :b 20}
203 | {:a 3 :b 30}]
204 | [:x :y :z]))
205 | [:x :y])
206 | ```
207 |
208 | => class dataframe.frame.Frame
209 | :a :b
210 | :x 1 10
211 | :y 2 20
212 |
213 |
214 |
215 | In addition to the index-based location, one can select values/rows using a Series of boolean values (the index of this series must align to the index of the Series or Frame)
216 |
217 |
218 |
219 | ```clojure
220 | (df/select (df/series [1 2 3] [:x :y :z])
221 | (df/series [true false true] [:x :y :z]))
222 | ```
223 |
224 | => class dataframe.series.Series
225 | :x 1
226 | :z 3
227 |
228 |
229 |
230 | ```clojure
231 | (df/select (df/frame [{:a 1 :b 10}
232 | {:a 2 :b 20}
233 | {:a 3 :b 30}]
234 | [:x :y :z]))
235 | (df/series [true false true] [:x :y :z]))
236 | ```
237 |
238 | => class dataframe.frame.Frame
239 | :a :b
240 | :x 1 10
241 | :z 3 30
242 |
243 |
244 |
245 | Grouping
246 | ========
247 |
248 | The `group-by` function takes a Frame and a series whose index
249 | is aligned with the Frame's index and returns a map of
250 | values to Frames. Each Frame is grouped by the value in the
251 | input index.
252 |
253 | ```clojure
254 |
255 | (def data (df/frame [{:a 1 :b 10}
256 | {:a 2 :b 20}
257 | {:a 3 :b 30}]
258 | [:x :y :z]))
259 |
260 | (df/group-by data (df/series [:foo :foo :bar] [:x :y :z]))
261 | ```
262 |
263 | One can also group by a function of each row using the `group-by-fn` function. This function should take the row as a map of column names to values and return a single value that represents the group value for that row:
264 |
265 | ```clojure
266 |
267 | (def data (df/frame [{:a 1 :b 10}
268 | {:a 2 :b 20}
269 | {:a 3 :b 30}]
270 | [:x :y :z]))
271 |
272 | (df/group-by-fn data (fn [row] (+ (:a row) (:b row))))
273 | ```
274 |
275 | Joining
276 | =======
277 |
278 | To DataFrames may be joined together. Dataframe supports inner, left, right, and outer joins, which are performed using the index of the two dataframes.
279 |
280 |
281 |
282 | ```clojure
283 |
284 | (def left (df/frame [{:a 1 :b 10}
285 | {:a 2 :b 20}
286 | {:a 3 :b 30}]
287 | [:x :y :z]))
288 |
289 | (def right (df/frame [{:c 100 :d "Foo"}
290 | {:c 200 :d "Bar"}
291 | {:c 300 :d "Baz"}]
292 | [:w :x :y]))
293 |
294 | (df/join left right :how :outer)
295 | ```
296 |
297 |
298 | => class dataframe.frame.Frame
299 | :b :a :c :d
300 | :x 10 1 200 Bar
301 | :y 20 2 300 Baz
302 | :z 30 3 nil nil
303 | :w nil nil 100 Foo
304 |
305 |
306 |
307 | Transforming
308 | ============
309 |
310 |
311 | DataFrame core has a number of functions for operating on or manipulating Frames.
312 |
313 | ```clojure
314 | (def frame (df/frame [[:x {:a 1 :b 10}]
315 | [:y {:a 2 :b 20}]
316 | [:z {:a 3 :b 30}]]))
317 |
318 | (df/ix frame :x)
319 | ;=> class dataframe.series.Series
320 | ;:b 10
321 | ;:a 1
322 |
323 | (df/col frame :a)
324 | ;=> class dataframe.series.Series
325 | ;:x 1
326 | ;:y 2
327 | ;:z 3
328 |
329 |
330 | (df/assoc-col frame :c (df/add (df/col frame :a) (df/col frame :b)))
331 | ;=> class dataframe.frame.Frame
332 | ; :b :a :c
333 | ;:x 10 1 11
334 | ;:y 20 2 22
335 | ;:z 30 3 33
336 |
337 | ```
338 |
339 | To make manipulating Frames easier, dataframe introduces the `with->` macro, which combines Clojure's threading macro with notation for easily accessing the column of a Frame. This macro takes a Frame and threads it through a series of operations. In doing so, when it encounters a symbol of the form `$col`, it knows to replace it with a reference to a column in the dataframe whose name is the keyword `:col` (for this reason, it is preferred to use keywords as column names).
340 |
341 |
342 | ```clojure
343 |
344 | (require '[dataframe.core :refer :all])
345 |
346 | (def my-df (frame {:a [1 2 3] :b [10 20 30]}))
347 |
348 | (with-> my-df
349 | (assoc-col :c (add $a 5))
350 | (assoc-col :d (add $b $c)))
351 | ```
352 |
353 | => class dataframe.frame.Frame
354 | :a :b :c :d
355 | 0 1 10 6 16
356 | 1 2 20 7 27
357 | 2 3 30 8 38
358 |
359 |
360 |
361 | Notice how the uses of `$a`, `$b`, and `$c` are replaced by the corresponding columns, as Series objects, in the dataframe pipeline above. This allows us to leverage functions that act on Series objects to transform these columns and to use them to update the Frame object.
362 |
363 | These pipelines can be arbitrarily complicated:
364 |
365 | ```clojure
366 |
367 | (def my-df (frame [[:w {:a 0 :b 8}]
368 | [:x {:a 1 :b 2}]
369 | [:y {:a 2 :b 4}]
370 | [:z {:a 3 :b 8}]]))
371 |
372 | (with-> my-df
373 | (select (and (lte $a 2) (gte $b 4)))
374 | (assoc-col :c (add $a $b))
375 | (map-rows->df (fn [row] {:foo (+ (:a row) (:c row))
376 | :bar (- (:b row) (:c row))}))
377 | (sort-rows :foo :bar)
378 | head)
379 | ```
380 |
381 |
382 | => class dataframe.frame.Frame
383 | :bar :foo
384 | :y -2 8
385 | :w 0 8
386 | :z -3 14
387 |
388 |
389 |
390 |
391 | DataFrame is distributed under the MIT license
392 |
393 | Copyright © 2016 George Herbert Lewis
394 |
395 |
--------------------------------------------------------------------------------
/doc/intro.md:
--------------------------------------------------------------------------------
1 | # Introduction to dataframe
2 |
3 | TODO: write [great documentation](http://jacobian.org/writing/what-to-write/)
4 |
--------------------------------------------------------------------------------
/java/dataframe/Lists.java:
--------------------------------------------------------------------------------
1 | package dataframe;
2 |
3 | import java.util.ArrayList;
4 | import java.util.Collection;
5 | import java.util.Collections;
6 | import java.util.Iterator;
7 |
8 | public class Lists {
9 |
10 | public static ArrayList newArrayList() {
11 | return new ArrayList();
12 | }
13 |
14 | public static ArrayList newArrayList(E... elements) {
15 | int capacity = computeArrayListCapacity(elements.length);
16 | ArrayList list = new ArrayList(capacity);
17 | Collections.addAll(list, elements);
18 | return list;
19 | }
20 |
21 | public static ArrayList newArrayList(Iterable extends E> elements) {
22 | return (elements instanceof Collection)
23 | ? new ArrayList(cast(elements))
24 | : newArrayList(elements.iterator());
25 | }
26 |
27 | public static ArrayList newArrayList(Iterator extends E> elements) {
28 | ArrayList list = newArrayList();
29 | addAll(list, elements);
30 | return list;
31 | }
32 |
33 | public static boolean addAll(Collection addTo, Iterator extends T> iterator) {
34 | boolean wasModified = false;
35 | while (iterator.hasNext()) {
36 | wasModified |= addTo.add(iterator.next());
37 | }
38 | return wasModified;
39 | }
40 |
41 | static Collection cast(Iterable iterable) {
42 | return (Collection) iterable;
43 | }
44 |
45 | static int computeArrayListCapacity(int arraySize) {
46 | return saturatedCast(5L + arraySize + (arraySize / 10));
47 | }
48 |
49 | public static int saturatedCast(long value) {
50 | if (value > Integer.MAX_VALUE) {
51 | return Integer.MAX_VALUE;
52 | }
53 | if (value < Integer.MIN_VALUE) {
54 | return Integer.MIN_VALUE;
55 | }
56 | return (int) value;
57 | }
58 | }
59 |
--------------------------------------------------------------------------------
/java/dataframe/TableBuilder.java:
--------------------------------------------------------------------------------
1 | package dataframe;
2 |
3 | import org.apache.commons.lang3.StringUtils;
4 |
5 | import java.util.ArrayList;
6 | import java.util.LinkedList;
7 | import java.util.List;
8 |
9 | public class TableBuilder {
10 |
11 | private final String[] header;
12 |
13 | private List rows;
14 |
15 | static final String COLUMN_SEPARATOR = " ";
16 |
17 |
18 | public TableBuilder(String indexName, Iterable columns) {
19 |
20 | List names = Lists.newArrayList();
21 |
22 | names.add(formatObject(indexName));
23 |
24 | for (Object col : columns) {
25 | names.add(formatObject(col));
26 | }
27 |
28 | String[] header = new String[names.size()];
29 | header = names.toArray(header);
30 | this.header = header;
31 |
32 | rows = new LinkedList();
33 | }
34 |
35 | public TableBuilder(Iterable columns) {
36 | this("idx", columns);
37 | }
38 |
39 | public TableBuilder addRow(Object idx, List row) {
40 |
41 | assert row.size() == this.header.length-1;
42 |
43 | List formattedRow = new ArrayList();
44 |
45 | formattedRow.add(formatObject(idx));
46 |
47 | for (Object item: row) {
48 | formattedRow.add(formatObject(item));
49 | }
50 |
51 | String[] cols = new String[row.size()];
52 | cols = formattedRow.toArray(cols);
53 |
54 | rows.add(cols);
55 |
56 | return this;
57 | }
58 |
59 | public static String formatObject(Object o) {
60 | if (o == null) {
61 | return "nil";
62 | } else {
63 | return o.toString();
64 | }
65 | }
66 |
67 | private int totalWidth() {
68 | int total = 0;
69 | for (int w: colWidths()) {
70 | total += w + COLUMN_SEPARATOR.length();
71 | }
72 | return total;
73 | }
74 |
75 | private int[] colWidths() {
76 |
77 | int numCols = header.length;
78 |
79 | int[] widths = new int[numCols];
80 |
81 | for(int colNum = 0; colNum < header.length; colNum++) {
82 | widths[colNum] = header[colNum].length();
83 | }
84 |
85 | for(String[] row : rows) {
86 | for(int colNum = 0; colNum < row.length; colNum++) {
87 | widths[colNum] = Math.max(widths[colNum], StringUtils.length(row[colNum]));
88 | }
89 | }
90 |
91 | return widths;
92 | }
93 |
94 | static void addLine(StringBuilder buf,
95 | String[] line,
96 | int[] colWidths) {
97 | for(int colNum = 0; colNum < line.length; colNum++) {
98 | buf.append(
99 | StringUtils.leftPad(
100 | StringUtils.defaultString(
101 | line[colNum]), colWidths[colNum]));
102 | buf.append(COLUMN_SEPARATOR);
103 | }
104 |
105 | buf.append('\n');
106 | }
107 |
108 | @Override
109 | public String toString() {
110 |
111 | StringBuilder buf = new StringBuilder();
112 |
113 | int[] colWidths = colWidths();
114 |
115 | addLine(buf, this.header, colWidths);
116 |
117 | for(String[] row: this.rows) {
118 | addLine(buf, row, colWidths);
119 | }
120 |
121 | return buf.toString();
122 | }
123 | }
124 |
125 |
--------------------------------------------------------------------------------
/profiles.clj:
--------------------------------------------------------------------------------
1 | {:dev {:dependencies [[expectations "2.1.9"]]
2 |
3 | :plugins [[jonase/eastwood "0.2.3"]
4 | [lein-cljfmt "0.5.6" :exclusions [org.clojure/clojure]]
5 | [lein-expectations "0.0.8" :exclusions [org.clojure/clojure]]]
6 |
7 | ; Generate docs
8 | :codox {:output-path "resources/codox"
9 | :metadata {:doc/format :markdown}
10 | :source-uri "http://github.com/lendup/citadel/blob/master/{filepath}#L{line}"}
11 |
12 | :eastwood {:exclude-namespaces [:test-paths]}
13 |
14 | ; Format code
15 | :cljfmt {:indents
16 | {require [[:block 0]]
17 | ns [[:block 0]]
18 | #"^(?!:require|:import).*" [[:inner 0]]}}}
19 | }
20 |
--------------------------------------------------------------------------------
/project.clj:
--------------------------------------------------------------------------------
1 | (defproject dataframe "0.1.0-SNAPSHOT"
2 |
3 | :description "DataFrames in clojure"
4 |
5 | :url "http://example.com/FIXME"
6 |
7 | :license {:name "MIT License"
8 | :url "http://www.opensource.org/licenses/mit-license.php"}
9 |
10 | :dependencies [
11 | [org.clojure/clojure "1.8.0"]
12 | [net.mikera/core.matrix "0.54.0"]
13 | [net.mikera/vectorz-clj "0.45.0"]
14 |
15 | [org.apache.commons/commons-lang3 "3.0"]
16 |
17 | [expectations "2.1.8"]
18 |
19 | ]
20 |
21 | :source-paths ["src"]
22 |
23 | :java-source-paths ["java"]
24 |
25 | :plugins [[lein-expectations "0.0.7"]]
26 |
27 | )
28 |
--------------------------------------------------------------------------------
/src/dataframe/core.clj:
--------------------------------------------------------------------------------
1 | (ns dataframe.core
2 | (:refer-clojure :exclude [group-by])
3 | (:require [dataframe.series]
4 | [dataframe.frame])
5 | (:import (dataframe.series Series)
6 | (dataframe.frame Frame)))
7 |
8 |
9 | ; Multi Methods
10 |
11 | (defn delegate
12 | "Deligate the implementation of a multimethod to an existing function"
13 | [multifn dispatch-val f]
14 | (.. multifn (addMethod dispatch-val f)))
15 |
16 | (defn first-type
17 | [& args]
18 | (type (first args)))
19 |
20 | (defmulti ix first-type)
21 | (delegate ix Series dataframe.series/ix)
22 | (delegate ix Frame dataframe.frame/ix)
23 |
24 | (defmulti index first-type)
25 | (delegate index Series dataframe.series/index)
26 | (delegate index Frame dataframe.frame/index)
27 |
28 | (defmulti set-index first-type)
29 | (delegate set-index Series dataframe.series/set-index)
30 | (delegate set-index Frame dataframe.frame/set-index)
31 |
32 | (defmulti select first-type)
33 | (delegate select Series dataframe.series/select)
34 | (delegate select Frame dataframe.frame/select)
35 |
36 | (defmulti loc first-type)
37 | (delegate loc Series dataframe.series/loc)
38 | (delegate loc Frame dataframe.frame/loc)
39 |
40 | (defmulti subset first-type)
41 | (delegate subset Series dataframe.series/subset)
42 | (delegate subset Frame dataframe.frame/subset)
43 |
44 | (defmulti head first-type)
45 | (delegate head Series dataframe.series/head)
46 | (delegate head Frame dataframe.frame/head)
47 |
48 | (defmulti tail first-type)
49 | (delegate tail Series dataframe.series/tail)
50 | (delegate tail Frame dataframe.frame/tail)
51 |
52 |
53 | ; Imported series methods
54 |
55 | (def series dataframe.series/series)
56 | (def series? dataframe.series/series?)
57 | (def values dataframe.series/values)
58 | (def update-key dataframe.series/update-key)
59 | (def mapvals dataframe.series/mapvals)
60 |
61 | (def lt dataframe.series/lt)
62 | (def lte dataframe.series/lte)
63 | (def gt dataframe.series/gt)
64 | (def gte dataframe.series/gte)
65 | (def add dataframe.series/add)
66 | (def sub dataframe.series/sub)
67 | (def mul dataframe.series/mul)
68 | (def div dataframe.series/div)
69 | (def eq dataframe.series/eq)
70 | (def neq dataframe.series/neq)
71 |
72 |
73 | ; Imported frame methods
74 |
75 | (def frame dataframe.frame/frame)
76 | (def col dataframe.frame/col)
77 | (def column-map dataframe.frame/column-map)
78 | (def columns dataframe.frame/columns)
79 | (def assoc-ix dataframe.frame/assoc-ix)
80 | (def assoc-col dataframe.frame/assoc-col)
81 | (def iterrows dataframe.frame/iterrows)
82 | (def maprows->srs dataframe.frame/maprows->srs)
83 | (def maprows->df dataframe.frame/maprows->df)
84 | (def sort-rows dataframe.frame/sort-rows)
85 | (def group-by dataframe.frame/group-by)
86 | (def group-by-fn dataframe.frame/group-by-fn)
87 | (def join dataframe.frame/join)
88 |
89 | (defmacro with-> [& args] `(dataframe.frame/with-> ~@args))
90 |
--------------------------------------------------------------------------------
/src/dataframe/frame.clj:
--------------------------------------------------------------------------------
1 | (ns dataframe.frame
2 | (:refer-clojure :exclude [group-by])
3 | (:require [dataframe.series :as series]
4 | [clojure.string :as str]
5 | [dataframe.series :as series]
6 | [dataframe.util :refer :all]
7 | [clojure.set :as set]
8 | [clojure.core :as core])
9 | (:import (java.util Map)
10 | (dataframe TableBuilder)))
11 |
12 | (declare frame
13 | assoc-ix
14 | assoc-col
15 | iterrows
16 | columns
17 | print-row
18 | rows->vectors
19 | set-index
20 | -seq->frame
21 | -list-of-row-maps->frame
22 | -list-of-index-row-pairs->frame
23 | -map->frame
24 | -map-of-series->frame
25 | -map-of-sequence->frame)
26 |
27 | ; A Frame can be interpreted as:
28 | ; - A Map of index keys to maps of values
29 | ; - A Map of column names to Series as columns
30 | ;
31 | ; A Frame supports
32 | ; - Order 1 lookup of row maps by index key
33 | ; - Order 1 lookup of [index row] pairs by position (nth)
34 | ; - Order 1 lookup of columns by name
35 | ;
36 | ; A Frame does not guarantee column order
37 | ;
38 | ; As viewed as a Clojure PersistentColleection, it is a
39 | ; collection of [index row] pairs, where a row is a map
40 | ; of [column val] pairs (for the purpose of seq and cons).
41 | ; As viewed as an association, it is a map from index
42 | ; keys to row maps.
43 | (deftype Frame [index column-map]
44 |
45 | java.lang.Object
46 | (equals [this other]
47 | (cond (nil? other) false
48 | (not (= Frame (class other))) false
49 | :else (and
50 | (= (. this index) (. other index))
51 | (= (. this column-map) (. other column-map)))))
52 | (hashCode [this]
53 | (hash [(hash (. this index)) (hash (. this column-map))]))
54 |
55 | java.lang.Iterable
56 | (iterator [this]
57 | (.iterator (iterrows this)))
58 |
59 | clojure.lang.Counted
60 | (count [this] (count index))
61 |
62 | clojure.lang.IPersistentCollection
63 | (seq [this] (if (empty? index)
64 | nil
65 | (iterrows this)))
66 | ;Takes a vector pair of [idx row],
67 | ;where row is a map, and returns a
68 | ;Frame extended by one row."
69 | (cons [this other]
70 | (assert vector? other)
71 | (assert (= 2 (count other)))
72 | (assert map? (last other))
73 | (let [[idx m] other]
74 | (assoc-ix this idx m)))
75 | (empty [this] (empty? index))
76 | (equiv [this other] (.. this (equals other))))
77 |
78 | ; It has an index for row-wise lookups
79 | (defn frame
80 | "Create a Frame from on of the following inputs:
81 |
82 | - A map of column keys to sequences representing column values
83 | - A map of column keys to Series reprsenting column values
84 | - A sequence of index keys to maps representing rows
85 | "
86 | ([data index] (set-index (frame data) index))
87 | ([data]
88 | (cond
89 | (map? data) (-map->frame data)
90 | (seq? data) (-seq->frame data)
91 | (vector? data) (-seq->frame data)
92 | :else (throw (new Exception "Encountered unexpected type for frame constructor")))))
93 |
94 |
95 | (def empty-frame (Frame. [] {}))
96 |
97 | (defmethod print-method Frame
98 | [df writer]
99 | (.write writer (str (class df) "\n"))
100 | (if (empty? df)
101 | (.write writer "[]")
102 | (.write writer
103 | (let [table (new TableBuilder "" (columns df))]
104 | (doall (for [[idx row] (rows->vectors df)]
105 | (. table (addRow idx row))))
106 | (. table toString)))))
107 |
108 |
109 | (defn ^{:protected true} -map->frame
110 | [^Map data-map]
111 |
112 | ; Ensure all values have the same length
113 | (if (not (empty? data-map))
114 | (assert (apply = (map count (vals data-map)))))
115 |
116 | (let [k->srs (into {}
117 | (for [[k xs] data-map]
118 | (if (series/series? xs)
119 | [k xs]
120 | [k (series/series xs)])))]
121 |
122 | (-map-of-series->frame k->srs)))
123 |
124 | (defn ^{:protected true} -map-of-series->frame
125 | "Takes a map of column keys to Series objects
126 | representing column values.
127 | Return a Frame."
128 | [map-of-srs]
129 |
130 | ; Assert all the indices are aligned
131 | (if (not (empty? map-of-srs))
132 | (assert (apply = (map series/index (vals map-of-srs)))))
133 |
134 | (if (empty? map-of-srs)
135 | (Frame. [] {})
136 |
137 | (let [any-index (series/index (nth (vals map-of-srs) 0))]
138 | (Frame. any-index map-of-srs))))
139 |
140 | (defn ^{:protected true} -seq->frame
141 | "Take a list of either maps
142 | (each representing a row)
143 | or pairs of index->maps.
144 | Return a Frame."
145 | [s]
146 | (if (map? (first s))
147 | (-list-of-row-maps->frame s)
148 | (-list-of-index-row-pairs->frame s)))
149 |
150 | (defn ^{:protected true} -list-of-row-maps->frame
151 | "Take a list of maps (each representing a row
152 | with keys as columns and vals as row values)
153 | and return a Frame"
154 | [row-maps]
155 | (let [index (range (count row-maps))
156 | columns (into #{} (flatten (map keys row-maps)))
157 | col->vec (into {} (for [col columns]
158 | [col (vec (map #(get % col nil) row-maps))]))
159 | col->srs (into {} (for [[col vals] col->vec]
160 | [col (series/series vals index)]))]
161 |
162 | (-map-of-series->frame col->srs)))
163 |
164 | (defn ^{:protected true} -list-of-index-row-pairs->frame
165 | "Take a list of pairs
166 | of index values to row-maps
167 | and return a Frame."
168 | [seq-of-idx->maps]
169 |
170 | (let [index (into [] (map first seq-of-idx->maps))
171 | row-maps (map last seq-of-idx->maps)
172 | columns (into #{} (filter (comp not nil?) (flatten (map keys row-maps))))
173 | col->vec (into {} (for [col columns]
174 | [col (vec (map #(get % col nil) row-maps))]))
175 | col->srs (into {} (for [[col vals] col->vec]
176 | [col (series/series vals index)]))]
177 |
178 | (-map-of-series->frame col->srs)))
179 |
180 | (defn index
181 | [^Frame frame]
182 | (. frame index))
183 |
184 | (defn column-map
185 | [^Frame frame]
186 | (. frame column-map))
187 |
188 | (defn columns
189 | [^Frame frame]
190 | (keys (column-map frame)))
191 |
192 | (defn set-index
193 | [^Frame frame index]
194 | (Frame. index (into {} (for [[col srs] (column-map frame)]
195 | [col (series/set-index srs index)]))))
196 |
197 | (defn assoc-ix
198 | "Takes a key of the index type and map
199 | of column names to values and return a
200 | frame with a new row added corresponding
201 | to the input index and column map."
202 | [^Frame df i row-map]
203 |
204 | (assert map? row-map)
205 |
206 | (let [new-columns (into {}
207 | (for [[k srs] (column-map df)]
208 | [k (conj srs [i (get row-map k nil)])]))
209 | new-index (conj (index df) i)]
210 | (frame new-columns new-index)))
211 |
212 | (defn assoc-col
213 | "Takes a key of the index type and map
214 | of column names to values and return a
215 | frame with a new row added corresponding
216 | to the input index and column map."
217 | [^Frame df col-name col]
218 |
219 | (let [col (if (series/series? col)
220 | col
221 | (series/series col (index df)))]
222 | (frame (assoc (column-map df) col-name col)
223 | (index df))))
224 |
225 |
226 | (defn ^{:protected true} print-row
227 | [row]
228 | (str/join
229 | \tab
230 | (into []
231 | (map #(if (nil? %) "nil" %) row))))
232 |
233 |
234 | (defn ix
235 | "Get the 'row' of the input dataframe
236 | corresponding to the input index.
237 |
238 | The 'row' is a Series corresponding to the
239 | input index applied to every column
240 | in this dataframe, where the index of
241 | the new series are the column names.
242 |
243 | If no row matching the index exists,
244 | return nil
245 | "
246 | [df i]
247 | (if (some #(= i %) (index df))
248 | (series/series (map #(series/ix % i) (-> df column-map vals)) (-> df column-map keys))
249 | nil))
250 |
251 | (defn col
252 | "Return the column from the dataframe
253 | by the given name as a Series"
254 | [df col-name]
255 | (get (column-map df) col-name))
256 |
257 | (defn rows->vectors
258 | "Return an iterator key-val pairs
259 | of index values to row values (as a vector)"
260 | [df]
261 | (zip
262 | (index df)
263 | (apply zip (map series/values (vals (column-map df))))))
264 |
265 | (defn iterrows
266 | "Return an iterator over vectors
267 | of key-val pairs of the row's
268 | index value and the value of that
269 | row as a map"
270 | [df]
271 | (for [idx (index df)]
272 | [idx (into {} (for [[col srs] (column-map df)]
273 | [col (series/ix srs idx)]))]))
274 |
275 | (defn maprows->srs
276 | "Apply the function to each row in the DataFrame
277 | (where the representation of each row is a map of
278 | column names to values).
279 | Return a Series whose index is the index of the
280 | original DataFrame and whose value is the value
281 | of the applied function."
282 | [^Frame df f]
283 | (let [rows (for [[_ row] (iterrows df)]
284 | (f row))]
285 | (series/series rows (index df))))
286 |
287 | (defn maprows->df
288 | "Apply the function to each row in the DataFrame
289 | (where the representation of each row is a map of
290 | column names to values). The function should return
291 | a Map.
292 | Return a DataFrame whose index is the same as the
293 | original dataframe and whose columns are the values
294 | of the maps returned by the function."
295 | [^Frame df f]
296 | (let [rows (for [[idx row] (iterrows df)]
297 | [idx (f row)])]
298 | (-list-of-index-row-pairs->frame rows)))
299 |
300 |
301 |
302 | (defn indices-alignable?
303 | [idx-left idx-right]
304 | (= (sort idx-left) (sort idx-right)))
305 |
306 | (defn loc
307 | "Take a Frame and a list of indices.
308 | Return a DataFrame consisting only of
309 | the input index rows (in the order of
310 | the given index).
311 | If an entry in indices is not in the
312 | input Frame, then each column will be nil."
313 | [^Frame df indices]
314 |
315 | (if (empty? indices)
316 | empty-frame
317 | (-list-of-index-row-pairs->frame
318 | (into [] (for [i indices]
319 | [i (if-let [row (ix df i)] (series/->map row) {})])))))
320 |
321 | (defn select
322 | [^Frame df sel]
323 |
324 | (let [sel (if (series/series? sel) sel (series/series sel))]
325 |
326 | (assert (indices-alignable? (index df) (series/index sel)))
327 |
328 | (let [to-keep (for [[[idx keep?] [idx row-map]] (zip sel (loc df (series/index sel)))
329 | :when keep?]
330 | [idx row-map])
331 | idx (map first to-keep)
332 | vals (map last to-keep)]
333 | (frame vals idx))))
334 |
335 |
336 | (defn subset
337 | "Return a subset of the input Frame
338 | the start and end indices (which are
339 | integer like) using the index order.
340 |
341 | The subset is inclusive on the start
342 | but exclusive on the end, meaning that
343 | (subset srs 0 (count srs)) returns the
344 | same series"
345 | [^Frame df start end]
346 |
347 | (assert (<= start end))
348 |
349 | (let [last (count df)
350 | srs-begin (min (max 0 start) last)
351 | srs-end (min (max 0 end) last)
352 | subset-index (subvec (index df) srs-begin srs-end)
353 | subset-columns (into {} (for [[name col] (column-map df)]
354 | [name (series/subset col start end)]))]
355 | (frame subset-columns subset-index)))
356 |
357 | (defn head
358 | "Return a subseries consisting of the
359 | first n elements of the input frame
360 | using the index order.
361 |
362 | If n > (count df), return the
363 | whole frame."
364 | ([^Frame df] (head df 5))
365 | ([^Frame df n] (subset df 0 n)))
366 |
367 | (defn tail
368 | "Return a subseries consisting of the
369 | last n elements of the input frame
370 | using the index order.
371 |
372 | If n > (count df), return the
373 | whole frame."
374 | ([^Frame df] (tail df 5))
375 | ([^Frame df n]
376 | (let [start (- (count df) n)
377 | end (count df)]
378 | (subset df start end))))
379 |
380 | (defn sort-rows
381 | "Sort DataFrame rows using the
382 | given column names in the order
383 | that they appear"
384 | [^Frame df & col-names]
385 | (let [get-sort-key (fn [[idx row-map]]
386 | (into [] (for [col col-names] (get row-map col))))
387 | sorted-idx-row-pairs (sort-by get-sort-key df)]
388 | (-list-of-index-row-pairs->frame sorted-idx-row-pairs)))
389 |
390 |
391 | (defn add-suffix
392 | "Add a suffix to a name.
393 | A name or suffix can be either
394 | a string or a keyword"
395 | [col-name suffix]
396 | (cond
397 | (string? col-name) (str col-name (name suffix))
398 | (keyword? col-name) (keyword (str (name col-name) (name suffix)))
399 | :else (throw (new Exception))))
400 |
401 |
402 | (defn assoc-common-column
403 | "Conditionally associated the input
404 | column and value pair into the input
405 | map. The name associated in and whether
406 | the association happens at all is dependent
407 | on the common-col-resolution map, using
408 | whether the columns is left? or not."
409 | [into-map col-name val left? common-columns
410 | {:keys [suffixes prefer-column]
411 | :or {suffixes ["-x" "-y"]
412 | prefer-column nil}}]
413 |
414 | (assert (contains? #{:left :right nil} prefer-column))
415 |
416 | (if (not (contains? common-columns col-name))
417 | (assoc into-map col-name val)
418 |
419 | (if prefer-column
420 |
421 | (case [prefer-column left?]
422 | [:left true] (assoc into-map col-name val)
423 | [:left false] into-map
424 | [:right false] (assoc into-map col-name val)
425 | [:right true] into-map
426 | :else (throw (new Exception)))
427 |
428 | (let [[left-suffix right-suffix] suffixes
429 | suffix (if left? left-suffix right-suffix)]
430 |
431 | (assoc into-map (add-suffix col-name suffix) val)))))
432 |
433 |
434 | (defn join-index
435 | [left-index right-index join-type]
436 | (case join-type
437 | :left left-index
438 | :right right-index
439 | :outer (concat left-index (filter #(not (contains? (set left-index) %)) right-index))
440 | :inner (filter #(contains? (set right-index) %) left-index)
441 | :else (throw (new Exception))))
442 |
443 |
444 | (defn join
445 | [^Frame left ^Frame right
446 | & {:keys [how suffixes prefer-column]
447 | :or {how :inner
448 | suffixes ["-x" "-y"]
449 | prefer-column nil}
450 | :as kwargs}]
451 |
452 | (let [shared-cols (into #{} (set/intersection (set (columns left)) (set (columns right))))
453 | idx (join-index (index left) (index right) how)
454 | left-cols (for [[col srs] (column-map left)] [col (series/series (map #(series/ix srs %) idx) idx) true])
455 | right-cols (for [[col srs] (column-map right)] [col (series/series (map #(series/ix srs %) idx) idx) false])
456 | all-cols (concat left-cols right-cols)
457 | col-map (reduce (fn [coll [col-name val left?]]
458 | (assoc-common-column coll col-name val left? shared-cols kwargs))
459 | {}
460 | all-cols)]
461 | (frame col-map idx)))
462 |
463 |
464 | (defn group-by
465 | [^Frame df vals]
466 | (let [srs (if (series/series? vals) vals (series/series vals))
467 | grouped-vals (core/group-by (fn [[ix val]] val) srs)
468 | val-index-list (into {} (for [[val ix-val-list] grouped-vals]
469 | [val (into [] (map first ix-val-list))]))]
470 | (into {} (for [[val idx-list] val-index-list]
471 | [val (loc df idx-list)]))))
472 |
473 | (defn group-by-fn
474 | "Group a Frame by the given function,
475 | which must be a function of it's a row-map,
476 | and return a map of fn vals to Frames"
477 | [^Frame df f]
478 | (let [grouped-idx-rows (core/group-by (fn [[idx row]] (f row)) (iterrows df))]
479 | (into {} (for [[k idx-row-list] grouped-idx-rows]
480 | [k (frame idx-row-list)]))))
481 |
482 | (defn replace-$-with-keys
483 | "Takes a context (typically a map or a Frame),
484 | an expression (containing '$' values)
485 | and a getter function (typically core/get
486 | or frame/col).
487 | Return an expression where each instance of
488 | a $var is replaced with the getter-function
489 | getting :var from the input context.
490 |
491 | In other words, if the expression contains:
492 |
493 | $foo
494 |
495 | it is replaced by:
496 |
497 | (get ctx :foo)
498 | "
499 | [ctx expr get-fn]
500 | (clojure.walk/postwalk
501 | (fn [x]
502 | (if (and
503 | (symbol? x)
504 | (clojure.string/starts-with? (name x) "$"))
505 | `(~get-fn ~ctx ~(keyword (subs (name x) 1)))
506 | x))
507 | expr))
508 |
509 | (defmacro with->
510 | "A threading macro intended to thread
511 | expressions on data frames.
512 | Automatically replaces symbols starting
513 | with '$' with columns from the last
514 | DataFrame that was encountered
515 | in the threading."
516 | [df & exprs]
517 | (if (empty? exprs)
518 | df
519 | (let [sym (gensym)
520 | head (replace-$-with-keys df (first exprs) 'dataframe.frame/col)
521 | tail (rest exprs)]
522 | `(let [~sym (-> ~df ~head)] (with-> ~sym ~@tail)))))
523 |
--------------------------------------------------------------------------------
/src/dataframe/series.clj:
--------------------------------------------------------------------------------
1 | (ns dataframe.series
2 | (:refer-clojure)
3 | (:require [dataframe.util :refer :all]
4 | [clojure.string :as str])
5 | (:import (clojure.lang IPersistentVector IPersistentMap MapEntry)))
6 |
7 | (declare series
8 | update-key)
9 |
10 | ; A Series is a data structure that maps an index
11 | ; A Series is a data structure that maps an index
12 | ; to valus. It supports:
13 | ; - Order 1 access to values by index
14 | ; - Order 1 access to [index value] pairs by position (nth)
15 | ; - Maintaining the order of [index value] pairs for iteration
16 | ;
17 | ; As viewed as a Clojure Persistent collection, it is a collection
18 | ; of [index value] pairs.
19 | ; It is also Associative between the index keys and its values
20 | (deftype Series [^IPersistentVector values
21 | ^IPersistentVector index
22 | ^IPersistentMap lookup]
23 |
24 | java.lang.Object
25 | (equals [this other]
26 | (cond (nil? other) false
27 | (not (= Series (class other))) false
28 | :else (every? true?
29 | [(= (. this values) (. other values))
30 | (= (. this index) (. other index))])))
31 | (hashCode [this]
32 | (hash [(hash (. this index)) (hash (. this values))]))
33 |
34 | java.lang.Iterable
35 | (iterator [this]
36 | (.iterator (zip index values)))
37 |
38 | clojure.lang.Counted
39 | (count [this] (count index))
40 |
41 | clojure.lang.IPersistentCollection
42 | (seq [this] (if (empty? index)
43 | nil
44 | (zip (. this index) (. this values))))
45 | ; Return a sequence of key-val pairs
46 | (cons [this other]
47 | (assert (vector? other))
48 | (assert (= 2 (count other)))
49 | (assoc this (first other) (last other)))
50 | ;(cons (.iterator this) other))
51 | (empty [this] (empty? index))
52 | (equiv [this other] (.. this (equals other)))
53 |
54 | clojure.lang.ILookup
55 | (valAt [this i] (.. this (valAt i nil)))
56 | (valAt [this i or-else] (if-let [n (get (. this lookup) i)]
57 | (nth (. this values) n)
58 | or-else))
59 |
60 | clojure.lang.Associative
61 | (containsKey [this key]
62 | (contains? lookup key))
63 | (entryAt [this key]
64 | (MapEntry/create key (.. this (valAt key))))
65 | ;Takes a key of the index type and map
66 | ; of column names to values and return a
67 | ; frame with a new row added corresponding
68 | ; to the input index and column map."
69 | (assoc [this idx val]
70 | (if (contains? this idx)
71 | (series (assoc values (get lookup idx) val)
72 | index)
73 | (series (conj values val) (conj index idx)))))
74 |
75 | ; Constructor
76 | (defn series
77 |
78 | ([data] (series data (range (count data))))
79 |
80 | ([data index]
81 |
82 | (let [data (->vector data)
83 | index (->vector index)
84 | lookup (into {} (enumerate index false))]
85 |
86 | (assert (apply distinct? index))
87 | (assert (= (count data) (count index)))
88 | (if (not (every? nil? data))
89 | (assert (apply = (map type (filter (comp not nil?) data)))))
90 |
91 | (Series. data index lookup))))
92 |
93 | (defmethod print-method Series [^Series srs writer]
94 | (.write writer (str (class srs)
95 | "\n"
96 | (str/join "\n"
97 | (map
98 | (fn [[i d]]
99 | (str i " " (if (nil? d) "nil" d)))
100 | (zip (. srs index) (. srs values)))))))
101 |
102 | (defn series?
103 | [x]
104 | (instance? Series x))
105 |
106 | (defn index
107 | [^Series srs]
108 | (. srs index))
109 |
110 | (defn values
111 | [^Series srs]
112 | (. srs values))
113 |
114 | (defn ix
115 | "Takes a series and an index and returns
116 | the item in the series corresponding
117 | to the input index"
118 | ([^Series srs i] (get srs i nil))
119 | ([^Series srs i or-else] (get srs i or-else)))
120 |
121 | (defn loc
122 | "Take a Series and a list of indices.
123 | Return a Seriues consisting only of
124 | the input index rows (in the order of
125 | the given index).
126 | If an entry in indices is not in the
127 | input Series, then it's value will be nil"
128 | [^Series srs indices]
129 | (if (empty? indices)
130 | (series [])
131 | (series
132 | (for [i indices] (ix srs i))
133 | indices)))
134 |
135 |
136 | (defn set-index
137 | "Return a series with the same values
138 | but with the updated index."
139 | [^Series srs index]
140 | (series (values srs)
141 | (->vector index)))
142 |
143 | (defn mapvals
144 | "Apply the function to all vals in the Series,
145 | returning a new Series consistening of these
146 | transformed vals with their indices."
147 | [^Series srs f]
148 | (series (map f (values srs)) (index srs)))
149 |
150 | (defn select
151 | "Takes a series and a list of possibly-true values
152 | and return a series containing only vals that
153 | line up to truthy values"
154 | [^Series srs selection]
155 |
156 | (assert (= (count srs) (count selection)))
157 |
158 | (let [selection (if (series? selection) (values selection) selection)
159 | to-keep (for [[keep? [idx val]] (zip selection srs)
160 | :when keep?]
161 | [idx val])
162 | idx (map #(nth % 0) to-keep)
163 | vals (map #(nth % 1) to-keep)]
164 |
165 | (series vals idx)))
166 |
167 | (defn subset
168 | "Return a subseries defined
169 | the start and end indices (which are
170 | integer like) using the index order.
171 |
172 | The subset is inclusive on the start
173 | but exclusive on the end, meaning that
174 | (subset srs 0 (count srs)) returns the
175 | same series"
176 | [^Series srs start end]
177 |
178 | (assert (<= start end))
179 |
180 | (let [last (count srs)
181 | srs-begin (min (max 0 start) last)
182 | srs-end (min (max 0 end) last)]
183 | (series
184 | (subvec (values srs) srs-begin srs-end)
185 | (subvec (index srs) srs-begin srs-end))))
186 |
187 | (defn head
188 | "Return a subseries consisting of the
189 | first n elements of the input series
190 | using the index order.
191 |
192 | If n > (count srs), return the
193 | whole series."
194 | ([^Series srs] (head srs 5))
195 | ([^Series srs n] (subset srs 0 n)))
196 |
197 | (defn tail
198 | "Return a subseries consisting of the
199 | last n elements of the input series
200 | using the index order.
201 |
202 | If n > (count srs), return the
203 | whole series."
204 | ([^Series srs] (tail srs 5))
205 | ([^Series srs n]
206 | (let [start (- (count srs) n)
207 | end (count srs)]
208 | (subset srs start end))))
209 |
210 |
211 | (defn ->map
212 | [^Series srs]
213 | (into {} srs))
214 |
215 | (defn index-aligned-pairs
216 | "Take two series and return
217 | a joined index and a sequence
218 | over pairs of the left and right
219 | series"
220 | [^Series left ^Series right]
221 |
222 | (if (= (index left) (index right))
223 |
224 | [(index left) (zip (values left) (values right))]
225 |
226 | (let [left-idx (index left)
227 | right-only-idx (->> right index (filter #(not (contains? left %))))
228 | idx (concat left-idx right-only-idx)
229 | vals (for [i idx] [(ix left i) (ix right i)])]
230 | [idx vals])))
231 |
232 | (defn join-map
233 | "Takes a function of two arguments and
234 | applies it to the pairs in the outer join of the
235 | two input series, returning a new Series."
236 | [f ^Series x ^Series y]
237 | (let [[idx pairs] (index-aligned-pairs x y)
238 | vals (for [[l r] pairs] (f l r))]
239 | (series vals idx)))
240 |
241 | (defn ^{:protected true} broadcast
242 | "Take a binary function and turn it into
243 | a bradcasted function so that it can
244 | operate on Series in any of it's arguments"
245 | [f]
246 | (fn [x y]
247 | (cond
248 | (and (instance? Series x) (instance? Series y)) (join-map f x y)
249 | (instance? Series x) (series (for [l (values x)]
250 | (f l y))
251 | (index x))
252 | (instance? Series y) (series (for [r (values y)]
253 | (f x r))
254 | (index y))
255 | :else (f x y))))
256 |
257 | (defn ^{:protected true} multi-broadcast
258 | "Take a function of any arity and turn it into
259 | a bradcasted function so that it can
260 | operate on Series in any of it's arguments"
261 | [f]
262 | (fn [x & args]
263 | (loop [x x
264 | args args]
265 | (if (empty? args)
266 | x
267 | (recur ((broadcast f) x (first args))
268 | (rest args))))))
269 |
270 | (def lt (broadcast (nillify <)))
271 | (def lte (broadcast (nillify <=)))
272 | (def gt (broadcast (nillify >)))
273 | (def gte (broadcast (nillify >=)))
274 |
275 | (def add (multi-broadcast (nillify +)))
276 | (def sub (multi-broadcast (nillify -)))
277 | (def mul (multi-broadcast (nillify *)))
278 | (def div (multi-broadcast (nillify /)))
279 |
280 | (def eq (multi-broadcast (nillify =)))
281 | (def neq (multi-broadcast (comp (nillify not) (nillify =))))
282 |
283 |
--------------------------------------------------------------------------------
/src/dataframe/util.clj:
--------------------------------------------------------------------------------
1 | (ns dataframe.util)
2 |
3 | (defn zip
4 | "Take a number of iterables and
5 | return a single iterable over vectors,
6 | each containing the ith element of each
7 | input iterable (ordered by the order of
8 | the input iterables).
9 | If the input iterables are not of the same
10 | length, the returned iterable is as long as
11 | the shortest input iterable (data from
12 | longer input interables will not be
13 | returned).
14 | "
15 | [& args]
16 | (apply map vector args))
17 |
18 | (defn enumerate
19 | ([xs] (enumerate xs true))
20 | ([xs index-first?]
21 | (if index-first?
22 | (zip (range) xs)
23 | (zip xs (range)))))
24 |
25 | (defn ->vector
26 | [x]
27 | (if (vector? x)
28 | x
29 | (vec x)))
30 |
31 | (defn nillify
32 | "Takes a binary function and returns
33 | a function that short-circuits nil values."
34 | [f]
35 | (fn [& args]
36 | (if (some nil? args)
37 | nil
38 | (apply f args))))
39 |
--------------------------------------------------------------------------------
/test/dataframe/core_test.clj:
--------------------------------------------------------------------------------
1 | (ns dataframe.core-test
2 | (:refer-clojure :exclude [group-by])
3 | (:require [clojure.test :refer :all]
4 | [dataframe.core :refer :all]
5 | [dataframe.frame :as frame]))
6 |
7 | ;
8 | ;
9 | ;(expect (more-of df
10 | ;
11 | ; )
12 | ;
13 | ; (let [df (frame/frame {:a '(1 2 3) :b '(2 4 6)})
14 | ; a-min (with-df-> f (frame/filter :a (< 10)
--------------------------------------------------------------------------------
/test/dataframe/frame_test.clj:
--------------------------------------------------------------------------------
1 | (ns dataframe.frame-test
2 | (:require [dataframe.frame :as frame :refer [index]]
3 | [expectations :refer [expect expect-focused more-of]]
4 | [dataframe.series :as series]
5 | [clojure.core :as core]))
6 |
7 | ; Constructors
8 |
9 | (expect (frame/frame {:a '(1 2 3) :b '(2 4 6)} [:x :y :z])
10 | (frame/frame [[:x {:a 1 :b 2}]
11 | [:y {:a 2 :b 4}]
12 | [:z {:a 3 :b 6}]]))
13 |
14 | (expect '(0 1 2)
15 | (let [df (frame/frame {:a '(1 2 3) :b '(2 4 6)})]
16 | (index df)))
17 |
18 | (expect (series/series '(1 2 3) '(0 1 2))
19 | (-> (frame/frame {:a '(1 2 3) :b '(2 4 6)})
20 | (frame/col :a)))
21 |
22 | (expect nil
23 | (-> (frame/frame {:a '(1 2 3) :b '(2 4 6)})
24 | (frame/col :c)))
25 |
26 | (expect (series/series [1 2] [:a :b])
27 | (-> (frame/frame {:a '(1 2 3) :b '(2 4 6)} [:x :y :z])
28 | (frame/ix :x)))
29 |
30 | (expect '([:x {:a 1 :b 2}]
31 | [:y {:a 2 :b 4}]
32 | [:z {:a 3 :b 6}])
33 | (-> (frame/frame {:a '(1 2 3) :b '(2 4 6)} [:x :y :z])
34 | frame/iterrows))
35 |
36 | ; Assert the iterator over a datafram
37 | ; iterates over [index row-map] pairs
38 | (expect '([:x {:a 1 :b 2}]
39 | [:y {:a 2 :b 4}]
40 | [:z {:a 3 :b 6}])
41 | (for [x (frame/frame {:a '(1 2 3) :b '(2 4 6)} [:x :y :z])]
42 | x))
43 |
44 | (expect (frame/frame [[:x {:a 1 :b 2}]
45 | [:y {:a 2 :b 4}]
46 | [:z {:a 3 :b 6}]])
47 | (conj
48 | (frame/frame {:a '(1 2) :b '(2 4)} [:x :y])
49 | [:z {:a 3 :b 6}]))
50 |
51 | (expect false
52 | (empty? (frame/frame {:a '(1 2) :b '(2 4)} [:x :y])))
53 |
54 | (expect true
55 | (empty? (frame/frame {} [])))
56 |
57 | (expect (series/series [3 6 9] [:x :y :z])
58 | (frame/maprows->srs
59 | (frame/frame {:a '(1 2 3) :b '(2 4 6)} [:x :y :z])
60 | (fn [row] (+ (:a row) (:b row)))))
61 |
62 | (expect (frame/frame {:bar [-1 -2 -3] :foo [3 6 9]} [:x :y :z])
63 | (frame/maprows->df
64 | (frame/frame {:a '(1 2 3) :b '(2 4 6)} [:x :y :z])
65 | (fn [row] {:foo (+ (:a row) (:b row)) :bar (- (:a row) (:b row))})))
66 |
67 | (expect (frame/frame {:a [1 2 3] :b [4 5 6]} [:x :y :z])
68 | (frame/frame
69 | [{:a 1 :b 4} {:a 2 :b 5} {:a 3 :b 6}]
70 | [:x :y :z]))
71 |
72 | (expect (frame/frame [{:a 2 :b 6} {:a 4 :b 8}] [:x :z])
73 | (frame/select
74 | (frame/frame
75 | {:a [1 2 3 4] :b [5 6 7 8]}
76 | [:w :x :y :z])
77 | (series/series [false true nil true] [:w :x :y :z])))
78 |
79 |
80 | (expect (frame/frame {:a [1 3] :b [20 30]} [1 2])
81 | (frame/loc
82 | (frame/frame {:a [1 1 3] :b [10 20 30]}) [1 2]))
83 |
84 | (expect (frame/frame {:a [1 3 nil nil] :b [20 30 nil nil]} [1 2 10 20])
85 | (frame/loc
86 | (frame/frame {:a [1 1 3] :b [10 20 30]}) [1 2 10 20]))
87 |
88 | ;(expect (series/series [15])
89 | ; (frame/with-context
90 | ; (frame/frame [{:b 10}])
91 | ; (series/add 5 $b)))
92 |
93 |
94 | (expect '(+ 5 (core/get {:b 10} :b))
95 | (frame/replace-$-with-keys {:b 10} '(+ 5 $b) 'core/get))
96 |
97 | (expect 15
98 | (eval (frame/replace-$-with-keys {:b 10} '(+ 5 $b) get)))
99 |
100 | (expect 15
101 | (frame/with-> 12 (+ 5) (- 2)))
102 |
103 | (expect (frame/frame {:a [1 2] :z [1 2]})
104 | (frame/with-> (frame/frame {:a [1 2]}) (frame/assoc-col :z $a)))
105 |
106 | (expect 20
107 | (frame/with-> {:x {:y 20}} :x :y))
108 |
109 | (expect (frame/frame [{:a 3 :b 300}] [2])
110 |
111 | (let [df (frame/frame {:a [1 2 3] :b [100 200 300]})]
112 | (frame/with-> df (frame/select (series/gt $a 2)))))
113 |
114 | (expect (frame/frame {:a [1 2 3] :b [100 200 300] :c [10 20 30]})
115 | (let [df (frame/frame {:a [1 2] :b [100 200] :c [10 20]})]
116 | (frame/assoc-ix df 2 {:a 3 :b 300 :c 30})))
117 |
118 | (expect (frame/frame {:a [1 2] :b [100 200] :c [10 20] :d [5 10]})
119 | (let [df (frame/frame {:a [1 2] :b [100 200] :c [10 20]})]
120 | (frame/assoc-col df :d [5 10])))
121 |
122 | (expect (frame/frame {:a [1] :b [2]} [:x])
123 | (frame/head
124 | (frame/frame [[:x {:a 1 :b 2}]
125 | [:y {:a 2 :b 4}]
126 | [:z {:a 3 :b 6}]])
127 | 1))
128 |
129 | (expect 3
130 | (count (frame/frame {:a '(1 2 3) :b '(2 4 6)})))
131 |
132 | (expect true
133 | (= (frame/frame {:a '(1 2 3) :b '(2 4 6)})
134 | (frame/frame {:a '(1 2 3) :b '(2 4 6)})))
135 |
136 | (expect true
137 | (= (frame/frame {:b '(2 4 6) :a '(1 2 3)})
138 | (frame/frame {:a '(1 2 3) :b '(2 4 6)})))
139 |
140 | (expect false
141 | (= (frame/frame {:a '(1 2 5) :b '(2 4 6)})
142 | (frame/frame {:a '(1 2 3) :b '(2 4 6)})))
143 |
144 | (expect (frame/frame {:a [2 4 7] :b [4 2 8]} [:y :x :z])
145 | (frame/sort-rows (frame/frame [[:x {:a 4, :b 2}] [:y {:a 2, :b 4}] [:z {:a 7, :b 8}]])
146 | :a))
147 |
148 | (expect (frame/frame {:a [1 2 3]
149 | :b [10 20 30]
150 | :c [1 2 3]
151 | :d [10 20 30]})
152 | (frame/join
153 | (frame/frame {:a [1 2 3] :b [10 20 30]})
154 | (frame/frame {:c [1 2 3] :d [10 20 30]})
155 | :how :outer))
156 |
157 | (expect (frame/frame {:a [1 2 3 nil nil]
158 | :b [10 20 30 nil nil]
159 | :c [nil nil 1 2 3]
160 | :d [nil nil 10 20 30]})
161 | (frame/join
162 | (frame/frame {:a [1 2 3] :b [10 20 30]})
163 | (frame/frame {:c [1 2 3] :d [10 20 30]} [2 3 4])
164 | :how :outer))
165 |
166 |
167 | (expect (frame/frame {:a-y [4 5 6]
168 | :a-x [1 2 3]
169 | :b [10 20 30]
170 | :c [100 200 300]})
171 | (frame/join
172 | (frame/frame {:a [1 2 3] :b [10 20 30]})
173 | (frame/frame {:a [4 5 6] :c [100 200 300]})
174 | :how :outer))
175 |
176 |
177 | (expect (frame/frame {:a-x [3]
178 | :b [30]
179 | :a-y [1]
180 | :d [10]}
181 | [2])
182 | (frame/join
183 | (frame/frame {:a [1 2 3] :b [10 20 30]})
184 | (frame/frame {:a [1 2 3] :d [10 20 30]} [2 3 4])
185 | :how :inner))
186 |
187 | (expect (frame/frame {:a-x [1 2 3]
188 | :b [10 20 30]
189 | :a-y [nil nil 1]
190 | :d [nil nil 10]}
191 | [0 1 2])
192 | (frame/join
193 | (frame/frame {:a [1 2 3] :b [10 20 30]})
194 | (frame/frame {:a [1 2 3] :d [10 20 30]} [2 3 4])
195 | :how :left))
196 |
197 |
198 | ; Test handling of common columns
199 |
200 | (expect "foobar" (frame/add-suffix "foo" "bar"))
201 |
202 | (expect "foobar" (frame/add-suffix "foo" "bar"))
203 |
204 | (expect "foobar" (frame/add-suffix "foo" :bar))
205 |
206 | (expect :foobar (frame/add-suffix :foo :bar))
207 |
208 | (expect :foobar (frame/add-suffix :foo "bar"))
209 |
210 | (expect {:foo 10} (frame/assoc-common-column {} :foo 10 true #{:foo :bar} {:prefer-column :left}))
211 |
212 | (expect {} (frame/assoc-common-column {} :foo 10 true #{:foo :bar} {:prefer-column :right}))
213 |
214 | (expect {:baz 10} (frame/assoc-common-column {} :baz 10 true #{:foo :bar} {:prefer-column :right}))
215 |
216 | (expect {:baz 10} (frame/assoc-common-column {} :baz 10 true #{:foo :bar} {:suffixes ["-left" "-right"]}))
217 |
218 | (expect {:foo-left 10} (frame/assoc-common-column {} :foo 10 true #{:foo :bar} {:suffixes ["-left" "-right"]}))
219 |
220 | (expect (frame/assoc-common-column {} :foo 10 false #{:foo :bar} {:suffixes ["-left" "-right"]}))
221 |
222 |
223 | (expect (more-of grouped
224 | (frame/frame {:a [1 2] :b [10 20]} [0 1]) (:foo grouped)
225 | (frame/frame {:a [3] :b [30]} [2]) (:bar grouped))
226 | (frame/group-by
227 | (frame/frame {:a [1 2 3] :b [10 20 30]})
228 | [:foo :foo :bar]))
229 |
230 | (expect (more-of grouped
231 | (frame/frame {:a [3 2] :b [30 20]} [2 1]) (:foo grouped)
232 | (frame/frame {:a [1] :b [10]} [0]) (:bar grouped))
233 | (frame/group-by
234 | (frame/frame {:a [1 2 3] :b [10 20 30]})
235 | (series/series [:foo :foo :bar] [2 1 0])))
236 |
237 |
238 | (expect (more-of grouped
239 | (frame/frame {:a [1 1] :b [10 20]} [0 1]) (get grouped 1)
240 | (frame/frame {:a [3] :b [30]} [2]) (get grouped 3))
241 | (frame/group-by-fn
242 | (frame/frame {:a [1 1 3] :b [10 20 30]}) :a))
243 |
244 |
245 | (expect (more-of grouped
246 | (frame/frame {:a [1 10] :b [10 1]} [0 2]) (get grouped 11)
247 | (frame/frame {:a [1] :b [5]} [1]) (get grouped 6)
248 | (frame/frame {:a [10] :b [17]} [3]) (get grouped 27))
249 | (frame/with-> (frame/frame {:a [1 1 10 10] :b [10 5 1 17]})
250 | (frame/group-by (series/add $a $b))))
251 |
--------------------------------------------------------------------------------
/test/dataframe/pipeline_test.clj:
--------------------------------------------------------------------------------
1 | (ns dataframe.pipeline-test
2 | (:refer-clojure :exclude [group-by])
3 | (:require [dataframe.core :refer :all]
4 | [expectations :refer [expect]]))
5 |
6 | (expect (frame {:a [1 2 3]
7 | :b [10 20 30]
8 | :c [6 7 8]
9 | :d [16 27 38]})
10 | (with-> (frame {:a [1 2 3] :b [10 20 30]})
11 | (assoc-col :c (add $a 5))
12 | (assoc-col :d (add $b $c))))
13 |
14 | (expect (frame {:c [3 6] :b [2 4] :a [1 2]} [:x :y])
15 | (let [df (frame [[:x {:a 1 :b 2}]
16 | [:y {:a 2 :b 4}]
17 | [:z {:a 3 :b 8}]])]
18 | (with-> df
19 | (select (lte $a 2))
20 | (assoc-col :c (add $a $b))
21 | (sort-rows :c :b))))
22 |
23 | (expect (frame {:foo [8 8 14] :bar [0 -2 -3]} [:w :y :z])
24 | (let [df (frame [[:w {:a 0 :b 8}]
25 | [:x {:a 1 :b 2}]
26 | [:y {:a 2 :b 4}]
27 | [:z {:a 3 :b 8}]])]
28 | (with-> df
29 | (select (and (lte $a 2) (gte $b 4)))
30 | (assoc-col :c (add $a $b))
31 | (maprows->df (fn [row] {:foo (+ (:a row) (:c row))
32 | :bar (- (:b row) (:c row))}))
33 | head)))
--------------------------------------------------------------------------------
/test/dataframe/series_test.clj:
--------------------------------------------------------------------------------
1 | (ns dataframe.series-test
2 | (:require [dataframe.series :as srs :refer [series index]]
3 | [expectations :refer [expect more-of]]
4 | [dataframe.series :as series]
5 | [dataframe.util :as util]))
6 |
7 | (expect '(0 1 2)
8 | (let [my-srs (srs/series '(:x :y :z))]
9 | (index my-srs)))
10 |
11 | (expect 1
12 | (let [my-srs (srs/series '(1 2 3) '("A" "B" "C"))]
13 | (srs/ix my-srs "A")))
14 |
15 | (expect nil
16 | (let [my-srs (srs/series '(1 2 3) '("A" "B" "C"))]
17 | (srs/ix my-srs "D")))
18 |
19 | (expect "Bar"
20 | (let [my-srs (srs/series '(1 2 3) '("A" "B" "C"))]
21 | (srs/ix my-srs "D" "Bar")))
22 |
23 | (expect AssertionError
24 | (let [my-srs (srs/series '(1 2 3) '("A" "B" "A"))]
25 | (srs/ix my-srs "D")))
26 |
27 | ; Test iteration
28 | (expect '([:a 1] [:b 2] [:c 3])
29 | (map identity
30 | (srs/series [1 2 3] [:a :b :c])))
31 |
32 | (expect (srs/series [2 4 6] [:a :b :c])
33 | (srs/mapvals
34 | (srs/series [1 2 3] [:a :b :c])
35 | #(* 2 %)))
36 |
37 | (expect (srs/series [1 2 3] [:c :d :e])
38 | (srs/set-index
39 | (srs/series [1 2 3] [:a :b :c])
40 | [:c :d :e]))
41 |
42 | (expect (srs/series [2 4] [:b :d])
43 | (srs/select
44 | (srs/series [1 2 3 4] [:a :b :c :d])
45 | [false true nil "true"]))
46 |
47 | (expect (srs/series [false false true])
48 | (srs/gt
49 | (srs/series [1 5 10])
50 | 5))
51 |
52 | (expect (srs/series [116 120 125])
53 | (srs/add
54 | (srs/series [1 5 10])
55 | 5
56 | 10
57 | (srs/series [100 100 100])))
58 |
59 | (expect (srs/series [6 nil 15])
60 | (srs/add
61 | (srs/series [1 nil 10])
62 | 5))
63 |
64 | (expect (srs/series [false true false])
65 | (series/eq (series/series [1 5 10]) 5))
66 |
67 | (expect (srs/series [true false true])
68 | (series/neq (series/series [1 5 10]) 5))
69 |
70 | (expect (more-of srs
71 | (series/series [1] [0]) (series/subset srs 0 1)
72 | (series/series [3 4] [2 3]) (series/subset srs 2 4)
73 | (series/series [1 2] [0 1]) (series/head srs 2)
74 | (series/series [6 7] [5 6]) (series/tail srs 2))
75 | (series/series [1 2 3 4 5 6 7]))
76 |
77 | (expect 2
78 | (.valAt (series [1 2 3] [:a :b :c]) :b))
79 |
80 | (expect true
81 | (contains? (series [1 2 3] [:a :b :c]) :b))
82 |
83 | (expect false
84 | (contains? (series [1 2 3] [:a :b :c]) :d))
85 |
86 | (expect [:b 2]
87 | (.entryAt (series [1 2 3] [:a :b :c]) :b))
88 |
89 | (expect (series [1 2 3 4 5 6] [:a :b :c :d :e :f])
90 | (assoc
91 | (series [1 2 3 4 5] [:a :b :c :d :e])
92 | :f 6))
93 |
94 | (expect (series [1 10 3 4 5] [:a :b :c :d :e])
95 | (assoc
96 | (series [1 2 3 4 5] [:a :b :c :d :e])
97 | :b 10))
98 |
99 | (expect 2
100 | (get (series [1 2 3] [:a :b :c]) :b))
101 |
102 | (expect '([:a 1] [:b 2] [:c 3])
103 | (seq (series/series [1 2 3] [:a :b :c])))
104 |
105 | (expect '([:d 4] [:a 1] [:b 2] [:c 3])
106 | (cons [:d 4] (series/series [1 2 3] [:a :b :c])))
107 |
108 | (expect '([:d 4] [:a 1] [:b 2] [:c 3])
109 | (cons [:d 4] (series/series [1 2 3] [:a :b :c])))
110 |
111 | (expect true
112 | (= (series/series [1 2 3] [:a :b :c])
113 | (series/series [1 2 3] [:a :b :c])))
114 |
115 | (expect false
116 | (= (series/series [1 2 3] [:a :b :d])
117 | (series/series [1 2 3] [:a :b :c])))
118 |
119 | ; Equality checks order
120 | (expect false
121 | (= (series/series [3 2 1] [:c :b :a])
122 | (series/series [1 2 3] [:a :b :c])))
123 |
124 | ; Check that we iterate as pairs of index->val
125 | (expect '([:a 1] [:b 2] [:c 3])
126 | (for [x (series/series [1 2 3] [:a :b :c])]
127 | x))
128 |
129 | (expect ['(:a :b :c :d) '([1 10] [2 nil] [3 20] [nil 30])]
130 | (series/index-aligned-pairs
131 | (series/series [1 2 3] [:a :b :c])
132 | (series/series [10 20 30] [:a :c :d])))
133 |
134 | (expect (series/series [11 nil 23 nil] [:a :b :c :d])
135 | (series/join-map
136 | (util/nillify +)
137 | (series/series [1 2 3] [:a :b :c])
138 | (series/series [10 20 30] [:a :c :d])))
139 |
140 | (expect
141 | (series/series [1 2] [0 1])
142 | (series/loc (series/series [1 2 3]) [0 1]))
143 |
144 | (expect
145 | (series/series [1 2 nil nil] [0 1 1000 2000])
146 | (series/loc (series/series [1 2 3]) [0 1 1000 2000]))
--------------------------------------------------------------------------------
/test/dataframe/util_test.clj:
--------------------------------------------------------------------------------
1 | (ns dataframe.util-test
2 | (:require [expectations :refer [expect]]
3 | [dataframe.util :as util]))
4 |
--------------------------------------------------------------------------------