tech.v3.dataset.node
Functions and helpers that require the node runtime.
8 |transit-file->dataset
(transit-file->dataset fname)Given a file of transit data return a dataset.
9 |Functions and helpers that require the node runtime.
8 |(transit-file->dataset fname)Given a file of transit data return a dataset.
9 |The datatype library has some helpers that work with datasets that can make certain types of 9 | reductions much faster.
10 |This is a very common operation so let's take a closer look. The generic dataset 12 | pathway would be:
13 |cljs.user> (require '[tech.v3.dataset :as ds])
14 | nil
15 | cljs.user> (def test-ds (ds/->dataset {:a (range 20000)
16 | :b (repeatedly 20000 rand)}))
17 | #'cljs.user/test-ds
18 | cljs.user> ;;filter on a, sum b.
19 | cljs.user> (reduce + 0.0 (-> (ds/filter-column test-ds :a #(> % 10000))
20 | (ds/column :b)))
21 | 5000.898384571656
22 | cljs.user> (time (dotimes [idx 100] (reduce + 0.0 (-> (ds/filter-column test-ds :a #(> % 10000))
23 | (ds/column :b)))))
24 | "Elapsed time: 282.714231 msecs"
25 | "Elapsed time: 282.714231 msecs"
26 | "Elapsed time: 282.714231 msecs"
27 |
28 | Think transducers are fast? What about a generic transducer pathway?
29 |cljs.user> (let [a (test-ds :a)
30 | b (test-ds :b)]
31 | (transduce (comp (filter #(> (nth a %) 10000))
32 | (map #(nth b %)))
33 | (completing +)
34 | (range (ds/row-count test-ds))))
35 | 5000.898384571656
36 | cljs.user> (time (dotimes [idx 100]
37 | (let [a (test-ds :a)
38 | b (test-ds :b)]
39 | (transduce (comp (filter #(> (nth a %) 10000))
40 | (map #(nth b %)))
41 | (completing +)
42 | (range (ds/row-count test-ds))))))
43 | "Elapsed time: 436.235972 msecs"
44 | "Elapsed time: 436.235972 msecs"
45 | "Elapsed time: 436.235972 msecs"
46 | nil
47 |
48 | Transducers are fast - after looking at this pathway we found the
49 | nth call is relatively expensive. The datatype library has a way
50 | to get the fastest nth-like access available for a given container. Columns overload
51 | this pathway such that if there are no missing they use the fastest
52 | access for their buffer, else they have to wrap a missing check. Regardless,
53 | this gets us a solid improvement:
cljs.user> (require '[tech.v3.datatype :as dtype])
55 | nil
56 | cljs.user> (time (dotimes [idx 100]
57 | (let [a (dtype/->fast-nth (test-ds :a))
58 | b (dtype/->fast-nth (test-ds :b))]
59 | (transduce (comp (filter #(> (a %) 10000))
60 | (map #(b %)))
61 | (completing +)
62 | (range (ds/row-count test-ds))))))
63 | "Elapsed time: 77.823553 msecs"
64 | "Elapsed time: 77.823553 msecs"
65 | "Elapsed time: 77.823553 msecs"
66 | nil
67 |
68 | OK - there is another more dangerous approach. dtype has another query,
69 | as-agetable, that either returns something for which aget works or
70 | nil. If you know your dataset's columns have no missing data and their
71 | backing store data itself is agetable - then you can get an agetable. This
72 | doesn't have a fallback so you risk null ptr issues - but it is the fastest
73 | possible pathway.
cljs.user> (time (dotimes [idx 100]
75 | (let [a (dtype/as-agetable (test-ds :a))
76 | b (dtype/as-agetable (test-ds :b))]
77 | (transduce (comp (filter #(> (aget a %) 10000))
78 | (map #(aget b %)))
79 | (completing +)
80 | (range (ds/row-count test-ds))))))
81 | "Elapsed time: 57.404783 msecs"
82 | "Elapsed time: 57.404783 msecs"
83 | "Elapsed time: 57.404783 msecs"
84 | nil
85 |
86 | In this simple example we find that a transducing pathway is indeed a quite bit faster but only 87 | when it is coupled with an efficient per-element access pattern.
88 |Bindings to use the dataset handlers in cljs GET/POST calls.
8 |(add-java-time-handlers!)Add handlers for java.time.LocalDate and java.time.Instant
9 |(opt-map)Options map that must be included in the cljs-ajax request in order 11 | to activate dataset->transit pathways.
12 |(response-format & [content-type])cljs-ajax interceptor that hardwires content-type to application/transit+json
14 | and uses ds/transit-read-handler-map.
(transit-request method url options)Perform a cljs-ajax request using the select cljs-ajax.core method and merging 16 | opt-map first into the options.
17 |(writer)Transit writer used for writing transit-json datasets and uses
18 | ds/transit-write-handler-map
Index-space algorithms. Implements a subset of the jvm-version.
8 |(argfilter pred options data)(argfilter pred data)(argfilter data)Return an array of indexes that pass the filter.
9 |(arglast-every rdr pred)Return the last index where (pred (rdr idx) (rdr (dec idx))) was true by 11 | comparing every value and keeping track of the last index where pred was true.
12 |(argsort compare-fn options data)(argsort compare-fn data)(argsort data)Return an array of indexes that order the provided data by compare-fn. compare-fn must
15 | be a boolean function such as < or >. You can use a full custom comparator returning
16 | -1,0 or 1 by using the :comparator option.
compare-fn - Boolean binary predicate such as < or >.Options:
21 |:nan-strategy - defaults to :last - if the data has a numeric elemwise-datatype, a
23 | nan-aware comparsing will be used that will place nan data first, last, or throw an exception
24 | as specified by the three possible options - :first, :last, and :exception.:comparator - comparator to use. This overrides compare-fn and takes two arguments
26 | but returns a number.Examples:
29 |cljs.user> ;;Persistent vectors do not indicate datatype so nan-aware comparison is disabled.
30 | cljs.user> (argops/argsort [##NaN 1 2 3 ##NaN])
31 | #typed-buffer[[:int32 5][0 1 2 3 4]
32 | cljs.user> ;;But with a container that indicates datatype nan will be handled
33 | cljs.user> (argops/argsort (dtype/make-container :float32 [##NaN 1 2 3 ##NaN]))
34 | #typed-buffer[[:int32 5][1 2 3 4 0]
35 | cljs.user> ;;example setting nan strategy and using custom comparator.
36 | cljs.user> (argops/argsort nil ;;no compare fn
37 | {:nan-strategy :first
38 | :comparator #(compare %2 %1)}
39 | (dtype/make-container :float32 [##NaN 1 2 3 ##NaN]))
40 | #typed-buffer[[:int32 5][0 4 3 2 1]
41 |
42 | (binary-search data target options)(binary-search data target)Returns a long result that points to just before the value or exactly points to the 43 | value. In the case where the target is after the last value will return 44 | elem-count. If value is present multiple times the index will point to the first 45 | value.
46 |Options:
47 |:comparator - a specific comparator to use; defaults to comparator.(index-reducer-rf)(index-reducer-rf acc v)(index-reducer-rf acc)Return a transduce-compatible index scanning rf.
51 |Simple math primitives.
8 |(descriptive-statistics stats v)Given a sequence of desired stats return a map of statname->value.
9 |Example:
10 |cljs.user> (dfn/descriptive-statistics [:min :max :mean :n-values] (range 10))
11 | {:min 0, :max 9, :mean 4.5, :n-values 10}
12 |
13 | (equals lhs rhs & [error-bar])Numeric equals - the distance between x,y must be less than error-bar which defaults 16 | to 0.001.
17 |(percentiles percentages options v)(percentiles percentages v)Percentiles are given in whole numbers:
22 |tech.v3.datatype.functional> (percentiles [0 25 50 75 100] (range 10))
23 | [0.0 1.75 4.5 7.25 9.0]
24 |
25 | (reduce-max v)Nan-unaware max. tech.v3.datatype.statistics/max is nan-aware
27 |(reduce-min v)Nan-unaware min. tech.v3.datatype.statistics/min is nan-aware
28 |(shift rdr n)Shift by n and fill in with the first element for n>0 or last element for n<0.
30 |Examples:
31 |user> (dfn/shift (range 10) 2)
32 | [0 0 0 1 2 3 4 5 6 7]
33 | user> (dfn/shift (range 10) -2)
34 | [2 3 4 5 6 7 8 9 9 9]
35 |
36 | (standard-deviation v)Nan-aware standard-deviation. Nan's will be skipped.
37 |