├── .gitignore
├── LICENSE
├── README.md
├── deps.edn
├── pom.xml
├── release.edn
├── src
└── appliedsciencestudio
│ ├── experiment.clj
│ └── rdata.clj
└── test
├── appliedsciencestudio
└── rdata_test.clj
└── data
├── mers_korea_2015.RData
├── palettes_d.rda
├── sars_canada_2003.RData
├── totest.rda
└── zika_girardot_2015.RData
/.gitignore:
--------------------------------------------------------------------------------
1 | /target
2 | /classes
3 | /checkouts
4 | *.jar
5 | *.class
6 | /.cpcache
7 | /.lein-*
8 | /.nrepl-history
9 | /.nrepl-port
10 | .hgignore
11 | .hg/
12 |
--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
1 | Permission is hereby granted, free of charge, to any person obtaining
2 | a copy of this software and associated documentation files (the
3 | "Software"), to deal in the Software without restriction, including
4 | without limitation the rights to use, copy, modify, merge, publish,
5 | distribute, sublicense, and/or sell copies of the Software, and to
6 | permit persons to whom the Software is furnished to do so, subject to
7 | the following conditions:
8 |
9 | The above copyright notice and this permission notice shall be
10 | included in all copies or substantial portions of the Software.
11 |
12 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
13 | EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
14 | MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
15 | NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE
16 | LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION
17 | OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION
18 | WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 | # rdata
2 |
3 | A wrapper around [Renjin](https://www.renjin.org) to provide a
4 | convenient way to load the contents of file saved in
5 | [R](https://www.r-project.org/foundation/)'s
6 | [RData](https://www.loc.gov/preservation/digital/formats/fdd/fdd000470.shtml)
7 | format in Clojure.
8 |
9 | One might want to do this because they have found an interesting
10 | dataset that has been published in this format.
11 |
12 |
13 | ## Installation
14 |
15 | We have not yet released to clojars, so the recommended installation is with deps.edn:
16 |
17 | For deps.edn users:
18 |
19 | ``` clojure
20 | appliedsciencestudio/rdata {:git/url "https://github.com/appliedsciencestudio/rdata/"
21 | :sha "151e6dead06b38995f1f30b09d954a060f7a2a9c"}
22 | ```
23 |
24 | Because of a `deps.edn` issue with transitive dependencies which
25 | depend on non-standard repositories, you must have the key/value pair
26 | `"bedatadriven" {:url
27 | "https://nexus.bedatadriven.com/content/groups/public/"` in the
28 | `:mvn/repos` of your `deps.edn` file. For instance:
29 |
30 | ``` clojure
31 | :mvn/repos {"central" {:url "https://repo1.maven.org/maven2/"}
32 | "clojars" {:url "https://clojars.org/repo"}
33 | "bedatadriven" {:url "https://nexus.bedatadriven.com/content/groups/public/"}}
34 | ```
35 |
36 | Run the tests:
37 |
38 | clj -A:test:runner
39 |
40 | You can also build a deployable jar of this library:
41 |
42 | $ clojure -A:jar
43 |
44 | then install it locally:
45 |
46 | $ clojure -A:install
47 |
48 |
49 | ## Usage
50 |
51 | This library exports a single useful function, `read-rdata`, which --
52 | somewhat predictably -- reads a file saved in the RData format used by
53 | R.
54 |
55 | The file contents are returned as nested maps (RData files can contain
56 | arbitrarily nested data). The top-most level of the returned structure
57 | is a key/value mapping from name to dataset, while the leaf nodes will
58 | always be `vector`s of some primitive type (`int`, `double`, `inst`,
59 | and so on).
60 |
61 | The R attributes stored with each value are attached to the Clojure
62 | translation of that value as Clojure `metadata`.
63 |
64 | ```clojure
65 | (def mers
66 | (read-rdata "test/data/mers_korea_2015.RData" {:key-fn keyword}))
67 |
68 | (keys mers)
69 | ;;=> (:mers_korea_2015)
70 |
71 | (-> mers :mers_korea_2015 keys)
72 | ;;=> (:linelist :contacts)
73 |
74 | (-> mers :mers_korea_2015 :linelist keys)
75 | ;;=> (:id :age :age_class :sex :place_infect :reporting_ctry :loc_hosp :dt_onset :dt_report :week_report :dt_start_exp :dt_end_exp :dt_diag :outcome :dt_death)
76 |
77 | (-> mers :mers_korea_2015 :linelist :place_infect)
78 | [1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2]
79 |
80 | ;; hm, place 1 or place 2? Maybe the metadata can tell us what this means...
81 | (-> mers :mers_korea_2015 :linelist :place_infect meta)
82 | ;;=> {:class ["factor"], :levels ["Middle East" "Outside Middle East"]}
83 |
84 | ;; Ah, it's a two value factor (note that R values start from 1, so one
85 | ;; must decrement the factor's index to look it up in the vector held in
86 | ;; the meta.
87 | ```
88 |
89 |
90 | ## License
91 |
92 | Copyright © 2020 Applied Science
93 |
94 | Distributed under the MIT License.
95 |
--------------------------------------------------------------------------------
/deps.edn:
--------------------------------------------------------------------------------
1 | {:paths ["src"]
2 | :mvn/repos {"central" {:url "https://repo1.maven.org/maven2/"}
3 | "clojars" {:url "https://clojars.org/repo"}
4 | "bedatadriven" {:url "https://nexus.bedatadriven.com/content/groups/public/"}}
5 | :deps {org.clojure/clojure {:mvn/version "1.10.1"}
6 | org.apache.commons/commons-compress {:mvn/version "1.20"}
7 | org.renjin/renjin-script-engine {:mvn/version "3.5-beta43"}}
8 | :aliases
9 | {:test {:extra-paths ["test"]
10 | :extra-deps {org.clojure/test.check {:mvn/version "0.10.0"}}}
11 | :runner {:extra-deps {com.cognitect/test-runner
12 | {:git/url "https://github.com/cognitect-labs/test-runner"
13 | :sha "f7ef16dc3b8332b0d77bc0274578ad5270fbfedd"}}
14 | :main-opts ["-m" "cognitect.test-runner"
15 | "-d" "test"]}
16 | :jar {:extra-deps {seancorfield/depstar {:mvn/version "0.5.2"}}
17 | :main-opts ["-m" "hf.depstar.jar" "rdata.jar"]}
18 | :install {:extra-deps {deps-deploy {:mvn/version "0.0.9"}}
19 | :main-opts ["-m" "deps-deploy.deps-deploy" "install" "rdata.jar"]}
20 | :deploy {:extra-deps {deps-deploy {:mvn/version "0.0.9"}}
21 | :main-opts ["-m" "deps-deploy.deps-deploy" "deploy" "rdata.jar"]}
22 | :release {:extra-deps {appliedscience/deps-library {:mvn/version "0.3.4"}}
23 | :main-opts ["-m" "deps-library.release"]}}}
24 |
--------------------------------------------------------------------------------
/pom.xml:
--------------------------------------------------------------------------------
1 |
2 |
3 | 4.0.0
4 | appliedscience
5 | rdata
6 | 0.0.1-alpha
7 | rdata
8 |
9 |
10 | org.clojure
11 | clojure
12 | 1.10.1
13 |
14 |
15 | org.renjin
16 | renjin-script-engine
17 | 3.5-beta43
18 |
19 |
20 |
21 | src
22 |
23 |
24 |
25 | clojars
26 | https://clojars.org/repo
27 |
28 |
29 | bedatadriven
30 | https://nexus.bedatadriven.com/content/groups/public/
31 |
32 |
33 |
34 | https://github.com/appliedsciencestudio/rdata
35 | 11ddbf42e90e391fc46d1031e9560d6bf84189b4
36 | scm:git:git@github.com:appliedsciencestudio/rdata.git
37 | scm:git:git@github.com:appliedsciencestudio/rdata.git
38 |
39 |
40 |
--------------------------------------------------------------------------------
/release.edn:
--------------------------------------------------------------------------------
1 | {:group-id "appliedscience"
2 | :artifact-id "rdata"
3 | :scm-url "https://github.com/appliedsciencestudio/rdata"}
4 |
--------------------------------------------------------------------------------
/src/appliedsciencestudio/experiment.clj:
--------------------------------------------------------------------------------
1 | (ns appliedsciencestudio.experiment
2 | (:import (org.renjin.sexp SEXP Vector ListVector IntVector Logical Symbol Null StringArrayVector PairList)))
3 |
4 | ;; Protocol-based implementation taken from clojisr, currently used
5 | ;; experimentally to compare with our version
6 |
7 | (defprotocol Clojable
8 | (-java->clj [this]))
9 |
10 | (defn java->clj
11 | [java-obj]
12 | (some-> java-obj
13 | -java->clj))
14 |
15 | (extend-type Object
16 | Clojable
17 | (-java->clj [this] this))
18 |
19 | ;; Renjin represents a dataframe as a ListVector.
20 | ;; Its elements are are the columns,
21 | ;; and the "names" attribute holds the column names.
22 | (defn df->maps
23 | [^ListVector df]
24 | (let [column-names (map keyword (lang/->attr df :names))]
25 | (->> df
26 | (map java->clj)
27 | (apply map (fn [& row-elements]
28 | (zipmap column-names row-elements))))))
29 |
30 | (defn NULL->nil
31 | [obj]
32 | (if (= Null/INSTANCE obj)
33 | nil
34 | obj))
35 |
36 | (defn ->attr
37 | [^SEXP sexp attr-name]
38 | (-> sexp
39 | (.getAttribute (Symbol/get (name attr-name)))
40 | NULL->nil
41 | (->> (mapv #(if (string? %)
42 | (keyword %)
43 | %)))))
44 |
45 | (defn ->names
46 | [^SEXP sexp]
47 | (some->> (->attr sexp "names")
48 | (mapv keyword)))
49 |
50 | (defn ->class
51 | [^SEXP sexp]
52 | (some->> (->attr sexp "class")
53 | (mapv keyword)))
54 |
55 | (defn renjin-vector->clj
56 | [transf v]
57 | (if (some #(= % :data.frame) (->class v))
58 | (df->maps v)
59 | (let [names (->names v)
60 | dim (->attr v :dim)]
61 | (->> v
62 | (map-indexed (fn [i x]
63 | (when (not (.isElementNA ^Vector v ^int i))
64 | (transf x))))
65 | ((if (seq names)
66 | ;; A named list or vector will be translated to a map.
67 | (partial zipmap names)
68 | (if (seq dim)
69 | ;; A matrix will be translated to a vector of vectors
70 | (fn [values]
71 | (->> values
72 | (partition (second dim))
73 | (#(do (println %) %))
74 | (mapv vec)))
75 | ;; A regular list or vector will be translated to a vector.
76 | vec)))))))
77 |
78 | (extend-type Vector
79 | Clojable
80 | (-java->clj [this]
81 | (renjin-vector->clj java->clj
82 | this)))
83 |
84 | (extend-type IntVector
85 | Clojable
86 | (-java->clj [this]
87 | (if (.isNumeric this)
88 | (renjin-vector->clj java->clj
89 | this)
90 | ;; else - a factor
91 | (renjin-vector->clj (comp java->clj
92 | (->attr this :levels)
93 | dec)
94 | this))))
95 |
96 | (extend-type PairList
97 | Clojable
98 | (-java->clj [this]
99 | (renjin-vector->clj java->clj
100 | (.toVector this))))
101 |
102 | (extend-type Logical
103 | Clojable
104 | (-java->clj [this]
105 | ({Logical/TRUE true
106 | Logical/FALSE false}
107 | this)))
108 |
109 | (extend-type Symbol
110 | Clojable
111 | (-java->clj [this]
112 | (symbol (.toString this))))
113 |
114 | (extend-type Null
115 | Clojable
116 | (-java->clj [this]
117 | nil))
118 |
--------------------------------------------------------------------------------
/src/appliedsciencestudio/rdata.clj:
--------------------------------------------------------------------------------
1 | (ns appliedsciencestudio.rdata
2 | (:import (org.apache.commons.compress.compressors.bzip2 BZip2CompressorInputStream)
3 | (org.renjin.primitives.io.serialization RDataReader)))
4 |
5 | (defn r-date-to-java-date
6 | "The RData format returns dates as Doubles (!). This function massages
7 | them into java.util.Date instances."
8 | [the-double]
9 | (java.util.Date. (.longValue (* 86400000 the-double))))
10 |
11 | (declare clojurize-sexp)
12 |
13 | (defn attributes->metadata
14 | "Retrieve the attributes from an R object and return them as a Clojure map."
15 | [key-fn serializer sexp]
16 | (let [pair-list (.asPairList (.getAttributes sexp))]
17 | (if (= (class pair-list) org.renjin.sexp.Null)
18 | {}
19 | (into {} (map #(vector (key-fn %1) (clojurize-sexp key-fn serializer %2))
20 | (.getNames pair-list)
21 | (.values pair-list))))))
22 |
23 | (defn clojurize-vector
24 | "Convert an R vector into a clojure vector, preserving the attributes
25 | as clojure metadata on the vector."
26 | [key-fn serializer sexp]
27 | (let [the-meta (attributes->metadata key-fn serializer sexp)]
28 | (with-meta
29 | (mapv (if (= ["Date"] (get the-meta (key-fn "class")))
30 | r-date-to-java-date
31 | (partial clojurize-sexp key-fn serializer))
32 | sexp)
33 | the-meta)))
34 |
35 | (defn clojurize-sexp
36 | "Recursively unpack a nested set of R sexps into a clojure
37 | representation."
38 | [key-fn serializer sexp]
39 | (condp get (class sexp)
40 | #{org.renjin.sexp.PairList$Node} (apply array-map
41 | (mapcat #(vector (key-fn (if (= "" %1) (str "appliedsciencestudio.rdata/unnamed-"(serializer)) %1))
42 | (clojurize-sexp key-fn serializer %2))
43 | (.getNames sexp)
44 | (.values sexp)))
45 | #{org.renjin.sexp.ListVector} (with-meta
46 | (if (= (class (.getNames sexp)) org.renjin.sexp.Null)
47 | {}
48 | (apply array-map
49 | (mapcat #(vector (key-fn (if (= "" %) (str "appliedsciencestudio.rdata/unnamed-"(serializer)) %))
50 | (clojurize-sexp key-fn serializer (.get sexp (str %))))
51 | (.getNames sexp))))
52 | (attributes->metadata key-fn serializer sexp))
53 | #{org.renjin.sexp.IntArrayVector
54 | org.renjin.sexp.IntBufferVector
55 | org.renjin.sexp.DoubleArrayVector
56 | org.renjin.sexp.StringArrayVector
57 | org.renjin.sexp.LogicalArrayVector} (clojurize-vector key-fn serializer sexp) ;; XXX
58 | #{org.renjin.sexp.Logical} ({org.renjin.sexp.Logical/TRUE true
59 | org.renjin.sexp.Logical/FALSE false
60 | org.renjin.sexp.Logical/NA nil} sexp)
61 | #{org.renjin.primitives.io.serialization.StringByteArrayVector
62 | org.renjin.primitives.sequence.IntSequence} (mapv identity sexp) ; XXX
63 | #{java.lang.Double ; primitive type leaf nodes
64 | java.lang.String
65 | java.lang.Integer} sexp
66 | (class sexp))) ; emit classname if an unmapped class shows up
67 | ;; TODO org.renjin.primitives.vector.RowNamesVector
68 |
69 | (defn open-with-wrapper
70 | "RData files can be compressed with GZip or bz. This function takes
71 | `filename` and returns an `InputStream` wrapped with the appropriate
72 | stream decompressor (which might be none at all)."
73 | [filename]
74 | (let [istream (doto (clojure.java.io/input-stream filename)
75 | (.mark 4)) ; mark so we can reset the stream after reading the header
76 | bzh-header (mapv int [\B \Z \h])
77 | gzip-header [31 139]
78 | header (into [] (repeatedly 3 #(.read istream)))]
79 | (.reset istream) ; "unread" the three byte header
80 | (cond (= bzh-header header) (BZip2CompressorInputStream. istream)
81 | (and (= (header 0) (gzip-header 0))
82 | (= (header 1) (gzip-header 1))) (java.util.zip.GZIPInputStream. istream)
83 | :else istream)))
84 |
85 | (defn read-rdata-raw
86 | "Read `filename` into Renjin's internal representation. Mostly useful for debugging."
87 | [filename]
88 | (with-open [is (open-with-wrapper filename)]
89 | (.readFile (org.renjin.primitives.io.serialization.RDataReader. is))))
90 |
91 | (defn make-serializer
92 | "Produces a thread-local counter that increments every time it is
93 | called. This is used in this code to generate serial names."
94 | []
95 | (let [a (atom 0)]
96 | (fn [] (swap! a inc))))
97 |
98 | (defn read-rdata
99 | "Read an RData formatted file into nested clojure data structures. NB
100 | I've used Clojure's metadata feature to store the attributes from
101 | the original file. There is an optional second argument, which is a
102 | map of options. The only options supported at the moment is
103 | `key-fn`, which allows one to pass a function to be applied to all
104 | strings being treated as keys during conversion."
105 | ([filename] (read-rdata filename {}))
106 | ([filename {:keys [key-fn]
107 | :or {key-fn identity}}]
108 | (->> (read-rdata-raw filename)
109 | (clojurize-sexp key-fn (make-serializer)))))
110 |
111 | ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
112 | ;; experimental CSV support -- here there be dragons!
113 |
114 | (def read-csv-fn
115 | (org.renjin.sexp.FunctionCall/newCall
116 | (org.renjin.sexp.Symbol/get "::")
117 | (into-array org.renjin.sexp.SEXP [(org.renjin.sexp.Symbol/get "utils") (org.renjin.sexp.Symbol/get "read.csv")])))
118 |
119 | ;; TODO add these parameters?
120 | (comment "
121 | numerals=c(allow.loss, warn.loss, no.loss),
122 | as.is=!(stringsAsFactors),
123 | colClasses=NA,
124 | nrows=-(1.0),
125 | check.names=TRUE,
126 | fill=!(blank.lines.skip),
127 | flush=FALSE,
128 | stringsAsFactors=default.stringsAsFactors(),
129 | fileEncoding=,
130 | encoding=unknown,
131 | text=")
132 |
133 | ;; XXX it has a hard time with thousands separators, like "1,000", but
134 | ;; works well with an alternate decimal specifier.
135 | (defn read-csv
136 | "This is a wrapper around R's CSV reader as an experiment. Do not use it."
137 | ([filename] (read-csv filename {}))
138 | ([filename {:keys [header? sep quote dec
139 | strip-white? skip-blank-lines?
140 | skip-nil? allow-escapes?
141 | nil-string
142 | ;; comment-char (defaults to #)
143 | ;; col-names row-names
144 | ;; col-names-fn
145 | ;; file-encoding
146 | ;; skip (default 0.0)
147 | ]}]
148 | (let [args (org.renjin.sexp.PairList$Builder.)]
149 | (.add args (org.renjin.sexp.StringVector/valueOf (.getAbsolutePath (java.io.File. filename))))
150 | ;; factor conversion might not make sense?
151 | (.add args "stringsAsFactors", org.renjin.sexp.LogicalVector/FALSE)
152 | (when header? (.add args "header", org.renjin.sexp.LogicalVector/TRUE))
153 | (when sep (.add args "sep", (org.renjin.sexp.StringVector/valueOf sep)))
154 | (when dec (.add args "dec", (org.renjin.sexp.StringVector/valueOf dec)))
155 | (when skip-nil? (.add args "skipNul", org.renjin.sexp.LogicalVector/TRUE))
156 | (when skip-blank-lines? (.add args "blank.lines.skip", org.renjin.sexp.LogicalVector/TRUE))
157 | (when strip-white? (.add args "strip.white", org.renjin.sexp.LogicalVector/TRUE))
158 | (when allow-escapes? (.add args "allowEscapes", org.renjin.sexp.LogicalVector/TRUE))
159 | (when nil-string
160 | (.add args "na.strings" (org.renjin.sexp.StringVector/valueOf nil-string)))
161 | (clojurize-sexp
162 | (.evaluate (org.renjin.eval.Context/newTopLevelContext)
163 | (org.renjin.sexp.FunctionCall. read-csv-fn (.build args)))))))
164 |
165 | ;;(read-csv "resources/COVID-19/csse_covid_19_data/csse_covid_19_time_series/time_series_19-covid-Confirmed.csv")
166 | ;;(read-csv "resources/deutschland.covid19cases.tsv" {:sep "\t" :dec ","})
167 | ;;(->> vals (map meta))
168 |
169 | ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
170 | ;; input stream wrapper to gracefully handled ZIP and BZ files
171 |
172 | ;; TODO add support for this compression method?
173 | ;; } else if(b1 == 0xFD && b2 == '7') {
174 | ;; // See http://tukaani.org/xz/xz-javadoc/org/tukaani/xz/XZInputStream.html
175 | ;; // Set a memory limit of 64mb, if this is not sufficient, it will throw
176 | ;; // an exception rather than an OutOfMemoryError, which will terminate the JVM
177 | ;; return new XZInputStream(new FileInputStream(file), 64 * 1024 * 1024);
178 |
--------------------------------------------------------------------------------
/test/appliedsciencestudio/rdata_test.clj:
--------------------------------------------------------------------------------
1 | (ns appliedsciencestudio.rdata-test
2 | (:require [clojure.test :refer :all]
3 | [appliedsciencestudio.rdata :refer [read-rdata make-serializer clojurize-sexp]]))
4 |
5 | (def eval-r
6 | "An instance of the Renjin script engine, which we will use to generate test data."
7 | (let [engine (.getScriptEngine (org.renjin.script.RenjinScriptEngineFactory.))]
8 | (fn [script]
9 | (.eval engine script))))
10 |
11 | (defn r->clj
12 | "A helper function to convert R data to clj w/ keyword keys."
13 | [key-fn sexp]
14 | (clojurize-sexp key-fn (make-serializer) sexp))
15 |
16 | (deftest simple-tests
17 | ;; originally taking ideas from https://scicloj.github.io/clojisr/resources/public/clojisr/v1/tutorial-test/index.html#more-data-conversion-examples
18 | (testing "Generate some data using R, then convert it to clojure structures."
19 | (testing "named list"
20 | (is (= (r->clj identity (eval-r "list(a=1,b=c(10,20),c='hi!')"))
21 | {"a" [1.0],
22 | "b" [10.0 20.0],
23 | "c" ["hi!"]})))
24 | (testing "booleans"
25 | (is (= (r->clj identity (eval-r "TRUE"))
26 | [true]))
27 | (is (= (r->clj identity (eval-r "FALSE"))
28 | [false]))
29 | (is (= (r->clj identity (eval-r "NA"))
30 | ;; XXX
31 | [nil])))
32 | #_ (testing "null/nil"
33 | (is (= (r->clj keyword (eval-r "NULL"))
34 | nil)))
35 | (is (= (r->clj identity (eval-r "c(10,20,30)"))
36 | [10.0 20.0 30.0]))
37 | (is (= (r->clj identity (eval-r "list(A=1,B=2,'#123strange ()'=3)"))
38 | {"A" [1.0], "B" [2.0], "#123strange ()" [3.0]}))
39 | (is (= (r->clj keyword (eval-r "list(a=1:10,b='hi!')"))
40 | {:a [1 2 3 4 5 6 7 8 9 10], :b ["hi!"]}))
41 | (is (= (r->clj keyword (eval-r "list(a=1,b=c(10,20),c='hi!')"))
42 | {:a [1.0], :b [10.0 20.0], :c ["hi!"]}))))
43 |
44 | ;; java->clj might be trying too hard to keywordize?
45 | ;; (appliedsciencestudio.experiment/java->clj
46 | ;; (eval-r "list(A=1,B=2,'#123strange ()'=3)"))
47 | ;;=> {:A [1.0], :B [2.0], :#123strange () [3.0]}
48 |
49 | ;;(r->clj identity (eval-r "table(c('a','b','a','b','a','b','a','b'), c(1,1,2,2,3,3,1,1))"))
50 | ;;
51 | ;; In R this is:
52 | ;; 1 2 3
53 | ;; a 2 1 1
54 | ;; b 2 1 1
55 | ;; ... but rdata currently returns:
56 | ;; => [2 2 1 1 1 1]
57 | ;;... with this meta:
58 | ;; {:class ["table"], :dim [2 3], :dimnames #:appliedsciencestudio.rdata{:unnamed-1 ["a" "b"], :unnamed-2 ["a" "b"]}}
59 |
60 | ;; clojisr gives this, which I'm not sure is what I'd want:
61 | ;; {["1" "a"] 2,
62 | ;; ["1" "b"] 2,
63 | ;; ["2" "a"] 1,
64 | ;; ["2" "b"] 1,
65 | ;; ["3" "a"] 1,
66 | ;; ["3" "b"] 1}
67 |
68 | ;; bringing the clojisr code over, we get this:
69 | ;; (appliedsciencestudio.experiment/java->clj
70 | ;; (eval-r "table(c('a','b','a','b','a','b','a','b'), c(1,1,2,2,3,3,1,1))"))
71 | ;;=>[[2 2 1] [1 1 1]]
72 |
73 | ;; these first datasets were taken from https://github.com/reconhub/outbreaks
74 | (deftest sars-test
75 | (testing "Load some demo data from the SARS 2003 dataset, access it using the string keys provided by R."
76 | (let [data (read-rdata "test/data/sars_canada_2003.RData" )
77 | sars (get data "sars_canada_2003")
78 | dates (get sars "date")
79 | cases (get sars "cases_travel")]
80 | (is (not (nil? data)))
81 | (is (= '("date" "cases_travel" "cases_household" "cases_healthcare" "cases_other")
82 | (keys sars)))
83 | (is (= 110 (count dates)))
84 | (is (= (first cases) 1))
85 | (is (= (last cases) 0)))))
86 |
87 | (deftest zika-test
88 | (testing "Load some data from the Zika 2015 dataset, converting keys to keywords"
89 | (let [data (read-rdata "test/data/zika_girardot_2015.RData" {:key-fn keyword})
90 | zika (-> data :zika_girardot_2015)
91 | dates (-> zika :date)
92 | cases (-> zika :cases)]
93 | (is (not (nil? data)))
94 | (is (= '(:date :cases) (keys zika)))
95 | (is (= 93 (count dates)))
96 | (is (= (first dates) #inst "2015-10-19T00:00:00.000-00:00"))
97 | (is (= (last dates) #inst "2016-01-22T00:00:00.000-00:00"))
98 | (is (= (first cases) 1))
99 | (is (= (last cases) 1)))))
100 |
101 | (deftest mers-test
102 | (testing "Load some data from the multilayered MERS Korea 2015 dataset, converting keys to keywords"
103 | (let [data (read-rdata "test/data/mers_korea_2015.RData" {:key-fn keyword})
104 | linelist (-> data :mers_korea_2015 :linelist)]
105 | (is (= '(:from :to :exposure :diff_dt_onset)
106 | (keys (-> data :mers_korea_2015 :contacts))))
107 | (is (= '(:id :age :age_class :sex :place_infect
108 | :reporting_ctry :loc_hosp :dt_onset :dt_report
109 | :week_report :dt_start_exp :dt_end_exp :dt_diag
110 | :outcome :dt_death)
111 | (keys linelist)))
112 | (is (= (first (:outcome linelist)) 1))
113 | (is (= (last (:outcome linelist)) 1))
114 | (is (= (first (:id linelist)) "SK_1"))
115 | (is (= (last (:id linelist)) "SK_162"))
116 | (is (= (first (:age linelist)) 68))
117 | (is (= (last (:age linelist)) 33)))))
118 |
119 | ;; https://github.com/EmilHvitfeldt/paletteer/blob/master/data/palettes_d.rda
120 | (deftest palettes-test
121 | (testing "Load some colour palettes from an uncompressed RData file, converting keys to keywords"
122 | (let [palettes (-> (read-rdata "test/data/palettes_d.rda" {:key-fn keyword}) :palettes_d)]
123 | (is (= (keys palettes)
124 | '(:awtools :basetheme :calecopal :colorblindr :colRoz :dichromat :dutchmasters :DresdenColor
125 | :fishualize :futurevisions :ggsci :ggpomological :ggthemes :ggthemr :ghibli :grDevices
126 | :IslamicArt :jcolors :LaCroixColoR :lisa :nationalparkcolors :NineteenEightyR :nord :ochRe
127 | :palettetown :pals :Polychrome :MapPalettes :miscpalettes :palettesForR :PNWColors
128 | :rcartocolor :RColorBrewer :Redmonder :RSkittleBrewer :tidyquant :trekcolors :tvthemes
129 | :unikn :vapeplot :vapoRwave :werpals :wesanderson :yarrr)))
130 | (is (= (-> palettes :wesanderson)
131 | {:BottleRocket1 ["#A42820" "#5F5647" "#9B110E" "#3F5151" "#4E2A1E" "#550307" "#0C1707"],
132 | :BottleRocket2 ["#FAD510" "#CB2314" "#273046" "#354823" "#1E1E1E"],
133 | :Rushmore1 ["#E1BD6D" "#EABE94" "#0B775E" "#35274A" "#F2300F"],
134 | :Rushmore ["#E1BD6D" "#EABE94" "#0B775E" "#35274A" "#F2300F"],
135 | :Royal1 ["#899DA4" "#C93312" "#FAEFD1" "#DC863B"],
136 | :Royal2 ["#9A8822" "#F5CDB4" "#F8AFA8" "#FDDDA0" "#74A089"],
137 | :Zissou1 ["#3B9AB2" "#78B7C5" "#EBCC2A" "#E1AF00" "#F21A00"],
138 | :Darjeeling1 ["#FF0000" "#00A08A" "#F2AD00" "#F98400" "#5BBCD6"],
139 | :Darjeeling2 ["#ECCBAE" "#046C9A" "#D69C4E" "#ABDDDE" "#000000"],
140 | :Chevalier1 ["#446455" "#FDD262" "#D3DDDC" "#C7B19C"],
141 | :FantasticFox1 ["#DD8D29" "#E2D200" "#46ACC8" "#E58601" "#B40F20"],
142 | :Moonrise1 ["#F3DF6C" "#CEAB07" "#D5D5D3" "#24281A"],
143 | :Moonrise2 ["#798E87" "#C27D38" "#CCC591" "#29211F"],
144 | :Moonrise3 ["#85D4E3" "#F4B5BD" "#9C964A" "#CDC08C" "#FAD77B"],
145 | :Cavalcanti1 ["#D8B70A" "#02401B" "#A2A475" "#81A88D" "#972D15"],
146 | :GrandBudapest1 ["#F1BB7B" "#FD6467" "#5B1A18" "#D67236"],
147 | :GrandBudapest2 ["#E6A0C4" "#C6CDF7" "#D8A499" "#7294D4"],
148 | :IsleofDogs1 ["#9986A5" "#79402E" "#CCBA72" "#0F0D0E" "#D9D0D3" "#8D8680"],
149 | :IsleofDogs2 ["#EAD3BF" "#AA9486" "#B6854D" "#39312F" "#1C1718"]})))))
150 |
151 | ;; courtesy of @generateme :)
152 | (deftest unnamed-test
153 | (testing "Load some data containing unnamed pairs, converting keys to keywords"
154 | (is (= (read-rdata "test/data/totest.rda" {:key-fn keyword})
155 | {:partiallyNamedList
156 | {:appliedsciencestudio.rdata/unnamed-1 ["noname"],
157 | :n ["withname"],
158 | :appliedsciencestudio.rdata/unnamed-2 ["noname"],
159 | :appliedsciencestudio.rdata/unnamed-3 ["noname"],
160 | :a [1.0],
161 | :b [2.0],
162 | :c [3.0]},
163 | :matrixRowAndColumnNames [1 2 3 4 5 6]}))))
164 |
--------------------------------------------------------------------------------
/test/data/mers_korea_2015.RData:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/applied-science/rdata/888d388954366b93a3a53d773a0b05249bdecb58/test/data/mers_korea_2015.RData
--------------------------------------------------------------------------------
/test/data/palettes_d.rda:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/applied-science/rdata/888d388954366b93a3a53d773a0b05249bdecb58/test/data/palettes_d.rda
--------------------------------------------------------------------------------
/test/data/sars_canada_2003.RData:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/applied-science/rdata/888d388954366b93a3a53d773a0b05249bdecb58/test/data/sars_canada_2003.RData
--------------------------------------------------------------------------------
/test/data/totest.rda:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/applied-science/rdata/888d388954366b93a3a53d773a0b05249bdecb58/test/data/totest.rda
--------------------------------------------------------------------------------
/test/data/zika_girardot_2015.RData:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/applied-science/rdata/888d388954366b93a3a53d773a0b05249bdecb58/test/data/zika_girardot_2015.RData
--------------------------------------------------------------------------------