├── .gitignore ├── LICENSE ├── README.md ├── deps.edn ├── pom.xml ├── release.edn ├── src └── appliedsciencestudio │ ├── experiment.clj │ └── rdata.clj └── test ├── appliedsciencestudio └── rdata_test.clj └── data ├── mers_korea_2015.RData ├── palettes_d.rda ├── sars_canada_2003.RData ├── totest.rda └── zika_girardot_2015.RData /.gitignore: -------------------------------------------------------------------------------- 1 | /target 2 | /classes 3 | /checkouts 4 | *.jar 5 | *.class 6 | /.cpcache 7 | /.lein-* 8 | /.nrepl-history 9 | /.nrepl-port 10 | .hgignore 11 | .hg/ 12 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | Permission is hereby granted, free of charge, to any person obtaining 2 | a copy of this software and associated documentation files (the 3 | "Software"), to deal in the Software without restriction, including 4 | without limitation the rights to use, copy, modify, merge, publish, 5 | distribute, sublicense, and/or sell copies of the Software, and to 6 | permit persons to whom the Software is furnished to do so, subject to 7 | the following conditions: 8 | 9 | The above copyright notice and this permission notice shall be 10 | included in all copies or substantial portions of the Software. 11 | 12 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, 13 | EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF 14 | MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND 15 | NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE 16 | LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION 17 | OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION 18 | WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # rdata 2 | 3 | A wrapper around [Renjin](https://www.renjin.org) to provide a 4 | convenient way to load the contents of file saved in 5 | [R](https://www.r-project.org/foundation/)'s 6 | [RData](https://www.loc.gov/preservation/digital/formats/fdd/fdd000470.shtml) 7 | format in Clojure. 8 | 9 | One might want to do this because they have found an interesting 10 | dataset that has been published in this format. 11 | 12 | 13 | ## Installation 14 | 15 | We have not yet released to clojars, so the recommended installation is with deps.edn: 16 | 17 | For deps.edn users: 18 | 19 | ``` clojure 20 | appliedsciencestudio/rdata {:git/url "https://github.com/appliedsciencestudio/rdata/" 21 | :sha "151e6dead06b38995f1f30b09d954a060f7a2a9c"} 22 | ``` 23 | 24 | Because of a `deps.edn` issue with transitive dependencies which 25 | depend on non-standard repositories, you must have the key/value pair 26 | `"bedatadriven" {:url 27 | "https://nexus.bedatadriven.com/content/groups/public/"` in the 28 | `:mvn/repos` of your `deps.edn` file. For instance: 29 | 30 | ``` clojure 31 | :mvn/repos {"central" {:url "https://repo1.maven.org/maven2/"} 32 | "clojars" {:url "https://clojars.org/repo"} 33 | "bedatadriven" {:url "https://nexus.bedatadriven.com/content/groups/public/"}} 34 | ``` 35 | 36 | Run the tests: 37 | 38 | clj -A:test:runner 39 | 40 | You can also build a deployable jar of this library: 41 | 42 | $ clojure -A:jar 43 | 44 | then install it locally: 45 | 46 | $ clojure -A:install 47 | 48 | 49 | ## Usage 50 | 51 | This library exports a single useful function, `read-rdata`, which -- 52 | somewhat predictably -- reads a file saved in the RData format used by 53 | R. 54 | 55 | The file contents are returned as nested maps (RData files can contain 56 | arbitrarily nested data). The top-most level of the returned structure 57 | is a key/value mapping from name to dataset, while the leaf nodes will 58 | always be `vector`s of some primitive type (`int`, `double`, `inst`, 59 | and so on). 60 | 61 | The R attributes stored with each value are attached to the Clojure 62 | translation of that value as Clojure `metadata`. 63 | 64 | ```clojure 65 | (def mers 66 | (read-rdata "test/data/mers_korea_2015.RData" {:key-fn keyword})) 67 | 68 | (keys mers) 69 | ;;=> (:mers_korea_2015) 70 | 71 | (-> mers :mers_korea_2015 keys) 72 | ;;=> (:linelist :contacts) 73 | 74 | (-> mers :mers_korea_2015 :linelist keys) 75 | ;;=> (:id :age :age_class :sex :place_infect :reporting_ctry :loc_hosp :dt_onset :dt_report :week_report :dt_start_exp :dt_end_exp :dt_diag :outcome :dt_death) 76 | 77 | (-> mers :mers_korea_2015 :linelist :place_infect) 78 | [1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2] 79 | 80 | ;; hm, place 1 or place 2? Maybe the metadata can tell us what this means... 81 | (-> mers :mers_korea_2015 :linelist :place_infect meta) 82 | ;;=> {:class ["factor"], :levels ["Middle East" "Outside Middle East"]} 83 | 84 | ;; Ah, it's a two value factor (note that R values start from 1, so one 85 | ;; must decrement the factor's index to look it up in the vector held in 86 | ;; the meta. 87 | ``` 88 | 89 | 90 | ## License 91 | 92 | Copyright © 2020 Applied Science 93 | 94 | Distributed under the MIT License. 95 | -------------------------------------------------------------------------------- /deps.edn: -------------------------------------------------------------------------------- 1 | {:paths ["src"] 2 | :mvn/repos {"central" {:url "https://repo1.maven.org/maven2/"} 3 | "clojars" {:url "https://clojars.org/repo"} 4 | "bedatadriven" {:url "https://nexus.bedatadriven.com/content/groups/public/"}} 5 | :deps {org.clojure/clojure {:mvn/version "1.10.1"} 6 | org.apache.commons/commons-compress {:mvn/version "1.20"} 7 | org.renjin/renjin-script-engine {:mvn/version "3.5-beta43"}} 8 | :aliases 9 | {:test {:extra-paths ["test"] 10 | :extra-deps {org.clojure/test.check {:mvn/version "0.10.0"}}} 11 | :runner {:extra-deps {com.cognitect/test-runner 12 | {:git/url "https://github.com/cognitect-labs/test-runner" 13 | :sha "f7ef16dc3b8332b0d77bc0274578ad5270fbfedd"}} 14 | :main-opts ["-m" "cognitect.test-runner" 15 | "-d" "test"]} 16 | :jar {:extra-deps {seancorfield/depstar {:mvn/version "0.5.2"}} 17 | :main-opts ["-m" "hf.depstar.jar" "rdata.jar"]} 18 | :install {:extra-deps {deps-deploy {:mvn/version "0.0.9"}} 19 | :main-opts ["-m" "deps-deploy.deps-deploy" "install" "rdata.jar"]} 20 | :deploy {:extra-deps {deps-deploy {:mvn/version "0.0.9"}} 21 | :main-opts ["-m" "deps-deploy.deps-deploy" "deploy" "rdata.jar"]} 22 | :release {:extra-deps {appliedscience/deps-library {:mvn/version "0.3.4"}} 23 | :main-opts ["-m" "deps-library.release"]}}} 24 | -------------------------------------------------------------------------------- /pom.xml: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4.0.0 4 | appliedscience 5 | rdata 6 | 0.0.1-alpha 7 | rdata 8 | 9 | 10 | org.clojure 11 | clojure 12 | 1.10.1 13 | 14 | 15 | org.renjin 16 | renjin-script-engine 17 | 3.5-beta43 18 | 19 | 20 | 21 | src 22 | 23 | 24 | 25 | clojars 26 | https://clojars.org/repo 27 | 28 | 29 | bedatadriven 30 | https://nexus.bedatadriven.com/content/groups/public/ 31 | 32 | 33 | 34 | https://github.com/appliedsciencestudio/rdata 35 | 11ddbf42e90e391fc46d1031e9560d6bf84189b4 36 | scm:git:git@github.com:appliedsciencestudio/rdata.git 37 | scm:git:git@github.com:appliedsciencestudio/rdata.git 38 | 39 | 40 | -------------------------------------------------------------------------------- /release.edn: -------------------------------------------------------------------------------- 1 | {:group-id "appliedscience" 2 | :artifact-id "rdata" 3 | :scm-url "https://github.com/appliedsciencestudio/rdata"} 4 | -------------------------------------------------------------------------------- /src/appliedsciencestudio/experiment.clj: -------------------------------------------------------------------------------- 1 | (ns appliedsciencestudio.experiment 2 | (:import (org.renjin.sexp SEXP Vector ListVector IntVector Logical Symbol Null StringArrayVector PairList))) 3 | 4 | ;; Protocol-based implementation taken from clojisr, currently used 5 | ;; experimentally to compare with our version 6 | 7 | (defprotocol Clojable 8 | (-java->clj [this])) 9 | 10 | (defn java->clj 11 | [java-obj] 12 | (some-> java-obj 13 | -java->clj)) 14 | 15 | (extend-type Object 16 | Clojable 17 | (-java->clj [this] this)) 18 | 19 | ;; Renjin represents a dataframe as a ListVector. 20 | ;; Its elements are are the columns, 21 | ;; and the "names" attribute holds the column names. 22 | (defn df->maps 23 | [^ListVector df] 24 | (let [column-names (map keyword (lang/->attr df :names))] 25 | (->> df 26 | (map java->clj) 27 | (apply map (fn [& row-elements] 28 | (zipmap column-names row-elements)))))) 29 | 30 | (defn NULL->nil 31 | [obj] 32 | (if (= Null/INSTANCE obj) 33 | nil 34 | obj)) 35 | 36 | (defn ->attr 37 | [^SEXP sexp attr-name] 38 | (-> sexp 39 | (.getAttribute (Symbol/get (name attr-name))) 40 | NULL->nil 41 | (->> (mapv #(if (string? %) 42 | (keyword %) 43 | %))))) 44 | 45 | (defn ->names 46 | [^SEXP sexp] 47 | (some->> (->attr sexp "names") 48 | (mapv keyword))) 49 | 50 | (defn ->class 51 | [^SEXP sexp] 52 | (some->> (->attr sexp "class") 53 | (mapv keyword))) 54 | 55 | (defn renjin-vector->clj 56 | [transf v] 57 | (if (some #(= % :data.frame) (->class v)) 58 | (df->maps v) 59 | (let [names (->names v) 60 | dim (->attr v :dim)] 61 | (->> v 62 | (map-indexed (fn [i x] 63 | (when (not (.isElementNA ^Vector v ^int i)) 64 | (transf x)))) 65 | ((if (seq names) 66 | ;; A named list or vector will be translated to a map. 67 | (partial zipmap names) 68 | (if (seq dim) 69 | ;; A matrix will be translated to a vector of vectors 70 | (fn [values] 71 | (->> values 72 | (partition (second dim)) 73 | (#(do (println %) %)) 74 | (mapv vec))) 75 | ;; A regular list or vector will be translated to a vector. 76 | vec))))))) 77 | 78 | (extend-type Vector 79 | Clojable 80 | (-java->clj [this] 81 | (renjin-vector->clj java->clj 82 | this))) 83 | 84 | (extend-type IntVector 85 | Clojable 86 | (-java->clj [this] 87 | (if (.isNumeric this) 88 | (renjin-vector->clj java->clj 89 | this) 90 | ;; else - a factor 91 | (renjin-vector->clj (comp java->clj 92 | (->attr this :levels) 93 | dec) 94 | this)))) 95 | 96 | (extend-type PairList 97 | Clojable 98 | (-java->clj [this] 99 | (renjin-vector->clj java->clj 100 | (.toVector this)))) 101 | 102 | (extend-type Logical 103 | Clojable 104 | (-java->clj [this] 105 | ({Logical/TRUE true 106 | Logical/FALSE false} 107 | this))) 108 | 109 | (extend-type Symbol 110 | Clojable 111 | (-java->clj [this] 112 | (symbol (.toString this)))) 113 | 114 | (extend-type Null 115 | Clojable 116 | (-java->clj [this] 117 | nil)) 118 | -------------------------------------------------------------------------------- /src/appliedsciencestudio/rdata.clj: -------------------------------------------------------------------------------- 1 | (ns appliedsciencestudio.rdata 2 | (:import (org.apache.commons.compress.compressors.bzip2 BZip2CompressorInputStream) 3 | (org.renjin.primitives.io.serialization RDataReader))) 4 | 5 | (defn r-date-to-java-date 6 | "The RData format returns dates as Doubles (!). This function massages 7 | them into java.util.Date instances." 8 | [the-double] 9 | (java.util.Date. (.longValue (* 86400000 the-double)))) 10 | 11 | (declare clojurize-sexp) 12 | 13 | (defn attributes->metadata 14 | "Retrieve the attributes from an R object and return them as a Clojure map." 15 | [key-fn serializer sexp] 16 | (let [pair-list (.asPairList (.getAttributes sexp))] 17 | (if (= (class pair-list) org.renjin.sexp.Null) 18 | {} 19 | (into {} (map #(vector (key-fn %1) (clojurize-sexp key-fn serializer %2)) 20 | (.getNames pair-list) 21 | (.values pair-list)))))) 22 | 23 | (defn clojurize-vector 24 | "Convert an R vector into a clojure vector, preserving the attributes 25 | as clojure metadata on the vector." 26 | [key-fn serializer sexp] 27 | (let [the-meta (attributes->metadata key-fn serializer sexp)] 28 | (with-meta 29 | (mapv (if (= ["Date"] (get the-meta (key-fn "class"))) 30 | r-date-to-java-date 31 | (partial clojurize-sexp key-fn serializer)) 32 | sexp) 33 | the-meta))) 34 | 35 | (defn clojurize-sexp 36 | "Recursively unpack a nested set of R sexps into a clojure 37 | representation." 38 | [key-fn serializer sexp] 39 | (condp get (class sexp) 40 | #{org.renjin.sexp.PairList$Node} (apply array-map 41 | (mapcat #(vector (key-fn (if (= "" %1) (str "appliedsciencestudio.rdata/unnamed-"(serializer)) %1)) 42 | (clojurize-sexp key-fn serializer %2)) 43 | (.getNames sexp) 44 | (.values sexp))) 45 | #{org.renjin.sexp.ListVector} (with-meta 46 | (if (= (class (.getNames sexp)) org.renjin.sexp.Null) 47 | {} 48 | (apply array-map 49 | (mapcat #(vector (key-fn (if (= "" %) (str "appliedsciencestudio.rdata/unnamed-"(serializer)) %)) 50 | (clojurize-sexp key-fn serializer (.get sexp (str %)))) 51 | (.getNames sexp)))) 52 | (attributes->metadata key-fn serializer sexp)) 53 | #{org.renjin.sexp.IntArrayVector 54 | org.renjin.sexp.IntBufferVector 55 | org.renjin.sexp.DoubleArrayVector 56 | org.renjin.sexp.StringArrayVector 57 | org.renjin.sexp.LogicalArrayVector} (clojurize-vector key-fn serializer sexp) ;; XXX 58 | #{org.renjin.sexp.Logical} ({org.renjin.sexp.Logical/TRUE true 59 | org.renjin.sexp.Logical/FALSE false 60 | org.renjin.sexp.Logical/NA nil} sexp) 61 | #{org.renjin.primitives.io.serialization.StringByteArrayVector 62 | org.renjin.primitives.sequence.IntSequence} (mapv identity sexp) ; XXX 63 | #{java.lang.Double ; primitive type leaf nodes 64 | java.lang.String 65 | java.lang.Integer} sexp 66 | (class sexp))) ; emit classname if an unmapped class shows up 67 | ;; TODO org.renjin.primitives.vector.RowNamesVector 68 | 69 | (defn open-with-wrapper 70 | "RData files can be compressed with GZip or bz. This function takes 71 | `filename` and returns an `InputStream` wrapped with the appropriate 72 | stream decompressor (which might be none at all)." 73 | [filename] 74 | (let [istream (doto (clojure.java.io/input-stream filename) 75 | (.mark 4)) ; mark so we can reset the stream after reading the header 76 | bzh-header (mapv int [\B \Z \h]) 77 | gzip-header [31 139] 78 | header (into [] (repeatedly 3 #(.read istream)))] 79 | (.reset istream) ; "unread" the three byte header 80 | (cond (= bzh-header header) (BZip2CompressorInputStream. istream) 81 | (and (= (header 0) (gzip-header 0)) 82 | (= (header 1) (gzip-header 1))) (java.util.zip.GZIPInputStream. istream) 83 | :else istream))) 84 | 85 | (defn read-rdata-raw 86 | "Read `filename` into Renjin's internal representation. Mostly useful for debugging." 87 | [filename] 88 | (with-open [is (open-with-wrapper filename)] 89 | (.readFile (org.renjin.primitives.io.serialization.RDataReader. is)))) 90 | 91 | (defn make-serializer 92 | "Produces a thread-local counter that increments every time it is 93 | called. This is used in this code to generate serial names." 94 | [] 95 | (let [a (atom 0)] 96 | (fn [] (swap! a inc)))) 97 | 98 | (defn read-rdata 99 | "Read an RData formatted file into nested clojure data structures. NB 100 | I've used Clojure's metadata feature to store the attributes from 101 | the original file. There is an optional second argument, which is a 102 | map of options. The only options supported at the moment is 103 | `key-fn`, which allows one to pass a function to be applied to all 104 | strings being treated as keys during conversion." 105 | ([filename] (read-rdata filename {})) 106 | ([filename {:keys [key-fn] 107 | :or {key-fn identity}}] 108 | (->> (read-rdata-raw filename) 109 | (clojurize-sexp key-fn (make-serializer))))) 110 | 111 | ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; 112 | ;; experimental CSV support -- here there be dragons! 113 | 114 | (def read-csv-fn 115 | (org.renjin.sexp.FunctionCall/newCall 116 | (org.renjin.sexp.Symbol/get "::") 117 | (into-array org.renjin.sexp.SEXP [(org.renjin.sexp.Symbol/get "utils") (org.renjin.sexp.Symbol/get "read.csv")]))) 118 | 119 | ;; TODO add these parameters? 120 | (comment " 121 | numerals=c(allow.loss, warn.loss, no.loss), 122 | as.is=!(stringsAsFactors), 123 | colClasses=NA, 124 | nrows=-(1.0), 125 | check.names=TRUE, 126 | fill=!(blank.lines.skip), 127 | flush=FALSE, 128 | stringsAsFactors=default.stringsAsFactors(), 129 | fileEncoding=, 130 | encoding=unknown, 131 | text=") 132 | 133 | ;; XXX it has a hard time with thousands separators, like "1,000", but 134 | ;; works well with an alternate decimal specifier. 135 | (defn read-csv 136 | "This is a wrapper around R's CSV reader as an experiment. Do not use it." 137 | ([filename] (read-csv filename {})) 138 | ([filename {:keys [header? sep quote dec 139 | strip-white? skip-blank-lines? 140 | skip-nil? allow-escapes? 141 | nil-string 142 | ;; comment-char (defaults to #) 143 | ;; col-names row-names 144 | ;; col-names-fn 145 | ;; file-encoding 146 | ;; skip (default 0.0) 147 | ]}] 148 | (let [args (org.renjin.sexp.PairList$Builder.)] 149 | (.add args (org.renjin.sexp.StringVector/valueOf (.getAbsolutePath (java.io.File. filename)))) 150 | ;; factor conversion might not make sense? 151 | (.add args "stringsAsFactors", org.renjin.sexp.LogicalVector/FALSE) 152 | (when header? (.add args "header", org.renjin.sexp.LogicalVector/TRUE)) 153 | (when sep (.add args "sep", (org.renjin.sexp.StringVector/valueOf sep))) 154 | (when dec (.add args "dec", (org.renjin.sexp.StringVector/valueOf dec))) 155 | (when skip-nil? (.add args "skipNul", org.renjin.sexp.LogicalVector/TRUE)) 156 | (when skip-blank-lines? (.add args "blank.lines.skip", org.renjin.sexp.LogicalVector/TRUE)) 157 | (when strip-white? (.add args "strip.white", org.renjin.sexp.LogicalVector/TRUE)) 158 | (when allow-escapes? (.add args "allowEscapes", org.renjin.sexp.LogicalVector/TRUE)) 159 | (when nil-string 160 | (.add args "na.strings" (org.renjin.sexp.StringVector/valueOf nil-string))) 161 | (clojurize-sexp 162 | (.evaluate (org.renjin.eval.Context/newTopLevelContext) 163 | (org.renjin.sexp.FunctionCall. read-csv-fn (.build args))))))) 164 | 165 | ;;(read-csv "resources/COVID-19/csse_covid_19_data/csse_covid_19_time_series/time_series_19-covid-Confirmed.csv") 166 | ;;(read-csv "resources/deutschland.covid19cases.tsv" {:sep "\t" :dec ","}) 167 | ;;(->> vals (map meta)) 168 | 169 | ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; 170 | ;; input stream wrapper to gracefully handled ZIP and BZ files 171 | 172 | ;; TODO add support for this compression method? 173 | ;; } else if(b1 == 0xFD && b2 == '7') { 174 | ;; // See http://tukaani.org/xz/xz-javadoc/org/tukaani/xz/XZInputStream.html 175 | ;; // Set a memory limit of 64mb, if this is not sufficient, it will throw 176 | ;; // an exception rather than an OutOfMemoryError, which will terminate the JVM 177 | ;; return new XZInputStream(new FileInputStream(file), 64 * 1024 * 1024); 178 | -------------------------------------------------------------------------------- /test/appliedsciencestudio/rdata_test.clj: -------------------------------------------------------------------------------- 1 | (ns appliedsciencestudio.rdata-test 2 | (:require [clojure.test :refer :all] 3 | [appliedsciencestudio.rdata :refer [read-rdata make-serializer clojurize-sexp]])) 4 | 5 | (def eval-r 6 | "An instance of the Renjin script engine, which we will use to generate test data." 7 | (let [engine (.getScriptEngine (org.renjin.script.RenjinScriptEngineFactory.))] 8 | (fn [script] 9 | (.eval engine script)))) 10 | 11 | (defn r->clj 12 | "A helper function to convert R data to clj w/ keyword keys." 13 | [key-fn sexp] 14 | (clojurize-sexp key-fn (make-serializer) sexp)) 15 | 16 | (deftest simple-tests 17 | ;; originally taking ideas from https://scicloj.github.io/clojisr/resources/public/clojisr/v1/tutorial-test/index.html#more-data-conversion-examples 18 | (testing "Generate some data using R, then convert it to clojure structures." 19 | (testing "named list" 20 | (is (= (r->clj identity (eval-r "list(a=1,b=c(10,20),c='hi!')")) 21 | {"a" [1.0], 22 | "b" [10.0 20.0], 23 | "c" ["hi!"]}))) 24 | (testing "booleans" 25 | (is (= (r->clj identity (eval-r "TRUE")) 26 | [true])) 27 | (is (= (r->clj identity (eval-r "FALSE")) 28 | [false])) 29 | (is (= (r->clj identity (eval-r "NA")) 30 | ;; XXX 31 | [nil]))) 32 | #_ (testing "null/nil" 33 | (is (= (r->clj keyword (eval-r "NULL")) 34 | nil))) 35 | (is (= (r->clj identity (eval-r "c(10,20,30)")) 36 | [10.0 20.0 30.0])) 37 | (is (= (r->clj identity (eval-r "list(A=1,B=2,'#123strange ()'=3)")) 38 | {"A" [1.0], "B" [2.0], "#123strange ()" [3.0]})) 39 | (is (= (r->clj keyword (eval-r "list(a=1:10,b='hi!')")) 40 | {:a [1 2 3 4 5 6 7 8 9 10], :b ["hi!"]})) 41 | (is (= (r->clj keyword (eval-r "list(a=1,b=c(10,20),c='hi!')")) 42 | {:a [1.0], :b [10.0 20.0], :c ["hi!"]})))) 43 | 44 | ;; java->clj might be trying too hard to keywordize? 45 | ;; (appliedsciencestudio.experiment/java->clj 46 | ;; (eval-r "list(A=1,B=2,'#123strange ()'=3)")) 47 | ;;=> {:A [1.0], :B [2.0], :#123strange () [3.0]} 48 | 49 | ;;(r->clj identity (eval-r "table(c('a','b','a','b','a','b','a','b'), c(1,1,2,2,3,3,1,1))")) 50 | ;; 51 | ;; In R this is: 52 | ;; 1 2 3 53 | ;; a 2 1 1 54 | ;; b 2 1 1 55 | ;; ... but rdata currently returns: 56 | ;; => [2 2 1 1 1 1] 57 | ;;... with this meta: 58 | ;; {:class ["table"], :dim [2 3], :dimnames #:appliedsciencestudio.rdata{:unnamed-1 ["a" "b"], :unnamed-2 ["a" "b"]}} 59 | 60 | ;; clojisr gives this, which I'm not sure is what I'd want: 61 | ;; {["1" "a"] 2, 62 | ;; ["1" "b"] 2, 63 | ;; ["2" "a"] 1, 64 | ;; ["2" "b"] 1, 65 | ;; ["3" "a"] 1, 66 | ;; ["3" "b"] 1} 67 | 68 | ;; bringing the clojisr code over, we get this: 69 | ;; (appliedsciencestudio.experiment/java->clj 70 | ;; (eval-r "table(c('a','b','a','b','a','b','a','b'), c(1,1,2,2,3,3,1,1))")) 71 | ;;=>[[2 2 1] [1 1 1]] 72 | 73 | ;; these first datasets were taken from https://github.com/reconhub/outbreaks 74 | (deftest sars-test 75 | (testing "Load some demo data from the SARS 2003 dataset, access it using the string keys provided by R." 76 | (let [data (read-rdata "test/data/sars_canada_2003.RData" ) 77 | sars (get data "sars_canada_2003") 78 | dates (get sars "date") 79 | cases (get sars "cases_travel")] 80 | (is (not (nil? data))) 81 | (is (= '("date" "cases_travel" "cases_household" "cases_healthcare" "cases_other") 82 | (keys sars))) 83 | (is (= 110 (count dates))) 84 | (is (= (first cases) 1)) 85 | (is (= (last cases) 0))))) 86 | 87 | (deftest zika-test 88 | (testing "Load some data from the Zika 2015 dataset, converting keys to keywords" 89 | (let [data (read-rdata "test/data/zika_girardot_2015.RData" {:key-fn keyword}) 90 | zika (-> data :zika_girardot_2015) 91 | dates (-> zika :date) 92 | cases (-> zika :cases)] 93 | (is (not (nil? data))) 94 | (is (= '(:date :cases) (keys zika))) 95 | (is (= 93 (count dates))) 96 | (is (= (first dates) #inst "2015-10-19T00:00:00.000-00:00")) 97 | (is (= (last dates) #inst "2016-01-22T00:00:00.000-00:00")) 98 | (is (= (first cases) 1)) 99 | (is (= (last cases) 1))))) 100 | 101 | (deftest mers-test 102 | (testing "Load some data from the multilayered MERS Korea 2015 dataset, converting keys to keywords" 103 | (let [data (read-rdata "test/data/mers_korea_2015.RData" {:key-fn keyword}) 104 | linelist (-> data :mers_korea_2015 :linelist)] 105 | (is (= '(:from :to :exposure :diff_dt_onset) 106 | (keys (-> data :mers_korea_2015 :contacts)))) 107 | (is (= '(:id :age :age_class :sex :place_infect 108 | :reporting_ctry :loc_hosp :dt_onset :dt_report 109 | :week_report :dt_start_exp :dt_end_exp :dt_diag 110 | :outcome :dt_death) 111 | (keys linelist))) 112 | (is (= (first (:outcome linelist)) 1)) 113 | (is (= (last (:outcome linelist)) 1)) 114 | (is (= (first (:id linelist)) "SK_1")) 115 | (is (= (last (:id linelist)) "SK_162")) 116 | (is (= (first (:age linelist)) 68)) 117 | (is (= (last (:age linelist)) 33))))) 118 | 119 | ;; https://github.com/EmilHvitfeldt/paletteer/blob/master/data/palettes_d.rda 120 | (deftest palettes-test 121 | (testing "Load some colour palettes from an uncompressed RData file, converting keys to keywords" 122 | (let [palettes (-> (read-rdata "test/data/palettes_d.rda" {:key-fn keyword}) :palettes_d)] 123 | (is (= (keys palettes) 124 | '(:awtools :basetheme :calecopal :colorblindr :colRoz :dichromat :dutchmasters :DresdenColor 125 | :fishualize :futurevisions :ggsci :ggpomological :ggthemes :ggthemr :ghibli :grDevices 126 | :IslamicArt :jcolors :LaCroixColoR :lisa :nationalparkcolors :NineteenEightyR :nord :ochRe 127 | :palettetown :pals :Polychrome :MapPalettes :miscpalettes :palettesForR :PNWColors 128 | :rcartocolor :RColorBrewer :Redmonder :RSkittleBrewer :tidyquant :trekcolors :tvthemes 129 | :unikn :vapeplot :vapoRwave :werpals :wesanderson :yarrr))) 130 | (is (= (-> palettes :wesanderson) 131 | {:BottleRocket1 ["#A42820" "#5F5647" "#9B110E" "#3F5151" "#4E2A1E" "#550307" "#0C1707"], 132 | :BottleRocket2 ["#FAD510" "#CB2314" "#273046" "#354823" "#1E1E1E"], 133 | :Rushmore1 ["#E1BD6D" "#EABE94" "#0B775E" "#35274A" "#F2300F"], 134 | :Rushmore ["#E1BD6D" "#EABE94" "#0B775E" "#35274A" "#F2300F"], 135 | :Royal1 ["#899DA4" "#C93312" "#FAEFD1" "#DC863B"], 136 | :Royal2 ["#9A8822" "#F5CDB4" "#F8AFA8" "#FDDDA0" "#74A089"], 137 | :Zissou1 ["#3B9AB2" "#78B7C5" "#EBCC2A" "#E1AF00" "#F21A00"], 138 | :Darjeeling1 ["#FF0000" "#00A08A" "#F2AD00" "#F98400" "#5BBCD6"], 139 | :Darjeeling2 ["#ECCBAE" "#046C9A" "#D69C4E" "#ABDDDE" "#000000"], 140 | :Chevalier1 ["#446455" "#FDD262" "#D3DDDC" "#C7B19C"], 141 | :FantasticFox1 ["#DD8D29" "#E2D200" "#46ACC8" "#E58601" "#B40F20"], 142 | :Moonrise1 ["#F3DF6C" "#CEAB07" "#D5D5D3" "#24281A"], 143 | :Moonrise2 ["#798E87" "#C27D38" "#CCC591" "#29211F"], 144 | :Moonrise3 ["#85D4E3" "#F4B5BD" "#9C964A" "#CDC08C" "#FAD77B"], 145 | :Cavalcanti1 ["#D8B70A" "#02401B" "#A2A475" "#81A88D" "#972D15"], 146 | :GrandBudapest1 ["#F1BB7B" "#FD6467" "#5B1A18" "#D67236"], 147 | :GrandBudapest2 ["#E6A0C4" "#C6CDF7" "#D8A499" "#7294D4"], 148 | :IsleofDogs1 ["#9986A5" "#79402E" "#CCBA72" "#0F0D0E" "#D9D0D3" "#8D8680"], 149 | :IsleofDogs2 ["#EAD3BF" "#AA9486" "#B6854D" "#39312F" "#1C1718"]}))))) 150 | 151 | ;; courtesy of @generateme :) 152 | (deftest unnamed-test 153 | (testing "Load some data containing unnamed pairs, converting keys to keywords" 154 | (is (= (read-rdata "test/data/totest.rda" {:key-fn keyword}) 155 | {:partiallyNamedList 156 | {:appliedsciencestudio.rdata/unnamed-1 ["noname"], 157 | :n ["withname"], 158 | :appliedsciencestudio.rdata/unnamed-2 ["noname"], 159 | :appliedsciencestudio.rdata/unnamed-3 ["noname"], 160 | :a [1.0], 161 | :b [2.0], 162 | :c [3.0]}, 163 | :matrixRowAndColumnNames [1 2 3 4 5 6]})))) 164 | -------------------------------------------------------------------------------- /test/data/mers_korea_2015.RData: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/applied-science/rdata/888d388954366b93a3a53d773a0b05249bdecb58/test/data/mers_korea_2015.RData -------------------------------------------------------------------------------- /test/data/palettes_d.rda: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/applied-science/rdata/888d388954366b93a3a53d773a0b05249bdecb58/test/data/palettes_d.rda -------------------------------------------------------------------------------- /test/data/sars_canada_2003.RData: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/applied-science/rdata/888d388954366b93a3a53d773a0b05249bdecb58/test/data/sars_canada_2003.RData -------------------------------------------------------------------------------- /test/data/totest.rda: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/applied-science/rdata/888d388954366b93a3a53d773a0b05249bdecb58/test/data/totest.rda -------------------------------------------------------------------------------- /test/data/zika_girardot_2015.RData: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/applied-science/rdata/888d388954366b93a3a53d773a0b05249bdecb58/test/data/zika_girardot_2015.RData --------------------------------------------------------------------------------