├── .gitignore ├── README.md ├── benchmarks └── csv │ └── benchmarks │ └── core.clj ├── project.clj ├── src └── clojure_csv │ └── core.clj └── test └── clojure_csv └── test ├── core.clj └── utils.clj /.gitignore: -------------------------------------------------------------------------------- 1 | pom.xml 2 | *.jar 3 | lib 4 | classes 5 | target 6 | .cake 7 | .lein-failures 8 | .lein-deps-sum 9 | .lein-repl-history 10 | benchmarks/data -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | Clojure-CSV 2 | =========== 3 | Clojure-CSV is a small library for reading and writing CSV files. The main 4 | features: 5 | 6 | * Both common line terminators are accepted. 7 | * Quoting and escaping inside CSV fields are handled correctly (specifically 8 | commas and double-quote characters). 9 | * Unescaped newlines embedded in CSV fields are supported when 10 | parsing. 11 | * Reading is lazy. 12 | * More permissive than RFC 4180, although there are some optional strictness 13 | checks. (Send me any bugs you find, or any correctness checks you think 14 | should be performed.) 15 | 16 | This library aims to be as permissive as possible with respect to deviation 17 | from the standard, as long as the intention is clear. The only correctness 18 | checks made are those on the actual (minimal) CSV structure. For example, 19 | some people think it should be an error when lines in the CSV have a 20 | different number of fields -- you should check this yourself. However, it is 21 | not possible, after parsing, to tell if the input ended before the closing 22 | quote of a field; if you care, it can be signaled to you. 23 | 24 | The API has changed in the 2.0 series; see below for details. 25 | 26 | Recent Updates 27 | -------------- 28 | 29 | * Updated library to 2.0.2, with a bug fix for malformed input by 30 | [attil-io](https://github.com/attil-io). 31 | * Updated library to 2.0.1, which adds the :force-quote option to write-csv. 32 | Big thanks to [Barrie McGuire](https://github.com/pleasle) for the contribution. 33 | * Updated library to 2.0.0; essentially identical to 2.0.0-alpha2. 34 | 35 | * Updated library to 2.0.0-alpha2.. 36 | * Rewritten parser for additional speed increases. 37 | * Benchmarks to help monitor and improve performance. 38 | 39 | * Updated the library to 2.0.0-alpha1. 40 | * Major update: Massive speed improvements, end-of-line string is 41 | configurable for parsing, improved handling of empty files, input to 42 | parse-csv is now a string or Reader, and a new API based on keyword 43 | args instead of rebinding vars. 44 | 45 | ###Previously... 46 | * Updated library to 1.3.2. 47 | * Added support for changing the character used to start and end quoted fields in 48 | reading and writing. 49 | * Updated library to 1.3.1. 50 | * Fixed the quoting behavior on write, to properly quote any field with a CR. Thanks to Matt Lehman for this fix. 51 | * Updated library to 1.3.0. 52 | * Now has support for Clojure 1.3. 53 | * Some speed improvements to take advantage of Clojure 1.3. Nearly twice as fast 54 | in my tests. 55 | * Updated library to 1.2.4. 56 | * Added the char-seq multimethod, which provides a variety of implementations 57 | for easily creating the char seqs that parse-csv uses on input from various 58 | similar objects. Big thanks to [Slawek Gwizdowski](https://github.com/i0cus) 59 | for this contribution. 60 | * Includes a bug fix for a problem where a non-comma delimiter was causing 61 | incorrect quoting on write. 62 | * Included a bug fix to make the presence of a double-quote in an unquoted field 63 | parse better in non-strict mode. Specifically, if a CSV field is not quoted 64 | but has \" characters, they are read as \" with no further processing. Does 65 | not start quoting. 66 | * Reorganized namespaces to fit better with my perception of Clojure standards. 67 | Specifically, the main namespace is now clojure-csv.core. 68 | * Significantly faster on parsing. There should be additional speed 69 | improvements possible when Clojure 1.2 is released. 70 | * Support for more error checking with \*strict\* var. 71 | * Numerous bug fixes. 72 | 73 | Obtaining 74 | --------- 75 | If you are using Leiningen, you can simply add 76 | 77 | [clojure-csv/clojure-csv "2.0.1"] 78 | 79 | to your project.clj and download it from Clojars with 80 | 81 | lein deps 82 | 83 | Use 84 | --- 85 | The `clojure-csv.core` namespace exposes two functions to the user: 86 | 87 | ### parse-csv 88 | Takes a CSV as a char sequence or string, and returns a lazy sequence of 89 | vectors of strings; each vector corresponds to a row, and each string is 90 | one field from that row. Be careful to ensure that if you read lazily from 91 | a file or some other resource that it remains open when the sequence is 92 | consumed. 93 | 94 | Takes the following keyword arguments to change parsing behavior: 95 | #### :delimiter 96 | A character that contains the cell separator for each column in a row. 97 | ##### Default value: \\, 98 | #### :end-of-line 99 | A string containing the end-of-line character for 100 | reading CSV files. If this setting is nil then \\n and \\r\\n are both 101 | accepted. 102 | ##### Default value: nil 103 | #### :quote-char 104 | A character that is used to begin and end a quoted cell. 105 | ##### Default value: \" 106 | #### :strict 107 | If this variable is true, the parser will throw an exception 108 | on parse errors that are recoverable but not to spec or otherwise 109 | nonsensical. 110 | ##### Default value: false 111 | 112 | ### write-csv 113 | Takes a sequence of sequences of strings, basically a table of strings, 114 | and renders that table into a string in CSV format. You can easily 115 | call this function repeatedly row-by-row and concatenate the results yourself. 116 | 117 | Takes the following keyword arguments to change the written file: 118 | #### :delimiter 119 | A character that contains the cell separator for each column in a row. 120 | ##### Default value: \\, 121 | #### :end-of-line 122 | A string containing the end-of-line character for writing CSV files. 123 | ##### Default value: \\n 124 | #### :quote-char 125 | A character that is used to begin and end a quoted cell. 126 | ##### Default value: \" 127 | #### :force-quote 128 | If this variable is true, the output will have ever field quoted, whether 129 | this is needed or not. This can apparently be helpful for interoperating 130 | with Excel. 131 | ##### Default value: false 132 | 133 | Changes from API 1.0 134 | -------------------- 135 | 136 | Clojure-CSV was originally written for Clojure 1.0, before many of the 137 | modern features we now enjoy in Clojure, like keyword args, an IO 138 | library and fast primitive math. The 2.0 series freshens up the API to 139 | more modern Clojure API style, language capabilities, and coding 140 | conventions. The JARs for the 1.0 series will remain available 141 | indefinitely (probably a long, long time), so if you can't handle an 142 | API change, you can continue to use it as you always have. 143 | 144 | Here's a summary of the changes: 145 | 146 | * Options are now set through keyword args to parse-csv and write-csv. The 147 | dynamic vars are removed. 148 | - Rationale: Dynamic vars are a little annoying to rebind. This can 149 | tempt you to imprudently set them for too wide a swath of 150 | code. Reusing the same vars for both reading and writing meant that 151 | the vars had to have the same meaning in each context, or else two 152 | vars introduced to accommodate the differences. Keyword args are 153 | clear, fast, explicit, and local. 154 | * Parsing logic is now based on Java readers instead of Clojure char seqs. 155 | - Rationale: Largely performance. Clojure's char seqs are not 156 | particularly fast and throw off a lot of garbage. It's not clear 157 | that working entirely with pure Clojure data structures was 158 | providing much value to anyone. When you're doing IO, Readers 159 | are close at hand in Java, and now the basis for Clojure's IO libs. 160 | * An empty file now parses as a file with no rows. 161 | - Rationale: The CSV standard actually doesn't say anything about an 162 | input that is an empty file. Clojure-CSV 1.0 would return a single 163 | row with an empty string in it. The logic was that a CSV file row is 164 | everything between the start of a line and the end of the line, 165 | where an EOF is a line terminator. This would mean an empty file is 166 | a single row that has an empty field. An alternative, and equally 167 | valid view is that if a file has nothing in it, there is no row to 168 | be had. A file that is a single row with an empty field can still be 169 | expressed in this viewpoint as a file that contains only a line 170 | terminator. The same cannot be said of the 1.0 view of things: there 171 | was no way to represent a file with no rows. In any case, I went and 172 | looked at many other CSV parsing libraries for other languages, 173 | and they universally took the view that an empty CSV file has no 174 | rows, so now Clojure-CSV does as well. 175 | * The end-of-line option can now be set during parsing. If end-of-line is 176 | set to something other than nil, parse-csv will treat \n and \r\n as 177 | any other character and only use the string given in end-of-line as the 178 | newline. 179 | 180 | Bugs 181 | ---- 182 | Please let me know of any problems you are having. 183 | 184 | Contributors 185 | ------------ 186 | - [Slawek Gwizdowski](https://github.com/i0cus) 187 | - [Matt Lehman](http://github.com/mlehman) 188 | - [Barrie McGuire](http://github.com/pleasle) 189 | 190 | License 191 | -------- 192 | Eclipse Public License 193 | -------------------------------------------------------------------------------- /benchmarks/csv/benchmarks/core.clj: -------------------------------------------------------------------------------- 1 | (ns csv.benchmarks.core 2 | (:use perforate.core 3 | clojure-csv.core) 4 | (:require [clojure.java.shell :as sh] 5 | [clojure.java.io :as io] 6 | [clojure.string :as string])) 7 | 8 | (def data-url "http://www2.census.gov/econ2002/CBP_CSV/cbp02st.zip") 9 | (def data-dir "benchmarks/data") 10 | (def data-file (str data-dir "/cbp02st.txt")) 11 | 12 | (defn data-present? 13 | "Check if the benchmark test data is available in the benchmark/data dir. 14 | Simply checks for the presence of the directory." 15 | [] 16 | (.exists (io/file data-dir))) 17 | 18 | (defn get-cbp-data 19 | [] 20 | (let [filename (last (string/split data-url #"/"))] 21 | (try (sh/sh "wget" data-url) 22 | (sh/sh "unzip" filename "-p" "benchmarks/data") 23 | (sh/sh "rm" filename) 24 | (catch java.io.IOException e)))) 25 | 26 | (defn get-cbp-data-if-missing 27 | "Get the data if it isn't already there." 28 | [] 29 | (when (not (data-present?)) 30 | (get-cbp-data))) 31 | 32 | 33 | (defgoal read-test "CSV Read Speed" 34 | :setup get-cbp-data-if-missing) 35 | 36 | (defcase* read-test :clojure-csv 37 | (fn [] 38 | (let [csvfile (slurp data-file)] 39 | [(fn [] (dorun 50000 (parse-csv csvfile)))]))) 40 | 41 | (defgoal write-test "CSV Write Speed" 42 | :setup get-cbp-data-if-missing) 43 | 44 | (defcase* write-test :clojure-csv 45 | (fn [] 46 | (let [csvfile (slurp data-file) 47 | cbp02 (doall (take 50000 (parse-csv csvfile)))] 48 | [(fn [] (doall (write-csv cbp02)))]))) 49 | -------------------------------------------------------------------------------- /project.clj: -------------------------------------------------------------------------------- 1 | (defproject clojure-csv "2.0.2" 2 | :description "A simple library to read and write CSV files." 3 | :dependencies [[org.clojure/clojure "1.3.0"]] 4 | :plugins [[perforate "0.3.2"]] 5 | :jvm-opts ["-Xmx1g"] 6 | :profiles {:current {:source-paths ["src/"]} 7 | :clj1.4 {:dependencies [[org.clojure/clojure "1.4.0-beta5"]]} 8 | :clj1.3 {:dependencies [[org.clojure/clojure "1.3.0"]]} 9 | :csv1.3 {:dependencies [[clojure-csv "1.3.0"]]} 10 | :csv2.0 {:dependencies [[clojure-csv "2.0.0-alpha1"]]}} 11 | :perforate {:environments [{:name :clojure-csv2 12 | :profiles [:clj1.3 :csv2.0] 13 | :namespaces [csv.benchmarks.core]} 14 | {:name :clojure-csv1 15 | :profiles [:clj1.3 :csv1.3] 16 | :namespaces [csv.benchmarks.core]} 17 | {:name :current 18 | :profiles [:clj1.4 :current] 19 | :namespaces [csv.benchmarks.core]}]}) 20 | -------------------------------------------------------------------------------- /src/clojure_csv/core.clj: -------------------------------------------------------------------------------- 1 | (ns 2 | ^{:author "David Santiago", 3 | :doc "Clojure-CSV is a small library for reading and writing CSV files. 4 | It correctly handles common CSV edge-cases, such as embedded newlines, commas, 5 | and quotes. The main functions are parse-csv and write-csv."} 6 | clojure-csv.core 7 | (:require [clojure.string :as string]) 8 | (:import [java.io Reader StringReader])) 9 | 10 | 11 | ;; 12 | ;; Utilities 13 | ;; 14 | 15 | (defn- reader-peek 16 | ^long [^Reader reader] 17 | (.mark reader 1) 18 | (let [c (.read reader)] 19 | (.reset reader) 20 | c)) 21 | 22 | ;; 23 | ;; CSV Input 24 | ;; 25 | 26 | (defn- lf-at-reader-pos? 27 | "Given a reader, returns true if the reader is currently pointing at an \n 28 | character. Reader will not be changed when the function returns." 29 | [^Reader reader] 30 | (let [next-char (reader-peek reader)] 31 | (== next-char (int \newline)))) 32 | 33 | (defn- crlf-at-reader-pos? 34 | "Given a reader, returns true if the reader is currently pointing at an \r\n 35 | character sequence. Reader will not be changed when the function returns." 36 | [^Reader reader] 37 | (.mark reader 2) 38 | (let [result (and (== (int \return) (.read reader)) 39 | (== (int \newline) (.read reader)))] 40 | (.reset reader) 41 | result)) 42 | 43 | (defn- custom-eol-at-reader-pos? 44 | "Given a reader and an end-of-line string, returns true if the reader is 45 | currently pointing at an instance of the end-of-line string. Reader will not 46 | be changed when the function returns." 47 | [^Reader reader ^String end-of-line] 48 | (.mark reader 16) 49 | (let [result (loop [curr-rdr-char (int (.read reader)) 50 | eol-pos (int 0)] 51 | (if (>= eol-pos (int (count end-of-line))) 52 | ;; Reached the end of the EOL to check for, so return success 53 | true 54 | ;; Didn't reach the end of the EOL string, so recur if the 55 | ;; next char matches the next EOL char. Otherwise, fail. 56 | (if (== curr-rdr-char (.codePointAt end-of-line eol-pos)) 57 | (recur (.read reader) (inc eol-pos)) 58 | false)))] 59 | (.reset reader) 60 | result)) 61 | 62 | (defn- eol-at-reader-pos? 63 | "Given a reader and optionally an end-of-line string, returns true if the 64 | reader is currently pointing at an end-of-line (LF/CRLF/the end-of-line arg). 65 | Reader will not be changed when the function returns. Note that if the 66 | EOL is specified, it will not check for LF/CRLF." 67 | ([^Reader reader] 68 | (or (lf-at-reader-pos? reader) 69 | (crlf-at-reader-pos? reader))) 70 | ([^Reader reader end-of-line] 71 | (if end-of-line 72 | (custom-eol-at-reader-pos? reader end-of-line) 73 | (eol-at-reader-pos? reader)))) 74 | 75 | (defn- skip-past-eol 76 | "Given a reader that is pointing at an end-of-line 77 | (LF/CRLF/the end-of-line arg), moves the reader forward to the 78 | first character after the end-of-line sequence. Note that if the EOL is 79 | specified, it will not check for LF/CRLF." 80 | ([^Reader reader] 81 | ;; If we peek and see a newline (LF), then the EOL is just an LF, skip 1. 82 | ;; Otherwise, the EOL is a CRLF, so skip 2. 83 | (if (== (int \newline) (reader-peek reader)) 84 | (.skip reader 1) 85 | (.skip reader 2))) 86 | ([^Reader reader end-of-line] 87 | (if end-of-line 88 | ;; end-of-line is specified, and we can assume we are positioned at 89 | ;; an eol. 90 | (.skip reader (count end-of-line)) 91 | (skip-past-eol reader)))) 92 | 93 | (defn- read-unquoted-field 94 | "Given a reader that is queued up to the beginning of an unquoted field, 95 | reads the field and returns it as a string. The reader will be left at the 96 | first character past the end of the field." 97 | [^Reader reader delimiter quote-char strict end-of-line] 98 | (let [delimiter (int delimiter) 99 | quote-char (int quote-char) 100 | field-str (StringBuilder.)] 101 | (loop [c (reader-peek reader)] 102 | (cond (or (== c -1) 103 | (== c delimiter)) 104 | (.toString field-str) 105 | (eol-at-reader-pos? reader end-of-line) 106 | (.toString field-str) 107 | (and strict (== c quote-char)) 108 | (throw (Exception. "Double quote present in unquoted field.")) 109 | :else ;; Saw a regular character that is part of the field. 110 | (do (.appendCodePoint field-str (.read reader)) 111 | (recur (reader-peek reader))))))) 112 | 113 | (defn- escaped-quote-at-reader-pos? 114 | "Given a reader, returns true if it is currently pointing at a character that 115 | is the same as quote-char. The reader position will not be changed when the 116 | function returns." 117 | [^Reader reader ^long quote-char] 118 | (.mark reader 2) 119 | (let [result (and (== quote-char (.read reader)) 120 | (== quote-char (.read reader)))] 121 | (.reset reader) 122 | result)) 123 | 124 | (defn- read-quoted-field 125 | "Given a reader that is queued up to the beginning of a quoted field, 126 | reads the field and returns it as a string. The reader will be left at the 127 | first character past the end of the field." 128 | [^Reader reader ^long delimiter ^long quote-char strict] 129 | (let [field-str (StringBuilder.)] 130 | (.skip reader 1) ;; Discard the quote that starts the field. 131 | (loop [c (reader-peek reader)] 132 | (cond (== c -1) 133 | (if strict 134 | (throw (Exception. 135 | "Reached end of input before end of quoted field.")) 136 | ;; Otherwise, return what we've got so far. 137 | (.toString field-str)) 138 | ;; If we see two quote chars in a row, only add one of them to the 139 | ;; output, skip both of the characters, and continue. 140 | (escaped-quote-at-reader-pos? reader quote-char) 141 | (do (.appendCodePoint field-str quote-char) 142 | (.skip reader 2) 143 | (recur (reader-peek reader))) 144 | ;; Otherwise, if we see a single quote char, this field has ended. 145 | ;; Skip past the ending quote and return the field. 146 | (== c quote-char) 147 | (do (.skip reader 1) ;; Skip past that quote character. 148 | (.toString field-str)) 149 | :else 150 | (do (.appendCodePoint field-str (.read reader)) 151 | (recur (reader-peek reader))))))) 152 | 153 | (defn- parse-csv-line 154 | "Takes a Reader as input and returns the first row of the CSV file, 155 | parsed into cells (an array of strings). The reader passed in will be 156 | positioned for the start of the next line." 157 | [^Reader csv-reader delimiter quote-char strict end-of-line] 158 | ;; We build the last-field variable, and then add it to fields when we 159 | ;; encounter some event (delimiter/eol/eof) that signals the end of 160 | ;; the field. This lets us correctly handle input with empty fields, like 161 | ;; ",,,". 162 | (let [delimiter (int delimiter) 163 | quote-char (int quote-char)] 164 | (loop [fields (transient []) ;; Will return this as the vector of fields. 165 | last-field "" 166 | look-ahead (reader-peek csv-reader)] 167 | (cond (== -1 look-ahead) 168 | (persistent! (conj! fields last-field)) 169 | (== look-ahead (int delimiter)) 170 | (do (.skip csv-reader 1) 171 | (recur (conj! fields last-field) "" (reader-peek csv-reader))) 172 | (eol-at-reader-pos? csv-reader end-of-line) 173 | (do (skip-past-eol csv-reader end-of-line) 174 | (persistent! (conj! fields last-field))) 175 | (== look-ahead (int quote-char)) 176 | (recur fields 177 | (read-quoted-field csv-reader delimiter quote-char strict) 178 | (reader-peek csv-reader)) 179 | (= "" last-field) ;; Must be at beginning or just after comma. 180 | (recur fields 181 | (read-unquoted-field csv-reader delimiter quote-char 182 | strict end-of-line) 183 | (reader-peek csv-reader)) 184 | :else 185 | (throw (Exception. (str "Unexpected character found: " look-ahead))))))) 186 | 187 | (defn- parse-csv-with-options 188 | ([csv-reader {:keys [delimiter quote-char strict end-of-line]}] 189 | (parse-csv-with-options csv-reader delimiter quote-char 190 | strict end-of-line)) 191 | ([csv-reader delimiter quote-char strict end-of-line] 192 | (lazy-seq 193 | (when (not (== -1 (reader-peek csv-reader))) 194 | (let [row (parse-csv-line csv-reader delimiter quote-char 195 | strict end-of-line)] 196 | (cons row (parse-csv-with-options csv-reader delimiter quote-char 197 | strict end-of-line))))))) 198 | 199 | (defn parse-csv 200 | "Takes a CSV as a string or Reader and returns a seq of the parsed CSV rows, 201 | in the form of a lazy sequence of vectors: a vector per row, a string for 202 | each cell. 203 | 204 | Accepts a number of keyword arguments to change the parsing behavior: 205 | :delimiter - A character that contains the cell separator for 206 | each column in a row. Default value: \\, 207 | :end-of-line - A string containing the end-of-line character 208 | for reading CSV files. If this setting is nil then 209 | \\n and \\r\\n are both accepted. Default value: nil 210 | :quote-char - A character that is used to begin and end a quoted cell. 211 | Default value: \\\" 212 | :strict - If this variable is true, the parser will throw an 213 | exception on parse errors that are recoverable but 214 | not to spec or otherwise nonsensical. Default value: false" 215 | ([csv & {:as opts}] 216 | (let [csv-reader (if (string? csv) (StringReader. csv) csv)] 217 | (parse-csv-with-options csv-reader (merge {:strict false 218 | :delimiter \, 219 | :quote-char \"} 220 | opts))))) 221 | 222 | ;; 223 | ;; CSV Output 224 | ;; 225 | 226 | (defn- needs-quote? 227 | "Given a string (cell), determine whether it contains a character that 228 | requires this cell to be quoted." 229 | [^String cell delimiter quote-char] 230 | (or (.contains cell delimiter) 231 | (.contains cell (str quote-char)) 232 | (.contains cell "\n") 233 | (.contains cell "\r"))) 234 | 235 | (defn- escape 236 | "Given a character, returns the escaped version, whether that is the same 237 | as the original character or a replacement. The return is a string or a 238 | character, but it all gets passed into str anyways." 239 | [chr delimiter quote-char] 240 | (if (= quote-char chr) (str quote-char quote-char) chr)) 241 | 242 | (defn- quote-and-escape 243 | "Given a string (cell), returns a new string that has any necessary quoting 244 | and escaping." 245 | [cell delimiter quote-char force-quote] 246 | (if (or force-quote (needs-quote? cell delimiter quote-char)) 247 | (str quote-char 248 | (apply str (map #(escape % delimiter quote-char) 249 | cell)) 250 | quote-char) 251 | cell)) 252 | 253 | (defn- quote-and-escape-row 254 | "Given a row (vector of strings), quotes and escapes any cells where that 255 | is necessary and then joins all the text into a string for that entire row." 256 | [row delimiter quote-char force-quote] 257 | (string/join delimiter (map #(quote-and-escape % 258 | delimiter 259 | quote-char 260 | force-quote) 261 | row))) 262 | 263 | (defn write-csv 264 | "Given a sequence of sequences of strings, returns a string of that table 265 | in CSV format, with all appropriate quoting and escaping. 266 | 267 | Accepts a number of keyword arguments to change the output: 268 | :delimiter - A character that contains the cell separator for 269 | each column in a row. Default value: \\, 270 | :end-of-line - A string containing the end-of-line character 271 | for writing CSV files. Default value: \\n 272 | :quote-char - A character that is used to begin and end a quoted cell. 273 | Default value: \\\" 274 | :force-quote - Forces every cell to be quoted (useful for Excel interop) 275 | Default value: false" 276 | [table & {:keys [delimiter quote-char end-of-line force-quote] 277 | :or {delimiter \, quote-char \" end-of-line "\n" 278 | force-quote false}}] 279 | (loop [csv-string (StringBuilder.) 280 | quoted-table (map #(quote-and-escape-row % 281 | (str delimiter) 282 | quote-char 283 | force-quote) 284 | table)] 285 | (if (empty? quoted-table) 286 | (.toString csv-string) 287 | (recur (.append csv-string (str (first quoted-table) end-of-line)) 288 | (rest quoted-table))))) 289 | -------------------------------------------------------------------------------- /test/clojure_csv/test/core.clj: -------------------------------------------------------------------------------- 1 | (ns clojure-csv.test.core 2 | (:import [java.io StringReader]) 3 | (:use clojure.test 4 | clojure.java.io 5 | clojure-csv.core)) 6 | 7 | (deftest basic-functionality 8 | (is (= [["a" "b" "c"]] (parse-csv "a,b,c"))) 9 | (is (= [["" ""]] (parse-csv ","))) 10 | (is (= [["a" "b"]] (parse-csv "a,b\r\n"))) ;; Linebreak on eof won't add line. 11 | (is (= [] (parse-csv "")))) 12 | 13 | (deftest alternate-sources 14 | (is (= [["a" "b" "c"]] (parse-csv (StringReader. "a,b,c")))) 15 | (is (= [["" ""]] (parse-csv (StringReader. ",")))) 16 | (is (= [] (parse-csv (StringReader. "")))) 17 | (is (= [["First", "Second"]] (parse-csv 18 | (reader (.toCharArray "First,Second")))))) 19 | 20 | (deftest quoting 21 | (is (= [[""]] (parse-csv "\""))) 22 | (is (= [["\""]] (parse-csv "\"\"\""))) 23 | (is (= [["Before", "\"","After"]] (parse-csv "Before,\"\"\"\",After"))) 24 | (is (= [["Before", "", "After"]] (parse-csv "Before,\"\",After"))) 25 | (is (= [["", "start&end", ""]] (parse-csv "\"\",\"start&end\",\"\""))) 26 | (is (= [[",", "\"", ",,", ",,,"]] 27 | (parse-csv "\",\",\"\"\"\",\",,\",\",,,\""))) 28 | (is (= [["quoted", "\",\"", "comma"]] 29 | (parse-csv "quoted,\"\"\",\"\"\",comma"))) 30 | (is (= [["Hello"]] (parse-csv "\"Hello\""))) 31 | (is (thrown? Exception (dorun (parse-csv "\"Hello\" \"Hello2\"")))) 32 | (is (thrown? Exception (dorun (parse-csv "\"Hello\" \"Hello2\" \"Hello3\"")))) 33 | (is (thrown? Exception (dorun (parse-csv "\"Hello\",\"Hello2\" \"Hello3\"")))) 34 | (is (= [["Hello\"Hello2"]] (parse-csv "\"Hello\"\"Hello2\""))) 35 | (is (thrown? Exception (dorun (parse-csv "\"Hello\"Hello2")))) 36 | (is (= [["Hello"]] (parse-csv "\"Hello")))) 37 | 38 | (deftest newlines 39 | (is (= [["test1","test2"] ["test3","test4"]] 40 | (parse-csv "test1,test2\ntest3,test4"))) 41 | (is (= [["test1","test2"] ["test3","test4"]] 42 | (parse-csv "test1,test2\r\ntest3,test4"))) 43 | (is (= [["embedded","line\nbreak"]] (parse-csv "embedded,\"line\nbreak\""))) 44 | (is (= [["embedded", "line\r\nbreak"]] 45 | (parse-csv "embedded,\"line\r\nbreak\"")))) 46 | 47 | (deftest writing 48 | (is (= "test1,test2\n" (write-csv [["test1" "test2"]]))) 49 | (is (= "test1,test2\ntest3,test4\n" 50 | (write-csv [["test1" "test2"] ["test3" "test4"]]))) 51 | (is (= "quoted:,\"line\nfeed\"\n" 52 | (write-csv [["quoted:" "line\nfeed"]]))) 53 | (is (= "quoted:,\"carriage\rreturn\"\n" 54 | (write-csv [["quoted:" "carriage\rreturn"]]))) 55 | (is (= "quoted:,\"embedded,comma\"\n" 56 | (write-csv [["quoted:" "embedded,comma"]]))) 57 | (is (= "quoted:,\"escaped\"\"quotes\"\"\"\n" 58 | (write-csv [["quoted:" "escaped\"quotes\""]])))) 59 | 60 | (deftest force-quote-on-output 61 | (is (= "test1,test2\n" (write-csv [["test1" "test2"]]))) 62 | (is (= "test1,test2\n" (write-csv [["test1" "test2"]] :force-quote false))) 63 | (is (= "\"test1\",\"test2\"\n" (write-csv [["test1" "test2"]] 64 | :force-quote true))) 65 | (is (= "stillquoted:,\"needs,quote\"\n" 66 | (write-csv [["stillquoted:" "needs,quote"]] 67 | :force-quote false))) 68 | (is (= "\"allquoted:\",\"needs,quote\"\n" 69 | (write-csv [["allquoted:" "needs,quote"]] 70 | :force-quote true)))) 71 | 72 | (deftest alternate-delimiters 73 | (is (= [["First", "Second"]] 74 | (parse-csv "First\tSecond" :delimiter \tab))) 75 | (is (= "First\tSecond\n" 76 | (write-csv [["First", "Second"]] :delimiter \tab))) 77 | (is (= "First\tSecond,Third\n" 78 | (write-csv [["First", "Second,Third"]] :delimiter \tab))) 79 | (is (= "First\t\"Second\tThird\"\n" 80 | (write-csv [["First", "Second\tThird"]] :delimiter \tab)))) 81 | 82 | (deftest alternate-quote-char 83 | (is (= [["a", "b", "c"]] 84 | (parse-csv "a,|b|,c" :quote-char \|))) 85 | (is (= [["a", "b|c", "d"]] 86 | (parse-csv "a,|b||c|,d" :quote-char \|))) 87 | (is (= [["a", "b\"\nc", "d"]] 88 | (parse-csv "a,|b\"\nc|,d" :quote-char \|))) 89 | (is (= "a,|b||c|,d\n" 90 | (write-csv [["a", "b|c", "d"]] :quote-char \|))) 91 | (is (= "a,|b\nc|,d\n" 92 | (write-csv [["a", "b\nc", "d"]] :quote-char \|))) 93 | (is (= "a,b\"c,d\n" 94 | (write-csv [["a", "b\"c", "d"]] :quote-char \|)))) 95 | 96 | (deftest strictness 97 | (is (thrown? Exception (dorun (parse-csv "a,b,c,\"d" :strict true)))) 98 | (is (thrown? Exception (dorun (parse-csv "a,b,c,d\"e" :strict true)))) 99 | (is (= [["a","b","c","d"]] 100 | (parse-csv "a,b,c,\"d" :strict false))) 101 | (is (= [["a","b","c","d"]] 102 | (parse-csv "a,b,c,\"d\"" :strict true))) 103 | (is (= [["a","b","c","d\""]] 104 | (parse-csv "a,b,c,d\"" :strict false))) 105 | (is (= [["120030" "BLACK COD FILET MET VEL \"MSC\"" "KG" "0" "1"]] 106 | (parse-csv "120030;BLACK COD FILET MET VEL \"MSC\";KG;0;1" 107 | :strict false :delimiter \;)))) 108 | 109 | (deftest reader-cases 110 | ;; reader will be created and closed in with-open, but used outside. 111 | ;; this is actually a java.io.IOException, but thrown at runtime so... 112 | (is (thrown? java.lang.RuntimeException 113 | (dorun (with-open [sr (StringReader. "a,b,c")] 114 | (parse-csv sr)))))) 115 | 116 | (deftest custom-eol 117 | ;; Test the use of this option. 118 | (is (= [["a" "b"] ["c" "d"]] (parse-csv "a,b\rc,d" :end-of-line "\r"))) 119 | (is (= [["a" "b"] ["c" "d"]] (parse-csv "a,babcc,d" :end-of-line "abc"))) 120 | ;; The presence of an end-of-line option turns off the parsing of \n and \r\n 121 | ;; as EOLs, so they can appear unquoted in fields when they do not interfere 122 | ;; with the EOL. 123 | (is (= [["a" "b\n"] ["c" "d"]] (parse-csv "a,b\n\rc,d" :end-of-line "\r"))) 124 | (is (= [["a" "b"] ["\nc" "d"]] (parse-csv "a,b\r\nc,d" :end-of-line "\r"))) 125 | ;; Custom EOL can still be quoted into a field. 126 | (is (= [["a" "b\r"] ["c" "d"]] (parse-csv "a,\"b\r\"\rc,d" 127 | :end-of-line "\r"))) 128 | (is (= [["a" "bHELLO"] ["c" "d"]] (parse-csv "a,\"bHELLO\"HELLOc,d" 129 | :end-of-line "HELLO"))) 130 | (is (= [["a" "b\r"] ["c" "d"]] (parse-csv "a,|b\r|\rc,d" 131 | :end-of-line "\r" :quote-char \|)))) 132 | -------------------------------------------------------------------------------- /test/clojure_csv/test/utils.clj: -------------------------------------------------------------------------------- 1 | (ns clojure-csv.test.utils 2 | "Some whitebox testing of the private utility functions used in core." 3 | (:import [java.io StringReader]) 4 | (:use clojure.test 5 | clojure.java.io 6 | clojure-csv.core)) 7 | 8 | (def default-options {:delimiter \, :quote-char \" 9 | :strict false :end-of-line nil}) 10 | 11 | (deftest eol-at-reader-pos? 12 | ;; Testing the private function to check for EOLs 13 | (is (= true (#'clojure-csv.core/eol-at-reader-pos? (StringReader. "\n") nil))) 14 | (is (= true (#'clojure-csv.core/eol-at-reader-pos? (StringReader. "\r\n") 15 | nil))) 16 | (is (= true (#'clojure-csv.core/eol-at-reader-pos? (StringReader. "\nabc") 17 | nil))) 18 | (is (= true (#'clojure-csv.core/eol-at-reader-pos? (StringReader. "\r\nabc") 19 | nil))) 20 | (is (= false (#'clojure-csv.core/eol-at-reader-pos? (StringReader. "\r\tabc") 21 | nil))) 22 | ;; Testing for user-specified EOLs 23 | (is (= true (#'clojure-csv.core/eol-at-reader-pos? (StringReader. "abc") 24 | "abc"))) 25 | (is (= true (#'clojure-csv.core/eol-at-reader-pos? (StringReader. "abcdef") 26 | "abc"))) 27 | (is (= false (#'clojure-csv.core/eol-at-reader-pos? (StringReader. "ab") 28 | "abc")))) 29 | 30 | (deftest skip-past-eol 31 | (is (= (int \c) 32 | (let [rdr (StringReader. "\nc")] 33 | (#'clojure-csv.core/skip-past-eol rdr) 34 | (.read rdr)))) 35 | (is (= (int \c) 36 | (let [rdr (StringReader. "\r\nc")] 37 | (#'clojure-csv.core/skip-past-eol rdr) 38 | (.read rdr)))) 39 | (is (= (int \c) 40 | (let [rdr (StringReader. "QQQc")] 41 | (#'clojure-csv.core/skip-past-eol rdr "QQQ") 42 | (.read rdr))))) 43 | 44 | (deftest read-unquoted-field 45 | (let [{:keys [delimiter quote-char strict end-of-line]} default-options] 46 | (is (= "abc" (#'clojure-csv.core/read-unquoted-field 47 | (StringReader. "abc,def") 48 | delimiter quote-char strict end-of-line))) 49 | (is (= "abc" (#'clojure-csv.core/read-unquoted-field 50 | (StringReader. "abc") 51 | delimiter quote-char strict end-of-line))) 52 | (is (= "abc" (#'clojure-csv.core/read-unquoted-field 53 | (StringReader. "abc\n") 54 | delimiter quote-char strict end-of-line))) 55 | (is (= "abc" (#'clojure-csv.core/read-unquoted-field 56 | (StringReader. "abc\r\n") 57 | delimiter quote-char strict end-of-line))) 58 | (is (= "abc" (#'clojure-csv.core/read-unquoted-field 59 | (StringReader. "abcQQQ") 60 | delimiter quote-char strict "QQQ"))) 61 | (is (= "abc\n" (#'clojure-csv.core/read-unquoted-field 62 | (StringReader. "abc\nQQQ") 63 | delimiter quote-char strict "QQQ"))) 64 | (is (= "abc\"" (#'clojure-csv.core/read-unquoted-field 65 | (StringReader. "abc\",") 66 | delimiter quote-char strict end-of-line))) 67 | (is (= "" (#'clojure-csv.core/read-unquoted-field 68 | (StringReader. ",,,") 69 | delimiter quote-char strict end-of-line))) 70 | (is (thrown? java.lang.Exception 71 | (#'clojure-csv.core/read-unquoted-field 72 | (StringReader. "abc\",") 73 | delimiter quote-char true end-of-line))))) 74 | 75 | (deftest escaped-quote-at-reader-pos? 76 | (is (= true (#'clojure-csv.core/escaped-quote-at-reader-pos? 77 | (StringReader. "\"\"") 78 | (int \")))) 79 | (is (= true (#'clojure-csv.core/escaped-quote-at-reader-pos? 80 | (StringReader. "\"\"abc") 81 | (int \")))) 82 | (is (= false (#'clojure-csv.core/escaped-quote-at-reader-pos? 83 | (StringReader. "\"abc") 84 | (int \")))) 85 | (is (= false (#'clojure-csv.core/escaped-quote-at-reader-pos? 86 | (StringReader. "abc") 87 | (int \"))))) 88 | 89 | (deftest read-quoted-field 90 | (let [{:keys [delimiter quote-char strict]} default-options 91 | delimiter (int delimiter) 92 | quote-char (int quote-char)] 93 | (is (= "abc" (#'clojure-csv.core/read-quoted-field 94 | (StringReader. "\"abc\"") 95 | delimiter quote-char strict))) 96 | (is (= "abc" (#'clojure-csv.core/read-quoted-field 97 | (StringReader. "\"abc\",def") 98 | delimiter quote-char strict))) 99 | (is (= "ab\"c" (#'clojure-csv.core/read-quoted-field 100 | (StringReader. "\"ab\"\"c\"") 101 | delimiter quote-char strict))) 102 | (is (= "ab\nc" (#'clojure-csv.core/read-quoted-field 103 | (StringReader. "\"ab\nc\"") 104 | delimiter quote-char strict))) 105 | (is (= "ab,c" (#'clojure-csv.core/read-quoted-field 106 | (StringReader. "\"ab,c\"") 107 | delimiter quote-char strict))) 108 | (is (thrown? java.lang.Exception 109 | (#'clojure-csv.core/read-quoted-field 110 | (StringReader. "\"abc") 111 | delimiter quote-char true))))) 112 | --------------------------------------------------------------------------------