A Clojure library for working with Delimiter-Separated Value data. This includes a customizable defensive parser and a simple writer.

You might be interested in using this instead of the common or a more mainstream codec like Jackson because CSV is a terrible format and you'll often need to deal with messy, malformed, and downright bizarre data files.


The main namespace entrypoint is, which contains both the reading and writing interfaces.

=> (require '[ :as separator])


One of the significant features of this library is safety valves on parsing to deal with bad input data. The parser does its best to recover from these errors and present meaningful data about the problems to the consumer. This includes limiting the maximum cell size and the maximum row width.

To parse data into a sequence of rows, use the read-rows function. This accepts many kinds of inputs, including directly reading string data:

=> (vec (separator/read-rows "A,B,C\nD,E,F\nG,H,I\n"))
[["A" "B" "C"] ["D" "E" "F"] ["G" "H" "I"]]

;; quoted cells can embed newlines
=> (vec (separator/read-rows "A,B,C\nD,E,\"F\nG\",H,I\n"))
[["A" "B" "C"] ["D" "E" "F\nG" "H" "I"]]

;; parse errors are included in the sequence by default
=> (vec (separator/read-rows "A,B,C\nD,\"\"E,F\nG,H,I\n"))
[["A" "B" "C"] #< :malformed-quote 2:4> ["G" "H" "I"]]

;; the error mode can also omit them
=> (vec (separator/read-rows "A,B,C\nD,\"\"E,F\nG,H,I\n" :error-mode :ignore))
[["A" "B" "C"] ["G" "H" "I"]]

;; ...or throw them
=> (vec (separator/read-rows "A,B,C\nD,\"\"E,F\nG,H,I\n" :error-mode :throw))
;; Execution error (ParseException) at (
;; Unexpected character following quote: E

;; the errors carry data:
=> (ex-data *e)
{:column 4,
 :line 2,
 :message "Unexpected character following quote: E",
 :partial-cell "",
 :partial-row ["D"],
 :skipped-text "E...F",
 :type :malformed-quote}

The parser also supports customizable quote, separator, and escape characters. Escapes are not part of the CSV standard but show up often in practice, so we need to deal with them.

=> (vec (separator/read-rows "A|B|C\nD|E|^F\nG^|H|I\n" :separator \| :quote \^))
[["A" "B" "C"] ["D" "E" "F\nG" "H" "I"]]

=> (vec (separator/read-rows "A,B,C\\\nD,E,F\nG,H,I\n" :escape \\))
[["A" "B" "C\\nD" "E" "F"] ["G" "H" "I"]]

Additionally, there's a convenience wrapper using the zip-headers transducer to read a sequence of map records instead, by utilizing a row of headers:

=> (vec (separator/read-records "name,age,role\nPhillip Fry,26,Delivery Boy\nTuranga Leela,28,Ship Pilot\nHubert Farnsworth,160,Professor\n"))
[{"age" "26", "name" "Phillip Fry", "role" "Delivery Boy"}
 {"age" "28", "name" "Turanga Leela", "role" "Ship Pilot"}
 {"age" "160", "name" "Hubert Farnsworth", "role" "Professor"}]


The library also provides tools for writing delimiter-separated data from a sequence of rows using the write-rows function. This takes a Writer to print the data to and a similar set of options to control the output format:

=> (separator/write-rows *out* [["A" "B" "C"] ["D" "E" "F"] ["G" "H" "I"]])
;; A,B,C
;; D,E,F
;; G,H,I

;; cells containing the quote or separator character are automatically quoted
=> (separator/write-rows *out* [["A" "B,B" "C"] ["D" "E" "F\"F"]])
;; A,"B,B",C
;; D,E,"F""F"

;; you can also force quoting for all cells
=> (separator/write-rows *out* [["A" "B" "C"] ["D" "E" "F"] ["G" "H" "I"]] :quote? true)
;; "A","B","C"
;; "D","E","F"
;; "G","H","I"

;; or provide a predicate to control quoting
=> (separator/write-rows *out* [["A" "B" "C"] ["D" "E" "F"] ["G" "H" "I"]] :quote? #{"E"})
;; A,B,C
;; D,"E",F
;; G,H,I


Separator prioritizes defensiveness over speed, but aims to be as performant as possible within those constraints. For comparison, it's faster than data.csv but significantly slower than Jackson:

=> (crit/quick-bench (consume! (separator/read-rows test-file)))
Evaluation count : 6 in 6 samples of 1 calls.
             Execution time mean : 5.544234 sec
    Execution time std-deviation : 78.630488 ms
   Execution time lower quantile : 5.481820 sec ( 2.5%)
   Execution time upper quantile : 5.667485 sec (97.5%)
                   Overhead used : 6.824396 ns

=> (crit/quick-bench (consume! (data-csv-read test-file)))
Evaluation count : 6 in 6 samples of 1 calls.
             Execution time mean : 10.253641 sec
    Execution time std-deviation : 121.221011 ms
   Execution time lower quantile : 10.146078 sec ( 2.5%)
   Execution time upper quantile : 10.436205 sec (97.5%)
                   Overhead used : 6.943926 ns

=> (crit/quick-bench (consume! (jackson-read test-file)))
Evaluation count : 6 in 6 samples of 1 calls.
             Execution time mean : 2.325301 sec
    Execution time std-deviation : 40.611328 ms
   Execution time lower quantile : 2.296693 sec ( 2.5%)
   Execution time upper quantile : 2.390772 sec (97.5%)
                   Overhead used : 6.824396 ns

The test above was performed on a 2021 MacBook Pro with data.csv version 1.0.1 and jackson-dataformat-csv version 2.13.0 on a 330 MB CSV file with 12.4 million rows.

Of course, all the speed in the world won't save you from a misplaced quote:

=> (spit "simple-err.csv" "A,B,C\nD,\"\"E,F\nG,H,I\n")

=> (consume! (separator/read-rows (io/file "simple-err.csv")))

=> (consume! (data-csv-read (io/file "simple-err.csv")))
Execution error at (csv.clj:37).
CSV error (unexpected character: E)

=> (consume! (jackson-read (io/file "simple-err.csv")))
Execution error (JsonParseException) at com.fasterxml.jackson.core.JsonParser/_constructError (
Unexpected character ('E' (code 69)): Expected column separator character (',' (code 44)) or end-of-line
 at [Source: (com.fasterxml.jackson.dataformat.csv.impl.UTF8Reader); line: 2, column: 6]


Copyright © 2022 Amperity, Inc.

Distributed under the MIT License.

