Efficient pathways to read/write csv-based formats and json. Many of these functions
have fast pathways for constructing the parser,writer in order to help with the case where
you want to rapidly encode/decode a stream of small objects. For general uses, the simply
named read-XXX, write-XXX functions are designed to be drop-in but far more efficient
replacements of their clojure.data.csv
and clojure.data.json
equivalents.
This is based on an underlying char[] based parsing system that makes it easy to build new parsers and allows tight loops to iterate through loaded character arrays and are thus easily optimized by HotSpot.
On top of these abstractions you have reader/writer abstractions for java and csv.
Many of these abstractions return a CloseableSupplier so you
can simply use them with with-open
and the underlying stream/reader will be closed when the control leaves the block. If you read all the data
out of the supplier then the supplier itself will close the input when finished.
Efficient pathways to read/write csv-based formats and json. Many of these functions have fast pathways for constructing the parser,writer in order to help with the case where you want to rapidly encode/decode a stream of small objects. For general uses, the simply named read-XXX, write-XXX functions are designed to be drop-in but far more efficient replacements of their `clojure.data.csv` and `clojure.data.json` equivalents. This is based on an underlying char[] based parsing system that makes it easy to build new parsers and allows tight loops to iterate through loaded character arrays and are thus easily optimized by HotSpot. * [CharBuffer.java](https://github.com/cnuernber/charred/blob/master/java/chardata/CharBuffer.java) - More efficient, simpler and general than StringBuilder. * [CharReader.java](https://github.com/cnuernber/charred/blob/master/java/chardata/CharReader.java) - PushbackReader-like abstraction only capable of pushing back 1 character. Allows access to the underlying buffer and relative offset. On top of these abstractions you have reader/writer abstractions for java and csv. Many of these abstractions return a [CloseableSupplier](https://github.com/cnuernber/charred/blob/master/java/charred/CloseableSupplier.java) so you can simply use them with `with-open` and the underlying stream/reader will be closed when the control leaves the block. If you read all the data out of the supplier then the supplier itself will close the input when finished.
(json-reader-fn options)
Given options, return a function that when called constructs a json reader from exactly those options. This avoids the work of upacking/analyzing the options when constructing many json readers for a sequence small inputs.
Given options, return a function that when called constructs a json reader from exactly those options. This avoids the work of upacking/analyzing the options when constructing many json readers for a sequence small inputs.
(json-writer-fn options)
Return a function that when called efficiently constructs a JSONWriter from the given
options. Same arguments as write-json
.
Return a function that when called efficiently constructs a JSONWriter from the given options. Same arguments as [[write-json]].
(parse-json-fn & [options])
Return a function from input->json. Parses the options once and thus when parsing many small JSON inputs where you intend to get one and only one JSON object from them this pathway is a bit more efficient than read-json.
Same options as read-json-supplier
.
Return a function from input->json. Parses the options once and thus when parsing many small JSON inputs where you intend to get one and only one JSON object from them this pathway is a bit more efficient than read-json. Same options as [[read-json-supplier]].
Protocol to extend support for converting items to a json-supported datastructure. These can be a number, a string, an implementation of java.util.List or an implementation of java.util.Map.
Protocol to extend support for converting items to a json-supported datastructure. These can be a number, a string, an implementation of java.util.List or an implementation of java.util.Map.
(->json-data item)
Automatic conversion of some subset of types to something acceptible to json. Defaults to toString for types that aren't representable in json.
Automatic conversion of some subset of types to something acceptible to json. Defaults to toString for types that aren't representable in json.
(read-csv input & {:as args})
Read a csv returning a clojure.data.csv-compatible sequence. For options
see read-csv-supplier
.
An important note is that :comment-char
is disabled by default during read-csv
for backward compatibility while it is not disabled by default during
read-csv-supplier. Also :close-reader?
defaults to false to match the behavior
of data.csv.
Read a csv returning a clojure.data.csv-compatible sequence. For options see [[read-csv-supplier]]. An important note is that `:comment-char` is disabled by default during read-csv for backward compatibility while it is not disabled by default during read-csv-supplier. Also `:close-reader?` defaults to false to match the behavior of data.csv.
(read-csv-supplier input & [options])
Read a csv into a row supplier. Parse algorithm the same as clojure.data.csv although
this returns a java.util.function.Supplier which also implements AutoCloseable as well as
clojure.lang.Seqable
and clojure.lang.IReduce
.
The supplier returned derives from AutoCloseable and it will terminate the reading and close the underlying read mechanism (and join the async thread) if (.close supp) is called.
For a drop-in but much faster replacement to clojure.data.csv use read-csv
.
Options:
In additon to these options, see options for reader->char-buf-supplier
.
:async?
- Defaults to true - read the file into buffers in an offline thread. This
speeds up reading larger files (1MB+) by about 30%.:separator
- Field separator - defaults to ,.:quote
- Quote specifier - defaults to //".:escape
- Escape character - defaults to disabled.:close-reader?
- Close the reader when iteration is finished - defaults to true.:column-allowlist
- Sequence of allowed column names or indexes. :column-whitelist
still
works but isn't preferred.:column-blocklist
- Sequence of dis-allowed column names or indexes. When conflicts with
:column-allowlist
then :column-allowlist
wins. :column-blacklist
still works but
isn't preferred:comment-char
- Defaults to #. Rows beginning with character are discarded with no
further processing. Setting the comment-char to nil or (char 0)
disables comment lines.:trim-leading-whitespace?
- When true, leading spaces and tabs are ignored. Defaults
to true.:trim-trailing-whitespace?
- When true, trailing spaces and tabs are ignored. Defaults
to true:nil-empty-values?
- When true, empty strings are elided entirely and returned as nil
values. Defaults to false.:profile
- Either :immutable
or :mutable
. :immutable
returns persistent vectors
while :mutable
returns arraylists.Read a csv into a row supplier. Parse algorithm the same as clojure.data.csv although this returns a java.util.function.Supplier which also implements AutoCloseable as well as `clojure.lang.Seqable` and `clojure.lang.IReduce`. The supplier returned derives from AutoCloseable and it will terminate the reading and close the underlying read mechanism (and join the async thread) if (.close supp) is called. For a drop-in but much faster replacement to clojure.data.csv use [[read-csv]]. Options: In additon to these options, see options for [[reader->char-buf-supplier]]. * `:async?` - Defaults to true - read the file into buffers in an offline thread. This speeds up reading larger files (1MB+) by about 30%. * `:separator` - Field separator - defaults to \,. * `:quote` - Quote specifier - defaults to //". * `:escape` - Escape character - defaults to disabled. * `:close-reader?` - Close the reader when iteration is finished - defaults to true. * `:column-allowlist` - Sequence of allowed column names or indexes. `:column-whitelist` still works but isn't preferred. * `:column-blocklist` - Sequence of dis-allowed column names or indexes. When conflicts with `:column-allowlist` then `:column-allowlist` wins. `:column-blacklist` still works but isn't preferred * `:comment-char` - Defaults to #. Rows beginning with character are discarded with no further processing. Setting the comment-char to nil or `(char 0)` disables comment lines. * `:trim-leading-whitespace?` - When true, leading spaces and tabs are ignored. Defaults to true. * `:trim-trailing-whitespace?` - When true, trailing spaces and tabs are ignored. Defaults to true * `:nil-empty-values?` - When true, empty strings are elided entirely and returned as nil values. Defaults to false. * `:profile` - Either `:immutable` or `:mutable`. `:immutable` returns persistent vectors while `:mutable` returns arraylists.
(read-json input & {:as args})
Drop in replacement for clojure.data.json/read and clojure.data.json/read-str. For options
see read-json-supplier
.
Drop in replacement for clojure.data.json/read and clojure.data.json/read-str. For options see [[read-json-supplier]].
(read-json-supplier input & [options])
Read one or more JSON objects.
Returns an auto-closeable supplier that when called by default throws an exception
if the read pathway is finished. Input may be a character array or string (most efficient)
or something convertible to a reader. Options for conversion to reader are described in
reader->char-reader
although for the json case we default :async?
to false as
most json is just too small to benefit from async reading of the input. For input streams
:async?
defaults to false
as most JSON files are relatively small -
in the 10-100K range where async loading doesn't make much of a difference. On a larger
file, however, setting :async?
to true definitely can make a large difference.Map keys are canonicalized using an instance of charred.StringCanonicalizer. This results
in less memory usage and faster performance as java strings cache their hash codes. You can
supply the string canonicalizer potentially pre-initialized with the parser-fn
option.
For an example of using the parser-fn
option see fjson.clj.
Options:
In addition to the options below, see options for reader->char-reader
.
:bigdec
- When true use bigdecimals for floating point numbers. Defaults to false.:double-fn
- If :bigdec isn't provided, use this function to parse double values.:profile
- Which performance profile to use. This simply provides defaults to
:array-iface
and :obj-iface
. The default :immutable
value produces persistent datastructures and supports value-fn and key-fn.
:mutable
produces an object arrays and java.util.HashMaps - this is about
30% faster. :raw
produces ArrayLists for arrays and a
JSONReader$JSONObj type with a public data member that is an ArrayList for objects.:key-fn
- Function called on each string map key.:value-fn
- Function called on each map value. Function is passed the key and val so it
takes 2 arguments. If this function returns :charred.api/elided
then
the key-val pair will be elided from the result.:array-iface
- Implementation of JSONReader$ArrayReader called on the object array of values for a javascript array.:obj-iface
- Implementation of JSONReader$ObjReader called for each javascript
object. Note that providing this overrides key-fn and value-fn.:eof-error?
- Defaults to true - when eof is encountered when attempting to read an
object throw an EOF error. Else returns a special EOF value, controlled by the :eof-value
option.:eof-value
- EOF value. Defaults to the keyword :eof
:eof-fn
- Function called if readObject is going to return EOF. Defaults to throwing an
EOFException.:parser-fn
- Function that overrides the array-iface and obj-iface parameters - this is
called each time the parser is created and must return a map with at least array-iface,
obj-iface and finalize-fn keys. It may also optionally have a :string-canonicalizer
key
which, if present, must be an instance of charred.StringCanonicalizer. Thus you can
ensure the share string tables between parser invocations or create a context-dependent
set of array and object interface specifications.Read one or more JSON objects. Returns an auto-closeable supplier that when called by default throws an exception if the read pathway is finished. Input may be a character array or string (most efficient) or something convertible to a reader. Options for conversion to reader are described in [[reader->char-reader]] although for the json case we default `:async?` to false as most json is just too small to benefit from async reading of the input. For input streams - unlike csv - `:async?` defaults to `false` as most JSON files are relatively small - in the 10-100K range where async loading doesn't make much of a difference. On a larger file, however, setting `:async?` to true definitely can make a large difference. Map keys are canonicalized using an instance of charred.StringCanonicalizer. This results in less memory usage and faster performance as java strings cache their hash codes. You can supply the string canonicalizer potentially pre-initialized with the `parser-fn` option. For an example of using the `parser-fn` option see [fjson.clj](https://github.com/cnuernber/fast-json/blob/master/src/fjson.clj#L100). Options: In addition to the options below, see options for [[reader->char-reader]]. * `:bigdec` - When true use bigdecimals for floating point numbers. Defaults to false. * `:double-fn` - If :bigdec isn't provided, use this function to parse double values. * `:profile` - Which performance profile to use. This simply provides defaults to `:array-iface` and `:obj-iface`. The default `:immutable` value produces persistent datastructures and supports value-fn and key-fn. `:mutable` produces an object arrays and java.util.HashMaps - this is about 30% faster. `:raw` produces ArrayLists for arrays and a JSONReader$JSONObj type with a public data member that is an ArrayList for objects. * `:key-fn` - Function called on each string map key. * `:value-fn` - Function called on each map value. Function is passed the key and val so it takes 2 arguments. If this function returns `:charred.api/elided` then the key-val pair will be elided from the result. * `:array-iface` - Implementation of JSONReader$ArrayReader called on the object array of values for a javascript array. * `:obj-iface` - Implementation of JSONReader$ObjReader called for each javascript object. Note that providing this overrides key-fn and value-fn. * `:eof-error?` - Defaults to true - when eof is encountered when attempting to read an object throw an EOF error. Else returns a special EOF value, controlled by the `:eof-value` option. * `:eof-value` - EOF value. Defaults to the keyword `:eof` * `:eof-fn` - Function called if readObject is going to return EOF. Defaults to throwing an EOFException. * `:parser-fn` - Function that overrides the array-iface and obj-iface parameters - this is called each time the parser is created and must return a map with at least array-iface, obj-iface and finalize-fn keys. It may also optionally have a `:string-canonicalizer` key which, if present, must be an instance of charred.StringCanonicalizer. Thus you can ensure the share string tables between parser invocations or create a context-dependent set of array and object interface specifications.
(reader->char-buf-supplier rdr & [options])
Given a reader, return a supplier that when called reads the next buffer of the reader.
When n-buffers is >= 0, this function iterates through a fixed number of buffers under
the covers so you need to be cognizant of the number of actual buffers that you want to
have present in memory. This fn also implement AutoCloseable
and closing it will close
the underlying reader.
Options:
:n-buffers
- Number of buffers to use. Defaults to 6 as the queue size defaults to 4 -
if this number is positive but too small then buffers in flight will get overwritten. If
n-buffers is <= 0 then buffers are allocated as needed and not reused - this is the safest
option but also can make async loading much slower than it would be otherwise. This must
be at least 2 larger than queue-depth.:queue-depth
- Defaults to 4. See comments on :n-buffers
.:bufsize
- Size of each buffer - defaults to (* 64 1024). Small improvements are
sometimes seen with larger or smaller buffers.:async?
- defaults to true if the number of processors is more than one.. When true
data is read in an async thread.:close-reader?
- When true, close input reader when finished. Defaults to true.Given a reader, return a supplier that when called reads the next buffer of the reader. When n-buffers is >= 0, this function iterates through a fixed number of buffers under the covers so you need to be cognizant of the number of actual buffers that you want to have present in memory. This fn also implement `AutoCloseable` and closing it will close the underlying reader. Options: * `:n-buffers` - Number of buffers to use. Defaults to 6 as the queue size defaults to 4 - if this number is positive but too small then buffers in flight will get overwritten. If n-buffers is <= 0 then buffers are allocated as needed and not reused - this is the safest option but also can make async loading much slower than it would be otherwise. This must be at least 2 larger than queue-depth. * `:queue-depth` - Defaults to 4. See comments on `:n-buffers`. * `:bufsize` - Size of each buffer - defaults to (* 64 1024). Small improvements are sometimes seen with larger or smaller buffers. * `:async?` - defaults to true if the number of processors is more than one.. When true data is read in an async thread. * `:close-reader?` - When true, close input reader when finished. Defaults to true.
(reader->char-reader rdr)
(reader->char-reader rdr options)
Given a reader, return a CharReader which presents some of the same interface as a pushbackreader but is only capable of pushing back 1 character. It is extremely quick to instantiate this object from a string or character array.
Options:
See options for reader->char-buf-supplier
.
Given a reader, return a CharReader which presents some of the same interface as a pushbackreader but is only capable of pushing back 1 character. It is extremely quick to instantiate this object from a string or character array. Options: See options for [[reader->char-buf-supplier]].
(write-csv w data & {:as options})
Writes data to writer in CSV-format. See also write-csv-rf
.
Options:
:separator
- Default ,):quote
- Default "):quote?
A predicate function which determines if a string should be quoted.
Defaults to quoting only when necessary. May also be the the value 'true' in which
case every field is quoted.:newline
- :lf
(default) or :cr+lf
):close-writer?
- defaults to false unless w
is a string. When true, close writer
when finished.Writes data to writer in CSV-format. See also [[write-csv-rf]]. Options: * `:separator` - Default \,) * `:quote` - Default \") * `:quote?` A predicate function which determines if a string should be quoted. Defaults to quoting only when necessary. May also be the the value 'true' in which case every field is quoted. * `:newline` - `:lf` (default) or `:cr+lf`) * `:close-writer?` - defaults to false unless `w` is a string. When true, close writer when finished.
(write-csv-rf w)
(write-csv-rf w options)
Returns a transduce-compatible rf that will write a csv.
See options for write-csv
.
This rf must be finalized (rf last-reduced-value) and will return the number of rows written in that case.
Example:
user> (transduce (map identity) (charred/write-csv-rf "test.csv") [[:a :b :c][1 2 3]])
2
user> (slurp "test.csv")
":a,:b,:c
1,2,3
"
Returns a transduce-compatible rf that will write a csv. See options for [[write-csv]]. This rf must be finalized (rf last-reduced-value) and will return the number of rows written in that case. Example: ```clojure user> (transduce (map identity) (charred/write-csv-rf "test.csv") [[:a :b :c][1 2 3]]) 2 user> (slurp "test.csv") ":a,:b,:c 1,2,3 " ```
(write-json output data & {:as argmap})
Write json to output. You can extend the writer to new datatypes by implementing
the [[->json-data]] function of the protocol PToJSON
. This function need only return
json-acceptible datastructures which are numbers, booleans, nil, lists, arrays, and
maps. The default type coercion will in general simply call .toString on the object.
Options:
:escape-unicode
- If true (default) non-ASCII characters are escaped as \uXXXX:escape-js-separators
If true (default) the Unicode characters U+2028 and U+2029 will
be escaped as \u2028 and \u2029 even if :escape-unicode is
false. (These two characters are valid in pure JSON but are not
valid in JavaScript strings.:escape-slash
If true (default) the slash / is escaped as /:indent-str
When nil (default) json is printed raw with no indent or whitespace. For
two spaces of indent per level of nesting, choose " ".:obj-fn
- Function called on each non-primitive object - it is passed the JSONWriter and
the object. The default iterates maps, lists, and arrays converting anything that is
not a json primitive or a map, list or array to a json primitive via str. java.sql.Date
classes get special treatment and are converted to instants which then converted to
json primitive objects via the PToJSon protocol fn [[->json-data]] which defaults to
toString
. This is the most general override mechanism where you will need to manually
call the JSONWriter's methods. The simpler but slightly less general pathway is to
override the protocol method [[->json-data]].Write json to output. You can extend the writer to new datatypes by implementing the [[->json-data]] function of the protocol `PToJSON`. This function need only return json-acceptible datastructures which are numbers, booleans, nil, lists, arrays, and maps. The default type coercion will in general simply call .toString on the object. Options: * `:escape-unicode` - If true (default) non-ASCII characters are escaped as \uXXXX * `:escape-js-separators` If true (default) the Unicode characters U+2028 and U+2029 will be escaped as \u2028 and \u2029 even if :escape-unicode is false. (These two characters are valid in pure JSON but are not valid in JavaScript strings. * `:escape-slash` If true (default) the slash / is escaped as \/ * `:indent-str` When nil (default) json is printed raw with no indent or whitespace. For two spaces of indent per level of nesting, choose " ". * `:obj-fn` - Function called on each non-primitive object - it is passed the JSONWriter and the object. The default iterates maps, lists, and arrays converting anything that is not a json primitive or a map, list or array to a json primitive via str. java.sql.Date classes get special treatment and are converted to instants which then converted to json primitive objects via the PToJSon protocol fn [[->json-data]] which defaults to `toString`. This is the most general override mechanism where you will need to manually call the JSONWriter's methods. The simpler but slightly less general pathway is to override the protocol method [[->json-data]].
(write-json-fn argmap)
Return a function of two arguments, (output,data), that efficiently constructs
a json writer and writes the data. This is the most efficient pathway when writing
a bunch of small json objects as it avoids the cost associated with unpacking the
argument map. Same arguments as write-json
.
Return a function of two arguments, (output,data), that efficiently constructs a json writer and writes the data. This is the most efficient pathway when writing a bunch of small json objects as it avoids the cost associated with unpacking the argument map. Same arguments as [[write-json]].
(write-json-str data & {:as args})
Write json to a string. See options for write-json
.
Write json to a string. See options for [[write-json]].
cljdoc is a website building & hosting documentation for Clojure/Script libraries
× close