Efficient pathways to read/write csv-based formats and json. Many of these functions
have fast pathways for constructing the parser,writer in order to help with the case where
you want to rapidly encode/decode a stream of small objects. For general uses, the simply
named read-XXX, write-XXX functions are designed to be drop-in but far more efficient
replacements of their clojure.data.csv
and clojure.data.json
equivalents.
This is based on an underlying char[] based parsing system that makes it easy to build new parsers and allows tight loops to iterate through loaded character arrays and are thus easily optimized by HotSpot.
On top of these abstractions you have reader/writer abstractions for java and csv.
Efficient pathways to read/write csv-based formats and json. Many of these functions have fast pathways for constructing the parser,writer in order to help with the case where you want to rapidly encode/decode a stream of small objects. For general uses, the simply named read-XXX, write-XXX functions are designed to be drop-in but far more efficient replacements of their `clojure.data.csv` and `clojure.data.json` equivalents. This is based on an underlying char[] based parsing system that makes it easy to build new parsers and allows tight loops to iterate through loaded character arrays and are thus easily optimized by HotSpot. * [CharBuffer.java](https://github.com/cnuernber/charred/blob/master/java/chardata/CharBuffer.java) - More efficient, simpler and general than StringBuilder. * [CharReader.java](https://github.com/cnuernber/charred/blob/master/java/chardata/CharReader.java) - PushbackReader-like abstraction only capable of pushing back 1 character. Allows access to the underlying buffer and relative offset. On top of these abstractions you have reader/writer abstractions for java and csv.
(json-reader-fn options)
Given options, return a function that when called constructs a json reader from exactly those options. This avoids the work of upacking/analyzing the options when constructing many json readers for a sequence small inputs.
Given options, return a function that when called constructs a json reader from exactly those options. This avoids the work of upacking/analyzing the options when constructing many json readers for a sequence small inputs.
(json-writer-fn options)
Return a function that when called efficiently constructs a JSONWriter from the given
options. Same arguments as write-json
.
Return a function that when called efficiently constructs a JSONWriter from the given options. Same arguments as [[write-json]].
(parse-json-fn & [options])
Return a function from input->json. Reuses the parse context and thus when parsing many small JSON inputs where you intend to get one and only one JSON object from them this pathway is a bit more efficient than read-json.
Same options as read-json-supplier
.
Return a function from input->json. Reuses the parse context and thus when parsing many small JSON inputs where you intend to get one and only one JSON object from them this pathway is a bit more efficient than read-json. Same options as [[read-json-supplier]].
Protocol to extend support for converting items to a json-supported datastructure. These can be a number, a string, an implementation of java.util.List or an implementation of java.util.Map.
Protocol to extend support for converting items to a json-supported datastructure. These can be a number, a string, an implementation of java.util.List or an implementation of java.util.Map.
(->json-data item)
Automatic conversion of some subset of types to something acceptible to json. Defaults to toString for types that aren't representable in json.
Automatic conversion of some subset of types to something acceptible to json. Defaults to toString for types that aren't representable in json.
(read-csv input & options)
Read a csv returning a clojure.data.csv-compatible sequence. For options
see read-csv-supplier
.
Read a csv returning a clojure.data.csv-compatible sequence. For options see [[read-csv-supplier]].
(read-csv-supplier input & [options])
Read a csv into a row supplier. Parse algorithm the same as clojure.data.csv although
this returns an iterator and each row is an ArrayList as opposed to a persistent
vector. To convert a java.util.List into something with the same equal and hash semantics
of a persistent vector use either tech.v3.datatype.ListPersistentVector
or vec
. To
convert an iterator to a sequence use iterator-seq.
The iterator returned derives from AutoCloseable and it will terminate the iteration and close the underlying iterator (and join the async thread) if (.close iter) is called.
For a drop-in but much faster replacement to clojure.data.csv use [[read-csv-compat]].
Options:
:async?
- Defaults to true - read the file into buffers in an offline thread. This
speeds up reading larger files (1MB+) by about 30%.:separator
- Field separator - defaults to ,.:quote
- Quote specifier - defaults to //".:close-reader?
- Close the reader when iteration is finished - defaults to true.:column-whitelist
- Sequence of allowed column names.:column-blacklist
- Sequence of dis-allowed column names. When conflicts with
:column-whitelist
then :column-whitelist
wins.:trim-leading-whitespace?
- When true, leading spaces and tabs are ignored. Defaults to true.:trim-trailing-whitespace?
- When true, trailing spaces and tabs are ignored. Defaults
to true:nil-empty-values?
- When true, empty strings are elided entirely and returned as nil
values. Defaults to true.Read a csv into a row supplier. Parse algorithm the same as clojure.data.csv although this returns an iterator and each row is an ArrayList as opposed to a persistent vector. To convert a java.util.List into something with the same equal and hash semantics of a persistent vector use either `tech.v3.datatype.ListPersistentVector` or `vec`. To convert an iterator to a sequence use iterator-seq. The iterator returned derives from AutoCloseable and it will terminate the iteration and close the underlying iterator (and join the async thread) if (.close iter) is called. For a drop-in but much faster replacement to clojure.data.csv use [[read-csv-compat]]. Options: * `:async?` - Defaults to true - read the file into buffers in an offline thread. This speeds up reading larger files (1MB+) by about 30%. * `:separator` - Field separator - defaults to \,. * `:quote` - Quote specifier - defaults to //". * `:close-reader?` - Close the reader when iteration is finished - defaults to true. * `:column-whitelist` - Sequence of allowed column names. * `:column-blacklist` - Sequence of dis-allowed column names. When conflicts with `:column-whitelist` then `:column-whitelist` wins. * `:trim-leading-whitespace?` - When true, leading spaces and tabs are ignored. Defaults to true. * `:trim-trailing-whitespace?` - When true, trailing spaces and tabs are ignored. Defaults to true * `:nil-empty-values?` - When true, empty strings are elided entirely and returned as nil values. Defaults to true.
(read-json input & args)
Drop in replacement for clojure.data.json/read and clojure.data.json/read-str. For options
see read-json-supplier
.
Drop in replacement for clojure.data.json/read and clojure.data.json/read-str. For options see [[read-json-supplier]].
(read-json-supplier input & [options])
Read one or more JSON objects.
Returns an auto-closeable supplier that when called by default throws an exception
if the read pathway is finished. Input may be a character array or string (most efficient)
or something convertible to a reader. Options for conversion to reader are described in
reader->char-reader
although for the json case we default :async?
to false as
most json is just too small to benefit from async reading of the input.
Options:
:bigdec
- When true use bigdecimals for floating point numbers. Defaults to false.:double-fn
- If :bigdec isn't provided, use this function to parse double values.:profile
- Which performance profile to use. This simply provides defaults to
:array-iface
and :obj-iface
. The default :immutable
value produces persistent datastructures and supports value-fn and key-fn.
:mutable
produces an object arrays and java.util.HashMaps - this is about
30% faster. :raw
produces ArrayLists for arrays and a
JSONReader$JSONObj type with a public data member that is an ArrayList for objects.:key-fn
- Function called on each string map key.:value-fn
- Function called on each map value. Function is passed the key and val so it
takes 2 arguments. If this function returns :tech.v3.datatype.char-input/elided
then
the key-val pair will be elided from the result.:array-iface
- Implementation of JSONReader$ArrayReader called on the object array of values for a javascript array.:obj-iface
- Implementation of JSONReader$ObjReader called for each javascript
object. Note that providing this overrides key-fn and value-fn.:eof-error?
- Defaults to true - when eof is encountered when attempting to read an
object throw an EOF error. Else returns a special EOF value.:eof-value
- EOF value. Defaults to:eof-fn
- Function called if readObject is going to return EOF. Defaults to throwing an
EOFException.Read one or more JSON objects. Returns an auto-closeable supplier that when called by default throws an exception if the read pathway is finished. Input may be a character array or string (most efficient) or something convertible to a reader. Options for conversion to reader are described in [[reader->char-reader]] although for the json case we default `:async?` to false as most json is just too small to benefit from async reading of the input. Options: * `:bigdec` - When true use bigdecimals for floating point numbers. Defaults to false. * `:double-fn` - If :bigdec isn't provided, use this function to parse double values. * `:profile` - Which performance profile to use. This simply provides defaults to `:array-iface` and `:obj-iface`. The default `:immutable` value produces persistent datastructures and supports value-fn and key-fn. `:mutable` produces an object arrays and java.util.HashMaps - this is about 30% faster. `:raw` produces ArrayLists for arrays and a JSONReader$JSONObj type with a public data member that is an ArrayList for objects. * `:key-fn` - Function called on each string map key. * `:value-fn` - Function called on each map value. Function is passed the key and val so it takes 2 arguments. If this function returns `:tech.v3.datatype.char-input/elided` then the key-val pair will be elided from the result. * `:array-iface` - Implementation of JSONReader$ArrayReader called on the object array of values for a javascript array. * `:obj-iface` - Implementation of JSONReader$ObjReader called for each javascript object. Note that providing this overrides key-fn and value-fn. * `:eof-error?` - Defaults to true - when eof is encountered when attempting to read an object throw an EOF error. Else returns a special EOF value. * `:eof-value` - EOF value. Defaults to * `:eof-fn` - Function called if readObject is going to return EOF. Defaults to throwing an EOFException.
(reader->char-buf-supplier rdr & [options])
Given a reader, return a supplierthat when called reads the next buffer of the reader.
When n-buffers is >= 0, this function iterates through a fixed number of buffers under
the covers so you need to be cognizant of the number of actual buffers that you want to
have present in memory. This fn also implement AutoCloseable
and closing it will close
the underlying reader.
Options:
:n-buffers
- Number of buffers to use. Defaults to -1 - if this number is positive
but too small then buffers in flight will get overwritten. If n-buffers is <= 0 then
buffers are allocated as needed and not reused - this is the safest option.:bufsize
- Size of each buffer - defaults to (* 64 1024). Small improvements are
sometimes seen with larger or smaller buffers.:async?
- defaults to true if the number of processors is more than one.. When true
data is read in an async thread.:close-reader?
- When true, close input reader when finished. Defaults to true.Given a reader, return a supplierthat when called reads the next buffer of the reader. When n-buffers is >= 0, this function iterates through a fixed number of buffers under the covers so you need to be cognizant of the number of actual buffers that you want to have present in memory. This fn also implement `AutoCloseable` and closing it will close the underlying reader. Options: * `:n-buffers` - Number of buffers to use. Defaults to -1 - if this number is positive but too small then buffers in flight will get overwritten. If n-buffers is <= 0 then buffers are allocated as needed and not reused - this is the safest option. * `:bufsize` - Size of each buffer - defaults to (* 64 1024). Small improvements are sometimes seen with larger or smaller buffers. * `:async?` - defaults to true if the number of processors is more than one.. When true data is read in an async thread. * `:close-reader?` - When true, close input reader when finished. Defaults to true.
(reader->char-reader rdr)
(reader->char-reader rdr options)
Given a reader, return a CharReader which presents some of the same interface as a pushbackreader but is only capable of pushing back 1 character. It is extremely quick to instantiate this object from a string or character array.
Options:
Options are passed through mainly unchanged to queue-iter and to
reader->char-buf-supplier
.
:async?
- default to true - reads the reader in an offline thread into character
buffers.Given a reader, return a CharReader which presents some of the same interface as a pushbackreader but is only capable of pushing back 1 character. It is extremely quick to instantiate this object from a string or character array. Options: Options are passed through mainly unchanged to queue-iter and to [[reader->char-buf-supplier]]. * `:async?` - default to true - reads the reader in an offline thread into character buffers.
(write-csv w data & options)
Writes data to writer in CSV-format. Options:
:separator
- Default ,):quote
- Default "):quote?
A predicate function which determines if a string should be quoted.
Defaults to quoting only when necessary. May also be the the value 'true' in which
case every field is quoted.
:newline (:lf (default) or :cr+lf)
:close-writer? - defaults to true. When true, close writer when finished.Writes data to writer in CSV-format. Options: * `:separator` - Default \,) * `:quote` - Default \") * `:quote?` A predicate function which determines if a string should be quoted. Defaults to quoting only when necessary. May also be the the value 'true' in which case every field is quoted. :newline (:lf (default) or :cr+lf) :close-writer? - defaults to true. When true, close writer when finished.
(write-json output data & args)
Write json to output. You can extend the writer to new datatypes by implementing
the [[->json-data]] function of the protocol PToJSON
. This function need only return
json-acceptible datastructures which are numbers, booleans, nil, lists, arrays, and
maps. The default type coercion will in general simply call .toString on the object.
Options:
:escape-unicode
- If true (default) non-ASCII characters are escaped as \uXXXX:escape-js-separators
If true (default) the Unicode characters U+2028 and U+2029 will
be escaped as \u2028 and \u2029 even if :escape-unicode is
false. (These two characters are valid in pure JSON but are not
valid in JavaScript strings.:escape-slash
If true (default) the slash / is escaped as /:indent-str
Defaults to " ". When nil json is printed raw with no indent or
whitespace.:obj-fn
- Function called on each non-primitive object - it is passed the JSONWriter and
the object. The default iterates maps, lists, and arrays converting anything that is
not a json primitive or a map, list or array to a json primitive via str. java.sql.Date
classes get special treatment and are converted to instants which then converted to
json primitive objects via the PToJSon protocol fn [[->json-data]] which defaults to
toString
. This is the most general override mechanism where you will need to manually
call the JSONWriter's methods. The simpler but slightly less general pathway is to
override the protocol method [[->json-data]].Write json to output. You can extend the writer to new datatypes by implementing the [[->json-data]] function of the protocol `PToJSON`. This function need only return json-acceptible datastructures which are numbers, booleans, nil, lists, arrays, and maps. The default type coercion will in general simply call .toString on the object. Options: * `:escape-unicode` - If true (default) non-ASCII characters are escaped as \uXXXX * `:escape-js-separators` If true (default) the Unicode characters U+2028 and U+2029 will be escaped as \u2028 and \u2029 even if :escape-unicode is false. (These two characters are valid in pure JSON but are not valid in JavaScript strings. * `:escape-slash` If true (default) the slash / is escaped as \/ * `:indent-str` Defaults to " ". When nil json is printed raw with no indent or whitespace. * `:obj-fn` - Function called on each non-primitive object - it is passed the JSONWriter and the object. The default iterates maps, lists, and arrays converting anything that is not a json primitive or a map, list or array to a json primitive via str. java.sql.Date classes get special treatment and are converted to instants which then converted to json primitive objects via the PToJSon protocol fn [[->json-data]] which defaults to `toString`. This is the most general override mechanism where you will need to manually call the JSONWriter's methods. The simpler but slightly less general pathway is to override the protocol method [[->json-data]].
(write-json-fn argmap)
Return a function of two arguments, (output,data), that efficiently constructs
a json writer and writes the data. This is the most efficient pathway when writing
a bunch of small json objects as it avoids the cost associated with unpacking the
argument map. Same arguments as write-json
.
Return a function of two arguments, (output,data), that efficiently constructs a json writer and writes the data. This is the most efficient pathway when writing a bunch of small json objects as it avoids the cost associated with unpacking the argument map. Same arguments as [[write-json]].
(write-json-str data & args)
Write json to a string. See options for write-json
.
Write json to a string. See options for [[write-json]].
cljdoc is a website building & hosting documentation for Clojure/Script libraries
× close