Liking cljdoc? Tell your friends :D

charred.api

Efficient pathways to read/write csv-based formats and json. Many of these functions have fast pathways for constructing the parser,writer in order to help with the case where you want to rapidly encode/decode a stream of small objects. For general uses, the simply named read-XXX, write-XXX functions are designed to be drop-in but far more efficient replacements of their clojure.data.csv and clojure.data.json equivalents.

This is based on an underlying char[] based parsing system that makes it easy to build new parsers and allows tight loops to iterate through loaded character arrays and are thus easily optimized by HotSpot.

  • CharBuffer.java - More efficient, simpler and general than StringBuilder.
  • CharReader.java - PushbackReader-like abstraction only capable of pushing back 1 character. Allows access to the underlying buffer and relative offset.

On top of these abstractions you have reader/writer abstractions for java and csv.

Many of these abstractions return a CloseableSupplier so you can simply use them with with-open and the underlying stream/reader will be closed when the control leaves the block. If you read all the data out of the supplier then the supplier itself will close the input when finished.

Efficient pathways to read/write csv-based formats and json.  Many of these functions
have fast pathways for constructing the parser,writer in order to help with the case where
you want to rapidly encode/decode a stream of small objects.  For general uses, the simply
named read-XXX, write-XXX functions are designed to be drop-in but far more efficient
replacements of their `clojure.data.csv` and `clojure.data.json` equivalents.


This is based on an underlying char[] based parsing system that makes it easy to build
new parsers and allows tight loops to iterate through loaded character arrays and are thus
easily optimized by HotSpot.

* [CharBuffer.java](https://github.com/cnuernber/charred/blob/master/java/chardata/CharBuffer.java) - More efficient, simpler and general than StringBuilder.
* [CharReader.java](https://github.com/cnuernber/charred/blob/master/java/chardata/CharReader.java) - PushbackReader-like abstraction only capable of pushing back
  1 character.  Allows access to the underlying buffer and relative offset.

On top of these abstractions you have reader/writer abstractions for java and csv.

Many of these abstractions return a [CloseableSupplier](https://github.com/cnuernber/charred/blob/master/java/charred/CloseableSupplier.java) so you
can simply use them with `with-open` and the underlying stream/reader will be closed when the control leaves the block.  If you read all the data
out of the supplier then the supplier itself will close the input when finished.
raw docstring

json-reader-fnclj

(json-reader-fn options)

Given options, return a function that when called constructs a json reader from exactly those options. This avoids the work of upacking/analyzing the options when constructing many json readers for a sequence small inputs.

Given options, return a function that when called constructs a json reader from
exactly those options.  This avoids the work of upacking/analyzing the options
when constructing many json readers for a sequence small inputs.
raw docstring

json-writer-fnclj

(json-writer-fn options)

Return a function that when called efficiently constructs a JSONWriter from the given options. Same arguments as write-json.

Return a function that when called efficiently constructs a JSONWriter from the given
options.  Same arguments as [[write-json]].
raw docstring

parse-json-fnclj

(parse-json-fn & [options])

Return a function from input->json. Parses the options once and thus when parsing many small JSON inputs where you intend to get one and only one JSON object from them this pathway is a bit more efficient than read-json.

Same options as read-json-supplier.

Return a function from input->json.  Parses the options once and thus when
parsing many small JSON inputs where you intend to get one and only one JSON
object from them this pathway is a bit more efficient than read-json.

Same options as [[read-json-supplier]].
raw docstring

PToJSONcljprotocol

Protocol to extend support for converting items to a json-supported datastructure. These can be a number, a string, an implementation of java.util.List or an implementation of java.util.Map.

Protocol to extend support for converting items to a json-supported datastructure.
These can be a number, a string, an implementation of java.util.List or an implementation
of java.util.Map.

->json-dataclj

(->json-data item)

Automatic conversion of some subset of types to something acceptible to json. Defaults to toString for types that aren't representable in json.

Automatic conversion of some subset of types to something acceptible to json.
Defaults to toString for types that aren't representable in json.
raw docstring

read-csvclj

(read-csv input & options)

Read a csv returning a clojure.data.csv-compatible sequence. For options see read-csv-supplier.

An important note is that :comment-char is disabled by default during read-csv for backward compatibility while it is not disabled by default during read-csv-supplier.

Read a csv returning a clojure.data.csv-compatible sequence.  For options
see [[read-csv-supplier]].

An important note is that `:comment-char` is disabled by default during read-csv
for backward compatibility while it is not disabled by default during
read-csv-supplier.
raw docstring

read-csv-supplierclj

(read-csv-supplier input & [options])

Read a csv into a row supplier. Parse algorithm the same as clojure.data.csv although this returns a java.util.function.Supplier which also implements AutoCloseable as well as clojure.lang.Seqable and clojure.lang.IReduce.

The supplier returned derives from AutoCloseable and it will terminate the reading and close the underlying read mechanism (and join the async thread) if (.close supp) is called.

For a drop-in but much faster replacement to clojure.data.csv use read-csv.

Options:

In additon to these options, see options for reader->char-buf-supplier.

  • :async? - Defaults to true - read the file into buffers in an offline thread. This speeds up reading larger files (1MB+) by about 30%.
  • :separator - Field separator - defaults to ,.
  • :quote - Quote specifier - defaults to //".
  • :close-reader? - Close the reader when iteration is finished - defaults to true.
  • :column-whitelist - Sequence of allowed column names or indexes.
  • :column-blacklist - Sequence of dis-allowed column names or indexes. When conflicts with :column-whitelist then :column-whitelist wins.
  • :comment-char - Defaults to #. Rows beginning with character are discarded with no further processing. Setting the comment-char to nil or (char 0) disables comment lines.
  • :trim-leading-whitespace? - When true, leading spaces and tabs are ignored. Defaults to true.
  • :trim-trailing-whitespace? - When true, trailing spaces and tabs are ignored. Defaults to true
  • :nil-empty-values? - When true, empty strings are elided entirely and returned as nil values. Defaults to false.
  • :profile - Either :immutable or :mutable. :immutable returns persistent vectors while :mutable returns arraylists.
Read a csv into a row supplier.  Parse algorithm the same as clojure.data.csv although
this returns a java.util.function.Supplier which also implements AutoCloseable as well as
`clojure.lang.Seqable` and `clojure.lang.IReduce`.

The supplier returned derives from AutoCloseable and it will terminate the reading and
close the underlying read mechanism (and join the async thread) if (.close supp) is called.

For a drop-in but much faster replacement to clojure.data.csv use [[read-csv]].

Options:

In additon to these options, see options for [[reader->char-buf-supplier]].

* `:async?` - Defaults to true - read the file into buffers in an offline thread.  This
   speeds up reading larger files (1MB+) by about 30%.
* `:separator` - Field separator - defaults to \,.
* `:quote` - Quote specifier - defaults to //".
* `:close-reader?` - Close the reader when iteration is finished - defaults to true.
* `:column-whitelist` - Sequence of allowed column names or indexes.
* `:column-blacklist` - Sequence of dis-allowed column names or indexes.  When conflicts with
   `:column-whitelist` then `:column-whitelist` wins.
* `:comment-char` - Defaults to #.  Rows beginning with character are discarded with no
  further processing.  Setting the comment-char to nil or `(char 0)` disables comment lines.
* `:trim-leading-whitespace?` - When true, leading spaces and tabs are ignored.  Defaults
   to true.
* `:trim-trailing-whitespace?` - When true, trailing spaces and tabs are ignored.  Defaults
   to true
* `:nil-empty-values?` - When true, empty strings are elided entirely and returned as nil
   values. Defaults to false.
* `:profile` - Either `:immutable` or `:mutable`.  `:immutable` returns persistent vectors
  while `:mutable` returns arraylists.
raw docstring

read-jsonclj

(read-json input & args)

Drop in replacement for clojure.data.json/read and clojure.data.json/read-str. For options see read-json-supplier.

Drop in replacement for clojure.data.json/read and clojure.data.json/read-str.  For options
see [[read-json-supplier]].
raw docstring

read-json-supplierclj

(read-json-supplier input & [options])

Read one or more JSON objects. Returns an auto-closeable supplier that when called by default throws an exception if the read pathway is finished. Input may be a character array or string (most efficient) or something convertible to a reader. Options for conversion to reader are described in reader->char-reader although for the json case we default :async? to false as most json is just too small to benefit from async reading of the input. For input streams

  • unlike csv - :async? defaults to false as most JSON files are relatively small - in the 10-100K range where async loading doesn't make much of a difference. On a larger file, however, setting :async? to true definitely can make a large difference.

Options:

In addition to the options below, see options for reader->char-reader.

  • :bigdec - When true use bigdecimals for floating point numbers. Defaults to false.
  • :double-fn - If :bigdec isn't provided, use this function to parse double values.
  • :profile - Which performance profile to use. This simply provides defaults to :array-iface and :obj-iface. The default :immutable value produces persistent datastructures and supports value-fn and key-fn. :mutable produces an object arrays and java.util.HashMaps - this is about 30% faster. :raw produces ArrayLists for arrays and a JSONReader$JSONObj type with a public data member that is an ArrayList for objects.
  • :key-fn - Function called on each string map key.
  • :value-fn - Function called on each map value. Function is passed the key and val so it takes 2 arguments. If this function returns :tech.v3.datatype.char-input/elided then the key-val pair will be elided from the result.
  • :array-iface - Implementation of JSONReader$ArrayReader called on the object array of values for a javascript array.
  • :obj-iface - Implementation of JSONReader$ObjReader called for each javascript object. Note that providing this overrides key-fn and value-fn.
  • :eof-error? - Defaults to true - when eof is encountered when attempting to read an object throw an EOF error. Else returns a special EOF value.
  • :eof-value - EOF value. Defaults to
  • :eof-fn - Function called if readObject is going to return EOF. Defaults to throwing an EOFException.
Read one or more JSON objects.
Returns an auto-closeable supplier that when called by default throws an exception
if the read pathway is finished.  Input may be a character array or string (most efficient)
or something convertible to a reader.  Options for conversion to reader are described in
[[reader->char-reader]] although for the json case we default `:async?` to false as
most json is just too small to benefit from async reading of the input.  For input streams
- unlike csv - `:async?` defaults to `false` as most JSON files are relatively small -
in the 10-100K range where async loading doesn't make much of a difference.  On a larger
file, however, setting `:async?` to true definitely can make a large difference.

Options:

In addition to the options below, see options for [[reader->char-reader]].

* `:bigdec` - When true use bigdecimals for floating point numbers.  Defaults to false.
* `:double-fn` - If :bigdec isn't provided, use this function to parse double values.
* `:profile` - Which performance profile to use.  This simply provides defaults to
   `:array-iface` and `:obj-iface`. The default `:immutable` value produces persistent datastructures and supports value-fn and key-fn.
   `:mutable` produces an object arrays and java.util.HashMaps - this is about
   30% faster. `:raw` produces ArrayLists for arrays and a
   JSONReader$JSONObj type with a public data member that is an ArrayList for objects.
* `:key-fn` - Function called on each string map key.
* `:value-fn` - Function called on each map value.  Function is passed the key and val so it
   takes 2 arguments.  If this function returns `:tech.v3.datatype.char-input/elided` then
   the key-val pair will be elided from the result.
* `:array-iface` - Implementation of JSONReader$ArrayReader called on the object array of values for a javascript array.
* `:obj-iface` - Implementation of JSONReader$ObjReader called for each javascript
  object.  Note that providing this overrides key-fn and value-fn.
* `:eof-error?` - Defaults to true - when eof is encountered when attempting to read an
   object throw an EOF error.  Else returns a special EOF value.
* `:eof-value` - EOF value.  Defaults to
* `:eof-fn` - Function called if readObject is going to return EOF.  Defaults to throwing an
   EOFException.
raw docstring

reader->char-buf-supplierclj

(reader->char-buf-supplier rdr & [options])

Given a reader, return a supplier that when called reads the next buffer of the reader. When n-buffers is >= 0, this function iterates through a fixed number of buffers under the covers so you need to be cognizant of the number of actual buffers that you want to have present in memory. This fn also implement AutoCloseable and closing it will close the underlying reader.

Options:

  • :n-buffers - Number of buffers to use. Defaults to 6 as the queue size defaults to 4 - if this number is positive but too small then buffers in flight will get overwritten. If n-buffers is <= 0 then buffers are allocated as needed and not reused - this is the safest option but also can make async loading much slower than it would be otherwise. This must be at least 2 larger than queue-depth.
  • :queue-depth - Defaults to 4. See comments on :n-buffers.
  • :bufsize - Size of each buffer - defaults to (* 64 1024). Small improvements are sometimes seen with larger or smaller buffers.
  • :async? - defaults to true if the number of processors is more than one.. When true data is read in an async thread.
  • :close-reader? - When true, close input reader when finished. Defaults to true.
Given a reader, return a supplier that when called reads the next buffer of the reader.
When n-buffers is >= 0, this function iterates through a fixed number of buffers under
the covers so you need to be cognizant of the number of actual buffers that you want to
have present in memory. This fn also implement `AutoCloseable` and closing it will close
the underlying reader.

Options:

* `:n-buffers` - Number of buffers to use.  Defaults to 6 as the queue size defaults to 4 -
if this number is positive but too small then buffers in flight will get overwritten.  If
n-buffers is <= 0 then buffers are allocated as needed and not reused - this is the safest
option but also can make async loading much slower than it would be otherwise.  This must
be at least 2 larger than queue-depth.
* `:queue-depth` - Defaults to 4.  See comments on `:n-buffers`.
* `:bufsize` - Size of each buffer - defaults to (* 64 1024).  Small improvements are
sometimes seen with larger or smaller buffers.
* `:async?` - defaults to true if the number of processors is more than one..  When true
   data is read in an async thread.
* `:close-reader?` - When true, close input reader when finished.  Defaults to true.
raw docstring

reader->char-readerclj

(reader->char-reader rdr)
(reader->char-reader rdr options)

Given a reader, return a CharReader which presents some of the same interface as a pushbackreader but is only capable of pushing back 1 character. It is extremely quick to instantiate this object from a string or character array.

Options:

See options for reader->char-buf-supplier.

Given a reader, return a CharReader which presents some of the same interface
as a pushbackreader but is only capable of pushing back 1 character.  It is extremely
quick to instantiate this object from a string or character array.

Options:

See options for [[reader->char-buf-supplier]].
raw docstring

write-csvclj

(write-csv w data & options)

Writes data to writer in CSV-format. Options:

  • :separator - Default ,)
  • :quote - Default ")
  • :quote? A predicate function which determines if a string should be quoted. Defaults to quoting only when necessary. May also be the the value 'true' in which case every field is quoted. :newline (:lf (default) or :cr+lf) :close-writer? - defaults to true. When true, close writer when finished.
Writes data to writer in CSV-format.
Options:

  * `:separator` - Default \,)
  * `:quote` - Default \")
  * `:quote?` A predicate function which determines if a string should be quoted.
     Defaults to quoting only when necessary.  May also be the the value 'true' in which
     case every field is quoted.
  :newline (:lf (default) or :cr+lf)
  :close-writer? - defaults to true.  When true, close writer when finished.
raw docstring

write-jsonclj

(write-json output data & args)

Write json to output. You can extend the writer to new datatypes by implementing the [[->json-data]] function of the protocol PToJSON. This function need only return json-acceptible datastructures which are numbers, booleans, nil, lists, arrays, and maps. The default type coercion will in general simply call .toString on the object.

Options:

  • :escape-unicode - If true (default) non-ASCII characters are escaped as \uXXXX
  • :escape-js-separators If true (default) the Unicode characters U+2028 and U+2029 will be escaped as \u2028 and \u2029 even if :escape-unicode is false. (These two characters are valid in pure JSON but are not valid in JavaScript strings.
  • :escape-slash If true (default) the slash / is escaped as /
  • :indent-str Defaults to " ". When nil json is printed raw with no indent or whitespace.
  • :obj-fn - Function called on each non-primitive object - it is passed the JSONWriter and the object. The default iterates maps, lists, and arrays converting anything that is not a json primitive or a map, list or array to a json primitive via str. java.sql.Date classes get special treatment and are converted to instants which then converted to json primitive objects via the PToJSon protocol fn [[->json-data]] which defaults to toString. This is the most general override mechanism where you will need to manually call the JSONWriter's methods. The simpler but slightly less general pathway is to override the protocol method [[->json-data]].
Write json to output.  You can extend the writer to new datatypes by implementing
the [[->json-data]] function of the protocol `PToJSON`.  This function need only return
json-acceptible datastructures which are numbers, booleans, nil, lists, arrays, and
maps.  The default type coercion will in general simply call .toString on the object.

Options:

* `:escape-unicode` - If true (default) non-ASCII characters are escaped as \uXXXX
* `:escape-js-separators` If true (default) the Unicode characters U+2028 and U+2029 will
     be escaped as \u2028 and \u2029 even if :escape-unicode is
     false. (These two characters are valid in pure JSON but are not
     valid in JavaScript strings.
* `:escape-slash` If true (default) the slash / is escaped as \/
* `:indent-str` Defaults to "  ".  When nil json is printed raw with no indent or
   whitespace.
* `:obj-fn` - Function called on each non-primitive object - it is passed the JSONWriter and
   the object.  The default iterates maps, lists, and arrays converting anything that is
   not a json primitive or a map, list or array to a json primitive via str.  java.sql.Date
   classes get special treatment and are converted to instants which then converted to
   json primitive objects via the PToJSon protocol fn [[->json-data]] which defaults to
   `toString`.  This is the most general override mechanism where you will need to manually
   call the JSONWriter's methods.  The simpler but slightly less general pathway is to
   override the protocol method [[->json-data]].
raw docstring

write-json-fnclj

(write-json-fn argmap)

Return a function of two arguments, (output,data), that efficiently constructs a json writer and writes the data. This is the most efficient pathway when writing a bunch of small json objects as it avoids the cost associated with unpacking the argument map. Same arguments as write-json.

Return a function of two arguments,  (output,data), that efficiently constructs
a json writer and writes the data. This is the most efficient pathway when writing
a bunch of small json objects as it avoids the cost associated with unpacking the
argument map.  Same arguments as [[write-json]].
raw docstring

write-json-strclj

(write-json-str data & args)

Write json to a string. See options for write-json.

Write json to a string.  See options for [[write-json]].
raw docstring

cljdoc is a website building & hosting documentation for Clojure/Script libraries

× close