This file really should be named univocity.clj. But it is for parsing and writing csv and tsv data.
This file really should be named univocity.clj. But it is for parsing and writing csv and tsv data.
(create-csv-parser
{:keys [header-row? num-rows column-whitelist column-blacklist separator
n-initial-skip-rows max-chars-per-column max-num-columns]
:or {header-row? true max-chars-per-column (* 64 1024) max-num-columns 8192}
:as options})
Create an implementation of univocity csv parser.
Create an implementation of univocity csv parser.
(csv->dataset input)
(csv->dataset input options)
Non-lazily and serially parse the columns. Returns a vector of maps of { :name column-name :missing long-reader of in-order missing indexes :data typed reader/writer of data :metadata - optional map with unparsed-indexes and unparsed-values } Supports a subset of tech.ml.dataset/->dataset options: :column-whitelist :column-blacklist :n-initial-skip-rows :num-rows :header-row? :separator :parser-fn :parser-scan-len
Non-lazily and serially parse the columns. Returns a vector of maps of { :name column-name :missing long-reader of in-order missing indexes :data typed reader/writer of data :metadata - optional map with unparsed-indexes and unparsed-values } Supports a subset of tech.ml.dataset/->dataset options: :column-whitelist :column-blacklist :n-initial-skip-rows :num-rows :header-row? :separator :parser-fn :parser-scan-len
(csv->rows input)
(csv->rows input options)
Given a csv, produces a sequence of rows. The csv options from ->dataset apply here.
options: :column-whitelist - either sequence of string column names or sequence of column indices of columns to whitelist. :column-blacklist - either sequence of string column names or sequence of column indices of columns to blacklist. :num-rows - Number of rows to read :separator - Add a character separator to the list of separators to auto-detect. :max-chars-per-column - Defaults to 4096. Columns with more characters that this will result in an exception. :max-num-columns - Defaults to 8192. CSV,TSV files with more columns than this will fail to parse. For more information on this option, please visit: https://github.com/uniVocity/univocity-parsers/issues/301
Given a csv, produces a sequence of rows. The csv options from ->dataset apply here. options: :column-whitelist - either sequence of string column names or sequence of column indices of columns to whitelist. :column-blacklist - either sequence of string column names or sequence of column indices of columns to blacklist. :num-rows - Number of rows to read :separator - Add a character separator to the list of separators to auto-detect. :max-chars-per-column - Defaults to 4096. Columns with more characters that this will result in an exception. :max-num-columns - Defaults to 8192. CSV,TSV files with more columns than this will fail to parse. For more information on this option, please visit: https://github.com/uniVocity/univocity-parsers/issues/301
(column-data parser)
Return a map containing {:data - convertible-to-reader column data. :missing - convertible-to-reader array of missing values.
Return a map containing {:data - convertible-to-reader column data. :missing - convertible-to-reader array of missing values.
(missing! parser)
Mark a value as missing.
Mark a value as missing.
(parse! parser str-val)
Side-effecting parse the value and store it. Exceptions escaping from here will stop the parsing system.
Side-effecting parse the value and store it. Exceptions escaping from here will stop the parsing system.
(can-parse? parser str-val)
(make-parser-container parser)
(simple-missing! parser container)
(simple-parse! parser container str-val)
(raw-row-iterable input)
(raw-row-iterable input parser)
Returns an iterable that produces map of {:header-row - string[] :rows - iterable producing string[] rows }
Returns an iterable that produces map of {:header-row - string[] :rows - iterable producing string[] rows }
(rows->dataset {:keys [header-row? parser-fn parser-scan-len bad-row-policy
skip-bad-rows?]
:or {header-row? true parser-scan-len 100}
:as options}
row-seq)
Given a sequence of string[] rows, parse into columnar data. See csv->columns. This method is useful if you have another way of generating sequences of string[] row data.
Given a sequence of string[] rows, parse into columnar data. See csv->columns. This method is useful if you have another way of generating sequences of string[] row data.
(rows->n-row-sequences row-seq)
(rows->n-row-sequences options row-seq)
(rows->n-row-sequences {:keys [header-row?] :or {header-row? true}} n row-seq)
Used for parallizing loading of a csv. Returns N sequences that fed from a single sequence of rows. Experimental - Not the most effectively way of speeding up loading.
Type-hinting your columns and providing specific parsers for datetime types like: (ds/->dataset input {:parser-fn {"date" [:packed-local-date "yyyy-MM-dd"]}}) may have a larger effect than parallelization in most cases.
Loading multiple files in parallel will also have a larger effect than single-file parallelization in most cases.
Used for parallizing loading of a csv. Returns N sequences that fed from a single sequence of rows. Experimental - Not the most effectively way of speeding up loading. Type-hinting your columns and providing specific parsers for datetime types like: (ds/->dataset input {:parser-fn {"date" [:packed-local-date "yyyy-MM-dd"]}}) may have a larger effect than parallelization in most cases. Loading multiple files in parallel will also have a larger effect than single-file parallelization in most cases.
(write! output header-string-array row-string-array-seq)
(write! output
header-string-array
row-string-array-seq
{:keys [separator] :or {separator \tab} :as options})
cljdoc is a website building & hosting documentation for Clojure/Script libraries
× close