This namespace contains functions which operate on a dataset and mostly return a dataset.
The namespaces scicloj.ml.metamorph and scicloj.ml.dataset contain functions with the same name. But they operate on either a context map (ns metamorph) or on a dataset (ns dataset)
The functions in tis namespace are re-exported from:
This namespace contains functions which operate on a dataset and mostly return a dataset. The namespaces scicloj.ml.metamorph and scicloj.ml.dataset contain functions with the same name. But they operate on either a context map (ns metamorph) or on a dataset (ns dataset) The functions in tis namespace are re-exported from: * tabecloth.api - docs at https://scicloj.github.io/tablecloth/ * tech.v3.dataset.modelling * tech.v3.dataset.column-filters
(->array ds colname)
(->array ds colname datatype)
Convert numerical column(s) to java array
Convert numerical column(s) to java array
(add-column ds column-name column)
(add-column ds column-name column size-strategy)
Add or update (modify) column under column-name
.
column
can be sequence of values or generator function (which gets ds
as input).
ds
- a datasetcolumn-name
- if it's existing column name, column will be replacedcolumn
- can be column (from other dataset), sequence, single value or function. Too big columns are always trimmed. Too small are cycled or extended with missing values (according to size-strategy
argument)size-strategy
(optional) - when new column is shorter than dataset row count, following strategies are applied:
:cycle
- repeat data:na
- append missing values:strict
- (default) throws an exception when sizes mismatchAdd or update (modify) column under `column-name`. `column` can be sequence of values or generator function (which gets `ds` as input). * `ds` - a dataset * `column-name` - if it's existing column name, column will be replaced * `column` - can be column (from other dataset), sequence, single value or function. Too big columns are always trimmed. Too small are cycled or extended with missing values (according to `size-strategy` argument) * `size-strategy` (optional) - when new column is shorter than dataset row count, following strategies are applied: - `:cycle` - repeat data - `:na` - append missing values - `:strict` - (default) throws an exception when sizes mismatch
(add-columns ds columns-map)
(add-columns ds columns-map size-strategy)
Add or updade (modify) columns defined in columns-map
(mapping: name -> column)
Add or updade (modify) columns defined in `columns-map` (mapping: name -> column)
(add-or-replace-column ds column-name column)
(add-or-replace-column ds column-name column size-strategy)
(add-or-replace-columns ds columns-map)
(add-or-replace-columns ds columns-map size-strategy)
(aggregate ds aggregator)
(aggregate ds aggregator options)
Aggregate dataset by providing:
Aggregation functions can return:
Aggregate dataset by providing: - aggregation function - map with column names and functions - sequence of aggregation functions Aggregation functions can return: - single value - seq of values - map of values with column names
(aggregate-columns ds columns-selector column-aggregators)
(aggregate-columns ds columns-selector column-aggregators options)
Aggregates each column separately
Aggregates each column separately
(anti-join ds-left ds-right columns-selector)
(anti-join ds-left ds-right columns-selector options)
(as-regular-dataset ds)
Remove grouping tag
Remove grouping tag
(boolean dataset)
Return a dataset containing only the boolean columns.
Return a dataset containing only the boolean columns.
(by-rank ds columns-selector rank-predicate)
(by-rank ds columns-selector rank-predicate options)
Select rows using rank
on a column, ties are resolved using :dense
method.
See R docs. Rank uses 0 based indexing.
Possible :ties
strategies: :average
, :first
, :last
, :random
, :min
, :max
, :dense
.
:dense
is the same as in data.table::frank
from R
:desc?
set to true (default) order descending before calculating rank
Select rows using `rank` on a column, ties are resolved using `:dense` method. See [R docs](https://www.rdocumentation.org/packages/base/versions/3.6.1/topics/rank). Rank uses 0 based indexing. Possible `:ties` strategies: `:average`, `:first`, `:last`, `:random`, `:min`, `:max`, `:dense`. `:dense` is the same as in `data.table::frank` from R `:desc?` set to true (default) order descending before calculating rank
(categorical dataset)
Return a dataset containing only the categorical columns.
Return a dataset containing only the categorical columns.
(categorical->number dataset filter-fn-or-ds)
(categorical->number dataset filter-fn-or-ds table-args)
(categorical->number dataset filter-fn-or-ds table-args result-datatype)
Convert columns into a discrete , numeric representation See tech.v3.dataset.categorical/fit-categorical-map.
Convert columns into a discrete , numeric representation See tech.v3.dataset.categorical/fit-categorical-map.
(categorical->one-hot dataset filter-fn-or-ds)
(categorical->one-hot dataset filter-fn-or-ds table-args)
(categorical->one-hot dataset filter-fn-or-ds table-args result-datatype)
Convert string columns to numeric columns. See tech.v3.dataset.categorical/fit-one-hot
Convert string columns to numeric columns. See tech.v3.dataset.categorical/fit-one-hot
(clone item)
Clone an object. Can clone anything convertible to a reader.
Clone an object. Can clone anything convertible to a reader.
(column-filter dataset filter-fn)
Return a dataset with only the columns for which the filter function returns a truthy value.
Return a dataset with only the columns for which the filter function returns a truthy value.
(column-names ds)
(column-names ds columns-selector)
(column-names ds columns-selector meta-field)
(column-values->categorical dataset src-column)
Given a column encoded via either string->number or one-hot, reverse map to the a sequence of the original string column values. In the case of one-hot mappings, src-column must be the original column name before the one-hot map
Given a column encoded via either string->number or one-hot, reverse map to the a sequence of the original string column values. In the case of one-hot mappings, src-column must be the original column name before the one-hot map
(columns ds)
(columns ds result-type)
Returns columns of dataset. Result type can be any of:
:as-map
:as-double-arrays
:as-seqs
Returns columns of dataset. Result type can be any of: * `:as-map` * `:as-double-arrays` * `:as-seqs`
(convert-types ds coltype-map-or-columns-selector)
(convert-types ds columns-selector new-types)
Convert type of the column to the other type.
Convert type of the column to the other type.
(dataset)
(dataset data)
(dataset data options)
Create dataset
.
Dataset can be created from:
Create `dataset`. Dataset can be created from: * single value * map of values and/or sequences * sequence of maps * sequence of columns * file or url
(dataset->categorical-maps dataset)
Given a dataset, return a map of column names to categorical label maps. This aids in inverting all of the label maps in a dataset. The source column name is src-column.
Given a dataset, return a map of column names to categorical label maps. This aids in inverting all of the label maps in a dataset. The source column name is src-column.
(dataset->categorical-xforms ds)
Given a dataset, return a map of column-name->xform information.
Given a dataset, return a map of column-name->xform information.
(dataset->one-hot-maps dataset)
Given a dataset, return a sequence of applied on-hot transformations.
Given a dataset, return a sequence of applied on-hot transformations.
(dataset->str ds)
(dataset->str ds options)
Convert a dataset to a string. Prints a single line header and then calls dataset-data->str.
For options documentation see dataset-data->str.
Convert a dataset to a string. Prints a single line header and then calls dataset-data->str. For options documentation see dataset-data->str.
(datetime dataset)
Return a dataset containing only the datetime columns.
Return a dataset containing only the datetime columns.
(drop ds columns-selector rows-selector)
Drop columns and rows.
Drop columns and rows.
(drop-columns ds)
(drop-columns ds columns-selector)
(drop-columns ds columns-selector meta-field)
Drop columns by (returns dataset):
Drop columns by (returns dataset): - name - sequence of names - map of names with new names (rename) - function which filter names (via column metadata)
(drop-missing ds)
(drop-missing ds columns-selector)
Drop rows with missing values
columns-selector
selects columns to look at missing values
Drop rows with missing values `columns-selector` selects columns to look at missing values
(drop-rows ds)
(drop-rows ds rows-selector)
(drop-rows ds rows-selector options)
Drop rows using:
Drop rows using: - row id - seq of row ids - seq of true/false - fn with predicate
(feature dataset)
Return a dataset container only the columns which have not been marked as inference columns.
Return a dataset container only the columns which have not been marked as inference columns.
(feature-ecount dataset)
Number of feature columns. Feature columns are columns that are not inference targets.
Number of feature columns. Feature columns are columns that are not inference targets.
(fill-range-replace ds colname max-span)
(fill-range-replace ds colname max-span missing-strategy)
(fill-range-replace ds colname max-span missing-strategy missing-value)
(fit-categorical-map dataset colname & [table-args res-dtype])
Given a column, map it into an numeric space via a discrete map of values to integers. This fits the categorical transformation onto the column and returns the transformation.
If table-args
is not given, the distinct column values will be mapped into 0..x without any specific order.
'table-args` allows to specify the precise mapping as a sequence of pairs of [val idx] or as a sorted seq of values.
Given a column, map it into an numeric space via a discrete map of values to integers. This fits the categorical transformation onto the column and returns the transformation. If `table-args` is not given, the distinct column values will be mapped into 0..x without any specific order. 'table-args` allows to specify the precise mapping as a sequence of pairs of [val idx] or as a sorted seq of values.
(fit-one-hot dataset colname & [table-args res-dtype])
Fit a one hot transformation to a column. Returns a reusable transformation. Maps each unique value to a column with 1 every time the value appears in the original column and 0 otherwise.
Fit a one hot transformation to a column. Returns a reusable transformation. Maps each unique value to a column with 1 every time the value appears in the original column and 0 otherwise.
(fold-by ds columns-selector)
(fold-by ds columns-selector folding-function)
Group-by and pack columns into vector - the output data set has a row for each unique combination of the provided columns while each remaining column has its valu(es) collected into a vector, similar to how clojure.core/group-by works. See https://scicloj.github.io/tablecloth/index.html#Fold-by
Group-by and pack columns into vector - the output data set has a row for each unique combination of the provided columns while each remaining column has its valu(es) collected into a vector, similar to how clojure.core/group-by works. See https://scicloj.github.io/tablecloth/index.html#Fold-by
(full-join ds-left ds-right columns-selector)
(full-join ds-left ds-right columns-selector options)
(group-by ds grouping-selector)
(group-by ds grouping-selector options)
Group dataset by:
Options are:
select-keys
seq.:as-dataset
, default) or as map of datasets (:as-map
) or as map of row indexes (:as-indexes
) or as sequence of (sub)datasetsdataset
fnWhen dataset is returned, meta contains :grouped?
set to true. Columns in dataset:
Group dataset by: - column name - list of columns - map of keys and row indexes - function getting map of values Options are: - select-keys - when grouping is done by function, you can limit fields to a `select-keys` seq. - result-type - return results as dataset (`:as-dataset`, default) or as map of datasets (`:as-map`) or as map of row indexes (`:as-indexes`) or as sequence of (sub)datasets - other parameters which are passed to `dataset` fn When dataset is returned, meta contains `:grouped?` set to true. Columns in dataset: - name - group name - group-id - id of the group (int) - data - group as dataset
(grouped? ds)
Is dataset
represents grouped dataset (result of group-by
)?
Is `dataset` represents grouped dataset (result of `group-by`)?
(groups->map ds)
Convert grouped dataset to the map of groups
Convert grouped dataset to the map of groups
(inference-target-column-names ds)
Return the names of the columns that are inference targets.
Return the names of the columns that are inference targets.
(inference-target-ds dataset)
Given a dataset return reverse-mapped inference target columns or nil in the case where there are no inference targets.
Given a dataset return reverse-mapped inference target columns or nil in the case where there are no inference targets.
(inference-target-label-inverse-map dataset & [label-columns])
Given options generated during ETL operations and annotated with :label-columns sequence container 1 label column, generate a reverse map that maps from a dataset value back to the label that generated that value.
Given options generated during ETL operations and annotated with :label-columns sequence container 1 label column, generate a reverse map that maps from a dataset value back to the label that generated that value.
(inner-join ds-left ds-right columns-selector)
(inner-join ds-left ds-right columns-selector options)
(intersection lhs-ds rhs-ds)
Return only columns for rhs for which an equivalently named column exists in lhs.
Return only columns for rhs for which an equivalently named column exists in lhs.
(invert-categorical-map dataset {:keys [src-column lookup-table]})
Invert a categorical map returning the column to the original set of values.
Invert a categorical map returning the column to the original set of values.
(invert-one-hot-map dataset {:keys [one-hot-table src-column]})
Invert a one-hot transformation removing the one-hot columns and adding back the original column.
Invert a one-hot transformation removing the one-hot columns and adding back the original column.
(join-columns ds target-column columns-selector)
(join-columns ds target-column columns-selector conf)
(k-fold-datasets dataset k)
(k-fold-datasets dataset k options)
Given 1 dataset, prepary K datasets using the k-fold algorithm. Randomize dataset defaults to true which will realize the entire dataset so use with care if you have large datasets.
Returns a sequence of {:test-ds :train-ds}
Options:
:randomize-dataset?
- When true, shuffle the dataset. In that case 'seed' may be
provided. Defaults to true.:seed
- when :randomize-dataset?
is true then this can either be an
implementation of java.util.Random or an integer seed which will be used to
construct java.util.Random.Given 1 dataset, prepary K datasets using the k-fold algorithm. Randomize dataset defaults to true which will realize the entire dataset so use with care if you have large datasets. Returns a sequence of {:test-ds :train-ds} Options: * `:randomize-dataset?` - When true, shuffle the dataset. In that case 'seed' may be provided. Defaults to true. * `:seed` - when `:randomize-dataset?` is true then this can either be an implementation of java.util.Random or an integer seed which will be used to construct java.util.Random.
(labels dataset)
Return the labels. The labels sequence is the reverse mapped inference column. This returns a single column of data or errors out.
Return the labels. The labels sequence is the reverse mapped inference column. This returns a single column of data or errors out.
(left-join ds-left ds-right columns-selector)
(left-join ds-left ds-right columns-selector options)
(map-columns ds column-name map-fn)
(map-columns ds column-name columns-selector map-fn)
(map-columns ds column-name new-type columns-selector map-fn)
(metadata-filter dataset filter-fn)
Return a dataset with only the columns for which, given the column metadata, the filter function returns a truthy value.
Return a dataset with only the columns for which, given the column metadata, the filter function returns a truthy value.
(missing dataset)
Return a dataset with only columns have have missing values
Return a dataset with only columns have have missing values
(model-type dataset & [column-name-seq])
Check the label column after dataset processing. Return either :regression :classification
Check the label column after dataset processing. Return either :regression :classification
(no-missing dataset)
Return a dataset with only columns that have no missing values.
Return a dataset with only columns that have no missing values.
(num-inference-classes dataset)
Given a dataset and correctly built options from pipeline operations, return the number of classes used for the label. Error if not classification dataset.
Given a dataset and correctly built options from pipeline operations, return the number of classes used for the label. Error if not classification dataset.
(numeric dataset)
Return a dataset containing only the numeric columns.
Return a dataset containing only the numeric columns.
(of-datatype dataset datatype)
Return a dataset containing only the columns of a specific datatype.
Return a dataset containing only the columns of a specific datatype.
(order-by ds columns-or-fn)
(order-by ds columns-or-fn comparators)
(order-by ds columns-or-fn comparators options)
Order dataset by:
Order dataset by: - column name - columns (as sequence of names) - key-fn - sequence of columns / key-fn Additionally you can ask the order by: - :asc - :desc - custom comparator function
(pivot->longer ds)
(pivot->longer ds columns-selector)
(pivot->longer ds columns-selector options)
tidyr
pivot_longer api
`tidyr` pivot_longer api
(pivot->wider ds columns-selector value-columns)
(pivot->wider ds columns-selector value-columns options)
(prediction dataset)
Return the columns of the dataset marked as predictions.
Return the columns of the dataset marked as predictions.
(probability-distribution dataset)
Return the columns of the dataset that comprise the probability distribution after classification.
Return the columns of the dataset that comprise the probability distribution after classification.
(probability-distributions->label-column prob-ds dst-colname)
Given a dataset that has columns in which the column names describe labels and the rows describe a probability distribution, create a label column by taking the max value in each row and assign column that row value.
Given a dataset that has columns in which the column names describe labels and the rows describe a probability distribution, create a label column by taking the max value in each row and assign column that row value.
(rename-columns ds columns-mapping)
(rename-columns ds columns-selector columns-map-fn)
Rename columns with provided old -> new name map
Rename columns with provided old -> new name map
(reorder-columns ds columns-selector & args)
Reorder columns using column selector(s). When column names are incomplete, the missing will be attached at the end.
Reorder columns using column selector(s). When column names are incomplete, the missing will be attached at the end.
(replace-missing ds)
(replace-missing ds strategy)
(replace-missing ds columns-selector strategy)
(replace-missing ds columns-selector strategy value)
(reverse-map-categorical-xforms dataset)
Given a dataset where we have converted columns from a categorical representation to either a numeric reprsentation or a one-hot representation, reverse map back to the original dataset given the reverse mapping of label->number in the column's metadata.
Given a dataset where we have converted columns from a categorical representation to either a numeric reprsentation or a one-hot representation, reverse map back to the original dataset given the reverse mapping of label->number in the column's metadata.
(right-join ds-left ds-right columns-selector)
(right-join ds-left ds-right columns-selector options)
(rows ds)
(rows ds result-type)
Returns rows of dataset. Result type can be any of:
:as-maps
:as-double-arrays
:as-seqs
Returns rows of dataset. Result type can be any of: * `:as-maps` * `:as-double-arrays` * `:as-seqs`
(select ds columns-selector rows-selector)
Select columns and rows.
Select columns and rows.
(select-columns ds)
(select-columns ds columns-selector)
(select-columns ds columns-selector meta-field)
Select columns by (returns dataset):
Select columns by (returns dataset): - name - sequence of names - map of names with new names (rename) - function which filter names (via column metadata)
(select-missing ds)
(select-missing ds columns-selector)
Select rows with missing values
columns-selector
selects columns to look at missing values
Select rows with missing values `columns-selector` selects columns to look at missing values
(select-rows ds)
(select-rows ds rows-selector)
(select-rows ds rows-selector options)
Select rows using:
Select rows using: - row id - seq of row ids - seq of true/false - fn with predicate
(semi-join ds-left ds-right columns-selector)
(semi-join ds-left ds-right columns-selector options)
(separate-column ds column separator)
(separate-column ds column target-columns separator)
(separate-column ds column target-columns separator conf)
(set-inference-target dataset target-name-or-target-name-seq)
Set the inference target on the column. This sets the :column-type member of the column metadata to :inference-target?.
Set the inference target on the column. This sets the :column-type member of the column metadata to :inference-target?.
(shape ds)
Returns shape of the dataset [rows, cols]
Returns shape of the dataset [rows, cols]
(split ds)
(split ds split-type)
(split ds split-type options)
Split given dataset into 2 or more (holdout) splits
As the result two new columns are added:
:$split-name
- with subgroup name:$split-id
- fold id/repetition idsplit-type
can be one of the following:
:kfold
- k-fold strategy, :k
defines number of folds (defaults to 5
), produces k
splits:bootstrap
- :ratio
defines ratio of observations put into result (defaults to 1.0
), produces 1
split:holdout
- split into two parts with given ratio (defaults to 2/3
), produces 1
split:loo
- leave one out, produces the same number of splits as number of observations:holdout
can accept also probabilites or ratios and can split to more than 2 subdatasets
Additionally you can provide:
:seed
- for random number generator:repeats
- repeat procedure :repeats
times:partition-selector
- same as in group-by
for stratified splitting to reflect dataset structure in splits.:split-names
names of subdatasets different than default, ie. [:train :test :split-2 ...]
:split-col-name
- a column where name of split is stored, either :train
or :test
values (default: :$split-name
):split-id-col-name
- a column where id of the train/test pair is stored (default: :$split-id
)Rows are shuffled before splitting.
In case of grouped dataset each group is processed separately.
See more
Split given dataset into 2 or more (holdout) splits As the result two new columns are added: * `:$split-name` - with subgroup name * `:$split-id` - fold id/repetition id `split-type` can be one of the following: * `:kfold` - k-fold strategy, `:k` defines number of folds (defaults to `5`), produces `k` splits * `:bootstrap` - `:ratio` defines ratio of observations put into result (defaults to `1.0`), produces `1` split * `:holdout` - split into two parts with given ratio (defaults to `2/3`), produces `1` split * `:loo` - leave one out, produces the same number of splits as number of observations `:holdout` can accept also probabilites or ratios and can split to more than 2 subdatasets Additionally you can provide: * `:seed` - for random number generator * `:repeats` - repeat procedure `:repeats` times * `:partition-selector` - same as in `group-by` for stratified splitting to reflect dataset structure in splits. * `:split-names` names of subdatasets different than default, ie. `[:train :test :split-2 ...]` * `:split-col-name` - a column where name of split is stored, either `:train` or `:test` values (default: `:$split-name`) * `:split-id-col-name` - a column where id of the train/test pair is stored (default: `:$split-id`) Rows are shuffled before splitting. In case of grouped dataset each group is processed separately. See [more](https://www.mitpressjournals.org/doi/pdf/10.1162/EVCO_a_00069)
(split->seq ds)
(split->seq ds split-type)
(split->seq ds split-type options)
Returns split as a sequence of train/test datasets or map of sequences (grouped dataset)
Returns split as a sequence of train/test datasets or map of sequences (grouped dataset)
(string dataset)
Return a dataset containing only the string columns.
Return a dataset containing only the string columns.
(target dataset)
Return a dataset containing only the columns that have been marked as inference targets.
Return a dataset containing only the columns that have been marked as inference targets.
(train-test-split dataset)
(train-test-split dataset
{:keys [train-fraction] :or {train-fraction 0.7} :as options})
Probabilistically split the dataset returning a map of {:train-ds :test-ds}
.
Options:
:randomize-dataset?
- When true, shuffle the dataset. In that case 'seed' may be
provided. Defaults to true.:seed
- when :randomize-dataset?
is true then this can either be an
implementation of java.util.Random or an integer seed which will be used to
construct java.util.Random.:train-fraction
- Fraction of the dataset to use as training set. Defaults to
0.7.Probabilistically split the dataset returning a map of `{:train-ds :test-ds}`. Options: * `:randomize-dataset?` - When true, shuffle the dataset. In that case 'seed' may be provided. Defaults to true. * `:seed` - when `:randomize-dataset?` is true then this can either be an implementation of java.util.Random or an integer seed which will be used to construct java.util.Random. * `:train-fraction` - Fraction of the dataset to use as training set. Defaults to 0.7.
(transform-categorical-map dataset fit-data)
Apply a categorical mapping transformation fit with fit-categorical-map.
Apply a categorical mapping transformation fit with fit-categorical-map.
(transform-one-hot dataset one-hot-fit-data)
Apply a one-hot transformation to a dataset
Apply a one-hot transformation to a dataset
(ungroup ds)
(ungroup ds options)
Concat groups into dataset.
When add-group-as-column
or add-group-id-as-column
is set to true
or name(s), columns with group name(s) or group id is added to the result.
Before joining the groups groups can be sorted by group name.
Concat groups into dataset. When `add-group-as-column` or `add-group-id-as-column` is set to `true` or name(s), columns with group name(s) or group id is added to the result. Before joining the groups groups can be sorted by group name.
(unique-by ds)
(unique-by ds columns-selector)
(unique-by ds columns-selector options)
(unroll ds columns-selector)
(unroll ds columns-selector options)
Unfolds sequences stored inside a column(s), turning it into multiple columns. Opposite of fold-by
.
Add each of the provided columns to the set that defines the "uniqe key" of each row.
Thus there will be a new row for each value inside the target column(s)' value sequence.
If you want instead to split the content of the columns into a set of new columns, look at separate-column
.
See https://scicloj.github.io/tablecloth/index.html#Unroll
Unfolds sequences stored inside a column(s), turning it into multiple columns. Opposite of [[fold-by]]. Add each of the provided columns to the set that defines the "uniqe key" of each row. Thus there will be a new row for each value inside the target column(s)' value sequence. If you want instead to split the content of the columns into a set of new _columns_, look at [[separate-column]]. See https://scicloj.github.io/tablecloth/index.html#Unroll
(update-columns ds columns-map)
(update-columns ds columns-selector update-functions)
(write! dataset output-path)
(write! dataset output-path options)
Write a dataset out to a file. Supported forms are:
(ds/write! test-ds "test.csv")
(ds/write! test-ds "test.tsv")
(ds/write! test-ds "test.tsv.gz")
(ds/write! test-ds "test.nippy")
(ds/write! test-ds out-stream)
Options:
:max-chars-per-column
- csv,tsv specific, defaults to 65536 - values longer than this will
cause an exception during serialization.:max-num-columns
- csv,tsv specific, defaults to 8192 - If the dataset has more than this number of
columns an exception will be thrown during serialization.:quoted-columns
- csv specific - sequence of columns names that you would like to always have quoted.:file-type
- Manually specify the file type. This is usually inferred from the filename but if you
pass in an output stream then you will need to specify the file type.:headers?
- if csv headers are written, defaults to true.Write a dataset out to a file. Supported forms are: ```clojure (ds/write! test-ds "test.csv") (ds/write! test-ds "test.tsv") (ds/write! test-ds "test.tsv.gz") (ds/write! test-ds "test.nippy") (ds/write! test-ds out-stream) ``` Options: * `:max-chars-per-column` - csv,tsv specific, defaults to 65536 - values longer than this will cause an exception during serialization. * `:max-num-columns` - csv,tsv specific, defaults to 8192 - If the dataset has more than this number of columns an exception will be thrown during serialization. * `:quoted-columns` - csv specific - sequence of columns names that you would like to always have quoted. * `:file-type` - Manually specify the file type. This is usually inferred from the filename but if you pass in an output stream then you will need to specify the file type. * `:headers?` - if csv headers are written, defaults to true.
cljdoc is a website building & hosting documentation for Clojure/Script libraries
× close