tablecloth.api

Liking cljdoc? Tell your friends :D

Clojure only.

->array
add-column
add-columns
add-or-replace-column
add-or-replace-columns
aggregate
aggregate-columns
anti-join
append
array-column->columns
as-regular-dataset
asof-join
bind
by-rank
clone
column
column-count
column-names
columns
columns->array-column
complete
concat
concat-copying
convert-types
cross-join
crosstab
dataset
dataset->str
dataset-name
dataset?
difference
drop
drop-columns
drop-missing
drop-rows
empty-ds?
expand
fill-range-replace
first
fold-by
full-join
get-entry
group-by
grouped?
groups->map
groups->seq
has-column?
head
info
inner-join
intersect
join-columns
last
left-join
let-dataset
map-columns
map-rows
mark-as-group
order-by
pivot->longer
pivot->wider
print-dataset
process-group-data
rand-nth
random
read-nippy
rename-columns
reorder-columns
replace-missing
right-join
row-count
rows
select
select-columns
select-missing
select-rows
semi-join
separate-column
set-dataset-name
shape
shuffle
split
split->seq
tail
ungroup
union
unique-by
unmark-group
unroll
update-columns
without-grouping->
write!
write-csv!
write-nippy!

Tablecloth API

Tablecloth API

raw docstring

->array^clj

(->array ds colname)

(->array ds colname datatype)

Convert numerical column(s) to java array

Convert numerical column(s) to java array

source raw docstring

add-column^clj

(add-column ds column-name column)

(add-column ds column-name column size-strategy)

Add or update (modify) column under column-name.

column can be sequence of values or generator function (which gets ds as input).

ds - a dataset
column-name - if it's existing column name, column will be replaced
column - can be column (from other dataset), sequence, single value or function (taking a dataset). Too big columns are always trimmed. Too small are cycled or extended with missing values (according to size-strategy argument)
size-strategy (optional) - when new column is shorter than dataset row count, following strategies are applied:
- :cycle - repeat data
- :na - append missing values
- :strict - (default) throws an exception when sizes mismatch

Add or update (modify) column under `column-name`.

`column` can be sequence of values or generator function (which gets `ds` as input).

* `ds` - a dataset
* `column-name` - if it's existing column name, column will be replaced
* `column` - can be column (from other dataset), sequence, single value or function (taking a dataset). Too big columns are always trimmed. Too small are cycled or extended with missing values (according to `size-strategy` argument)
* `size-strategy` (optional) - when new column is shorter than dataset row count, following strategies are applied:
  - `:cycle` - repeat data
  - `:na` - append missing values
  - `:strict` - (default) throws an exception when sizes mismatch

source raw docstring

add-columns^clj

(add-columns ds columns-map)

(add-columns ds columns-map size-strategy)

Add or updade (modify) columns defined in columns-map (mapping: name -> column)

Add or updade (modify) columns defined in `columns-map` (mapping: name -> column)

source raw docstring

add-or-replace-column^clj

(add-or-replace-column ds column-name column)

(add-or-replace-column ds column-name column size-strategy)

source

add-or-replace-columns^clj

(add-or-replace-columns ds columns-map)

(add-or-replace-columns ds columns-map size-strategy)

source

aggregate^clj

(aggregate ds aggregator)

(aggregate ds aggregator options)

Aggregate dataset by providing:

aggregation function
map with column names and functions
sequence of aggregation functions

Aggregation functions can return:

single value
seq of values
map of values with column names

Aggregate dataset by providing:

- aggregation function
- map with column names and functions
- sequence of aggregation functions

Aggregation functions can return:
- single value
- seq of values
- map of values with column names

source raw docstring

aggregate-columns^clj

(aggregate-columns ds columns-aggregators)

(aggregate-columns ds columns-selector column-aggregators)

(aggregate-columns ds columns-selector column-aggregators options)

Aggregates each column separately

Aggregates each column separately

source raw docstring

anti-join^clj

(anti-join ds-left ds-right columns-selector)

(anti-join ds-left ds-right columns-selector options)

source

append^clj

(append ds & args)

Concats columns of several datasets

Concats columns of several datasets

source raw docstring

array-column->columns^clj

(array-column->columns ds src-column)

(array-column->columns ds src-column opts)

Converts a column of type java array into several columns, one for each element of the array of all rows. The source column is dropped afterwards. The function assumes that arrays in all rows have same type and length and are numeric.

ds Datset to operate on. src-column The (array) column to convert opts can contain: prefix newly created column will get prefix before column number

Converts a column of type java array into several columns,
one for each element of the array of all rows. The source column is dropped afterwards.
The function assumes that arrays in all rows have same type and length and are numeric.

`ds` Datset to operate on.
`src-column` The (array) column to convert
`opts` can contain:
  `prefix` newly created column will get prefix before column number

source raw docstring

as-regular-dataset^clj

(as-regular-dataset ds)

Remove grouping tag

Remove grouping tag

source raw docstring

asof-join^clj

(asof-join ds-left ds-right columns-selector)

(asof-join ds-left ds-right columns-selector options)

source

bind^clj

(bind ds & args)

source

by-rank^clj

(by-rank ds columns-selector rank-predicate)

(by-rank ds columns-selector rank-predicate options)

Select rows using rank on a column, ties are resolved using :dense method.

See R docs. Rank uses 0 based indexing.

Possible :ties strategies: :average, :first, :last, :random, :min, :max, :dense. :dense is the same as in data.table::frank from R

:desc? set to true (default) order descending before calculating rank

Select rows using `rank` on a column, ties are resolved using `:dense` method.

See [R docs](https://www.rdocumentation.org/packages/base/versions/3.6.1/topics/rank).
Rank uses 0 based indexing.

Possible `:ties` strategies: `:average`, `:first`, `:last`, `:random`, `:min`, `:max`, `:dense`.
`:dense` is the same as in `data.table::frank` from R

`:desc?` set to true (default) order descending before calculating rank

source raw docstring

clone^clj

(clone item)

Clone an object. Can clone anything convertible to a reader.

Clone an object.  Can clone anything convertible to a reader.

source raw docstring

column^clj

(column dataset colname)

source

column-count^clj

(column-count dataset)

source

column-names^clj

(column-names ds)

(column-names ds columns-selector)

(column-names ds columns-selector meta-field)

Returns column names, given a selector. Columns-selector can be one of the following:

:all keyword - selects all columns
column name - for single column
sequence of column names - for collection of columns
regex - to apply pattern on column names or datatype
filter predicate - to filter column names or datatype
type namespaced keyword for specific datatype or group of datatypes

Column name can be anything.

column-names function returns names according to columns-selector and optional meta-field. meta-field is one of the following:

:name (default) - to operate on column names
:datatype - to operated on column types
:all - if you want to process all metadata

Datatype groups are:

:type/numerical - any numerical type
:type/float - floating point number (:float32 and :float64)
:type/integer - any integer
:type/datetime - any datetime type

If qualified keyword starts with :!type, complement set is used.

Returns column names, given a selector.
  Columns-selector can be one of the following:

  * :all keyword - selects all columns
  * column name - for single column
  * sequence of column names - for collection of columns
  * regex - to apply pattern on column names or datatype
  * filter predicate - to filter column names or datatype
  * type namespaced keyword for specific datatype or group of datatypes

  Column name can be anything.

column-names function returns names according to columns-selector
  and optional meta-field. meta-field is one of the following:

  * `:name` (default) - to operate on column names
  * `:datatype` - to operated on column types
  * `:all` - if you want to process all metadata

  Datatype groups are:

  * `:type/numerical` - any numerical type
  * `:type/float` - floating point number (:float32 and :float64)
  * `:type/integer` - any integer
  * `:type/datetime` - any datetime type

  If qualified keyword starts with :!type, complement set is used.

source raw docstring

columns^clj

(columns ds)

(columns ds result-type)

Returns columns of dataset. Result type can be any of:

:as-map
:as-double-arrays
:as-seqs

Returns columns of dataset. Result type can be any of:
* `:as-map`
* `:as-double-arrays`
* `:as-seqs`

source raw docstring

columns->array-column^clj

(columns->array-column ds column-selector new-column)

Converts several columns to a single column of type array. The src columns are dropped afterwards.

ds Dataset to operate on. column-selector anything supported by select-columns new-column new column to create

Converts several columns to a single column of type array.
 The src columns are dropped afterwards.

`ds` Dataset to operate on.
`column-selector` anything supported by [[select-columns]]
`new-column` new column to create

source raw docstring

complete^clj

(complete ds columns-selector & args)

TidyR complete.

Fills a dataset with all possible combinations of selected columns. When a given combination doesn't exist, missing values are created.

TidyR complete.

Fills a dataset with all possible combinations of selected columns. When a given combination doesn't exist, missing values are created.

source raw docstring

concat^clj

(concat dataset & args)

Joins rows from other datasets

Joins rows from other datasets

source raw docstring

concat-copying^clj

(concat-copying dataset & args)

Joins rows from other datasets via a copy of data

Joins rows from other datasets via a copy of data

source raw docstring

convert-types^clj

(convert-types ds coltype-map-or-columns-selector)

(convert-types ds columns-selector new-types)

Convert type of the column to the other type.

Convert type of the column to the other type.

source raw docstring

cross-join^clj

(cross-join ds-left ds-right)

(cross-join ds-left ds-right columns-selector)

(cross-join ds-left ds-right columns-selector options)

Cross product from selected columns

Cross product from selected columns

source raw docstring

crosstab^clj

(crosstab ds row-selector col-selector)

(crosstab ds row-selector col-selector options)

Cross tabulation of two sets of columns.

Creates grouped dataset by [row-selector, col-selector] pairs and calls aggregation on each group.

Options:

pivot? - create pivot table or just flat structure (default: true)
replace-missing? - replace missing values? (default: true)
missing-value - a missing value (default: 0)
aggregator - aggregating function (default: row-count)
marginal-rows, marginal-cols - adds row and/or cols, it's a sum if true. Can be a custom fn.

Cross tabulation of two sets of columns.

Creates grouped dataset by [row-selector, col-selector] pairs and calls aggregation on each group.

Options:

* pivot? - create pivot table or just flat structure (default: true)
* replace-missing? - replace missing values? (default: true)
* missing-value - a missing value (default: 0)
* aggregator - aggregating function (default: row-count)
* marginal-rows, marginal-cols - adds row and/or cols, it's a sum if true. Can be a custom fn.

source raw docstring

dataset^clj

(dataset)

(dataset data)

(dataset data options)

Create a dataset.

Dataset can be created from:

map of values and/or sequences
sequence of maps
sequence of columns
file or url
array of arrays
single value

Single value is set only when it's not possible to find a path for given data. If tech.ml.dataset throws an exception, it's won;t be printed. To print a stack trace, set stack-trace? option to true.

ds/->dataset documentation:

Create a dataset from either csv/tsv or a sequence of maps.

A String be interpreted as a file (or gzipped file if it ends with .gz) of tsv or csv data. The system will attempt to autodetect if this is csv or tsv and then engineering around detecting datatypes all of which can be overridden.
InputStreams have no file type and thus a file-type must be provided in the options.
A sequence of maps may be passed in in which case the first N maps are scanned in order to derive the column datatypes before the actual columns are created.

Parquet, xlsx, and xls formats require that you require the appropriate libraries which are tech.v3.libs.parquet for parquet, tech.v3.libs.fastexcel for xlsx, and tech.v3.libs.poi for xls.

Arrow support is provided via the tech.v3.libs.Arrow namespace not via a file-type overload as the Arrow project current has 3 different file types and it is not clear what their final suffix will be or which of the three file types it will indicate. Please see documentation in the tech.v3.libs.arrow namespace for further information on Arrow file types.

Options:

:dataset-name - set the name of the dataset.
:file-type - Override filetype discovery mechanism for strings or force a particular parser for an input stream. Note that parquet must have paths on disk and cannot currently load from input stream. Acceptible file types are: #{:csv :tsv :xlsx :xls :parquet}.
:gzipped? - for file formats that support it, override autodetection and force creation of a gzipped input stream as opposed to a normal input stream.
:column-allowlist - either sequence of string column names or sequence of column indices of columns to allowlist. This is preferred to :column-whitelist
:column-blocklist - either sequence of string column names or sequence of column indices of columns to blocklist. This is preferred to :column-blacklist
:num-rows - Number of rows to read
:header-row? - Defaults to true, indicates the first row is a header.
:key-fn - function to be applied to column names. Typical use is: :key-fn keyword.
:separator - Add a character separator to the list of separators to auto-detect.
:csv-parser - Implementation of univocity's AbstractParser to use. If not provided a default permissive parser is used. This way you parse anything that univocity supports (so flat files and such).
:bad-row-policy - One of three options: :skip, :error, :carry-on. Defaults to :carry-on. Some csv data has ragged rows and in this case we have several options. If the option is :carry-on then we either create a new column or add missing values for columns that had no data for that row.
:skip-bad-rows? - Legacy option. Use :bad-row-policy.
:disable-comment-skipping? - As default, the # character is recognised as a line comment when found in the beginning of a line of text in a CSV file, and the row will be ignored. Set true to disable this behavior.
:max-chars-per-column - Defaults to 4096. Columns with more characters that this will result in an exception.
:max-num-columns - Defaults to 8192. CSV,TSV files with more columns than this will fail to parse. For more information on this option, please visit: https://github.com/uniVocity/univocity-parsers/issues/301
:text-temp-dir - The temporary directory to use for file-backed text. Setting this value to boolean 'false' turns off file backed text which is the default. If a tech.v3.resource stack context is opened the file will be deleted when the context closes else it will be deleted when the gc cleans up the dataset. A shutdown hook is added as a last resort to ensure the file is cleaned up.
:n-initial-skip-rows - Skip N rows initially. This currently may include the header row. Works across both csv and spreadsheet datasets.
:parser-type - Default parser to use if no parser-fn is specified for that column. For csv files, the default parser type is :string which indicates a promotional string parser. For sequences of maps, the default parser type is :object. It can be useful in some contexts to use the :string parser with sequences of maps or maps of columns.
:parser-fn -
- keyword? - all columns parsed to this datatype. For example: {:parser-fn :string}
- map? - {column-name parse-method} parse each column with specified parse-method. The parse-method can be:
  - keyword? - parse the specified column to this datatype. For example: {:parser-fn {:answer :boolean :id :int32}}
  - tuple - pair of [datatype parse-data] in which case container of type [datatype] will be created. parse-data can be one of:
    - :relaxed? - data will be parsed such that parse failures of the standard parse functions do not stop the parsing process. :unparsed-values and :unparsed-indexes are available in the metadata of the column that tell you the values that failed to parse and their respective indexes.
    - fn? - function from str-> one of :tech.v3.dataset/missing, :tech.v3.dataset/parse-failure, or the parsed value. Exceptions here always kill the parse process. :missing will get marked in the missing indexes, and :parse-failure will result in the index being added to missing, the unparsed the column's :unparsed-values and :unparsed-indexes will be updated.
    - string? - for datetime types, this will turned into a DateTimeFormatter via DateTimeFormatter/ofPattern. For :text you can specify the backing file to use.
    - DateTimeFormatter - use with the appropriate temporal parse static function to parse the value.
map? - the header-name-or-idx is used to lookup value. If not nil, then value can be any of the above options. Else the default column parser is used.

Returns a new dataset

Create a `dataset`.

Dataset can be created from:

* map of values and/or sequences
* sequence of maps
* sequence of columns
* file or url
* array of arrays
* single value

Single value is set only when it's not possible to find a path for given data. If tech.ml.dataset throws an exception, it's won;t be printed. To print a stack trace, set `stack-trace?` option to `true`.

ds/->dataset documentation:

Create a dataset from either csv/tsv or a sequence of maps.

 * A `String` be interpreted as a file (or gzipped file if it
   ends with .gz) of tsv or csv data.  The system will attempt to autodetect if this
   is csv or tsv and then engineering around detecting datatypes all of which can
   be overridden.

* InputStreams have no file type and thus a `file-type` must be provided in the
  options.

* A sequence of maps may be passed in in which case the first N maps are scanned in
  order to derive the column datatypes before the actual columns are created.

Parquet, xlsx, and xls formats require that you require the appropriate libraries
which are `tech.v3.libs.parquet` for parquet, `tech.v3.libs.fastexcel` for xlsx,
and `tech.v3.libs.poi` for xls.


Arrow support is provided via the tech.v3.libs.Arrow namespace not via a file-type
overload as the Arrow project current has 3 different file types and it is not clear
what their final suffix will be or which of the three file types it will indicate.
Please see documentation in the `tech.v3.libs.arrow` namespace for further information
on Arrow file types.

Options:

- `:dataset-name` - set the name of the dataset.
- `:file-type` - Override filetype discovery mechanism for strings or force a particular
    parser for an input stream.  Note that parquet must have paths on disk
    and cannot currently load from input stream.  Acceptible file types are:
    #{:csv :tsv :xlsx :xls :parquet}.
- `:gzipped?` - for file formats that support it, override autodetection and force
   creation of a gzipped input stream as opposed to a normal input stream.
- `:column-allowlist` - either sequence of string column names or sequence of column
   indices of columns to allowlist. This is preferred to `:column-whitelist`
- `:column-blocklist` - either sequence of string column names or sequence of column
   indices of columns to blocklist. This is preferred to `:column-blacklist`
- `:num-rows` - Number of rows to read
- `:header-row?` - Defaults to true, indicates the first row is a header.
- `:key-fn` - function to be applied to column names.  Typical use is:
   `:key-fn keyword`.
- `:separator` - Add a character separator to the list of separators to auto-detect.
- `:csv-parser` - Implementation of univocity's AbstractParser to use.  If not
   provided a default permissive parser is used.  This way you parse anything that
   univocity supports (so flat files and such).
- `:bad-row-policy` - One of three options: :skip, :error, :carry-on.  Defaults to
   :carry-on.  Some csv data has ragged rows and in this case we have several
   options. If the option is :carry-on then we either create a new column or add
   missing values for columns that had no data for that row.
- `:skip-bad-rows?` - Legacy option.  Use :bad-row-policy.
- `:disable-comment-skipping?` - As default, the `#` character is recognised as a
   line comment when found in the beginning of a line of text in a CSV file,
   and the row will be ignored. Set `true` to disable this behavior.
- `:max-chars-per-column` - Defaults to 4096.  Columns with more characters that this
   will result in an exception.
- `:max-num-columns` - Defaults to 8192.  CSV,TSV files with more columns than this
   will fail to parse.  For more information on this option, please visit:
   https://github.com/uniVocity/univocity-parsers/issues/301
- `:text-temp-dir` - The temporary directory to use for file-backed text.  Setting
  this value to boolean 'false' turns off file backed text which is the default.  If a
  tech.v3.resource stack context is opened the file will be deleted when the context
  closes else it will be deleted when the gc cleans up the dataset.  A shutdown hook is
  added as a last resort to ensure the file is cleaned up.
- `:n-initial-skip-rows` - Skip N rows initially.  This currently may include the
   header row.  Works across both csv and spreadsheet datasets.
- `:parser-type` - Default parser to use if no parser-fn is specified for that column.
   For csv files, the default parser type is `:string` which indicates a promotional
   string parser.  For sequences of maps, the default parser type is :object.  It can
   be useful in some contexts to use the `:string` parser with sequences of maps or
   maps of columns.
- `:parser-fn` -
    - `keyword?` - all columns parsed to this datatype. For example:
      `{:parser-fn :string}`
    - `map?` - `{column-name parse-method}` parse each column with specified
      `parse-method`.
      The `parse-method` can be:
        - `keyword?` - parse the specified column to this datatype. For example:
          `{:parser-fn {:answer :boolean :id :int32}}`
        - tuple - pair of `[datatype parse-data]` in which case container of type
          `[datatype]` will be created. `parse-data` can be one of:
            - `:relaxed?` - data will be parsed such that parse failures of the standard
               parse functions do not stop the parsing process.  :unparsed-values and
               :unparsed-indexes are available in the metadata of the column that tell
               you the values that failed to parse and their respective indexes.
            - `fn?` - function from str-> one of `:tech.v3.dataset/missing`,
               `:tech.v3.dataset/parse-failure`, or the parsed value.
               Exceptions here always kill the parse process.  :missing will get marked
               in the missing indexes, and :parse-failure will result in the index being
               added to missing, the unparsed the column's :unparsed-values and
               :unparsed-indexes will be updated.
            - `string?` - for datetime types, this will turned into a DateTimeFormatter via
               DateTimeFormatter/ofPattern.  For `:text` you can specify the backing file
               to use.
            - `DateTimeFormatter` - use with the appropriate temporal parse static function
               to parse the value.

 - `map?` - the header-name-or-idx is used to lookup value.  If not nil, then
         value can be any of the above options.  Else the default column parser
         is used.

Returns a new dataset

source raw docstring

dataset->str^clj

(dataset->str ds)

(dataset->str ds options)

Convert a dataset to a string. Prints a single line header and then calls dataset-data->str.

For options documentation see dataset-data->str.

Convert a dataset to a string.  Prints a single line header and then calls
dataset-data->str.

For options documentation see dataset-data->str.

source raw docstring

dataset-name^clj

(dataset-name dataset)

source

dataset?^clj

(dataset? ds)

Is ds a dataset type?

Is `ds` a `dataset` type?

source raw docstring

difference^clj

(difference ds-left ds-right)

(difference ds-left ds-right options)

source

drop^clj

(drop ds columns-selector rows-selector)

Drop columns and rows.

Drop columns and rows.

source raw docstring

drop-columns^clj

(drop-columns ds)

(drop-columns ds columns-selector)

(drop-columns ds columns-selector meta-field)

Drop columns by (returns dataset):

name
sequence of names
map of names with new names (rename)
function which filter names (via column metadata)

Drop columns by (returns dataset):

- name
- sequence of names
- map of names with new names (rename)
- function which filter names (via column metadata)

source raw docstring

drop-missing^clj

(drop-missing ds)

(drop-missing ds columns-selector)

Drop rows with missing values

columns-selector selects columns to look at missing values

Drop rows with missing values

`columns-selector` selects columns to look at missing values

source raw docstring

drop-rows^clj

(drop-rows ds)

(drop-rows ds rows-selector)

(drop-rows ds rows-selector options)

Drop rows using:

row id
seq of row ids
seq of true/false
fn with predicate

Drop rows using:

- row id
- seq of row ids
- seq of true/false
- fn with predicate

source raw docstring

empty-ds?^clj

(empty-ds? ds)

source

expand^clj

(expand ds columns-selector & args)

TidyR expand.

Creates all possible combinations of selected columns.

TidyR expand.

Creates all possible combinations of selected columns.

source raw docstring

fill-range-replace^clj

(fill-range-replace ds colname max-span)

(fill-range-replace ds colname max-span missing-strategy)

(fill-range-replace ds colname max-span missing-strategy missing-value)

Fill missing up with lacking values. Accepts

dataset
column name
expected step (max-span, milliseconds in case of datetime column)
(optional) missing-strategy - how to replace missing, default :down (set to nil if none)
(optional) missing-value - optional value for replace missing

Fill missing up with lacking values. Accepts
* dataset
* column name
* expected step (max-span, milliseconds in case of datetime column)
* (optional) missing-strategy - how to replace missing, default :down (set to nil if none)
* (optional) missing-value - optional value for replace missing

source raw docstring

first^clj

(first ds)

First row

First row

source raw docstring

fold-by^clj

(fold-by ds columns-selector)

(fold-by ds columns-selector folding-function)

Group-by and pack columns into vector - the output data set has a row for each unique combination of the provided columns while each remaining column has its valu(es) collected into a vector, similar to how clojure.core/group-by works. See https://scicloj.github.io/tablecloth/index.html#Fold-by

Group-by and pack columns into vector - the output data set has a row for each unique combination
of the provided columns while each remaining column has its valu(es) collected into a vector, similar
to how clojure.core/group-by works.
See https://scicloj.github.io/tablecloth/index.html#Fold-by

source raw docstring

full-join^clj

(full-join ds-left ds-right columns-selector)

(full-join ds-left ds-right columns-selector options)

Join keeping all rows

Join keeping all rows

source raw docstring

get-entry^clj

(get-entry ds column row)

Returns a single value from given column and row

Returns a single value from given column and row

source raw docstring

group-by^clj

(group-by ds grouping-selector)

(group-by ds grouping-selector options)

Group dataset by:

column name
list of columns
map of keys and row indexes
function getting map of values

Options are:

select-keys - when grouping is done by function, you can limit fields to a select-keys seq.
result-type - return results as dataset (:as-dataset, default) or as map of datasets (:as-map) or as map of row indexes (:as-indexes) or as sequence of (sub)datasets
other parameters which are passed to dataset fn

When dataset is returned, meta contains :grouped? set to true. Columns in dataset:

name - group name
group-id - id of the group (int)
data - group as dataset

Group dataset by:

- column name
- list of columns
- map of keys and row indexes
- function getting map of values

Options are:

- select-keys - when grouping is done by function, you can limit fields to a `select-keys` seq.
- result-type - return results as dataset (`:as-dataset`, default) or as map of datasets (`:as-map`) or as map of row indexes (`:as-indexes`) or as sequence of (sub)datasets
- other parameters which are passed to `dataset` fn

When dataset is returned, meta contains `:grouped?` set to true. Columns in dataset:

- name - group name
- group-id - id of the group (int)
- data - group as dataset

source raw docstring

grouped?^clj

(grouped? ds)

Is dataset represents grouped dataset (result of group-by)?

Is `dataset` represents grouped dataset (result of `group-by`)?

source raw docstring

groups->map^clj

(groups->map ds)

Convert grouped dataset to the map of groups

Convert grouped dataset to the map of groups

source raw docstring

groups->seq^clj

(groups->seq ds)

Convert grouped dataset to seq of the groups

Convert grouped dataset to seq of the groups

source raw docstring

has-column?^clj

(has-column? dataset column-name)

source

head^clj

(head ds)

(head ds n)

First n rows (default 5)

First n rows (default 5)

source raw docstring

info^clj

(info ds)

(info ds result-type)

Returns a statistcial information about the columns of a dataset. result-type can be :descriptive or :columns

Returns a statistcial information about the columns of a dataset.
`result-type ` can be :descriptive or :columns

source raw docstring

inner-join^clj

(inner-join ds-left ds-right columns-selector)

(inner-join ds-left ds-right columns-selector options)

source

intersect^clj

(intersect ds-left ds-right)

(intersect ds-left ds-right options)

source

join-columns^clj

(join-columns ds target-column columns-selector)

(join-columns ds target-column columns-selector conf)

Join clumns of dataset. Accepts: dataset column selector (as in select-columns) options :separator (default "-") :drop-columns? - whether to drop source columns or not (default true) :result-type :map - packs data into map :seq - packs data into sequence :string - join strings with separator (default) or custom function which gets row as a vector :missing-subst - substitution for missing value

Join clumns of dataset. Accepts:
dataset
column selector (as in select-columns)
options
`:separator` (default "-")
`:drop-columns?` - whether to drop source columns or not (default true)
`:result-type`
   `:map` - packs data into map
   `:seq` - packs data into sequence
   `:string` - join strings with separator (default)
   or custom function which gets row as a vector
`:missing-subst` - substitution for missing value

source raw docstring

last^clj

(last ds)

Last row

Last row

source raw docstring

left-join^clj

(left-join ds-left ds-right columns-selector)

(left-join ds-left ds-right columns-selector options)

source

let-dataset^cljmacro

(let-dataset bindings)

(let-dataset bindings options)

source

map-columns^clj

(map-columns ds column-name map-fn)

(map-columns ds column-name columns-selector map-fn)

(map-columns ds column-name new-type columns-selector map-fn)

Map over rows using a map function. The arity should match the columns selected.

Map over rows using a map function. The arity should match the columns selected.

source raw docstring

map-rows^clj

(map-rows ds map-fn)

(map-rows ds map-fn options)

Map a function across the rows of the dataset producing a new dataset that is merged back into the original potentially replacing existing columns.

Map a function across the rows of the dataset producing a new dataset that is merged back into the original potentially replacing existing columns.

source raw docstring

mark-as-group^clj

(mark-as-group ds)

Add grouping tag

Add grouping tag

source raw docstring

order-by^clj

(order-by ds columns-or-fn)

(order-by ds columns-or-fn comparators)

(order-by ds columns-or-fn comparators options)

Order dataset by:

column name
columns (as sequence of names)
key-fn
sequence of columns / key-fn Additionally you can ask the order by:
:asc
:desc
custom comparator function

Order dataset by:
- column name
- columns (as sequence of names)
- key-fn
- sequence of columns / key-fn
Additionally you can ask the order by:
- :asc
- :desc
- custom comparator function

source raw docstring

pivot->longer^clj

(pivot->longer ds)

(pivot->longer ds columns-selector)

(pivot->longer ds columns-selector options)

tidyr pivot_longer api

`tidyr` pivot_longer api

source raw docstring

pivot->wider^clj

(pivot->wider ds columns-selector value-columns)

(pivot->wider ds columns-selector value-columns options)

Converts columns to rows. Arguments:

dataset
columns selector
options: :target-columns - names of the columns created or columns pattern (see below) (default: :$column) :value-column-name - name of the column for values (default: :$value) :splitter - string, regular expression or function which splits source column names into data :drop-missing? - remove rows with missing? (default: true) :datatypes - map of target columns data types :coerce-to-number - try to convert extracted values to numbers if possible (default: true)
target-columns - can be:
- column name - source columns names are put there as a data
- column names as seqence - source columns names after split are put separately into :target-columns as data
- pattern - is a sequence of names, where some of the names are nil. nil is replaced by a name taken from splitter and such column is used for values.

Converts columns to rows. Arguments:
* dataset
* columns selector
* options:
  `:target-columns` - names of the columns created or columns pattern (see below) (default: :$column)
  `:value-column-name` - name of the column for values (default: :$value)
  `:splitter` - string, regular expression or function which splits source column names into data
  `:drop-missing?` - remove rows with missing? (default: true)
  `:datatypes` - map of target columns data types
  `:coerce-to-number` - try to convert extracted values to numbers if possible (default: true)

* target-columns - can be:

  * column name - source columns names are put there as a data
  * column names as seqence - source columns names after split are put separately into :target-columns as data
  * pattern - is a sequence of names, where some of the names are nil. nil is replaced by a name taken from splitter and such column is used for values.

source raw docstring

print-dataset^clj

(print-dataset ds)

(print-dataset ds options)

Prints dataset into console. For options see tech.v3.dataset.print/dataset-data->str

Prints dataset into console. For options see
tech.v3.dataset.print/dataset-data->str

source raw docstring

process-group-data^clj

(process-group-data ds f)

(process-group-data ds f parallel?)

Internal: The passed-in function is applied on all groups

Internal: The passed-in function is applied on all groups

source raw docstring

rand-nth^clj

(rand-nth ds)

(rand-nth ds options)

Returns single random row

Returns single random row

source raw docstring

random^clj

(random ds)

(random ds n)

(random ds n options)

Returns (n) random rows with repetition

Returns (n) random rows with repetition

source raw docstring

read-nippy^clj

(read-nippy filename)

source

rename-columns^clj

(rename-columns ds columns-mapping)

(rename-columns ds columns-selector columns-map-fn)

Rename columns with provided old -> new name map

Rename columns with provided old -> new name map

source raw docstring

reorder-columns^clj

(reorder-columns ds columns-selector & args)

Reorder columns using column selector(s). When column names are incomplete, the missing will be attached at the end.

Reorder columns using column selector(s). When column names are incomplete, the missing will be attached at the end.

source raw docstring

replace-missing^clj

(replace-missing ds)

(replace-missing ds strategy)

(replace-missing ds columns-selector strategy)

(replace-missing ds columns-selector strategy value)

Replaces missing values. Accepts

dataset
column selector, default: :all
strategy, default: :nearest
value (optional)
single value
sequence of values (cycled)
function, applied on column(s) with stripped missings

Strategies are:

:value - replace with given value :up - copy values up :down - copy values down :updown - copy values up and then down for missing values at the end :downup - copy values down and then up for missing values at the beginning :mid or :nearest - copy values around known values :midpoint - use average value from previous and next non-missing :lerp - trying to lineary approximate values, works for numbers and datetime, otherwise applies :nearest. For numbers always results in float datatype.

Replaces missing values. Accepts

* dataset
* column selector, default: :all
* strategy, default: :nearest
* value (optional)
* single value
* sequence of values (cycled)
* function, applied on column(s) with stripped missings

Strategies are:

`:value` - replace with given value
`:up` - copy values up
`:down` - copy values down
`:updown` - copy values up and then down for missing values at the end
`:downup` - copy values down and then up for missing values at the beginning
`:mid` or `:nearest` - copy values around known values
`:midpoint` - use average value from previous and next non-missing
`:lerp` - trying to lineary approximate values, works for numbers and datetime, otherwise applies :nearest. For numbers always results in float datatype.

source raw docstring

right-join^clj

(right-join ds-left ds-right columns-selector)

(right-join ds-left ds-right columns-selector options)

source

row-count^clj

(row-count dataset-or-col)

source

rows^clj

(rows ds)

(rows ds result-type)

(rows ds result-type options)

Returns rows of dataset. Result type can be any of:

:as-maps - maps
:as-double-arrays - double arrays
:as-seqs - reader (sequence, default)
:as-vecs - vectors

If you want to elide nils in maps set :nil-missing? option to false (default: true). Another option - :copying? - when true row values are copied on read (default: false).

Returns rows of dataset. Result type can be any of:
* `:as-maps` - maps
* `:as-double-arrays` - double arrays
* `:as-seqs` - reader (sequence, default)
* `:as-vecs` - vectors

If you want to elide nils in maps set `:nil-missing?` option to false (default: `true`).
Another option - `:copying?` - when true row values are copied on read (default: `false`).

source raw docstring

select^clj

(select ds columns-selector rows-selector)

Select columns and rows.

Select columns and rows.

source raw docstring

select-columns^clj

(select-columns ds)

(select-columns ds columns-selector)

(select-columns ds columns-selector meta-field)

Select columns by (returns dataset):

name
sequence of names
map of names with new names (rename)
function which filter names (via column metadata)

Select columns by (returns dataset):

- name
- sequence of names
- map of names with new names (rename)
- function which filter names (via column metadata)

source raw docstring

select-missing^clj

(select-missing ds)

(select-missing ds columns-selector)

Select rows with missing values

columns-selector selects columns to look at missing values

Select rows with missing values

`columns-selector` selects columns to look at missing values

source raw docstring

select-rows^clj

(select-rows ds)

(select-rows ds rows-selector)

(select-rows ds rows-selector options)

Select rows using:

row id
seq of row ids
seq of true/false
fn with predicate

Select rows using:

- row id
- seq of row ids
- seq of true/false
- fn with predicate

source raw docstring

semi-join^clj

(semi-join ds-left ds-right columns-selector)

(semi-join ds-left ds-right columns-selector options)

source

separate-column^clj

(separate-column ds column)

(separate-column ds column separator)

(separate-column ds column target-columns separator)

(separate-column ds column target-columns separator conf)

source

set-dataset-name^clj

(set-dataset-name dataset ds-name)

source

shape^clj

(shape ds)

Returns shape of the dataset [rows, cols]

Returns shape of the dataset [rows, cols]

source raw docstring

shuffle^clj

(shuffle ds)

(shuffle ds options)

Shuffle dataset (with seed)

Shuffle dataset (with seed)

source raw docstring

split^clj

(split ds)

(split ds split-type)

(split ds split-type options)

Split given dataset into 2 or more (holdout) splits

As the result two new columns are added:

:$split-name - with subgroup name
:$split-id - fold id/repetition id

split-type can be one of the following:

:kfold - k-fold strategy, :k defines number of folds (defaults to 5), produces k splits
:bootstrap - :ratio defines ratio of observations put into result (defaults to 1.0), produces 1 split
:holdout - split into two parts with given ratio (defaults to 2/3), produces 1 split
:loo - leave one out, produces the same number of splits as number of observations

:holdout can accept also probabilites or ratios and can split to more than 2 subdatasets

Additionally you can provide:

:seed - for random number generator
:repeats - repeat procedure :repeats times
:partition-selector - same as in group-by for stratified splitting to reflect dataset structure in splits.
:split-names names of subdatasets different than default, ie. [:train :test :split-2 ...]
:split-col-name - a column where name of split is stored, either :train or :test values (default: :$split-name)
:split-id-col-name - a column where id of the train/test pair is stored (default: :$split-id)
:ratio - specify a list of split ratios for :holdout. Need to have same size then :split-names (example: [0.2 0.2 0.6])

Rows are shuffled before splitting.

In case of grouped dataset each group is processed separately.

Split given dataset into 2 or more (holdout) splits

As the result two new columns are added:

* `:$split-name` - with subgroup name
* `:$split-id` - fold id/repetition id

`split-type` can be one of the following:

* `:kfold` - k-fold strategy, `:k` defines number of folds (defaults to `5`), produces `k` splits
* `:bootstrap` - `:ratio` defines ratio of observations put into result (defaults to `1.0`), produces `1` split
* `:holdout` - split into two parts with given ratio (defaults to `2/3`), produces `1` split
* `:loo` - leave one out, produces the same number of splits as number of observations

`:holdout` can accept also probabilites or ratios and can split to more than 2 subdatasets

Additionally you can provide:

* `:seed` - for random number generator
* `:repeats` - repeat procedure `:repeats` times
* `:partition-selector` - same as in `group-by` for stratified splitting to reflect dataset structure in splits.
* `:split-names` names of subdatasets different than default, ie. `[:train :test :split-2 ...]`
* `:split-col-name` - a column where name of split is stored, either `:train` or `:test` values (default: `:$split-name`)
* `:split-id-col-name` - a column where id of the train/test pair is stored (default: `:$split-id`)
* `:ratio` - specify a list of split ratios for `:holdout`. Need to have same size then `:split-names` (example: [0.2 0.2 0.6])

Rows are shuffled before splitting.

In case of grouped dataset each group is processed separately.

See [more](https://www.mitpressjournals.org/doi/pdf/10.1162/EVCO_a_00069)

source raw docstring

split->seq^clj

(split->seq ds)

(split->seq ds split-type)

(split->seq ds split-type options)

Returns split as a sequence of train/test datasets or map of sequences (grouped dataset)

Returns split as a sequence of train/test datasets or map of sequences (grouped dataset)

source raw docstring

tail^clj

(tail ds)

(tail ds n)

Last n rows (default 5)

Last n rows (default 5)

source raw docstring

ungroup^clj

(ungroup ds)

(ungroup ds options)

Concat groups into dataset.

When add-group-as-column or add-group-id-as-column is set to true or name(s), columns with group name(s) or group id is added to the result.

Before joining the groups groups can be sorted by group name.

Concat groups into dataset.

When `add-group-as-column` or `add-group-id-as-column` is set to `true` or name(s), columns with group name(s) or group id is added to the result.

Before joining the groups groups can be sorted by group name.

source raw docstring

union^clj

(union ds & args)

source

unique-by^clj

(unique-by ds)

(unique-by ds columns-selector)

(unique-by ds columns-selector options)

Remove rows which contains the same data column-selector Select columns for uniqueness strategy There are 4 strategies defined to handle duplicates

:first - select first row (default) :last - select last row :random - select random row any function - apply function to a columns which are subject of uniqueness

Remove rows which contains the same data
`column-selector` Select columns for uniqueness
`strategy` There are 4 strategies defined to handle duplicates

  `:first` - select first row (default)
  `:last` - select last row
  `:random` - select random row
  any function - apply function to a columns which are subject of uniqueness

source raw docstring

unmark-group^clj

(unmark-group ds)

Remove grouping tag

Remove grouping tag

source raw docstring

unroll^clj

(unroll ds columns-selector)

(unroll ds columns-selector options)

Unfolds sequences stored inside a column(s), turning it into multiple columns. Opposite of fold-by. Add each of the provided columns to the set that defines the "uniqe key" of each row. Thus there will be a new row for each value inside the target column(s)' value sequence. If you want instead to split the content of the columns into a set of new columns, look at separate-column. See https://scicloj.github.io/tablecloth/index.html#Unroll

Unfolds sequences stored inside a column(s), turning it into multiple columns. Opposite of [[fold-by]].
Add each of the provided columns to the set that defines the "uniqe key" of each row.
Thus there will be a new row for each value inside the target column(s)' value sequence.
If you want instead to split the content of the columns into a set of new _columns_, look at [[separate-column]].
See https://scicloj.github.io/tablecloth/index.html#Unroll

source raw docstring

update-columns^clj

(update-columns ds columns-map)

(update-columns ds columns-selector update-functions)

source

without-grouping->^cljmacro

(without-grouping-> ds & args)

source

write!^clj

(write! dataset output-path)

(write! dataset output-path options)

Write a dataset out to a file. Supported forms are:

(ds/write! test-ds "test.csv")
(ds/write! test-ds "test.tsv")
(ds/write! test-ds "test.tsv.gz")
(ds/write! test-ds "test.nippy")
(ds/write! test-ds out-stream)

Options:

:max-chars-per-column - csv,tsv specific, defaults to 65536 - values longer than this will cause an exception during serialization.
:max-num-columns - csv,tsv specific, defaults to 8192 - If the dataset has more than this number of columns an exception will be thrown during serialization.
:quoted-columns - csv specific - sequence of columns names that you would like to always have quoted.
:file-type - Manually specify the file type. This is usually inferred from the filename but if you pass in an output stream then you will need to specify the file type.
:headers? - if csv headers are written, defaults to true.

Write a dataset out to a file.  Supported forms are:

```clojure
(ds/write! test-ds "test.csv")
(ds/write! test-ds "test.tsv")
(ds/write! test-ds "test.tsv.gz")
(ds/write! test-ds "test.nippy")
(ds/write! test-ds out-stream)
```

Options:

  * `:max-chars-per-column` - csv,tsv specific, defaults to 65536 - values longer than this will
     cause an exception during serialization.
  * `:max-num-columns` - csv,tsv specific, defaults to 8192 - If the dataset has more than this number of
     columns an exception will be thrown during serialization.
  * `:quoted-columns` - csv specific - sequence of columns names that you would like to always have quoted.
  * `:file-type` - Manually specify the file type.  This is usually inferred from the filename but if you
     pass in an output stream then you will need to specify the file type.
  * `:headers?` - if csv headers are written, defaults to true.

source raw docstring

write-csv!^clj

source

write-nippy!^clj

(write-nippy! ds filename)

source

cljdoc is a website building & hosting documentation for Clojure/Script libraries

Keyboard shortcuts Report a problem cljdoc on GitHub

× close

tablecloth.api

->arrayclj

add-columnclj

add-columnsclj

add-or-replace-columnclj

add-or-replace-columnsclj

aggregateclj

aggregate-columnsclj

anti-joinclj

appendclj

array-column->columnsclj

as-regular-datasetclj

asof-joinclj

bindclj

by-rankclj

cloneclj

columnclj

column-countclj

column-namesclj

columnsclj

columns->array-columnclj

completeclj

concatclj

concat-copyingclj

convert-typesclj

cross-joinclj

crosstabclj

datasetclj

dataset->strclj

dataset-nameclj

dataset?clj

differenceclj

dropclj

drop-columnsclj

drop-missingclj

drop-rowsclj

empty-ds?clj

expandclj

fill-range-replaceclj

firstclj

fold-byclj

full-joinclj

get-entryclj

group-byclj

grouped?clj

groups->mapclj

groups->seqclj

has-column?clj

headclj

infoclj

inner-joinclj

intersectclj

join-columnsclj

lastclj

left-joinclj

let-datasetcljmacro

map-columnsclj

map-rowsclj

mark-as-groupclj

order-byclj

pivot->longerclj

pivot->widerclj

print-datasetclj

process-group-dataclj

rand-nthclj

randomclj

read-nippyclj

rename-columnsclj

reorder-columnsclj

replace-missingclj

right-joinclj

row-countclj

rowsclj

selectclj

select-columnsclj

select-missingclj

select-rowsclj

semi-joinclj

separate-columnclj

set-dataset-nameclj

->array^clj

add-column^clj

add-columns^clj

add-or-replace-column^clj

add-or-replace-columns^clj

aggregate^clj

aggregate-columns^clj

anti-join^clj

append^clj

array-column->columns^clj

as-regular-dataset^clj

asof-join^clj

bind^clj

by-rank^clj

clone^clj

column^clj

column-count^clj

column-names^clj

columns^clj

columns->array-column^clj

complete^clj

concat^clj

concat-copying^clj

convert-types^clj

cross-join^clj

crosstab^clj

dataset^clj

dataset->str^clj

dataset-name^clj

dataset?^clj

difference^clj

drop^clj

drop-columns^clj

drop-missing^clj

drop-rows^clj

empty-ds?^clj

expand^clj

fill-range-replace^clj

first^clj

fold-by^clj

full-join^clj

get-entry^clj

group-by^clj

grouped?^clj

groups->map^clj

groups->seq^clj

has-column?^clj

head^clj

info^clj

inner-join^clj

intersect^clj

join-columns^clj

last^clj

left-join^clj

let-dataset^cljmacro

map-columns^clj

map-rows^clj

mark-as-group^clj

order-by^clj

pivot->longer^clj

pivot->wider^clj

print-dataset^clj

process-group-data^clj

rand-nth^clj

random^clj

read-nippy^clj

rename-columns^clj

reorder-columns^clj

replace-missing^clj

right-join^clj

row-count^clj

rows^clj

select^clj

select-columns^clj

select-missing^clj

select-rows^clj

semi-join^clj

separate-column^clj

set-dataset-name^clj

shape^clj