Liking cljdoc? Tell your friends :D

tablecloth.api


->arrayclj

(->array ds colname)
(->array ds colname datatype)

Convert numerical column(s) to java array

Convert numerical column(s) to java array
sourceraw docstring

add-columnclj

(add-column ds column-name column)
(add-column ds column-name column size-strategy)

Add or update (modify) column under column-name.

column can be sequence of values or generator function (which gets ds as input).

  • ds - a dataset
  • column-name - if it's existing column name, column will be replaced
  • column - can be column (from other dataset), sequence, single value or function. Too big columns are always trimmed. Too small are cycled or extended with missing values (according to size-strategy argument)
  • size-strategy (optional) - when new column is shorter than dataset row count, following strategies are applied:
    • :cycle - repeat data
    • :na - append missing values
    • :strict - (default) throws an exception when sizes mismatch
Add or update (modify) column under `column-name`.

`column` can be sequence of values or generator function (which gets `ds` as input).

* `ds` - a dataset
* `column-name` - if it's existing column name, column will be replaced
* `column` - can be column (from other dataset), sequence, single value or function. Too big columns are always trimmed. Too small are cycled or extended with missing values (according to `size-strategy` argument)
* `size-strategy` (optional) - when new column is shorter than dataset row count, following strategies are applied:
  - `:cycle` - repeat data
  - `:na` - append missing values
  - `:strict` - (default) throws an exception when sizes mismatch
sourceraw docstring

add-columnsclj

(add-columns ds columns-map)
(add-columns ds columns-map size-strategy)

Add or updade (modify) columns defined in columns-map (mapping: name -> column)

Add or updade (modify) columns defined in `columns-map` (mapping: name -> column) 
sourceraw docstring

add-or-replace-columnclj

(add-or-replace-column ds column-name column)
(add-or-replace-column ds column-name column size-strategy)
source

add-or-replace-columnsclj

(add-or-replace-columns ds columns-map)
(add-or-replace-columns ds columns-map size-strategy)
source

aggregateclj

(aggregate ds aggregator)
(aggregate ds aggregator options)

Aggregate dataset by providing:

  • aggregation function
  • map with column names and functions
  • sequence of aggregation functions

Aggregation functions can return:

  • single value
  • seq of values
  • map of values with column names
Aggregate dataset by providing:

- aggregation function
- map with column names and functions
- sequence of aggregation functions

Aggregation functions can return:
- single value
- seq of values
- map of values with column names
sourceraw docstring

aggregate-columnsclj

(aggregate-columns ds columns-selector column-aggregators)
(aggregate-columns ds columns-selector column-aggregators options)

Aggregates each column separately

Aggregates each column separately
sourceraw docstring

anti-joinclj

(anti-join ds-left ds-right columns-selector)
(anti-join ds-left ds-right columns-selector options)
source

appendclj

(append ds & args)
source

as-regular-datasetclj

(as-regular-dataset ds)

Remove grouping tag

Remove grouping tag
sourceraw docstring

asof-joinclj

(asof-join ds-left ds-right columns-selector)
(asof-join ds-left ds-right columns-selector options)
source

bindclj

(bind ds & args)
source

by-rankclj

(by-rank ds columns-selector rank-predicate)
(by-rank ds columns-selector rank-predicate options)

Select rows using rank on a column, ties are resolved using :dense method.

See R docs. Rank uses 0 based indexing.

Possible :ties strategies: :average, :first, :last, :random, :min, :max, :dense. :dense is the same as in data.table::frank from R

:desc? set to true (default) order descending before calculating rank

Select rows using `rank` on a column, ties are resolved using `:dense` method.

See [R docs](https://www.rdocumentation.org/packages/base/versions/3.6.1/topics/rank).
Rank uses 0 based indexing.

Possible `:ties` strategies: `:average`, `:first`, `:last`, `:random`, `:min`, `:max`, `:dense`.
`:dense` is the same as in `data.table::frank` from R

`:desc?` set to true (default) order descending before calculating rank
sourceraw docstring

cloneclj

(clone item)

Clone an object. Can clone anything convertible to a reader.

Clone an object.  Can clone anything convertible to a reader.
sourceraw docstring

columnclj

(column dataset colname)
source

column-countclj

(column-count dataset)
source

column-namesclj

(column-names ds)
(column-names ds columns-selector)
(column-names ds columns-selector meta-field)
source

columnsclj

(columns ds)
(columns ds result-type)

Returns columns of dataset. Result type can be any of:

  • :as-map
  • :as-double-arrays
  • :as-seqs
Returns columns of dataset. Result type can be any of:
* `:as-map`
* `:as-double-arrays`
* `:as-seqs`
sourceraw docstring

completeclj

(complete ds columns-selector & args)

TidyR complete.

Fills a dataset with all possible combinations of selected columns. When given combination wasn't existed, missing values are created.

TidyR complete.

Fills a dataset with all possible combinations of selected columns. When given combination wasn't existed, missing values are created.
sourceraw docstring

concatclj

(concat dataset & args)
source

concat-copyingclj

(concat-copying dataset & args)
source

convert-typesclj

(convert-types ds coltype-map-or-columns-selector)
(convert-types ds columns-selector new-types)

Convert type of the column to the other type.

Convert type of the column to the other type.
sourceraw docstring

cross-joinclj

(cross-join ds-left ds-right)
(cross-join ds-left ds-right columns-selector)
(cross-join ds-left ds-right columns-selector options)
source

datasetclj

(dataset)
(dataset data)
(dataset data options)

Create dataset.

Dataset can be created from:

  • map of values and/or sequences
  • sequence of maps
  • sequence of columns
  • file or url
  • array of arrays
  • single value

Single value is set only when it's not possible to find a path for given data. If tech.ml.dataset throws an exception, it's won;t be printed. To print a stack trace, set stack-trace? option to true.

Create `dataset`.

Dataset can be created from:

* map of values and/or sequences
* sequence of maps
* sequence of columns
* file or url
* array of arrays
* single value

Single value is set only when it's not possible to find a path for given data. If tech.ml.dataset throws an exception, it's won;t be printed. To print a stack trace, set `stack-trace?` option to `true`.
sourceraw docstring

dataset->strclj

(dataset->str ds)
(dataset->str ds options)

Convert a dataset to a string. Prints a single line header and then calls dataset-data->str.

For options documentation see dataset-data->str.

Convert a dataset to a string.  Prints a single line header and then calls
dataset-data->str.

For options documentation see dataset-data->str.
sourceraw docstring

dataset-nameclj

(dataset-name dataset)
source

dataset?clj

(dataset? ds)

Is ds a dataset type?

Is `ds` a `dataset` type?
sourceraw docstring

differenceclj

(difference ds-left ds-right)
(difference ds-left ds-right options)
source

dropclj

(drop ds columns-selector rows-selector)

Drop columns and rows.

Drop columns and rows.
sourceraw docstring

drop-columnsclj

(drop-columns ds)
(drop-columns ds columns-selector)
(drop-columns ds columns-selector meta-field)

Drop columns by (returns dataset):

  • name
  • sequence of names
  • map of names with new names (rename)
  • function which filter names (via column metadata)
Drop columns by (returns dataset):

- name
- sequence of names
- map of names with new names (rename)
- function which filter names (via column metadata)
sourceraw docstring

drop-missingclj

(drop-missing ds)
(drop-missing ds columns-selector)

Drop rows with missing values

columns-selector selects columns to look at missing values

Drop rows with missing values

`columns-selector` selects columns to look at missing values
sourceraw docstring

drop-rowsclj

(drop-rows ds)
(drop-rows ds rows-selector)
(drop-rows ds rows-selector options)

Drop rows using:

  • row id
  • seq of row ids
  • seq of true/false
  • fn with predicate
Drop rows using:

- row id
- seq of row ids
- seq of true/false
- fn with predicate
sourceraw docstring

empty-ds?clj

(empty-ds? ds)
source

expandclj

(expand ds columns-selector & args)

TidyR expand.

Creates all possible combinations of selected columns.

TidyR expand.

Creates all possible combinations of selected columns.
sourceraw docstring

fill-range-replaceclj

(fill-range-replace ds colname max-span)
(fill-range-replace ds colname max-span missing-strategy)
(fill-range-replace ds colname max-span missing-strategy missing-value)
source

firstclj

(first ds)
source

fold-byclj

(fold-by ds columns-selector)
(fold-by ds columns-selector folding-function)

Group-by and pack columns into vector - the output data set has a row for each unique combination of the provided columns while each remaining column has its valu(es) collected into a vector, similar to how clojure.core/group-by works. See https://scicloj.github.io/tablecloth/index.html#Fold-by

Group-by and pack columns into vector - the output data set has a row for each unique combination
of the provided columns while each remaining column has its valu(es) collected into a vector, similar
to how clojure.core/group-by works.
See https://scicloj.github.io/tablecloth/index.html#Fold-by
sourceraw docstring

full-joinclj

(full-join ds-left ds-right columns-selector)
(full-join ds-left ds-right columns-selector options)
source

group-byclj

(group-by ds grouping-selector)
(group-by ds grouping-selector options)

Group dataset by:

  • column name
  • list of columns
  • map of keys and row indexes
  • function getting map of values

Options are:

  • select-keys - when grouping is done by function, you can limit fields to a select-keys seq.
  • result-type - return results as dataset (:as-dataset, default) or as map of datasets (:as-map) or as map of row indexes (:as-indexes) or as sequence of (sub)datasets
  • other parameters which are passed to dataset fn

When dataset is returned, meta contains :grouped? set to true. Columns in dataset:

  • name - group name
  • group-id - id of the group (int)
  • data - group as dataset
Group dataset by:

- column name
- list of columns
- map of keys and row indexes
- function getting map of values

Options are:

- select-keys - when grouping is done by function, you can limit fields to a `select-keys` seq.
- result-type - return results as dataset (`:as-dataset`, default) or as map of datasets (`:as-map`) or as map of row indexes (`:as-indexes`) or as sequence of (sub)datasets
- other parameters which are passed to `dataset` fn

When dataset is returned, meta contains `:grouped?` set to true. Columns in dataset:

- name - group name
- group-id - id of the group (int)
- data - group as dataset
sourceraw docstring

grouped?clj

(grouped? ds)

Is dataset represents grouped dataset (result of group-by)?

Is `dataset` represents grouped dataset (result of `group-by`)?
sourceraw docstring

groups->mapclj

(groups->map ds)

Convert grouped dataset to the map of groups

Convert grouped dataset to the map of groups
sourceraw docstring

groups->seqclj

(groups->seq ds)
source

has-column?clj

(has-column? dataset column-name)
source

(head ds)
(head ds n)
source

infoclj

(info ds)
(info ds result-type)
source

inner-joinclj

(inner-join ds-left ds-right columns-selector)
(inner-join ds-left ds-right columns-selector options)
source

intersectclj

(intersect ds-left ds-right)
(intersect ds-left ds-right options)
source

join-columnsclj

(join-columns ds target-column columns-selector)
(join-columns ds target-column columns-selector conf)
source

lastclj

(last ds)
source

left-joinclj

(left-join ds-left ds-right columns-selector)
(left-join ds-left ds-right columns-selector options)
source

let-datasetcljmacro

(let-dataset bindings)
(let-dataset bindings options)
source

map-columnsclj

(map-columns ds column-name map-fn)
(map-columns ds column-name columns-selector map-fn)
(map-columns ds column-name new-type columns-selector map-fn)
source

mark-as-groupclj

(mark-as-group ds)

Add grouping tag

Add grouping tag
sourceraw docstring

order-byclj

(order-by ds columns-or-fn)
(order-by ds columns-or-fn comparators)
(order-by ds columns-or-fn comparators options)

Order dataset by:

  • column name
  • columns (as sequence of names)
  • key-fn
  • sequence of columns / key-fn Additionally you can ask the order by:
  • :asc
  • :desc
  • custom comparator function
Order dataset by:
- column name
- columns (as sequence of names)
- key-fn
- sequence of columns / key-fn
Additionally you can ask the order by:
- :asc
- :desc
- custom comparator function
sourceraw docstring

pivot->longerclj

(pivot->longer ds)
(pivot->longer ds columns-selector)
(pivot->longer ds columns-selector options)

tidyr pivot_longer api

`tidyr` pivot_longer api
sourceraw docstring

pivot->widerclj

(pivot->wider ds columns-selector value-columns)
(pivot->wider ds columns-selector value-columns options)
source

(print-dataset ds)
(print-dataset ds options)
source

process-group-dataclj

(process-group-data ds f)
(process-group-data ds f parallel?)
source

rand-nthclj

(rand-nth ds)
(rand-nth ds options)
source

randomclj

(random ds)
(random ds n)
(random ds n options)
source

read-nippyclj

(read-nippy filename)
source

rename-columnsclj

(rename-columns ds columns-mapping)
(rename-columns ds columns-selector columns-map-fn)

Rename columns with provided old -> new name map

Rename columns with provided old -> new name map
sourceraw docstring

reorder-columnsclj

(reorder-columns ds columns-selector & args)

Reorder columns using column selector(s). When column names are incomplete, the missing will be attached at the end.

Reorder columns using column selector(s). When column names are incomplete, the missing will be attached at the end.
sourceraw docstring

replace-missingclj

(replace-missing ds)
(replace-missing ds strategy)
(replace-missing ds columns-selector strategy)
(replace-missing ds columns-selector strategy value)
source

right-joinclj

(right-join ds-left ds-right columns-selector)
(right-join ds-left ds-right columns-selector options)
source

row-countclj

(row-count dataset-or-col)
source

rowsclj

(rows ds)
(rows ds result-type)

Returns rows of dataset. Result type can be any of:

  • :as-maps
  • :as-double-arrays
  • :as-seqs
Returns rows of dataset. Result type can be any of:
* `:as-maps`
* `:as-double-arrays`
* `:as-seqs`
sourceraw docstring

selectclj

(select ds columns-selector rows-selector)

Select columns and rows.

Select columns and rows.
sourceraw docstring

select-columnsclj

(select-columns ds)
(select-columns ds columns-selector)
(select-columns ds columns-selector meta-field)

Select columns by (returns dataset):

  • name
  • sequence of names
  • map of names with new names (rename)
  • function which filter names (via column metadata)
Select columns by (returns dataset):

- name
- sequence of names
- map of names with new names (rename)
- function which filter names (via column metadata)
sourceraw docstring

select-missingclj

(select-missing ds)
(select-missing ds columns-selector)

Select rows with missing values

columns-selector selects columns to look at missing values

Select rows with missing values

`columns-selector` selects columns to look at missing values
sourceraw docstring

select-rowsclj

(select-rows ds)
(select-rows ds rows-selector)
(select-rows ds rows-selector options)

Select rows using:

  • row id
  • seq of row ids
  • seq of true/false
  • fn with predicate
Select rows using:

- row id
- seq of row ids
- seq of true/false
- fn with predicate
sourceraw docstring

semi-joinclj

(semi-join ds-left ds-right columns-selector)
(semi-join ds-left ds-right columns-selector options)
source

separate-columnclj

(separate-column ds column separator)
(separate-column ds column target-columns separator)
(separate-column ds column target-columns separator conf)
source

set-dataset-nameclj

(set-dataset-name dataset ds-name)
source

shapeclj

(shape ds)

Returns shape of the dataset [rows, cols]

Returns shape of the dataset [rows, cols]
sourceraw docstring

shuffleclj

(shuffle ds)
(shuffle ds options)
source

splitclj

(split ds)
(split ds split-type)
(split ds split-type options)

Split given dataset into 2 or more (holdout) splits

As the result two new columns are added:

  • :$split-name - with subgroup name
  • :$split-id - fold id/repetition id

split-type can be one of the following:

  • :kfold - k-fold strategy, :k defines number of folds (defaults to 5), produces k splits
  • :bootstrap - :ratio defines ratio of observations put into result (defaults to 1.0), produces 1 split
  • :holdout - split into two parts with given ratio (defaults to 2/3), produces 1 split
  • :loo - leave one out, produces the same number of splits as number of observations

:holdout can accept also probabilites or ratios and can split to more than 2 subdatasets

Additionally you can provide:

  • :seed - for random number generator
  • :repeats - repeat procedure :repeats times
  • :partition-selector - same as in group-by for stratified splitting to reflect dataset structure in splits.
  • :split-names names of subdatasets different than default, ie. [:train :test :split-2 ...]
  • :split-col-name - a column where name of split is stored, either :train or :test values (default: :$split-name)
  • :split-id-col-name - a column where id of the train/test pair is stored (default: :$split-id)

Rows are shuffled before splitting.

In case of grouped dataset each group is processed separately.

See more

Split given dataset into 2 or more (holdout) splits

As the result two new columns are added:

* `:$split-name` - with subgroup name
* `:$split-id` - fold id/repetition id

`split-type` can be one of the following:

* `:kfold` - k-fold strategy, `:k` defines number of folds (defaults to `5`), produces `k` splits
* `:bootstrap` - `:ratio` defines ratio of observations put into result (defaults to `1.0`), produces `1` split
* `:holdout` - split into two parts with given ratio (defaults to `2/3`), produces `1` split
* `:loo` - leave one out, produces the same number of splits as number of observations

`:holdout` can accept also probabilites or ratios and can split to more than 2 subdatasets

Additionally you can provide:

* `:seed` - for random number generator
* `:repeats` - repeat procedure `:repeats` times
* `:partition-selector` - same as in `group-by` for stratified splitting to reflect dataset structure in splits.
* `:split-names` names of subdatasets different than default, ie. `[:train :test :split-2 ...]`
* `:split-col-name` - a column where name of split is stored, either `:train` or `:test` values (default: `:$split-name`)
* `:split-id-col-name` - a column where id of the train/test pair is stored (default: `:$split-id`)

Rows are shuffled before splitting.

In case of grouped dataset each group is processed separately.

See [more](https://www.mitpressjournals.org/doi/pdf/10.1162/EVCO_a_00069)
sourceraw docstring

split->seqclj

(split->seq ds)
(split->seq ds split-type)
(split->seq ds split-type options)

Returns split as a sequence of train/test datasets or map of sequences (grouped dataset)

Returns split as a sequence of train/test datasets or map of sequences (grouped dataset)
sourceraw docstring

tailclj

(tail ds)
(tail ds n)
source

ungroupclj

(ungroup ds)
(ungroup ds options)

Concat groups into dataset.

When add-group-as-column or add-group-id-as-column is set to true or name(s), columns with group name(s) or group id is added to the result.

Before joining the groups groups can be sorted by group name.

Concat groups into dataset.

When `add-group-as-column` or `add-group-id-as-column` is set to `true` or name(s), columns with group name(s) or group id is added to the result.

Before joining the groups groups can be sorted by group name.
sourceraw docstring

unionclj

(union ds & args)
source

unique-byclj

(unique-by ds)
(unique-by ds columns-selector)
(unique-by ds columns-selector options)
source

unmark-groupclj

(unmark-group ds)

Remove grouping tag

Remove grouping tag
sourceraw docstring

unrollclj

(unroll ds columns-selector)
(unroll ds columns-selector options)

Unfolds sequences stored inside a column(s), turning it into multiple columns. Opposite of fold-by. Add each of the provided columns to the set that defines the "uniqe key" of each row. Thus there will be a new row for each value inside the target column(s)' value sequence. If you want instead to split the content of the columns into a set of new columns, look at separate-column. See https://scicloj.github.io/tablecloth/index.html#Unroll

Unfolds sequences stored inside a column(s), turning it into multiple columns. Opposite of [[fold-by]].
Add each of the provided columns to the set that defines the "uniqe key" of each row.
Thus there will be a new row for each value inside the target column(s)' value sequence.
If you want instead to split the content of the columns into a set of new _columns_, look at [[separate-column]].
See https://scicloj.github.io/tablecloth/index.html#Unroll
sourceraw docstring

update-columnsclj

(update-columns ds columns-map)
(update-columns ds columns-selector update-functions)
source

without-grouping->cljmacro

(without-grouping-> ds & args)
source

write!clj

(write! dataset output-path)
(write! dataset output-path options)

Write a dataset out to a file. Supported forms are:

(ds/write! test-ds "test.csv")
(ds/write! test-ds "test.tsv")
(ds/write! test-ds "test.tsv.gz")
(ds/write! test-ds "test.nippy")
(ds/write! test-ds out-stream)

Options:

  • :max-chars-per-column - csv,tsv specific, defaults to 65536 - values longer than this will cause an exception during serialization.
  • :max-num-columns - csv,tsv specific, defaults to 8192 - If the dataset has more than this number of columns an exception will be thrown during serialization.
  • :quoted-columns - csv specific - sequence of columns names that you would like to always have quoted.
  • :file-type - Manually specify the file type. This is usually inferred from the filename but if you pass in an output stream then you will need to specify the file type.
  • :headers? - if csv headers are written, defaults to true.
Write a dataset out to a file.  Supported forms are:

```clojure
(ds/write! test-ds "test.csv")
(ds/write! test-ds "test.tsv")
(ds/write! test-ds "test.tsv.gz")
(ds/write! test-ds "test.nippy")
(ds/write! test-ds out-stream)
```

Options:

  * `:max-chars-per-column` - csv,tsv specific, defaults to 65536 - values longer than this will
     cause an exception during serialization.
  * `:max-num-columns` - csv,tsv specific, defaults to 8192 - If the dataset has more than this number of
     columns an exception will be thrown during serialization.
  * `:quoted-columns` - csv specific - sequence of columns names that you would like to always have quoted.
  * `:file-type` - Manually specify the file type.  This is usually inferred from the filename but if you
     pass in an output stream then you will need to specify the file type.
  * `:headers?` - if csv headers are written, defaults to true.
sourceraw docstring

write-csv!clj

source

write-nippy!clj

(write-nippy! ds filename)
source

cljdoc is a website building & hosting documentation for Clojure/Script libraries

× close