tablecloth.api

Liking cljdoc? Tell your friends :D

Clojure only.

->array
add-column
add-columns
add-or-replace-column
add-or-replace-columns
aggregate
aggregate-columns
anti-join
append
as-regular-dataset
asof-join
bind
by-rank
clone
column
column-count
column-names
columns
concat
concat-copying
convert-types
dataset
dataset->str
dataset-name
dataset?
difference
drop
drop-columns
drop-missing
drop-rows
empty-ds?
fill-range-replace
first
fold-by
full-join
group-by
grouped?
groups->map
groups->seq
has-column?
head
info
inner-join
intersect
join-columns
last
left-join
let-dataset
map-columns
mark-as-group
order-by
pivot->longer
pivot->wider
print-dataset
process-group-data
rand-nth
random
read-nippy
rename-columns
reorder-columns
replace-missing
right-join
row-count
rows
select
select-columns
select-missing
select-rows
semi-join
separate-column
set-dataset-name
shape
shuffle
split
split->seq
tail
ungroup
union
unique-by
unmark-group
unroll
update-columns
without-grouping->
write!
write-csv!
write-nippy!

->array^clj

(->array ds colname)

(->array ds colname datatype)

Convert numerical column(s) to java array

Convert numerical column(s) to java array

raw docstring

add-column^clj

(add-column ds column-name column)

(add-column ds column-name column size-strategy)

Add or update (modify) column under column-name.

column can be sequence of values or generator function (which gets ds as input).

ds - a dataset
column-name - if it's existing column name, column will be replaced
column - can be column (from other dataset), sequence, single value or function. Too big columns are always trimmed. Too small are cycled or extended with missing values (according to size-strategy argument)
size-strategy (optional) - when new column is shorter than dataset row count, following strategies are applied:
- :cycle - repeat data
- :na - append missing values
- :strict - (default) throws an exception when sizes mismatch

Add or update (modify) column under `column-name`.

`column` can be sequence of values or generator function (which gets `ds` as input).

* `ds` - a dataset
* `column-name` - if it's existing column name, column will be replaced
* `column` - can be column (from other dataset), sequence, single value or function. Too big columns are always trimmed. Too small are cycled or extended with missing values (according to `size-strategy` argument)
* `size-strategy` (optional) - when new column is shorter than dataset row count, following strategies are applied:
  - `:cycle` - repeat data
  - `:na` - append missing values
  - `:strict` - (default) throws an exception when sizes mismatch

raw docstring

add-columns^clj

(add-columns ds columns-map)

(add-columns ds columns-map size-strategy)

Add or updade (modify) columns defined in columns-map (mapping: name -> column)

Add or updade (modify) columns defined in `columns-map` (mapping: name -> column)

raw docstring

add-or-replace-column^clj

(add-or-replace-column ds column-name column)

(add-or-replace-column ds column-name column size-strategy)

add-or-replace-columns^clj

(add-or-replace-columns ds columns-map)

(add-or-replace-columns ds columns-map size-strategy)

aggregate^clj

(aggregate ds aggregator)

(aggregate ds
           aggregator
           {:keys [default-column-name-prefix ungroup? parallel?]
            :or {default-column-name-prefix "summary" ungroup? true}
            :as options})

Aggregate dataset by providing:

aggregation function
map with column names and functions
sequence of aggregation functions

Aggregation functions can return:

single value
seq of values
map of values with column names

Aggregate dataset by providing:

- aggregation function
- map with column names and functions
- sequence of aggregation functions

Aggregation functions can return:
- single value
- seq of values
- map of values with column names

raw docstring

aggregate-columns^clj

(aggregate-columns ds columns-selector column-aggregators)

(aggregate-columns ds columns-selector column-aggregators options)

Aggregates each column separately

Aggregates each column separately

raw docstring

anti-join^clj

(anti-join ds-left ds-right columns-selector)

(anti-join ds-left ds-right columns-selector options)

append^clj

(append ds & datasets)

as-regular-dataset^clj

(as-regular-dataset ds)

Remove grouping tag

Remove grouping tag

raw docstring

asof-join^clj

(asof-join ds-left ds-right colname)

(asof-join ds-left ds-right colname options)

bind^clj

(bind ds & datasets)

by-rank^clj

(by-rank ds columns-selector rank-predicate)

(by-rank ds
         columns-selector
         rank-predicate
         {:keys [desc? ties] :or {desc? true ties :dense}})

Select rows using rank on a column, ties are resolved using :dense method.

See R docs. Rank uses 0 based indexing.

Possible :ties strategies: :average, :first, :last, :random, :min, :max, :dense. :dense is the same as in data.table::frank from R

:desc? set to true (default) order descending before calculating rank

Select rows using `rank` on a column, ties are resolved using `:dense` method.

See [R docs](https://www.rdocumentation.org/packages/base/versions/3.6.1/topics/rank).
Rank uses 0 based indexing.

Possible `:ties` strategies: `:average`, `:first`, `:last`, `:random`, `:min`, `:max`, `:dense`.
`:dense` is the same as in `data.table::frank` from R

`:desc?` set to true (default) order descending before calculating rank

raw docstring

clone^clj

(clone item)

Clone an object. Can clone anything convertible to a reader.

Clone an object.  Can clone anything convertible to a reader.

raw docstring

column^clj

(column dataset colname)

column-count^clj

(column-count dataset)

column-names^clj

(column-names ds)

(column-names ds columns-selector)

(column-names ds columns-selector meta-field)

columns^clj

(columns ds)

(columns ds result-type)

Returns columns of dataset. Result type can be any of:

:as-map
:as-double-arrays
:as-seqs

Returns columns of dataset. Result type can be any of:
* `:as-map`
* `:as-double-arrays`
* `:as-seqs`

raw docstring

concat^clj

(concat dataset & datasets)

concat-copying^clj

(concat-copying dataset & datasets)

convert-types^clj

(convert-types ds coltype-map-or-columns-selector)

(convert-types ds columns-selector new-types)

Convert type of the column to the other type.

Convert type of the column to the other type.

raw docstring

dataset^clj

(dataset)

(dataset data)

(dataset data
         {:keys [single-value-column-name column-names layout dataset-name]
          :or {single-value-column-name :$value layout :as-rows}
          :as options})

Create dataset.

Dataset can be created from:

single value
map of values and/or sequences
sequence of maps
sequence of columns
file or url

Create `dataset`.

Dataset can be created from:

* single value
* map of values and/or sequences
* sequence of maps
* sequence of columns
* file or url

raw docstring

dataset->str^clj

(dataset->str ds)

(dataset->str ds options)

Convert a dataset to a string. Prints a single line header and then calls dataset-data->str.

For options documentation see dataset-data->str.

Convert a dataset to a string.  Prints a single line header and then calls
dataset-data->str.

For options documentation see dataset-data->str.

raw docstring

dataset-name^clj

(dataset-name dataset)

dataset?^clj

(dataset? ds)

Is ds a dataset type?

Is `ds` a `dataset` type?

raw docstring

difference^clj

(difference ds-left ds-right)

(difference ds-left ds-right options)

drop^clj

(drop ds columns-selector rows-selector)

Drop columns and rows.

Drop columns and rows.

raw docstring

drop-columns^clj

(drop-columns ds)

(drop-columns ds columns-selector)

(drop-columns ds columns-selector meta-field)

Drop columns by (returns dataset):

name
sequence of names
map of names with new names (rename)
function which filter names (via column metadata)

Drop columns by (returns dataset):

- name
- sequence of names
- map of names with new names (rename)
- function which filter names (via column metadata)

raw docstring

drop-missing^clj

(drop-missing ds)

(drop-missing ds columns-selector)

Drop rows with missing values

columns-selector selects columns to look at missing values

Drop rows with missing values

`columns-selector` selects columns to look at missing values

raw docstring

drop-rows^clj

(drop-rows ds)

(drop-rows ds rows-selector)

(drop-rows ds rows-selector {:keys [select-keys pre result-type parallel?]})

Drop rows using:

row id
seq of row ids
seq of true/false
fn with predicate

Drop rows using:

- row id
- seq of row ids
- seq of true/false
- fn with predicate

raw docstring

empty-ds?^clj

(empty-ds? ds)

fill-range-replace^clj

(fill-range-replace ds colname max-span)

(fill-range-replace ds colname max-span missing-strategy)

(fill-range-replace ds colname max-span missing-strategy missing-value)

first^clj

(first ds)

fold-by^clj

(fold-by ds columns-selector)

(fold-by ds columns-selector folding-function)

full-join^clj

(full-join ds-left ds-right columns-selector)

(full-join ds-left ds-right columns-selector options)

group-by^clj

(group-by ds grouping-selector)

(group-by ds
          grouping-selector
          {:keys [select-keys result-type]
           :or {result-type :as-dataset select-keys :all}
           :as options})

Group dataset by:

column name
list of columns
map of keys and row indexes
function getting map of values

Options are:

select-keys - when grouping is done by function, you can limit fields to a select-keys seq.
result-type - return results as dataset (:as-dataset, default) or as map of datasets (:as-map) or as map of row indexes (:as-indexes) or as sequence of (sub)datasets
other parameters which are passed to dataset fn

When dataset is returned, meta contains :grouped? set to true. Columns in dataset:

name - group name
group-id - id of the group (int)
data - group as dataset

Group dataset by:

- column name
- list of columns
- map of keys and row indexes
- function getting map of values

Options are:

- select-keys - when grouping is done by function, you can limit fields to a `select-keys` seq.
- result-type - return results as dataset (`:as-dataset`, default) or as map of datasets (`:as-map`) or as map of row indexes (`:as-indexes`) or as sequence of (sub)datasets
- other parameters which are passed to `dataset` fn

When dataset is returned, meta contains `:grouped?` set to true. Columns in dataset:

- name - group name
- group-id - id of the group (int)
- data - group as dataset

raw docstring

grouped?^clj

(grouped? ds)

Is dataset represents grouped dataset (result of group-by)?

Is `dataset` represents grouped dataset (result of `group-by`)?

raw docstring

groups->map^clj

(groups->map ds)

Convert grouped dataset to the map of groups

Convert grouped dataset to the map of groups

raw docstring

groups->seq^clj

(groups->seq ds)

has-column?^clj

(has-column? dataset column-name)

head^clj

(head ds)

(head ds n)

info^clj

(info ds)

(info ds result-type)

inner-join^clj

(inner-join ds-left ds-right columns-selector)

(inner-join ds-left ds-right columns-selector options)

intersect^clj

(intersect ds-left ds-right)

(intersect ds-left ds-right options)

join-columns^clj

(join-columns ds target-column columns-selector)

(join-columns ds
              target-column
              columns-selector
              {:keys [separator missing-subst drop-columns? result-type
                      parallel?]
               :or {separator "-" drop-columns? true result-type :string}})

last^clj

(last ds)

left-join^clj

(left-join ds-left ds-right columns-selector)

(left-join ds-left ds-right columns-selector options)

let-dataset^cljmacro

(let-dataset bindings)

(let-dataset bindings options)

map-columns^clj

(map-columns ds column-name map-fn)

(map-columns ds column-name columns-selector map-fn)

(map-columns ds column-name new-type columns-selector map-fn)

mark-as-group^clj

(mark-as-group ds)

Add grouping tag

Add grouping tag

raw docstring

order-by^clj

(order-by ds columns-or-fn)

(order-by ds columns-or-fn comparators)

(order-by ds columns-or-fn comparators {:keys [parallel?]})

Order dataset by:

column name
columns (as sequence of names)
key-fn
sequence of columns / key-fn Additionally you can ask the order by:
:asc
:desc
custom comparator function

Order dataset by:
- column name
- columns (as sequence of names)
- key-fn
- sequence of columns / key-fn
Additionally you can ask the order by:
- :asc
- :desc
- custom comparator function

raw docstring

pivot->longer^clj

(pivot->longer ds)

(pivot->longer ds columns-selector)

(pivot->longer
  ds
  columns-selector
  {:keys [target-columns value-column-name splitter drop-missing? datatypes]
   :or {target-columns :$column value-column-name :$value drop-missing? true}})

tidyr pivot_longer api

`tidyr` pivot_longer api

raw docstring

pivot->wider^clj

(pivot->wider ds columns-selector value-columns)

(pivot->wider
  ds
  columns-selector
  value-columns
  {:keys [fold-fn concat-columns-with concat-value-with drop-missing?]
   :or {concat-columns-with "_" concat-value-with "-" drop-missing? true}})

print-dataset^clj

(print-dataset ds)

(print-dataset ds options)

process-group-data^clj

(process-group-data ds f)

(process-group-data ds f parallel?)

rand-nth^clj

(rand-nth ds)

(rand-nth ds {:keys [seed]})

random^clj

(random ds)

(random ds n)

(random ds n {:keys [repeat? seed] :or {repeat? true}})

read-nippy^clj

(read-nippy filename)

rename-columns^clj

(rename-columns ds columns-mapping)

(rename-columns ds columns-selector columns-map-fn)

Rename columns with provided old -> new name map

Rename columns with provided old -> new name map

raw docstring

reorder-columns^clj

(reorder-columns ds columns-selector & columns-selectors)

Reorder columns using column selector(s). When column names are incomplete, the missing will be attached at the end.

Reorder columns using column selector(s). When column names are incomplete, the missing will be attached at the end.

raw docstring

replace-missing^clj

(replace-missing ds)

(replace-missing ds strategy)

(replace-missing ds columns-selector strategy)

(replace-missing ds columns-selector strategy value)

right-join^clj

(right-join ds-left ds-right columns-selector)

(right-join ds-left ds-right columns-selector options)

row-count^clj

(row-count dataset-or-col)

rows^clj

(rows ds)

(rows ds result-type)

Returns rows of dataset. Result type can be any of:

:as-maps
:as-double-arrays
:as-seqs

Returns rows of dataset. Result type can be any of:
* `:as-maps`
* `:as-double-arrays`
* `:as-seqs`

raw docstring

select^clj

(select ds columns-selector rows-selector)

Select columns and rows.

Select columns and rows.

raw docstring

select-columns^clj

(select-columns ds)

(select-columns ds columns-selector)

(select-columns ds columns-selector meta-field)

Select columns by (returns dataset):

name
sequence of names
map of names with new names (rename)
function which filter names (via column metadata)

Select columns by (returns dataset):

- name
- sequence of names
- map of names with new names (rename)
- function which filter names (via column metadata)

raw docstring

select-missing^clj

(select-missing ds)

(select-missing ds columns-selector)

Select rows with missing values

columns-selector selects columns to look at missing values

Select rows with missing values

`columns-selector` selects columns to look at missing values

raw docstring

select-rows^clj

(select-rows ds)

(select-rows ds rows-selector)

(select-rows ds rows-selector {:keys [select-keys pre result-type parallel?]})

Select rows using:

row id
seq of row ids
seq of true/false
fn with predicate

Select rows using:

- row id
- seq of row ids
- seq of true/false
- fn with predicate

raw docstring

semi-join^clj

(semi-join ds-left ds-right columns-selector)

(semi-join ds-left ds-right columns-selector options)

separate-column^clj

(separate-column ds column separator)

(separate-column ds column target-columns separator)

(separate-column ds
                 column
                 target-columns
                 separator
                 {:keys [missing-subst drop-column? parallel?]
                  :or {missing-subst ""}})

set-dataset-name^clj

(set-dataset-name dataset ds-name)

shape^clj

(shape ds)

Returns shape of the dataset [rows, cols]

Returns shape of the dataset [rows, cols]

raw docstring

shuffle^clj

(shuffle ds)

(shuffle ds {:keys [seed]})

split^clj

(split ds)

(split ds split-type)

(split ds
       split-type
       {:keys [seed parallel? shuffle?] :or {shuffle? true} :as opts})

Split given dataset into 2 or more (holdout) splits

As the result two new columns are added:

:$split-name - with subgroup name
:$split-id - fold id/repetition id

split-type can be one of the following:

:kfold - k-fold strategy, :k defines number of folds (defaults to 5), produces k splits
:bootstrap - :ratio defines ratio of observations put into result (defaults to 1.0), produces 1 split
:holdout - split into two parts with given ratio (defaults to 2/3), produces 1 split
:loo - leave one out, produces the same number of splits as number of observations

:holdout can accept also probabilites or ratios and can split to more than 2 subdatasets

Additionally you can provide:

:seed - for random number generator
:repeats - repeat procedure :repeats times
:partition-selector - same as in group-by for stratified splitting to reflect dataset structure in splits.
:split-names names of subdatasets different than default, ie. [:train :test :split-2 ...]
:split-col-name - a column where name of split is stored, either :train or :test values (default: :$split-name)
:split-id-col-name - a column where id of the train/test pair is stored (default: :$split-id)

Rows are shuffled before splitting.

In case of grouped dataset each group is processed separately.

Split given dataset into 2 or more (holdout) splits

As the result two new columns are added:

* `:$split-name` - with subgroup name
* `:$split-id` - fold id/repetition id

`split-type` can be one of the following:

* `:kfold` - k-fold strategy, `:k` defines number of folds (defaults to `5`), produces `k` splits
* `:bootstrap` - `:ratio` defines ratio of observations put into result (defaults to `1.0`), produces `1` split
* `:holdout` - split into two parts with given ratio (defaults to `2/3`), produces `1` split
* `:loo` - leave one out, produces the same number of splits as number of observations

`:holdout` can accept also probabilites or ratios and can split to more than 2 subdatasets

Additionally you can provide:

* `:seed` - for random number generator
* `:repeats` - repeat procedure `:repeats` times
* `:partition-selector` - same as in `group-by` for stratified splitting to reflect dataset structure in splits.
* `:split-names` names of subdatasets different than default, ie. `[:train :test :split-2 ...]`
* `:split-col-name` - a column where name of split is stored, either `:train` or `:test` values (default: `:$split-name`)
* `:split-id-col-name` - a column where id of the train/test pair is stored (default: `:$split-id`)

Rows are shuffled before splitting.

In case of grouped dataset each group is processed separately.

See [more](https://www.mitpressjournals.org/doi/pdf/10.1162/EVCO_a_00069)

raw docstring

split->seq^clj

(split->seq ds)

(split->seq ds split-type)

(split->seq ds
            split-type
            {:keys [split-col-name split-id-col-name]
             :or {split-col-name :$split-name split-id-col-name :$split-id}
             :as opts})

Returns split as a sequence of train/test datasets or map of sequences (grouped dataset)

Returns split as a sequence of train/test datasets or map of sequences (grouped dataset)

raw docstring

tail^clj

(tail ds)

(tail ds n)

ungroup^clj

(ungroup ds)

(ungroup ds
         {:keys [order? add-group-as-column add-group-id-as-column separate?
                 dataset-name parallel?]
          :or {separate? true}})

Concat groups into dataset.

When add-group-as-column or add-group-id-as-column is set to true or name(s), columns with group name(s) or group id is added to the result.

Before joining the groups groups can be sorted by group name.

Concat groups into dataset.

When `add-group-as-column` or `add-group-id-as-column` is set to `true` or name(s), columns with group name(s) or group id is added to the result.

Before joining the groups groups can be sorted by group name.

raw docstring

union^clj

(union ds & datasets)

unique-by^clj

(unique-by ds)

(unique-by ds columns-selector)

(unique-by
  ds
  columns-selector
  {:keys [strategy select-keys parallel?] :or {strategy :first} :as options})

unmark-group^clj

(unmark-group ds)

Remove grouping tag

Remove grouping tag

raw docstring

unroll^clj

(unroll ds columns-selector)

(unroll ds columns-selector options)

update-columns^clj

(update-columns ds columns-map)

(update-columns ds columns-selector update-functions)

without-grouping->^cljmacro

(without-grouping-> ds & r)

write!^clj

(write! dataset output-path)

(write! dataset output-path options)

Write a dataset out to a file. Supported forms are:

(ds/write! test-ds "test.csv")
(ds/write! test-ds "test.tsv")
(ds/write! test-ds "test.tsv.gz")
(ds/write! test-ds "test.nippy")
(ds/write! test-ds out-stream)

Options:

:max-chars-per-column - csv,tsv specific, defaults to 65536 - values longer than this will cause an exception during serialization.
:max-num-columns - csv,tsv specific, defaults to 8192 - If the dataset has more than this number of columns an exception will be thrown during serialization.
:quoted-columns - csv specific - sequence of columns names that you would like to always have quoted.
:file-type - Manually specify the file type. This is usually inferred from the filename but if you pass in an output stream then you will need to specify the file type.
:headers? - if csv headers are written, defaults to true.

Write a dataset out to a file.  Supported forms are:

```clojure
(ds/write! test-ds "test.csv")
(ds/write! test-ds "test.tsv")
(ds/write! test-ds "test.tsv.gz")
(ds/write! test-ds "test.nippy")
(ds/write! test-ds out-stream)
```

Options:

  * `:max-chars-per-column` - csv,tsv specific, defaults to 65536 - values longer than this will
     cause an exception during serialization.
  * `:max-num-columns` - csv,tsv specific, defaults to 8192 - If the dataset has more than this number of
     columns an exception will be thrown during serialization.
  * `:quoted-columns` - csv specific - sequence of columns names that you would like to always have quoted.
  * `:file-type` - Manually specify the file type.  This is usually inferred from the filename but if you
     pass in an output stream then you will need to specify the file type.
  * `:headers?` - if csv headers are written, defaults to true.

raw docstring

write-csv!^cljdeprecated

write-nippy!^clj

(write-nippy! ds filename)

cljdoc is a website building & hosting documentation for Clojure/Script libraries

Keyboard shortcuts Report a problem cljdoc on GitHub

× close

tablecloth.api

->arrayclj

add-columnclj

add-columnsclj

add-or-replace-columnclj

add-or-replace-columnsclj

aggregateclj

aggregate-columnsclj

anti-joinclj

appendclj

as-regular-datasetclj

asof-joinclj

bindclj

by-rankclj

cloneclj

columnclj

column-countclj

column-namesclj

columnsclj

concatclj

concat-copyingclj

convert-typesclj

datasetclj

dataset->strclj

dataset-nameclj

dataset?clj

differenceclj

dropclj

drop-columnsclj

drop-missingclj

drop-rowsclj

empty-ds?clj

fill-range-replaceclj

firstclj

fold-byclj

full-joinclj

group-byclj

grouped?clj

groups->mapclj

groups->seqclj

has-column?clj

headclj

infoclj

inner-joinclj

intersectclj

join-columnsclj

lastclj

left-joinclj

let-datasetcljmacro

map-columnsclj

mark-as-groupclj

order-byclj

pivot->longerclj

pivot->widerclj

print-datasetclj

process-group-dataclj

rand-nthclj

randomclj

read-nippyclj

rename-columnsclj

reorder-columnsclj

replace-missingclj

right-joinclj

row-countclj

rowsclj

selectclj

select-columnsclj

select-missingclj

select-rowsclj

semi-joinclj

separate-columnclj

set-dataset-nameclj

shapeclj

shuffleclj

splitclj

split->seqclj

tailclj

ungroupclj

unionclj

unique-byclj

->array^clj

add-column^clj

add-columns^clj

add-or-replace-column^clj

add-or-replace-columns^clj

aggregate^clj

aggregate-columns^clj

anti-join^clj

append^clj

as-regular-dataset^clj

asof-join^clj

bind^clj

by-rank^clj

clone^clj

column^clj

column-count^clj

column-names^clj

columns^clj

concat^clj

concat-copying^clj

convert-types^clj

dataset^clj

dataset->str^clj

dataset-name^clj

dataset?^clj

difference^clj

drop^clj

drop-columns^clj

drop-missing^clj

drop-rows^clj

empty-ds?^clj

fill-range-replace^clj

first^clj

fold-by^clj

full-join^clj

group-by^clj

grouped?^clj

groups->map^clj

groups->seq^clj

has-column?^clj

head^clj

info^clj

inner-join^clj

intersect^clj

join-columns^clj

last^clj

left-join^clj

let-dataset^cljmacro

map-columns^clj

mark-as-group^clj

order-by^clj

pivot->longer^clj

pivot->wider^clj

print-dataset^clj

process-group-data^clj

rand-nth^clj

random^clj

read-nippy^clj

rename-columns^clj

reorder-columns^clj

replace-missing^clj

right-join^clj

row-count^clj

rows^clj

select^clj

select-columns^clj

select-missing^clj

select-rows^clj

semi-join^clj

separate-column^clj

set-dataset-name^clj

shape^clj

shuffle^clj

split^clj

split->seq^clj

tail^clj

ungroup^clj

union^clj

unique-by^clj

unmark-group^clj