clj-djl.dataframe

Liking cljdoc? Tell your friends :D

Clojure only.

->ndarray^clj

(->ndarray ndm dataframe & [cols])

Convert dataframe to NDArray

Convert dataframe to NDArray

raw docstring

add-column^clj

(add-column dataset column)

Add a new column. Error if name collision

Add a new column. Error if name collision

raw docstring

add-or-update-column^clj

(add-or-update-column dataset column)

(add-or-update-column dataset colname column)

If column exists, replace. Else append new column.

If column exists, replace.  Else append new column.

raw docstring

assoc-ds^clj

(assoc-ds dataset cname cdata & args)

If dataset is not nil, calls clojure.core/assoc. Else creates a new empty dataset and then calls clojure.core/assoc. Guaranteed to return a dataset (unlike assoc).

If dataset is not nil, calls `clojure.core/assoc`. Else creates a new empty dataset and
then calls `clojure.core/assoc`.  Guaranteed to return a dataset (unlike assoc).

raw docstring

brief^clj

(brief ds)

(brief ds options)

Get a brief description, in mapseq form of a dataset. A brief description is the mapseq form of descriptive stats.

Get a brief description, in mapseq form of a dataset.  A brief description is
the mapseq form of descriptive stats.

raw docstring

categorical->one-hot^clj

(categorical->one-hot dataset filter-fn-or-ds)

(categorical->one-hot dataset filter-fn-or-ds table-args)

(categorical->one-hot dataset filter-fn-or-ds table-args result-datatype)

Convert string columns to numeric columns. See tech.v3.dataset.categorical/fit-one-hot

Convert string columns to numeric columns.
See tech.v3.dataset.categorical/fit-one-hot

raw docstring

column^clj

(column dataset colname)

column-count^clj

(column-count dataset)

column-names^clj

(column-names dataset)

In-order sequence of column names

In-order sequence of column names

raw docstring

columns^clj

(columns dataset)

Return sequence of all columns in dataset.

Return sequence of all columns in dataset.

raw docstring

columns-with-missing-seq^clj

(columns-with-missing-seq dataset)

Return a sequence of:

  {:column-name column-name
   :missing-count missing-count
  }

or nil of no columns are missing data.

Return a sequence of:
```clojure
  {:column-name column-name
   :missing-count missing-count
  }
```
  or nil of no columns are missing data.

raw docstring

concat^clj

(concat)

(concat dataset & args)

Concatenate datasets in place using a copying-concatenation. See also concat-inplace as it may be more efficient for your use case if you have a small number (like less than 3) of datasets.

Concatenate datasets in place using a copying-concatenation.
See also concat-inplace as it may be more efficient for your use case if you have
a small number (like less than 3) of datasets.

raw docstring

concat-copying^clj

(concat-copying)

(concat-copying dataset & args)

Concatenate datasets into a new dataset copying data. Respects missing values. Datasets must all have the same columns. Result column datatypes will be a widening cast of the datatypes.

Concatenate datasets into a new dataset copying data.  Respects missing values.
Datasets must all have the same columns.  Result column datatypes will be a widening
cast of the datatypes.

raw docstring

concat-inplace^clj

(concat-inplace)

(concat-inplace dataset & args)

Concatenate datasets in place. Respects missing values. Datasets must all have the same columns. Result column datatypes will be a widening cast of the datatypes.

Concatenate datasets in place.  Respects missing values.  Datasets must all have the
same columns.  Result column datatypes will be a widening cast of the datatypes.

raw docstring

data->dataframe^clj

dataframe^clj

(dataframe data)

(dataframe data {:keys [dataframe-name] :as options})

dataframe->data^clj

dataframe-name^clj

descriptive-stats^clj

(descriptive-stats dataset)

(descriptive-stats dataset options)

Get descriptive statistics across the columns of the dataset. In addition to the standard stats. Options: :stat-names - defaults to (remove #{:values :num-distinct-values} (all-descriptive-stats-names)) :n-categorical-values - Number of categorical values to report in the 'values' field. Defaults to 21.

Get descriptive statistics across the columns of the dataset.
In addition to the standard stats.
Options:
:stat-names - defaults to (remove #{:values :num-distinct-values}
                                  (all-descriptive-stats-names))
:n-categorical-values - Number of categorical values to report in the 'values'
   field. Defaults to 21.

raw docstring

drop-columns^clj

(drop-columns dataset colname-seq-or-fn)

Same as remove-columns. Remove columns indexed by column name seq or column filter function. For example:

(drop-columns DS [:A :B])
(drop-columns DS cf/categorical)

Same as remove-columns. Remove columns indexed by column name seq or
column filter function.
For example:

```clojure
(drop-columns DS [:A :B])
(drop-columns DS cf/categorical)
```

raw docstring

drop-missing^clj

(drop-missing dataset-or-col)

(drop-missing ds colname)

Remove missing entries by simply selecting out the missing indexes.

Remove missing entries by simply selecting out the missing indexes.

raw docstring

drop-rows^clj

(drop-rows dataset-or-col row-indexes)

Drop rows from dataset or column

Drop rows from dataset or column

raw docstring

ensure-array-backed^clj

(ensure-array-backed ds)

(ensure-array-backed ds options)

Ensure the column data in the dataset is stored in pure java arrays. This is sometimes necessary for interop with other libraries and this operation will force any lazy computations to complete. This also clears the missing set for each column and writes the missing values to the new arrays.

Columns that are already array backed and that have no missing values are not changed and retuned.

The postcondition is that dtype/->array will return a java array in the appropriate datatype for each column.

Options:

:unpack? - unpack packed datetime types. Defaults to true

Ensure the column data in the dataset is stored in pure java arrays.  This is
sometimes necessary for interop with other libraries and this operation will
force any lazy computations to complete.  This also clears the missing set
for each column and writes the missing values to the new arrays.

Columns that are already array backed and that have no missing values are not
changed and retuned.

The postcondition is that dtype/->array will return a java array in the appropriate
datatype for each column.

Options:

* `:unpack?` - unpack packed datetime types.  Defaults to true

raw docstring

filter^clj

(filter dataset predicate)

dataset->dataset transformation. Predicate is passed a map of colname->column-value.

dataset->dataset transformation.  Predicate is passed a map of
colname->column-value.

raw docstring

filter-column^clj

(filter-column dataset colname)

(filter-column dataset colname predicate)

Filter a given column by a predicate. Predicate is passed column values. If predicate is not an instance of Ifn it is treated as a value and will be used as if the predicate is #(= value %).

The 2-arity form of this function reads the column as a boolean reader so for instance numeric 0 values are false in that case as are Double/NaN, Float/NaN. Objects are only false if nil?.

Returns a dataset.

Filter a given column by a predicate.  Predicate is passed column values.
If predicate is *not* an instance of Ifn it is treated as a value and will
be used as if the predicate is #(= value %).

The 2-arity form of this function reads the column as a boolean reader so for
instance numeric 0 values are false in that case as are Double/NaN, Float/NaN.  Objects are
only false if nil?.

Returns a dataset.

raw docstring

group-by^clj

(group-by dataset key-fn)

Produce a map of key-fn-value->dataset. key-fn is a function taking a map of colname->column-value.

Produce a map of key-fn-value->dataset.  key-fn is a function taking
a map of colname->column-value.

raw docstring

group-by->indexes^clj

(group-by->indexes dataset key-fn)

(Non-lazy) - Group a dataset and return a map of key-fn-value->indexes where indexes is an in-order contiguous group of indexes.

(Non-lazy) - Group a dataset and return a map of key-fn-value->indexes where indexes
is an in-order contiguous group of indexes.

raw docstring

group-by-column^clj

(group-by-column dataset colname)

Return a map of column-value->dataset.

Return a map of column-value->dataset.

raw docstring

group-by-column->indexes^clj

(group-by-column->indexes dataset colname)

(Non-lazy) - Group a dataset by a column return a map of column-val->indexes where indexes is an in-order contiguous group of indexes.

(Non-lazy) - Group a dataset by a column return a map of column-val->indexes
where indexes is an in-order contiguous group of indexes.

raw docstring

has-column?^clj

(has-column? dataset column-name)

head^clj

(head dataset)

(head dataset n)

Get the first n row of a dataset. Equivalent to `(select-rows ds (range n)). Arguments are reversed, however, so this can be used in ->> operators.

Get the first n row of a dataset.  Equivalent to
`(select-rows ds (range n)).  Arguments are reversed, however, so this can
be used in ->> operators.

raw docstring

mapseq-reader^clj

(mapseq-reader dataset)

(mapseq-reader dataset options)

Return a reader that produces a map of column-name->column-value upon read.

Return a reader that produces a map of column-name->column-value
upon read.

raw docstring

missing^clj

(missing dataset-or-col)

Given a dataset or a column, return the missing set as a roaring bitmap

Given a dataset or a column, return the missing set as a roaring bitmap

raw docstring

new-column^clj

(new-column data-or-data-map)

(new-column name data)

(new-column name data metadata)

(new-column name data metadata missing)

Create a new column. Data will scanned for missing values unless the full 4-argument pathway is used.

Create a new column.  Data will scanned for missing values
unless the full 4-argument pathway is used.

raw docstring

order-column-names^clj

(order-column-names dataset colname-seq)

Order a sequence of columns names so they match the order in the original dataset. Missing columns are placed last.

Order a sequence of columns names so they match the order in the
original dataset.  Missing columns are placed last.

raw docstring

remove-column^clj

(remove-column dataset col-name)

Same as:

(dissoc dataset col-name)

Same as:

```clojure
(dissoc dataset col-name)
```

raw docstring

remove-columns^clj

(remove-columns dataset colname-seq-or-fn)

Remove columns indexed by column name seq or column filter function. For example:

  (remove-columns DS [:A :B])
  (remove-columns DS cf/categorical)

Remove columns indexed by column name seq or column filter function.
  For example:

```clojure
  (remove-columns DS [:A :B])
  (remove-columns DS cf/categorical)
```

raw docstring

remove-rows^clj

(remove-rows dataset-or-col row-indexes)

Same as drop-rows.

Same as drop-rows.

raw docstring

rename-columns^clj

(rename-columns dataset colnames)

Rename columns using a map or vector of column names.

Does not reorder columns; rename is in-place for maps and positional for vectors.

Rename columns using a map or vector of column names.

Does not reorder columns; rename is in-place for maps and
positional for vectors.

raw docstring

replace-missing^clj

(replace-missing df)

(replace-missing df strategy)

(replace-missing df col-sel strategy)

Replace missing with:

builtin strategys: :mid :up :down and :lerp
value
or column function with missing slot dropped

Replace missing with:

- builtin strategys: `:mid` `:up` `:down` and `:lerp`
- value
- or column function with missing slot dropped

raw docstring

row-count^clj

(row-count dataset-or-col)

select^clj

(select dataset colname-seq selection)

Reorder/trim dataset according to this sequence of indexes. Returns a new dataset. colname-seq - one of:

:all - all the columns
sequence of column names - those columns in that order.
implementation of java.util.Map - column order is dictate by map iteration order selected columns are subsequently named after the corresponding value in the map. similar to rename-columns except this trims the result to be only the columns in the map. selection - either keyword :all, a list of indexes to select, or a list of booleans where the index position of each true value indicates an index to select. When providing indices, duplicates will select the specified index position more than once.

Reorder/trim dataset according to this sequence of indexes.  Returns a new dataset.
colname-seq - one of:
  - :all - all the columns
  - sequence of column names - those columns in that order.
  - implementation of java.util.Map - column order is dictate by map iteration order
     selected columns are subsequently named after the corresponding value in the map.
     similar to `rename-columns` except this trims the result to be only the columns
     in the map.
selection - either keyword :all, a list of indexes to select, or a list of booleans where
  the index position of each true value indicates an index to select. When providing indices,
  duplicates will select the specified index position more than once.

raw docstring

select-by-index^clj

(select-by-index dataframe row-index col-index)

Select a sub-dataframe by seq of row index and column index

Select a sub-dataframe by seq of row index and column index

raw docstring

select-columns^clj

(select-columns dataset colname-seq-or-fn)

Select columns from the dataset by:

seq of column names
column selector function
:all keyword

For example:

(select-columns DS [:A :B])
(select-columns DS cf/numeric)
(select-columns DS :all)

Select columns from the dataset by:

- seq of column names
- column selector function
- `:all` keyword

For example:

```clojure
(select-columns DS [:A :B])
(select-columns DS cf/numeric)
(select-columns DS :all)
```

raw docstring

select-columns-by-index^clj

(select-columns-by-index dataset col-index)

Select columns from the dataset by seq of index(includes negative) or :all.

See documentation for select-by-index.

Select columns from the dataset by seq of index(includes negative) or :all.

See documentation for `select-by-index`.

raw docstring

select-rows^clj

(select-rows dataset-or-col row-indexes)

Select rows from the dataset or column.

Select rows from the dataset or column.

raw docstring

select-rows-by-index^clj

(select-rows-by-index dataset-or-col row-index)

Select rows from the dataset or column by seq of index(includes negative) or :all.

See documentation for select-by-index.

Select rows from the dataset or column by seq of index(includes negative) or :all.

See documentation for `select-by-index`.

raw docstring

set-dataframe-name^clj

shape^clj

(shape dataframe)

Get the shape of dataframe, in row major way

Get the shape of dataframe, in row major way

raw docstring

sort-by^clj

(sort-by dataset key-fn)

(sort-by dataset key-fn compare-fn & args)

Sort a dataset by a key-fn and compare-fn.

key-fn - function from map to sort value.
compare-fn may be one of:
- a clojure operator like clojure.core/<
- :tech.numerics/<, :tech.numerics/> for unboxing comparisons of primitive values.
- clojure.core/compare
- A custom java.util.Comparator instantiation.

Options:

:nan-strategy - General missing strategy. Options are :first, :last, and :exception.
:parallel? - Uses parallel quicksort when true and regular quicksort when false.

Sort a dataset by a key-fn and compare-fn.

* `key-fn` - function from map to sort value.
* `compare-fn` may be one of:
   - a clojure operator like clojure.core/<
   - `:tech.numerics/<`, `:tech.numerics/>` for unboxing comparisons of primitive
      values.
   - clojure.core/compare
   - A custom java.util.Comparator instantiation.

Options:

* `:nan-strategy` - General missing strategy.  Options are `:first`, `:last`, and
  `:exception`.
* `:parallel?` - Uses parallel quicksort when true and regular quicksort when false.

raw docstring

sort-by-column^clj

(sort-by-column dataset colname)

(sort-by-column dataset colname compare-fn & args)

Sort a dataset by a given column using the given compare fn.

compare-fn may be one of:
- a clojure operator like clojure.core/<
- :tech.numerics/<, :tech.numerics/> for unboxing comparisons of primitive values.
- clojure.core/compare
- A custom java.util.Comparator instantiation.

Options:

:nan-strategy - General missing strategy. Options are :first, :last, and :exception.
:parallel? - Uses parallel quicksort when true and regular quicksort when false.

Sort a dataset by a given column using the given compare fn.

* `compare-fn` may be one of:
   - a clojure operator like clojure.core/<
   - `:tech.numerics/<`, `:tech.numerics/>` for unboxing comparisons of primitive
      values.
   - clojure.core/compare
   - A custom java.util.Comparator instantiation.

Options:

* `:nan-strategy` - General missing strategy.  Options are `:first`, `:last`, and
  `:exception`.
* `:parallel?` - Uses parallel quicksort when true and regular quicksort when false.

raw docstring

tail^clj

(tail dataset)

(tail dataset n)

Get the last n rows of a dataset. Equivalent to `(select-rows ds (range ...)). Argument order is dataset-last, however, so this can be used in ->> operators.

Get the last n rows of a dataset.  Equivalent to
`(select-rows ds (range ...)).  Argument order is dataset-last, however, so this can
be used in ->> operators.

raw docstring

take-nth^clj

(take-nth dataset n-val)

unique-by^clj

(unique-by dataset map-fn)

(unique-by dataset options map-fn)

Map-fn function gets passed map for each row, rows are grouped by the return value. Keep-fn is used to decide the index to keep.

:keep-fn - Function from key,idx-seq->idx. Defaults to #(first %2).

Map-fn function gets passed map for each row, rows are grouped by the
return value.  Keep-fn is used to decide the index to keep.

:keep-fn - Function from key,idx-seq->idx.  Defaults to #(first %2).

raw docstring

unique-by-column^clj

(unique-by-column dataset colname)

(unique-by-column dataset options colname)

Map-fn function gets passed map for each row, rows are grouped by the return value. Keep-fn is used to decide the index to keep.

:keep-fn - Function from key, idx-seq->idx. Defaults to #(first %2).

Map-fn function gets passed map for each row, rows are grouped by the
return value.  Keep-fn is used to decide the index to keep.

:keep-fn - Function from key, idx-seq->idx.  Defaults to #(first %2).

raw docstring

unordered-select^clj

(unordered-select dataset colname-seq index-seq)

Perform a selection but use the order of the columns in the existing table; do not reorder the columns based on colname-seq. Useful when doing selection based on sets or persistent hash maps.

Perform a selection but use the order of the columns in the existing table; do
*not* reorder the columns based on colname-seq.  Useful when doing selection based
on sets or persistent hash maps.

raw docstring

update-column^clj

update-columns^clj

(update-columns dataframe col-name-seq-or-fn update-fn)

Update a sequence of columns selected by:

column name seq: (update-columns DF [:A :B] #(dfn// % (dfn/mean %)))
column selector function.

(require '[clj-djl.dataframe :as df]
         '[clj-djl.dataframe.functional :as dfn]
         '[clj-djl.dataframe.column-filters :as cf])

(def DF (df/->dataframe {:A [1 2 3]
                         :B [4 5 6]
                         :C ["A" "B" "C"]}))

(df/update-columns DF [:A :B] #(dfn// % (dfn/mean %)))
;; => _unnamed [3 3]:

|  :A |  :B | :C |
|-----|-----|----|
| 0.5 | 0.8 |  A |
| 1.0 | 1.0 |  B |
| 1.5 | 1.2 |  C |

(df/update-columns DF cf/numeric #(dfn// % (dfn/mean %)))
;; => _unnamed [3 3]:

|  :A |  :B | :C |
|-----|-----|----|
| 0.5 | 0.8 |  A |
| 1.0 | 1.0 |  B |
| 1.5 | 1.2 |  C |

Update a sequence of columns selected by:

- column name seq: `(update-columns DF [:A :B] #(dfn// % (dfn/mean %)))`
- column selector function.

```clojure
(require '[clj-djl.dataframe :as df]
         '[clj-djl.dataframe.functional :as dfn]
         '[clj-djl.dataframe.column-filters :as cf])

(def DF (df/->dataframe {:A [1 2 3]
                         :B [4 5 6]
                         :C ["A" "B" "C"]}))

(df/update-columns DF [:A :B] #(dfn// % (dfn/mean %)))
;; => _unnamed [3 3]:

|  :A |  :B | :C |
|-----|-----|----|
| 0.5 | 0.8 |  A |
| 1.0 | 1.0 |  B |
| 1.5 | 1.2 |  C |

(df/update-columns DF cf/numeric #(dfn// % (dfn/mean %)))
;; => _unnamed [3 3]:

|  :A |  :B | :C |
|-----|-----|----|
| 0.5 | 0.8 |  A |
| 1.0 | 1.0 |  B |
| 1.5 | 1.2 |  C |
```

raw docstring

value-reader^clj

(value-reader dataset)

(value-reader dataset options)

Return a reader that produces a reader of column values per index. Options: :copying? - Default to false - When true row values are copied on read.

Return a reader that produces a reader of column values per index.
Options:
:copying? - Default to false - When true row values are copied on read.

raw docstring

cljdoc is a website building & hosting documentation for Clojure/Script libraries

Keyboard shortcuts Report a problem cljdoc on GitHub

× close