(->ndarray ndm dataframe & [cols])
Convert dataframe to NDArray
Convert dataframe to NDArray
(add-column dataset column)
Add a new column. Error if name collision
Add a new column. Error if name collision
(add-or-update-column dataset column)
(add-or-update-column dataset colname column)
If column exists, replace. Else append new column.
If column exists, replace. Else append new column.
(assoc-ds dataset cname cdata & args)
If dataset is not nil, calls clojure.core/assoc
. Else creates a new empty dataset and
then calls clojure.core/assoc
. Guaranteed to return a dataset (unlike assoc).
If dataset is not nil, calls `clojure.core/assoc`. Else creates a new empty dataset and then calls `clojure.core/assoc`. Guaranteed to return a dataset (unlike assoc).
(brief ds)
(brief ds options)
Get a brief description, in mapseq form of a dataset. A brief description is the mapseq form of descriptive stats.
Get a brief description, in mapseq form of a dataset. A brief description is the mapseq form of descriptive stats.
(categorical->one-hot dataset filter-fn-or-ds)
(categorical->one-hot dataset filter-fn-or-ds table-args)
(categorical->one-hot dataset filter-fn-or-ds table-args result-datatype)
Convert string columns to numeric columns. See tech.v3.dataset.categorical/fit-one-hot
Convert string columns to numeric columns. See tech.v3.dataset.categorical/fit-one-hot
(column dataset colname)
(column-count dataset)
(column-names dataset)
In-order sequence of column names
In-order sequence of column names
(columns dataset)
Return sequence of all columns in dataset.
Return sequence of all columns in dataset.
(columns-with-missing-seq dataset)
Return a sequence of:
{:column-name column-name
:missing-count missing-count
}
or nil of no columns are missing data.
Return a sequence of: ```clojure {:column-name column-name :missing-count missing-count } ``` or nil of no columns are missing data.
(concat)
(concat dataset & args)
Concatenate datasets in place using a copying-concatenation. See also concat-inplace as it may be more efficient for your use case if you have a small number (like less than 3) of datasets.
Concatenate datasets in place using a copying-concatenation. See also concat-inplace as it may be more efficient for your use case if you have a small number (like less than 3) of datasets.
(concat-copying)
(concat-copying dataset & args)
Concatenate datasets into a new dataset copying data. Respects missing values. Datasets must all have the same columns. Result column datatypes will be a widening cast of the datatypes.
Concatenate datasets into a new dataset copying data. Respects missing values. Datasets must all have the same columns. Result column datatypes will be a widening cast of the datatypes.
(concat-inplace)
(concat-inplace dataset & args)
Concatenate datasets in place. Respects missing values. Datasets must all have the same columns. Result column datatypes will be a widening cast of the datatypes.
Concatenate datasets in place. Respects missing values. Datasets must all have the same columns. Result column datatypes will be a widening cast of the datatypes.
(dataframe data)
(dataframe data {:keys [dataframe-name] :as options})
(descriptive-stats dataset)
(descriptive-stats dataset options)
Get descriptive statistics across the columns of the dataset. In addition to the standard stats. Options: :stat-names - defaults to (remove #{:values :num-distinct-values} (all-descriptive-stats-names)) :n-categorical-values - Number of categorical values to report in the 'values' field. Defaults to 21.
Get descriptive statistics across the columns of the dataset. In addition to the standard stats. Options: :stat-names - defaults to (remove #{:values :num-distinct-values} (all-descriptive-stats-names)) :n-categorical-values - Number of categorical values to report in the 'values' field. Defaults to 21.
(drop-columns dataset colname-seq-or-fn)
Same as remove-columns. Remove columns indexed by column name seq or column filter function. For example:
(drop-columns DS [:A :B])
(drop-columns DS cf/categorical)
Same as remove-columns. Remove columns indexed by column name seq or column filter function. For example: ```clojure (drop-columns DS [:A :B]) (drop-columns DS cf/categorical) ```
(drop-missing dataset-or-col)
(drop-missing ds colname)
Remove missing entries by simply selecting out the missing indexes.
Remove missing entries by simply selecting out the missing indexes.
(drop-rows dataset-or-col row-indexes)
Drop rows from dataset or column
Drop rows from dataset or column
(ensure-array-backed ds)
(ensure-array-backed ds options)
Ensure the column data in the dataset is stored in pure java arrays. This is sometimes necessary for interop with other libraries and this operation will force any lazy computations to complete. This also clears the missing set for each column and writes the missing values to the new arrays.
Columns that are already array backed and that have no missing values are not changed and retuned.
The postcondition is that dtype/->array will return a java array in the appropriate datatype for each column.
Options:
:unpack?
- unpack packed datetime types. Defaults to trueEnsure the column data in the dataset is stored in pure java arrays. This is sometimes necessary for interop with other libraries and this operation will force any lazy computations to complete. This also clears the missing set for each column and writes the missing values to the new arrays. Columns that are already array backed and that have no missing values are not changed and retuned. The postcondition is that dtype/->array will return a java array in the appropriate datatype for each column. Options: * `:unpack?` - unpack packed datetime types. Defaults to true
(filter dataset predicate)
dataset->dataset transformation. Predicate is passed a map of colname->column-value.
dataset->dataset transformation. Predicate is passed a map of colname->column-value.
(filter-column dataset colname)
(filter-column dataset colname predicate)
Filter a given column by a predicate. Predicate is passed column values. If predicate is not an instance of Ifn it is treated as a value and will be used as if the predicate is #(= value %).
The 2-arity form of this function reads the column as a boolean reader so for instance numeric 0 values are false in that case as are Double/NaN, Float/NaN. Objects are only false if nil?.
Returns a dataset.
Filter a given column by a predicate. Predicate is passed column values. If predicate is *not* an instance of Ifn it is treated as a value and will be used as if the predicate is #(= value %). The 2-arity form of this function reads the column as a boolean reader so for instance numeric 0 values are false in that case as are Double/NaN, Float/NaN. Objects are only false if nil?. Returns a dataset.
(group-by dataset key-fn)
Produce a map of key-fn-value->dataset. key-fn is a function taking a map of colname->column-value.
Produce a map of key-fn-value->dataset. key-fn is a function taking a map of colname->column-value.
(group-by->indexes dataset key-fn)
(Non-lazy) - Group a dataset and return a map of key-fn-value->indexes where indexes is an in-order contiguous group of indexes.
(Non-lazy) - Group a dataset and return a map of key-fn-value->indexes where indexes is an in-order contiguous group of indexes.
(group-by-column dataset colname)
Return a map of column-value->dataset.
Return a map of column-value->dataset.
(group-by-column->indexes dataset colname)
(Non-lazy) - Group a dataset by a column return a map of column-val->indexes where indexes is an in-order contiguous group of indexes.
(Non-lazy) - Group a dataset by a column return a map of column-val->indexes where indexes is an in-order contiguous group of indexes.
(has-column? dataset column-name)
(head dataset)
(head dataset n)
Get the first n row of a dataset. Equivalent to `(select-rows ds (range n)). Arguments are reversed, however, so this can be used in ->> operators.
Get the first n row of a dataset. Equivalent to `(select-rows ds (range n)). Arguments are reversed, however, so this can be used in ->> operators.
(mapseq-reader dataset)
(mapseq-reader dataset options)
Return a reader that produces a map of column-name->column-value upon read.
Return a reader that produces a map of column-name->column-value upon read.
(missing dataset-or-col)
Given a dataset or a column, return the missing set as a roaring bitmap
Given a dataset or a column, return the missing set as a roaring bitmap
(new-column data-or-data-map)
(new-column name data)
(new-column name data metadata)
(new-column name data metadata missing)
Create a new column. Data will scanned for missing values unless the full 4-argument pathway is used.
Create a new column. Data will scanned for missing values unless the full 4-argument pathway is used.
(order-column-names dataset colname-seq)
Order a sequence of columns names so they match the order in the original dataset. Missing columns are placed last.
Order a sequence of columns names so they match the order in the original dataset. Missing columns are placed last.
(remove-column dataset col-name)
Same as:
(dissoc dataset col-name)
Same as: ```clojure (dissoc dataset col-name) ```
(remove-columns dataset colname-seq-or-fn)
Remove columns indexed by column name seq or column filter function. For example:
(remove-columns DS [:A :B])
(remove-columns DS cf/categorical)
Remove columns indexed by column name seq or column filter function. For example: ```clojure (remove-columns DS [:A :B]) (remove-columns DS cf/categorical) ```
(remove-rows dataset-or-col row-indexes)
Same as drop-rows.
Same as drop-rows.
(rename-columns dataset colnames)
Rename columns using a map or vector of column names.
Does not reorder columns; rename is in-place for maps and positional for vectors.
Rename columns using a map or vector of column names. Does not reorder columns; rename is in-place for maps and positional for vectors.
(replace-missing df)
(replace-missing df strategy)
(replace-missing df col-sel strategy)
Replace missing with:
:mid
:up
:down
and :lerp
Replace missing with: - builtin strategys: `:mid` `:up` `:down` and `:lerp` - value - or column function with missing slot dropped
(row-count dataset-or-col)
(select dataset colname-seq selection)
Reorder/trim dataset according to this sequence of indexes. Returns a new dataset. colname-seq - one of:
rename-columns
except this trims the result to be only the columns
in the map.
selection - either keyword :all, a list of indexes to select, or a list of booleans where
the index position of each true value indicates an index to select. When providing indices,
duplicates will select the specified index position more than once.Reorder/trim dataset according to this sequence of indexes. Returns a new dataset. colname-seq - one of: - :all - all the columns - sequence of column names - those columns in that order. - implementation of java.util.Map - column order is dictate by map iteration order selected columns are subsequently named after the corresponding value in the map. similar to `rename-columns` except this trims the result to be only the columns in the map. selection - either keyword :all, a list of indexes to select, or a list of booleans where the index position of each true value indicates an index to select. When providing indices, duplicates will select the specified index position more than once.
(select-by-index dataframe row-index col-index)
Select a sub-dataframe by seq of row index and column index
Select a sub-dataframe by seq of row index and column index
(select-columns dataset colname-seq-or-fn)
Select columns from the dataset by:
:all
keywordFor example:
(select-columns DS [:A :B])
(select-columns DS cf/numeric)
(select-columns DS :all)
Select columns from the dataset by: - seq of column names - column selector function - `:all` keyword For example: ```clojure (select-columns DS [:A :B]) (select-columns DS cf/numeric) (select-columns DS :all) ```
(select-columns-by-index dataset col-index)
Select columns from the dataset by seq of index(includes negative) or :all.
See documentation for select-by-index
.
Select columns from the dataset by seq of index(includes negative) or :all. See documentation for `select-by-index`.
(select-rows dataset-or-col row-indexes)
Select rows from the dataset or column.
Select rows from the dataset or column.
(select-rows-by-index dataset-or-col row-index)
Select rows from the dataset or column by seq of index(includes negative) or :all.
See documentation for select-by-index
.
Select rows from the dataset or column by seq of index(includes negative) or :all. See documentation for `select-by-index`.
(shape dataframe)
Get the shape of dataframe, in row major way
Get the shape of dataframe, in row major way
(sort-by dataset key-fn)
(sort-by dataset key-fn compare-fn & args)
Sort a dataset by a key-fn and compare-fn.
key-fn
- function from map to sort value.compare-fn
may be one of:
:tech.numerics/<
, :tech.numerics/>
for unboxing comparisons of primitive
values.Options:
:nan-strategy
- General missing strategy. Options are :first
, :last
, and
:exception
.:parallel?
- Uses parallel quicksort when true and regular quicksort when false.Sort a dataset by a key-fn and compare-fn. * `key-fn` - function from map to sort value. * `compare-fn` may be one of: - a clojure operator like clojure.core/< - `:tech.numerics/<`, `:tech.numerics/>` for unboxing comparisons of primitive values. - clojure.core/compare - A custom java.util.Comparator instantiation. Options: * `:nan-strategy` - General missing strategy. Options are `:first`, `:last`, and `:exception`. * `:parallel?` - Uses parallel quicksort when true and regular quicksort when false.
(sort-by-column dataset colname)
(sort-by-column dataset colname compare-fn & args)
Sort a dataset by a given column using the given compare fn.
compare-fn
may be one of:
:tech.numerics/<
, :tech.numerics/>
for unboxing comparisons of primitive
values.Options:
:nan-strategy
- General missing strategy. Options are :first
, :last
, and
:exception
.:parallel?
- Uses parallel quicksort when true and regular quicksort when false.Sort a dataset by a given column using the given compare fn. * `compare-fn` may be one of: - a clojure operator like clojure.core/< - `:tech.numerics/<`, `:tech.numerics/>` for unboxing comparisons of primitive values. - clojure.core/compare - A custom java.util.Comparator instantiation. Options: * `:nan-strategy` - General missing strategy. Options are `:first`, `:last`, and `:exception`. * `:parallel?` - Uses parallel quicksort when true and regular quicksort when false.
(tail dataset)
(tail dataset n)
Get the last n rows of a dataset. Equivalent to `(select-rows ds (range ...)). Argument order is dataset-last, however, so this can be used in ->> operators.
Get the last n rows of a dataset. Equivalent to `(select-rows ds (range ...)). Argument order is dataset-last, however, so this can be used in ->> operators.
(take-nth dataset n-val)
(unique-by dataset map-fn)
(unique-by dataset options map-fn)
Map-fn function gets passed map for each row, rows are grouped by the return value. Keep-fn is used to decide the index to keep.
:keep-fn - Function from key,idx-seq->idx. Defaults to #(first %2).
Map-fn function gets passed map for each row, rows are grouped by the return value. Keep-fn is used to decide the index to keep. :keep-fn - Function from key,idx-seq->idx. Defaults to #(first %2).
(unique-by-column dataset colname)
(unique-by-column dataset options colname)
Map-fn function gets passed map for each row, rows are grouped by the return value. Keep-fn is used to decide the index to keep.
:keep-fn - Function from key, idx-seq->idx. Defaults to #(first %2).
Map-fn function gets passed map for each row, rows are grouped by the return value. Keep-fn is used to decide the index to keep. :keep-fn - Function from key, idx-seq->idx. Defaults to #(first %2).
(unordered-select dataset colname-seq index-seq)
Perform a selection but use the order of the columns in the existing table; do not reorder the columns based on colname-seq. Useful when doing selection based on sets or persistent hash maps.
Perform a selection but use the order of the columns in the existing table; do *not* reorder the columns based on colname-seq. Useful when doing selection based on sets or persistent hash maps.
(update-columns dataframe col-name-seq-or-fn update-fn)
Update a sequence of columns selected by:
(update-columns DF [:A :B] #(dfn// % (dfn/mean %)))
(require '[clj-djl.dataframe :as df]
'[clj-djl.dataframe.functional :as dfn]
'[clj-djl.dataframe.column-filters :as cf])
(def DF (df/->dataframe {:A [1 2 3]
:B [4 5 6]
:C ["A" "B" "C"]}))
(df/update-columns DF [:A :B] #(dfn// % (dfn/mean %)))
;; => _unnamed [3 3]:
| :A | :B | :C |
|-----|-----|----|
| 0.5 | 0.8 | A |
| 1.0 | 1.0 | B |
| 1.5 | 1.2 | C |
(df/update-columns DF cf/numeric #(dfn// % (dfn/mean %)))
;; => _unnamed [3 3]:
| :A | :B | :C |
|-----|-----|----|
| 0.5 | 0.8 | A |
| 1.0 | 1.0 | B |
| 1.5 | 1.2 | C |
Update a sequence of columns selected by: - column name seq: `(update-columns DF [:A :B] #(dfn// % (dfn/mean %)))` - column selector function. ```clojure (require '[clj-djl.dataframe :as df] '[clj-djl.dataframe.functional :as dfn] '[clj-djl.dataframe.column-filters :as cf]) (def DF (df/->dataframe {:A [1 2 3] :B [4 5 6] :C ["A" "B" "C"]})) (df/update-columns DF [:A :B] #(dfn// % (dfn/mean %))) ;; => _unnamed [3 3]: | :A | :B | :C | |-----|-----|----| | 0.5 | 0.8 | A | | 1.0 | 1.0 | B | | 1.5 | 1.2 | C | (df/update-columns DF cf/numeric #(dfn// % (dfn/mean %))) ;; => _unnamed [3 3]: | :A | :B | :C | |-----|-----|----| | 0.5 | 0.8 | A | | 1.0 | 1.0 | B | | 1.5 | 1.2 | C | ```
(value-reader dataset)
(value-reader dataset options)
Return a reader that produces a reader of column values per index. Options: :copying? - Default to false - When true row values are copied on read.
Return a reader that produces a reader of column values per index. Options: :copying? - Default to false - When true row values are copied on read.
cljdoc is a website building & hosting documentation for Clojure/Script libraries
× close