(->k-fold-datasets
dataset
k
{:keys [randomize-dataset?] :or {randomize-dataset? true} :as options})
Given 1 dataset, prepary K datasets using the k-fold algorithm. Randomize dataset defaults to true which will realize the entire dataset so use with care if you have large datasets.
Given 1 dataset, prepary K datasets using the k-fold algorithm. Randomize dataset defaults to true which will realize the entire dataset so use with care if you have large datasets.
(->row-major dataset options)
(->row-major dataset
key-colname-seq-map
{:keys [datatype] :or {datatype :float64}})
Given a dataset and a map if desired key names to sequences of columns, produce a sequence of maps where each key name points to contiguous vector composed of the column values concatenated. If colname-seq-map is not provided then each row defaults to {:features [feature-columns] :label [label-columns]}
Given a dataset and a map if desired key names to sequences of columns, produce a sequence of maps where each key name points to contiguous vector composed of the column values concatenated. If colname-seq-map is not provided then each row defaults to {:features [feature-columns] :label [label-columns]}
(->train-test-split dataset
{:keys [randomize-dataset? train-fraction]
:or {randomize-dataset? true train-fraction 0.7}
:as options})
(column-values->categorical dataset src-column)
Given a column encoded via either string->number or one-hot, reverse map to the a sequence of the original string column values.
Given a column encoded via either string->number or one-hot, reverse map to the a sequence of the original string column values.
(feature-ecount dataset)
When columns aren't scalars then this will change. For now, just the number of feature columns.
When columns aren't scalars then this will change. For now, just the number of feature columns.
(inference-target-label-inverse-map dataset & [label-columns])
Given options generated during ETL operations and annotated with :label-columns sequence container 1 label column, generate a reverse map that maps from a dataset value back to the label that generated that value.
Given options generated during ETL operations and annotated with :label-columns sequence container 1 label column, generate a reverse map that maps from a dataset value back to the label that generated that value.
(model-type dataset & [column-name-seq])
Check the label column after dataset processing. Return either :regression :classification
Check the label column after dataset processing. Return either :regression :classification
(num-inference-classes dataset)
Given a dataset and correctly built options from pipeline operations, return the number of classes used for the label. Error if not classification dataset.
Given a dataset and correctly built options from pipeline operations, return the number of classes used for the label. Error if not classification dataset.
(reduce-column-names dataset colname-seq)
Reverse map from the one-hot encoded columns to the original source column.
Reverse map from the one-hot encoded columns to the original source column.
cljdoc is a website building & hosting documentation for Clojure/Script libraries
× close