tech.ml.dataset.modelling

Liking cljdoc? Tell your friends :D

Clojure only.

->k-fold-datasets
->row-major
->train-test-split
column-label-map
column-values->categorical
dataset-label-map
feature-ecount
has-column-label-map?
inference-target-column-names
inference-target-label-inverse-map
inference-target-label-map
model-type
num-inference-classes
reduce-column-names
set-inference-target

->k-fold-datasets^clj

(->k-fold-datasets dataset k)

(->k-fold-datasets
  dataset
  k
  {:keys [randomize-dataset?] :or {randomize-dataset? true} :as options})

Given 1 dataset, prepary K datasets using the k-fold algorithm. Randomize dataset defaults to true which will realize the entire dataset so use with care if you have large datasets.

Given 1 dataset, prepary K datasets using the k-fold algorithm.
Randomize dataset defaults to true which will realize the entire dataset
so use with care if you have large datasets.

source raw docstring

->row-major^clj

(->row-major dataset)

(->row-major dataset options)

(->row-major dataset
             key-colname-seq-map
             {:keys [datatype] :or {datatype :float64}})

Given a dataset and a map of desired key names to sequences of columns, produce a sequence of maps where each key name points to contiguous vector composed of the column values concatenated. If colname-seq-map is not provided then each row defaults to {:features [feature-columns] :label [label-columns]}

Given a dataset and a map of desired key names to sequences of columns,
produce a sequence of maps where each key name points to contiguous vector
composed of the column values concatenated.
If colname-seq-map is not provided then each row defaults to
{:features [feature-columns]
 :label [label-columns]}

source raw docstring

->train-test-split^clj

(->train-test-split dataset)

(->train-test-split dataset
                    {:keys [randomize-dataset? train-fraction]
                     :or {randomize-dataset? true train-fraction 0.7}
                     :as options})

source

column-label-map^clj

(column-label-map dataset column-name)

source

column-values->categorical^clj

(column-values->categorical dataset src-column)

Given a column encoded via either string->number or one-hot, reverse map to the a sequence of the original string column values.

Given a column encoded via either string->number or one-hot, reverse
map to the a sequence of the original string column values.

source raw docstring

dataset-label-map^clj

(dataset-label-map dataset)

source

feature-ecount^clj

(feature-ecount dataset)

When columns aren't scalars then this will change. For now, just the number of feature columns.

When columns aren't scalars then this will change.
For now, just the number of feature columns.

source raw docstring

has-column-label-map?^clj

(has-column-label-map? dataset column-name)

source

inference-target-column-names^clj

(inference-target-column-names ds)

source

inference-target-label-inverse-map^clj

(inference-target-label-inverse-map dataset & [label-columns])

Given options generated during ETL operations and annotated with :label-columns sequence container 1 label column, generate a reverse map that maps from a dataset value back to the label that generated that value.

Given options generated during ETL operations and annotated with :label-columns
sequence container 1 label column, generate a reverse map that maps from a dataset
value back to the label that generated that value.

source raw docstring

inference-target-label-map^clj

(inference-target-label-map dataset & [label-columns])

source

model-type^clj

(model-type dataset & [column-name-seq])

Check the label column after dataset processing. Return either :regression :classification

Check the label column after dataset processing.
Return either
:regression
:classification

source raw docstring

num-inference-classes^clj

(num-inference-classes dataset)

Given a dataset and correctly built options from pipeline operations, return the number of classes used for the label. Error if not classification dataset.

Given a dataset and correctly built options from pipeline operations,
return the number of classes used for the label.  Error if not classification
dataset.

source raw docstring

reduce-column-names^clj

(reduce-column-names dataset colname-seq)

Reverse map from the one-hot encoded columns to the original source column.

Reverse map from the one-hot encoded columns
to the original source column.

source raw docstring

set-inference-target^clj

(set-inference-target dataset target-name-or-target-name-seq)

source

cljdoc is a website building & hosting documentation for Clojure/Script libraries

Keyboard shortcuts Report a problem cljdoc on GitHub

× close