Liking cljdoc? Tell your friends :D

tech.ml.dataset.pipeline

A set of common 'pipeline' operations you probably will want to run on a dataset.

A set of common 'pipeline' operations you probably will want to run on a dataset.
raw docstring

->datatypeclj

(->datatype dataset)
(->datatype dataset column-filter)
(->datatype dataset column-filter datatype)

Marshall columns to be the etl datatype. This changes numeric columns to be a unified backing store datatype.

Marshall columns to be the etl datatype.  This changes numeric columns to be a
unified backing store datatype.
sourceraw docstring

assoc-metadataclj

(assoc-metadata dataset column-filter att-name att-value)

Assoc a new value into the metadata.

Assoc a new value into the metadata.
sourceraw docstring

colclj

(col & [column-name])

Return a column. Only works during 'm=' and the default column is the current operating column.

Return a column.  Only works during 'm=' and the default column
is the current operating column.
sourceraw docstring

correlation-tableclj

(correlation-table dataset & {:keys [colname-seq correlation-type]})

See dataset/correlation table. This version removes missing values and converts all columns to be numeric. So this will always work.

See dataset/correlation table.  This version removes missing values
and converts all columns to be numeric.  So this will always work.
sourceraw docstring

filterclj

(filter dataset column-filter filter-fn)

Filter out indexes for which filter-fn produces a 0 or false value.

Filter out indexes for which filter-fn produces a 0 or false value.
sourceraw docstring

impute-missingclj

(impute-missing dataset)
(impute-missing dataset column-filter)
(impute-missing dataset column-filter k)

Group columns into K groups and impute missing values from the means calculated from those groups.

Group columns into K groups and impute missing values from the means calculated from
those groups.
sourceraw docstring

int-mapclj

(int-map table col-data & {:keys [not-strict?]})

Perform an integer->integer conversion of a column using a static map. The map must be complete; missing entries are errors.

Perform an integer->integer conversion of a column using a static map.
The map must be complete; missing entries are errors.
sourceraw docstring

m=clj

(m= dataset column-filter operation)

Perform some math. Sets up variables such that the 'col' operator works.

Perform some math.  Sets up variables such that the 'col' operator
works.
sourceraw docstring

new-columnclj

(new-column dataset result-colname dataset-column-fn)

Create a new column. fn takes dataset and returns a reader, an iterable, or a new column.

Create a new column.  fn takes dataset and returns a reader, an iterable, or
a new column.
sourceraw docstring

one-hotclj

(one-hot dataset)
(one-hot dataset column-filter)
(one-hot dataset column-filter table-value-list & {:as op-args})

Replace string columns with one-hot encoded columns. table value list Argument can be nothing or a map containing keys representing the new derived column names and values representing which original values to encode to that particular column. The special keyword :rest indicates any remaining unencoded columns. example argument: {:main ["apple" "mandarin"] :other :rest}

Replace string columns with one-hot encoded columns.  table value list Argument can
 be nothing or a map containing keys representing the new derived column names and
 values representing which original values to encode to that particular column.  The
 special keyword :rest indicates any remaining unencoded columns.
 example argument:
 {:main ["apple" "mandarin"]
:other :rest}
sourceraw docstring

pcaclj

(pca dataset)
(pca dataset column-filter & {:as op-args})
source

pifclj

(pif dataset bool-expr pipe-when-true pipe-when-false)
source

pwhenclj

(pwhen dataset bool-expr pipe-when-true)
source

range-scaleclj

(range-scale dataset)
(range-scale dataset column-filter)
(range-scale dataset column-filter value-range & {:as op-args})

Range-scale a set of columns to be within either [-1 1] or the range provided by the first argument. Will fail if columns have missing values.

Range-scale a set of columns to be within either [-1 1] or the range provided
by the first argument.  Will fail if columns have missing values.
sourceraw docstring

read-varclj

(read-var varname)
source

remove-columnsclj

(remove-columns dataset column-filter)

Remove columns selected by column-filter

Remove columns selected by column-filter
sourceraw docstring

remove-missingclj

(remove-missing dataset)

Remove any missing values from the dataset

Remove any missing values from the dataset
sourceraw docstring

replaceclj

(replace dataset column-filter replace-value-or-fn & {:keys [result-datatype]})

Map a function across a column or set of columns. Map-fn may be a map?. Result column names are identical to src column names but metadata like a label map is removed. If map-setup-fn is provided, map-fn must be nil and map-setup-fn will be called with the dataset and column name to produce map-fn.

Map a function across a column or set of columns.  Map-fn may be a map?.
Result column names are identical to src column names but metadata like a label
map is removed.
If map-setup-fn is provided, map-fn must be nil and map-setup-fn will be called
with the dataset and column name to produce map-fn.
sourceraw docstring

replace-missingclj

(replace-missing dataset column-filter missing-value)

Replace all the missing values in the dataset. Can take a sclar missing value or a callable fn. If callable fn, the fn is passed the dataset and column-name

Replace all the missing values in the dataset.  Can take a sclar missing value
or a callable fn.  If callable fn, the fn is passed the dataset and column-name
sourceraw docstring

std-scaleclj

(std-scale dataset)
(std-scale
  dataset
  column-filter
  &
  {:keys [use-mean? use-std?] :or {use-mean? true use-std? true} :as op-args})

Scale columns to have 0 mean and 1 std deviation. Will fail if columns contain missing values.

Scale columns to have 0 mean and 1 std deviation.  Will fail if columns
contain missing values.
sourceraw docstring

store-variablesclj

(store-variables dataset varmap-fn)
source

string->numberclj

(string->number dataset)
(string->number dataset column-filter)
(string->number dataset column-filter table-value-list & {:as op-args})

Convert all string columns to numeric recording the lookup table in the column metadata.

Replace any string values with numeric values. Updates the label map of the options. Arguments may be notion or a vector of either expected strings or tuples of expected strings to their hardcoded values.

Convert all string columns to numeric recording the lookup table
in the column metadata.

Replace any string values with numeric values.  Updates the label map
of the options.  Arguments may be notion or a vector of either expected
strings or tuples of expected strings to their hardcoded values.
sourceraw docstring

training?clj

(training?)
source

update-columnclj

(update-column dataset column-filter column-fn)

Update a column via a function. Function takes a column and returns a either a column, an iterable, or a reader.

Update a column via a function.  Function takes a column and returns a either a
column, an iterable, or a reader.
sourceraw docstring

update-dataset-columnclj

(update-dataset-column dataset column-filter dataset-column-fn)

Update a column via a function. Function takes a dataset and a column and returns either a column, an iterable, or a reader.

Update a column via a function.  Function takes a dataset and a column and returns
either a column, an iterable, or a reader.
sourceraw docstring

cljdoc is a website building & hosting documentation for Clojure/Script libraries

× close