A set of common 'pipeline' operations you probably will want to run on a dataset.
A set of common 'pipeline' operations you probably will want to run on a dataset.
(->datatype dataset)
(->datatype dataset column-filter)
(->datatype dataset column-filter datatype)
Marshall columns to be the etl datatype. This changes numeric columns to be a unified backing store datatype.
Marshall columns to be the etl datatype. This changes numeric columns to be a unified backing store datatype.
(assoc-metadata dataset column-filter att-name att-value)
Assoc a new value into the metadata.
Assoc a new value into the metadata.
(col & [column-name])
Return a column. Only works during 'm=' and the default column is the current operating column.
Return a column. Only works during 'm=' and the default column is the current operating column.
(filter dataset column-filter filter-fn)
Filter out indexes for which filter-fn produces a 0 or false value.
Filter out indexes for which filter-fn produces a 0 or false value.
(impute-missing dataset)
(impute-missing dataset column-filter)
(impute-missing dataset column-filter k)
Group columns into K groups and impute missing values from the means calculated from those groups.
Group columns into K groups and impute missing values from the means calculated from those groups.
(int-map table col-data & {:keys [not-strict?]})
Perform an integer->integer conversion of a column using a static map. The map must be complete; missing entries are errors.
Perform an integer->integer conversion of a column using a static map. The map must be complete; missing entries are errors.
(m= dataset column-filter operation)
Perform some math. Sets up variables such that the 'col' operator works.
Perform some math. Sets up variables such that the 'col' operator works.
(new-column dataset result-colname dataset-column-fn)
Create a new column. fn takes dataset and returns a reader, an iterable, or a new column.
Create a new column. fn takes dataset and returns a reader, an iterable, or a new column.
(one-hot dataset)
(one-hot dataset column-filter)
(one-hot dataset
column-filter
table-value-list
&
{:keys [datatype] :as op-args})
Replace string columns with one-hot encoded columns. table value list Argument can be nothing or a map containing keys representing the new derived column names and values representing which original values to encode to that particular column. The special keyword :rest indicates any remaining unencoded columns. example argument: {:main ["apple" "mandarin"] :other :rest}
Replace string columns with one-hot encoded columns. table value list Argument can be nothing or a map containing keys representing the new derived column names and values representing which original values to encode to that particular column. The special keyword :rest indicates any remaining unencoded columns. example argument: {:main ["apple" "mandarin"] :other :rest}
(pca dataset)
(pca dataset
column-filter
&
{:keys [method variance n-components]
:or {method :svd variance 0.95}
:as op-args})
(range-scale dataset)
(range-scale dataset column-filter)
(range-scale dataset column-filter value-range & {:keys [datatype] :as op-args})
Range-scale a set of columns to be within either [-1 1] or the range provided by the first argument. Will fail if columns have missing values.
Range-scale a set of columns to be within either [-1 1] or the range provided by the first argument. Will fail if columns have missing values.
(remove-columns dataset column-filter)
Remove columns selected by column-filter
Remove columns selected by column-filter
(remove-missing dataset)
Remove any missing values from the dataset
Remove any missing values from the dataset
(replace dataset column-filter replace-value-or-fn & {:keys [result-datatype]})
Map a function across a column or set of columns. Map-fn may be a map?. Result column names are identical to src column names but metadata like a label map is removed. If map-setup-fn is provided, map-fn must be nil and map-setup-fn will be called with the dataset and column name to produce map-fn.
Map a function across a column or set of columns. Map-fn may be a map?. Result column names are identical to src column names but metadata like a label map is removed. If map-setup-fn is provided, map-fn must be nil and map-setup-fn will be called with the dataset and column name to produce map-fn.
(replace-missing dataset column-filter missing-value)
Replace all the missing values in the dataset. Can take a sclar missing value or a callable fn. If callable fn, the fn is passed the dataset and column-name
Replace all the missing values in the dataset. Can take a sclar missing value or a callable fn. If callable fn, the fn is passed the dataset and column-name
(std-scale dataset)
(std-scale dataset
column-filter
&
{:keys [use-mean? use-std? datatype]
:or {use-mean? true use-std? true}
:as op-args})
Scale columns to have 0 mean and 1 std deviation. Will fail if columns contain missing values.
Scale columns to have 0 mean and 1 std deviation. Will fail if columns contain missing values.
(string->number dataset)
(string->number dataset column-filter)
(string->number dataset
column-filter
table-value-list
&
{:keys [datatype] :as op-args})
Convert all string columns to numeric recording the lookup table in the column metadata.
Replace any string values with numeric values. Updates the label map of the options. Arguments may be notion or a vector of either expected strings or tuples of expected strings to their hardcoded values.
Convert all string columns to numeric recording the lookup table in the column metadata. Replace any string values with numeric values. Updates the label map of the options. Arguments may be notion or a vector of either expected strings or tuples of expected strings to their hardcoded values.
(update-column dataset column-filter column-fn)
Update a column via a function. Function takes a column and returns a either a column, an iterable, or a reader.
Update a column via a function. Function takes a column and returns a either a column, an iterable, or a reader.
(update-dataset-column dataset column-filter dataset-column-fn)
Update a column via a function. Function takes a dataset and a column and returns either a column, an iterable, or a reader.
Update a column via a function. Function takes a dataset and a column and returns either a column, an iterable, or a reader.
cljdoc is a website building & hosting documentation for Clojure/Script libraries
× close