A set of common 'pipeline' operations you probably will want to run on a dataset.
A set of common 'pipeline' operations you probably will want to run on a dataset.
(->datatype dataset)
(->datatype dataset column-filter)
(->datatype dataset column-filter datatype)
Marshall columns to be the etl datatype. This changes numeric columns to be a unified backing store datatype.
Marshall columns to be the etl datatype. This changes numeric columns to be a unified backing store datatype.
(assoc-metadata dataset column-filter att-name att-value)
Assoc a new value into the metadata.
Assoc a new value into the metadata.
(col & [column-name])
Return a column. Only works during 'm=' and the default column is the current operating column.
Return a column. Only works during 'm=' and the default column is the current operating column.
(correlation-table dataset & {:keys [colname-seq correlation-type]})
See dataset/correlation table. This version removes missing values and converts all columns to be numeric. So this will always work.
See dataset/correlation table. This version removes missing values and converts all columns to be numeric. So this will always work.
(filter dataset column-filter filter-fn)
Filter out indexes for which filter-fn produces a 0 or false value.
Filter out indexes for which filter-fn produces a 0 or false value.
(impute-missing dataset)
(impute-missing dataset column-filter)
(impute-missing dataset column-filter k)
Group columns into K groups and impute missing values from the means calculated from those groups.
Group columns into K groups and impute missing values from the means calculated from those groups.
(int-map table col-data & {:keys [not-strict?]})
Perform an integer->integer conversion of a column using a static map. The map must be complete; missing entries are errors.
Perform an integer->integer conversion of a column using a static map. The map must be complete; missing entries are errors.
(m= dataset column-filter operation)
Perform some math. Sets up variables such that the 'col' operator works.
Perform some math. Sets up variables such that the 'col' operator works.
(new-column dataset result-colname dataset-column-fn)
Create a new column. fn takes dataset and returns a reader, an iterable, or a new column.
Create a new column. fn takes dataset and returns a reader, an iterable, or a new column.
(one-hot dataset)
(one-hot dataset column-filter)
(one-hot dataset column-filter table-value-list & {:as op-args})
Replace string columns with one-hot encoded columns. table value list Argument can be nothing or a map containing keys representing the new derived column names and values representing which original values to encode to that particular column. The special keyword :rest indicates any remaining unencoded columns. example argument: {:main ["apple" "mandarin"] :other :rest}
Replace string columns with one-hot encoded columns. table value list Argument can be nothing or a map containing keys representing the new derived column names and values representing which original values to encode to that particular column. The special keyword :rest indicates any remaining unencoded columns. example argument: {:main ["apple" "mandarin"] :other :rest}
(range-scale dataset)
(range-scale dataset column-filter)
(range-scale dataset column-filter value-range & {:as op-args})
Range-scale a set of columns to be within either [-1 1] or the range provided by the first argument. Will fail if columns have missing values.
Range-scale a set of columns to be within either [-1 1] or the range provided by the first argument. Will fail if columns have missing values.
(remove-columns dataset column-filter)
Remove columns selected by column-filter
Remove columns selected by column-filter
(remove-missing dataset)
Remove any missing values from the dataset
Remove any missing values from the dataset
(replace dataset column-filter replace-value-or-fn & {:keys [result-datatype]})
Map a function across a column or set of columns. Map-fn may be a map?. Result column names are identical to src column names but metadata like a label map is removed. If map-setup-fn is provided, map-fn must be nil and map-setup-fn will be called with the dataset and column name to produce map-fn.
Map a function across a column or set of columns. Map-fn may be a map?. Result column names are identical to src column names but metadata like a label map is removed. If map-setup-fn is provided, map-fn must be nil and map-setup-fn will be called with the dataset and column name to produce map-fn.
(replace-missing dataset column-filter missing-value)
Replace all the missing values in the dataset. Can take a sclar missing value or a callable fn. If callable fn, the fn is passed the dataset and column-name
Replace all the missing values in the dataset. Can take a sclar missing value or a callable fn. If callable fn, the fn is passed the dataset and column-name
(std-scale dataset)
(std-scale
dataset
column-filter
&
{:keys [use-mean? use-std?] :or {use-mean? true use-std? true} :as op-args})
Scale columns to have 0 mean and 1 std deviation. Will fail if columns contain missing values.
Scale columns to have 0 mean and 1 std deviation. Will fail if columns contain missing values.
(string->number dataset)
(string->number dataset column-filter)
(string->number dataset column-filter table-value-list & {:as op-args})
Convert all string columns to numeric recording the lookup table in the column metadata.
Replace any string values with numeric values. Updates the label map of the options. Arguments may be notion or a vector of either expected strings or tuples of expected strings to their hardcoded values.
Convert all string columns to numeric recording the lookup table in the column metadata. Replace any string values with numeric values. Updates the label map of the options. Arguments may be notion or a vector of either expected strings or tuples of expected strings to their hardcoded values.
(update-column dataset column-filter column-fn)
Update a column via a function. Function takes a column and returns a either a column, an iterable, or a reader.
Update a column via a function. Function takes a column and returns a either a column, an iterable, or a reader.
(update-dataset-column dataset column-filter dataset-column-fn)
Update a column via a function. Function takes a dataset and a column and returns either a column, an iterable, or a reader.
Update a column via a function. Function takes a dataset and a column and returns either a column, an iterable, or a reader.
cljdoc is a website building & hosting documentation for Clojure/Script libraries
× close