This namespace contains functions, which operate on a metamorph context. They all return the context as well.
So all functions in this namespace are metamorph compliant and can be placed in a metamorph pipeline.
Most functions here are only manipulating the dataset, which is in the ctx map under the key :metamorph/data. And they behave the same in pipeline mode :fit and :transform.
A few functions manipulate other keys inside the ctx map, and/or behave different in :fit and :transform.
This is documented per function in this form:
metamorph | . |
---|---|
Behaviour in mode :fit | . |
Behaviour in mode :transform | . |
Reads keys from ctx | . |
Writes keys to ctx | . |
The namespaces scicloj.ml.metamorph and scicloj.ml.dataset contain functions with the same name. But they operate on either a context map (ns metamorph) or on a dataset (ns dataset)
The functions in this namesspaces are re-exported from :
This namespace contains functions, which operate on a metamorph context. They all return the context as well. So all functions in this namespace are metamorph compliant and can be placed in a metamorph pipeline. Most functions here are only manipulating the dataset, which is in the ctx map under the key :metamorph/data. And they behave the same in pipeline mode :fit and :transform. A few functions manipulate other keys inside the ctx map, and/or behave different in :fit and :transform. This is documented per function in this form: metamorph | . -------------------------------------|------------------------------ Behaviour in mode :fit | . Behaviour in mode :transform | . Reads keys from ctx | . Writes keys to ctx | . The namespaces scicloj.ml.metamorph and scicloj.ml.dataset contain functions with the same name. But they operate on either a context map (ns metamorph) or on a dataset (ns dataset) The functions in this namesspaces are re-exported from : * tablecloth.pipeline * tech.v3.libs.smile.metamorph * scicloj.metamorph.ml * tech.v3.dataset.metamorph
(->array colname)
(->array colname datatype)
Convert numerical column(s) to java array
Convert numerical column(s) to java array
(add-column column-name column)
(add-column column-name column size-strategy)
Add or update (modify) column under column-name
.
column
can be sequence of values or generator function (which gets ds
as input).
Add or update (modify) column under `column-name`. `column` can be sequence of values or generator function (which gets `ds` as input).
(add-columns columns-map)
(add-columns columns-map size-strategy)
Add or updade (modify) columns defined in columns-map
(mapping: name -> column)
Add or updade (modify) columns defined in `columns-map` (mapping: name -> column)
(add-or-replace-column column-name column)
(add-or-replace-column column-name column size-strategy)
(add-or-replace-columns columns-map)
(add-or-replace-columns columns-map size-strategy)
(add-or-update-column column)
(add-or-update-column colname column)
(aggregate aggregator)
(aggregate aggregator options)
Aggregate dataset by providing:
Aggregation functions can return:
Aggregate dataset by providing: - aggregation function - map with column names and functions - sequence of aggregation functions Aggregation functions can return: - single value - seq of values - map of values with column names
(aggregate-columns columns-selector column-aggregators)
(aggregate-columns columns-selector column-aggregators options)
Aggregates each column separately
Aggregates each column separately
(anti-join ds-right columns-selector)
(anti-join ds-right columns-selector options)
(append & datasets)
(append-columns column-seq)
(asof-join ds-right colname)
(asof-join ds-right colname options)
(assoc-ds cname cdata & args)
(assoc-metadata filter-fn-or-ds k v & args)
(bind & datasets)
(bow->something-sparse bow-col indices-col bow->sparse-fn options)
Converts a bag-of-word column bow-col
to a sparse data column indices-col
.
The exact transformation to the sparse representtaion is given by bow->sparse-fn
metamorph | . |
---|---|
Behaviour in mode :fit | normal |
Behaviour in mode :transform | normal |
Reads keys from ctx | none |
Writes keys to ctx | :scicloj.ml.smile.metamorph/bow->sparse-vocabulary |
Converts a bag-of-word column `bow-col` to a sparse data column `indices-col`. The exact transformation to the sparse representtaion is given by `bow->sparse-fn` metamorph |. -------------------------------------|--------- Behaviour in mode :fit |normal Behaviour in mode :transform |normal Reads keys from ctx |none Writes keys to ctx |:scicloj.ml.smile.metamorph/bow->sparse-vocabulary
(bow->sparse-array bow-col indices-col)
(bow->sparse-array bow-col indices-col options)
Converts a bag-of-word column bow-col
to sparse indices column indices-col
,
as needed by the Maxent model.
vocab size
is the size of vocabluary used, sorted by token frequency
metamorph | . |
---|---|
Behaviour in mode :fit | normal |
Behaviour in mode :transform | normal |
Reads keys from ctx | none |
Writes keys to ctx | :scicloj.ml.smile.metamorph/count-vectorize-vocabulary |
Converts a bag-of-word column `bow-col` to sparse indices column `indices-col`, as needed by the Maxent model. `vocab size` is the size of vocabluary used, sorted by token frequency metamorph |. -------------------------------------|--------- Behaviour in mode :fit |normal Behaviour in mode :transform |normal Reads keys from ctx |none Writes keys to ctx |:scicloj.ml.smile.metamorph/count-vectorize-vocabulary
(bow->SparseArray bow-col indices-col)
(bow->SparseArray bow-col indices-col options)
Converts a bag-of-word column bow-col
to sparse indices column indices-col
,
as needed by the discrete naive bayes model. vocab size
is the size of vocabluary used, sorted by token frequency
metamorph | . |
---|---|
Behaviour in mode :fit | normal |
Behaviour in mode :transform | normal |
Reads keys from ctx | none |
Writes keys to ctx | :scicloj.ml.smile.metamorph/count-vectorize-vocabulary |
Converts a bag-of-word column `bow-col` to sparse indices column `indices-col`, as needed by the discrete naive bayes model. `vocab size` is the size of vocabluary used, sorted by token frequency metamorph |. -------------------------------------|--------- Behaviour in mode :fit |normal Behaviour in mode :transform |normal Reads keys from ctx |none Writes keys to ctx |:scicloj.ml.smile.metamorph/count-vectorize-vocabulary
(bow->tfidf bow-column tfidf-column)
Calculates the tfidf score from bag-of-words (as token frequency maps)
in column bow-column
and stores them in a new column tfid-column
as maps of token->tfidf-score.
metamorph | . |
---|---|
Behaviour in mode :fit | normal |
Behaviour in mode :transform | normal |
Reads keys from ctx | none |
Writes keys to ctx | none |
Calculates the tfidf score from bag-of-words (as token frequency maps) in column `bow-column` and stores them in a new column `tfid-column` as maps of token->tfidf-score. metamorph |. -------------------------------------|--------- Behaviour in mode :fit |normal Behaviour in mode :transform |normal Reads keys from ctx |none Writes keys to ctx |none
(brief)
(brief options)
(by-rank columns-selector rank-predicate)
(by-rank columns-selector rank-predicate options)
Select rows using rank
on a column, ties are resolved using :dense
method.
See R docs. Rank uses 0 based indexing.
Possible :ties
strategies: :average
, :first
, :last
, :random
, :min
, :max
, :dense
.
:dense
is the same as in data.table::frank
from R
:desc?
set to true (default) order descending before calculating rank
Select rows using `rank` on a column, ties are resolved using `:dense` method. See [R docs](https://www.rdocumentation.org/packages/base/versions/3.6.1/topics/rank). Rank uses 0 based indexing. Possible `:ties` strategies: `:average`, `:first`, `:last`, `:random`, `:min`, `:max`, `:dense`. `:dense` is the same as in `data.table::frank` from R `:desc?` set to true (default) order descending before calculating rank
(categorical->number filter-fn-or-ds)
(categorical->number filter-fn-or-ds table-args)
(categorical->number filter-fn-or-ds table-args result-datatype)
(categorical->one-hot filter-fn-or-ds)
(categorical->one-hot filter-fn-or-ds table-args)
(categorical->one-hot filter-fn-or-ds table-args result-datatype)
(clone)
Clone an object. Can clone anything convertible to a reader.
Clone an object. Can clone anything convertible to a reader.
(column colname)
(column->dataset colname transform-fn)
(column->dataset colname transform-fn options)
(column-cast colname datatype)
(column-count)
(column-labeled-mapseq value-colname-seq)
(column-map result-colname map-fn)
(column-map result-colname map-fn filter-fn-or-ds)
(column-map result-colname map-fn res-dtype filter-fn-or-ds)
(column-names)
(column-names columns-selector)
(column-names columns-selector meta-field)
(column-values->categorical src-column)
(columns)
(columns result-type)
(columns-with-missing-seq)
(columnwise-concat colnames)
(columnwise-concat colnames options)
(concat & datasets)
(concat-copying & datasets)
(concat-inplace & datasets)
(convert-types coltype-map-or-columns-selector)
(convert-types columns-selector new-types)
Convert type of the column to the other type.
Convert type of the column to the other type.
(count-vectorize text-col bow-col)
(count-vectorize text-col bow-col options)
Transforms the text column text-col
into a map of token frequencies in column
bow-col
metamorph | . |
---|---|
Behaviour in mode :fit | normal |
Behaviour in mode :transform | normal |
Reads keys from ctx | none |
Writes keys to ctx | none |
Transforms the text column `text-col` into a map of token frequencies in column `bow-col` metamorph |. -------------------------------------|--------- Behaviour in mode :fit |normal Behaviour in mode :transform |normal Reads keys from ctx |none Writes keys to ctx |none
(data->dataset)
(dataset->categorical-xforms)
(dataset->data)
(dataset->str)
(dataset->str options)
Convert a dataset to a string. Prints a single line header and then calls dataset-data->str.
For options documentation see dataset-data->str.
Convert a dataset to a string. Prints a single line header and then calls dataset-data->str. For options documentation see dataset-data->str.
(dataset-name)
(descriptive-stats)
(descriptive-stats options)
(difference ds-right)
(difference ds-right options)
(drop columns-selector rows-selector)
Drop columns and rows.
Drop columns and rows.
(drop-columns)
(drop-columns columns-selector)
(drop-columns columns-selector meta-field)
Drop columns by (returns dataset):
Drop columns by (returns dataset): - name - sequence of names - map of names with new names (rename) - function which filter names (via column metadata)
(drop-missing)
(drop-missing columns-selector)
Drop rows with missing values
columns-selector
selects columns to look at missing values
Drop rows with missing values `columns-selector` selects columns to look at missing values
(drop-rows)
(drop-rows rows-selector)
(drop-rows rows-selector options)
Drop rows using:
Drop rows using: - row id - seq of row ids - seq of true/false - fn with predicate
(empty-ds?)
(ensure-array-backed)
(ensure-array-backed options)
(feature-ecount)
(fill-range-replace colname max-span)
(fill-range-replace colname max-span missing-strategy)
(fill-range-replace colname max-span missing-strategy missing-value)
(filter predicate)
(filter-column colname predicate)
(filter-dataset filter-fn-or-ds)
(first)
(fold-by columns-selector)
(fold-by columns-selector folding-function)
(full-join ds-right columns-selector)
(full-join ds-right columns-selector options)
(group-by grouping-selector)
(group-by grouping-selector options)
Group dataset by:
Options are:
select-keys
seq.:as-dataset
, default) or as map of datasets (:as-map
) or as map of row indexes (:as-indexes
) or as sequence of (sub)datasetsdataset
fnWhen dataset is returned, meta contains :grouped?
set to true. Columns in dataset:
Group dataset by: - column name - list of columns - map of keys and row indexes - function getting map of values Options are: - select-keys - when grouping is done by function, you can limit fields to a `select-keys` seq. - result-type - return results as dataset (`:as-dataset`, default) or as map of datasets (`:as-map`) or as map of row indexes (`:as-indexes`) or as sequence of (sub)datasets - other parameters which are passed to `dataset` fn When dataset is returned, meta contains `:grouped?` set to true. Columns in dataset: - name - group name - group-id - id of the group (int) - data - group as dataset
(group-by->indexes key-fn)
(group-by-column colname)
(group-by-column->indexes colname)
(grouped?)
Is dataset
represents grouped dataset (result of group-by
)?
Is `dataset` represents grouped dataset (result of `group-by`)?
(groups->map)
Convert grouped dataset to the map of groups
Convert grouped dataset to the map of groups
(groups->seq)
(has-column? column-name)
(head)
(head n)
(inference-column?)
(inference-target-column-names)
(inference-target-ds)
(inference-target-label-inverse-map & [label-columns])
(inference-target-label-map & [label-columns])
(info)
(info result-type)
(inner-join ds-right columns-selector)
(inner-join ds-right columns-selector options)
(intersect ds-right)
(intersect ds-right options)
(join-columns target-column columns-selector)
(join-columns target-column columns-selector options)
(labels)
(last)
(left-join ds-right columns-selector)
(left-join ds-right columns-selector options)
(map-columns column-name map-fn)
(map-columns column-name columns-selector map-fn)
(map-columns column-name new-type columns-selector map-fn)
(mapseq-reader)
(missing)
(model options)
Executes a machine learning model in train/predict (depending on :mode)
from the metamorph.ml
model registry.
The model is passed between both invocation via the shared context ctx in a
key (a step indentifier) which is passed in key :metamorph/id
and guarantied to be unique for each
pipeline step.
The function writes and reads into this common context key.
Options:
:model-type
- Keyword for the model to useFurther options get passed to train
functions and are model specific.
See here for an overview for the models build into scicloj.ml:
https://scicloj.github.io/scicloj.ml/userguide-models.html
Other libraries might contribute other models, which are documented as part of the library.
metamorph | . |
---|---|
Behaviour in mode :fit | Calls scicloj.metamorph.ml/train using data in :metamorph/data and options and stores trained model in ctx under key in :metamorph/id |
Behaviour in mode :transform | Reads trained model from ctx and calls scicloj.metamorph.ml/predict with the model in $id and data in :metamorph/data |
Reads keys from ctx | In mode :transform : Reads trained model to use for prediction from key in :metamorph/id . |
Writes keys to ctx | In mode :fit : Stores trained model in key $id and writes feature-ds and target-ds before prediction into ctx at :scicloj.metamorph.ml/feature-ds /:scicloj.metamorph.ml/target-ds |
See as well:
scicloj.metamorph.ml/train
scicloj.metamorph.ml/predict
Executes a machine learning model in train/predict (depending on :mode) from the `metamorph.ml` model registry. The model is passed between both invocation via the shared context ctx in a key (a step indentifier) which is passed in key `:metamorph/id` and guarantied to be unique for each pipeline step. The function writes and reads into this common context key. Options: - `:model-type` - Keyword for the model to use Further options get passed to `train` functions and are model specific. See here for an overview for the models build into scicloj.ml: https://scicloj.github.io/scicloj.ml/userguide-models.html Other libraries might contribute other models, which are documented as part of the library. metamorph | . -------------------------------------|---------------------------------------------------------------------------- Behaviour in mode :fit | Calls `scicloj.metamorph.ml/train` using data in `:metamorph/data` and `options`and stores trained model in ctx under key in `:metamorph/id` Behaviour in mode :transform | Reads trained model from ctx and calls `scicloj.metamorph.ml/predict` with the model in $id and data in `:metamorph/data` Reads keys from ctx | In mode `:transform` : Reads trained model to use for prediction from key in `:metamorph/id`. Writes keys to ctx | In mode `:fit` : Stores trained model in key $id and writes feature-ds and target-ds before prediction into ctx at `:scicloj.metamorph.ml/feature-ds` /`:scicloj.metamorph.ml/target-ds` See as well: * `scicloj.metamorph.ml/train` * `scicloj.metamorph.ml/predict`
(model-type & [column-name-seq])
(new-column data)
(new-column data metadata)
(new-column data metadata missing)
(new-dataset)
(new-dataset column-seq)
(new-dataset ds-metadata column-seq)
(num-inference-classes)
(order-by columns-or-fn)
(order-by columns-or-fn comparators)
(order-by columns-or-fn comparators options)
Order dataset by:
Order dataset by: - column name - columns (as sequence of names) - key-fn - sequence of columns / key-fn Additionally you can ask the order by: - :asc - :desc - custom comparator function
(order-column-names colname-seq)
(pivot->longer)
(pivot->longer columns-selector)
(pivot->longer columns-selector options)
tidyr
pivot_longer api
`tidyr` pivot_longer api
(pivot->wider columns-selector value-columns)
(pivot->wider columns-selector value-columns options)
(print-dataset)
(print-dataset options)
(probability-distributions->label-column dst-colname)
(process-group-data f)
(process-group-data f parallel?)
(rand-nth)
(rand-nth options)
(random)
(random n)
(random n options)
(read-nippy)
(remove-column col-name)
(remove-columns colname-seq-or-fn)
(remove-rows row-indexes)
(rename-columns columns-mapping)
(rename-columns columns-selector columns-map-fn)
Rename columns with provided old -> new name map
Rename columns with provided old -> new name map
(reorder-columns columns-selector & columns-selectors)
Reorder columns using column selector(s). When column names are incomplete, the missing will be attached at the end.
Reorder columns using column selector(s). When column names are incomplete, the missing will be attached at the end.
(replace-missing)
(replace-missing strategy)
(replace-missing columns-selector strategy)
(replace-missing columns-selector strategy value)
(replace-missing-value scalar-value)
(replace-missing-value filter-fn-or-ds scalar-value)
(right-join ds-right columns-selector)
(right-join ds-right columns-selector options)
(row-count)
(rows)
(rows result-type)
(sample)
(sample n)
(sample n options)
(select columns-selector rows-selector)
Select columns and rows.
Select columns and rows.
(select-by-index col-index row-index)
(select-columns)
(select-columns columns-selector)
(select-columns columns-selector meta-field)
Select columns by (returns dataset):
Select columns by (returns dataset): - name - sequence of names - map of names with new names (rename) - function which filter names (via column metadata)
(select-columns-by-index col-index)
(select-missing)
(select-missing columns-selector)
Select rows with missing values
columns-selector
selects columns to look at missing values
Select rows with missing values `columns-selector` selects columns to look at missing values
(select-rows)
(select-rows rows-selector)
(select-rows rows-selector options)
Select rows using:
Select rows using: - row id - seq of row ids - seq of true/false - fn with predicate
(select-rows-by-index row-index)
(semi-join ds-right columns-selector)
(semi-join ds-right columns-selector options)
(separate-column column separator)
(separate-column column target-columns separator)
(separate-column column target-columns separator options)
(set-dataset-name ds-name)
(set-inference-target target-name-or-target-name-seq)
(shape)
Returns shape of the dataset [rows, cols]
Returns shape of the dataset [rows, cols]
(shuffle)
(shuffle options)
(sort-by key-fn)
(sort-by key-fn compare-fn)
(sort-by-column colname)
(sort-by-column colname compare-fn)
(tail)
(tail n)
(take-nth n-val)
(ungroup)
(ungroup options)
Concat groups into dataset.
When add-group-as-column
or add-group-id-as-column
is set to true
or name(s), columns with group name(s) or group id is added to the result.
Before joining the groups groups can be sorted by group name.
Concat groups into dataset. When `add-group-as-column` or `add-group-id-as-column` is set to `true` or name(s), columns with group name(s) or group id is added to the result. Before joining the groups groups can be sorted by group name.
(union & datasets)
(unique-by)
(unique-by columns-selector)
(unique-by columns-selector options)
(unique-by-column colname)
(unique-by-column options colname)
(unordered-select colname-seq index-seq)
(unroll columns-selector)
(unroll columns-selector options)
(unroll-column column-name)
(unroll-column column-name options)
(update filter-fn-or-ds update-fn & args)
(update-column col-name update-fn)
(update-columns columns-map)
(update-columns columns-selector update-functions)
(update-columnwise filter-fn-or-ds cwise-update-fn & args)
(update-elemwise map-fn)
(update-elemwise filter-fn-or-ds map-fn)
(value-reader)
(write! output-path)
(write! output-path options)
Write a dataset out to a file. Supported forms are:
(ds/write! test-ds "test.csv")
(ds/write! test-ds "test.tsv")
(ds/write! test-ds "test.tsv.gz")
(ds/write! test-ds "test.nippy")
(ds/write! test-ds out-stream)
Options:
:max-chars-per-column
- csv,tsv specific, defaults to 65536 - values longer than this will
cause an exception during serialization.:max-num-columns
- csv,tsv specific, defaults to 8192 - If the dataset has more than this number of
columns an exception will be thrown during serialization.:quoted-columns
- csv specific - sequence of columns names that you would like to always have quoted.:file-type
- Manually specify the file type. This is usually inferred from the filename but if you
pass in an output stream then you will need to specify the file type.:headers?
- if csv headers are written, defaults to true.Write a dataset out to a file. Supported forms are: ```clojure (ds/write! test-ds "test.csv") (ds/write! test-ds "test.tsv") (ds/write! test-ds "test.tsv.gz") (ds/write! test-ds "test.nippy") (ds/write! test-ds out-stream) ``` Options: * `:max-chars-per-column` - csv,tsv specific, defaults to 65536 - values longer than this will cause an exception during serialization. * `:max-num-columns` - csv,tsv specific, defaults to 8192 - If the dataset has more than this number of columns an exception will be thrown during serialization. * `:quoted-columns` - csv specific - sequence of columns names that you would like to always have quoted. * `:file-type` - Manually specify the file type. This is usually inferred from the filename but if you pass in an output stream then you will need to specify the file type. * `:headers?` - if csv headers are written, defaults to true.
(write-nippy! filename)
cljdoc is a website building & hosting documentation for Clojure/Script libraries
× close