Categorical feature encoding for machine learning pipelines.
This namespace provides metamorph transformers for handling categorical variables commonly used in supervised learning. Currently focuses on one-hot encoding, which converts categorical values into binary indicator columns.
One-hot encoding is essential for:
Main API:
transform-one-hot: The primary metamorph transformer for one-hot encodingEncoding strategies:
:full Uses a predefined level set from full dataset context:fit Levels discovered during :fit used in :transform:independent Each mode independently determines and encodes levelsCategorical feature encoding for machine learning pipelines. This namespace provides metamorph transformers for handling categorical variables commonly used in supervised learning. Currently focuses on one-hot encoding, which converts categorical values into binary indicator columns. One-hot encoding is essential for: - Preparing categorical features for algorithms that expect numeric inputs - Preventing ordinal assumptions on nominal categories - Creating interpretable model features Main API: - `transform-one-hot`: The primary metamorph transformer for one-hot encoding Encoding strategies: - `:full` Uses a predefined level set from full dataset context - `:fit` Levels discovered during :fit used in :transform - `:independent` Each mode independently determines and encodes levels
(transform-one-hot column-selector strategy)(transform-one-hot column-selector strategy options)Metamorph transformer that maps categorical variables to one-hot encoded columns.
Each unique value of the categorical column becomes its own binary column in the one-hot encoding.
column-selector - Tablecloth column selector (keyword, fn, or selector spec)
strategy - Strategy for handling train/test level differences:
* :full - Levels retrieved from dataset at :metamorph.ml/full-ds in context
* :independent - One-hot columns fitted and transformed independently
* :fit - Mapping from :fit mode used in :transform (assumes all levels present in fit)
options - Optional map with:
* :table-args - Precise mapping as sequence of [val idx] pairs or sorted values
* :result-datatype - Datatype of the one-hot-mapping columns
Returns a metamorph step function that transforms the data in both :fit and :transform modes.
| metamorph | . |
|---|---|
| Behaviour in mode :fit | Fits one-hot encoding and applies it to :metamorph/data |
| Behaviour in mode :transform | Applies fitted encoding to :metamorph/data |
| Reads keys from ctx | In :transform: reads fitted encoding from :metamorph/id |
| Writes keys to ctx | In :fit: stores fitted encoding in :metamorph/id |
See also: tech.v3.dataset.categorical/fit-one-hot, tech.v3.dataset/categorical->one-hot
Metamorph transformer that maps categorical variables to one-hot encoded columns.
Each unique value of the categorical column becomes its own binary column in
the one-hot encoding.
`column-selector` - Tablecloth column selector (keyword, fn, or selector spec)
`strategy` - Strategy for handling train/test level differences:
* `:full` - Levels retrieved from dataset at `:metamorph.ml/full-ds` in context
* `:independent` - One-hot columns fitted and transformed independently
* `:fit` - Mapping from :fit mode used in :transform (assumes all levels present in fit)
`options` - Optional map with:
* `:table-args` - Precise mapping as sequence of [val idx] pairs or sorted values
* `:result-datatype` - Datatype of the one-hot-mapping columns
Returns a metamorph step function that transforms the data in both :fit and
:transform modes.
metamorph | .
-------------------------------------|----------------------------------------------------------------------------
Behaviour in mode :fit | Fits one-hot encoding and applies it to `:metamorph/data`
Behaviour in mode :transform | Applies fitted encoding to `:metamorph/data`
Reads keys from ctx | In `:transform`: reads fitted encoding from `:metamorph/id`
Writes keys to ctx | In `:fit`: stores fitted encoding in `:metamorph/id`
See also: `tech.v3.dataset.categorical/fit-one-hot`, `tech.v3.dataset/categorical->one-hot`cljdoc builds & hosts documentation for Clojure/Script libraries
| Ctrl+k | Jump to recent docs |
| ← | Move to previous article |
| → | Move to next article |
| Ctrl+/ | Jump to the search field |