Liking cljdoc? Tell your friends :D
Clojure only.

scicloj.metamorph.ml.categorical

Categorical feature encoding for machine learning pipelines.

This namespace provides metamorph transformers for handling categorical variables commonly used in supervised learning. Currently focuses on one-hot encoding, which converts categorical values into binary indicator columns.

One-hot encoding is essential for:

  • Preparing categorical features for algorithms that expect numeric inputs
  • Preventing ordinal assumptions on nominal categories
  • Creating interpretable model features

Main API:

  • transform-one-hot: The primary metamorph transformer for one-hot encoding

Encoding strategies:

  • :full Uses a predefined level set from full dataset context
  • :fit Levels discovered during :fit used in :transform
  • :independent Each mode independently determines and encodes levels
Categorical feature encoding for machine learning pipelines.

This namespace provides metamorph transformers for handling categorical
variables commonly used in supervised learning. Currently focuses on
one-hot encoding, which converts categorical values into binary indicator columns.

One-hot encoding is essential for:
- Preparing categorical features for algorithms that expect numeric inputs
- Preventing ordinal assumptions on nominal categories
- Creating interpretable model features

Main API:
- `transform-one-hot`: The primary metamorph transformer for one-hot encoding

Encoding strategies:
- `:full`        Uses a predefined level set from full dataset context
- `:fit`         Levels discovered during :fit used in :transform
- `:independent` Each mode independently determines and encodes levels

raw docstring

transform-one-hotclj

(transform-one-hot column-selector strategy)
(transform-one-hot column-selector strategy options)

Metamorph transformer that maps categorical variables to one-hot encoded columns.

Each unique value of the categorical column becomes its own binary column in the one-hot encoding.

column-selector - Tablecloth column selector (keyword, fn, or selector spec) strategy - Strategy for handling train/test level differences: * :full - Levels retrieved from dataset at :metamorph.ml/full-ds in context * :independent - One-hot columns fitted and transformed independently * :fit - Mapping from :fit mode used in :transform (assumes all levels present in fit) options - Optional map with: * :table-args - Precise mapping as sequence of [val idx] pairs or sorted values * :result-datatype - Datatype of the one-hot-mapping columns

Returns a metamorph step function that transforms the data in both :fit and :transform modes.

metamorph.
Behaviour in mode :fitFits one-hot encoding and applies it to :metamorph/data
Behaviour in mode :transformApplies fitted encoding to :metamorph/data
Reads keys from ctxIn :transform: reads fitted encoding from :metamorph/id
Writes keys to ctxIn :fit: stores fitted encoding in :metamorph/id

See also: tech.v3.dataset.categorical/fit-one-hot, tech.v3.dataset/categorical->one-hot

Metamorph transformer that maps categorical variables to one-hot encoded columns.

Each unique value of the categorical column becomes its own binary column in
the one-hot encoding.

`column-selector` - Tablecloth column selector (keyword, fn, or selector spec)
`strategy` - Strategy for handling train/test level differences:
             * `:full` - Levels retrieved from dataset at `:metamorph.ml/full-ds` in context
             * `:independent` - One-hot columns fitted and transformed independently
             * `:fit` - Mapping from :fit mode used in :transform (assumes all levels present in fit)
`options` - Optional map with:
            * `:table-args` - Precise mapping as sequence of [val idx] pairs or sorted values
            * `:result-datatype` - Datatype of the one-hot-mapping columns

Returns a metamorph step function that transforms the data in both :fit and
:transform modes.

metamorph                            | .
-------------------------------------|----------------------------------------------------------------------------
Behaviour in mode :fit               | Fits one-hot encoding and applies it to `:metamorph/data`
Behaviour in mode :transform         | Applies fitted encoding to `:metamorph/data`
Reads keys from ctx                  | In `:transform`: reads fitted encoding from `:metamorph/id`
Writes keys to ctx                   | In `:fit`: stores fitted encoding in `:metamorph/id`

See also: `tech.v3.dataset.categorical/fit-one-hot`, `tech.v3.dataset/categorical->one-hot`
sourceraw docstring

cljdoc builds & hosts documentation for Clojure/Script libraries

Keyboard shortcuts
Ctrl+kJump to recent docs
Move to previous article
Move to next article
Ctrl+/Jump to the search field
× close