Core machine learning framework integrating metamorph pipelines with standardized model APIs.
This is the central namespace of metamorph.ml, providing infrastructure for:
Key Concepts:
Model Registration: Models are registered using define-model! and can be
referenced by keyword (e.g., :fastmath/ols, :metamorph.ml/dummy-classifier).
Models define a train-fn, predict-fn, and optional diagnostic functions.
Training and Prediction:
train: Train a model on a dataset given options including :model-typepredict: Make predictions using a trained modeltrain-predict-cache: Optional cache to avoid redundant computationsPipeline Evaluation:
evaluate-pipelines: Evaluate multiple pipelines across train/test splitsevaluate-one-pipeline: Evaluate a single pipeline with cross-validationModel Diagnostics (following tidymodels conventions):
glance: One-row model summary (goodness-of-fit)tidy: One-row-per-component output (coefficients with statistics)augment: One-row-per-observation output (predictions, residuals)Main API Functions:
define-model!: Register a new model type with train/predict/diagnostic functionstrain: Train a model with a specified model-typepredict: Generate predictions from a trained modelevaluate-pipelines: Evaluate pipelines with cross-validationglance: Get model summary statisticstidy: Extract coefficient-level resultsaugment: Add predictions and residuals to dataPipeline Integration:
Models integrate with metamorph pipelines via the model step, which:
Built-in Models:
Regression:
:metamorph.ml/ols: Apache Commons Math OLS:fastmath/ols: FastMath OLS:fastmath/glm: FastMath GLM:metamorph.ml/dummy-regressor: Mean baselineClassification:
:metamorph.ml/dummy-classifier: Majority class or random baseline:metamorph.ml/random-forest: Random forest classifierPreprocessing:
See specific namespaces for transformers:
scicloj.metamorph.ml.preprocessing: Scaling and normalizationscicloj.metamorph.ml.categorical: One-hot encodingscicloj.metamorph.ml.r-model-matrix: R formula featuresSee also: scicloj.metamorph.core for metamorph pipeline mechanics,
scicloj.metamorph.ml.tidy-models for diagnostic validation
Core machine learning framework integrating metamorph pipelines with standardized model APIs.
This is the central namespace of metamorph.ml, providing infrastructure for:
- Registering and using machine learning models
- Training models and making predictions
- Evaluating pipelines via cross-validation
- Standardized model diagnostics (glance, tidy, augment)
- Optional caching of computationally expensive operations
Key Concepts:
**Model Registration**: Models are registered using `define-model!` and can be
referenced by keyword (e.g., `:fastmath/ols`, `:metamorph.ml/dummy-classifier`).
Models define a train-fn, predict-fn, and optional diagnostic functions.
**Training and Prediction**:
- `train`: Train a model on a dataset given options including :model-type
- `predict`: Make predictions using a trained model
- `train-predict-cache`: Optional cache to avoid redundant computations
**Pipeline Evaluation**:
- `evaluate-pipelines`: Evaluate multiple pipelines across train/test splits
- `evaluate-one-pipeline`: Evaluate a single pipeline with cross-validation
- Returns results sorted by metric performance with optional filtering
- Supports parallel evaluation (:map/:pmap/:ppmap)
**Model Diagnostics** (following tidymodels conventions):
- `glance`: One-row model summary (goodness-of-fit)
- `tidy`: One-row-per-component output (coefficients with statistics)
- `augment`: One-row-per-observation output (predictions, residuals)
Main API Functions:
- `define-model!`: Register a new model type with train/predict/diagnostic functions
- `train`: Train a model with a specified model-type
- `predict`: Generate predictions from a trained model
- `evaluate-pipelines`: Evaluate pipelines with cross-validation
- `glance`: Get model summary statistics
- `tidy`: Extract coefficient-level results
- `augment`: Add predictions and residuals to data
Pipeline Integration:
Models integrate with metamorph pipelines via the `model` step, which:
- Trains in :fit mode using training data
- Predicts in :transform mode on new data
- Stores model output column metadata for later evaluation
Built-in Models:
**Regression**:
- `:metamorph.ml/ols`: Apache Commons Math OLS
- `:fastmath/ols`: FastMath OLS
- `:fastmath/glm`: FastMath GLM
- `:metamorph.ml/dummy-regressor`: Mean baseline
**Classification**:
- `:metamorph.ml/dummy-classifier`: Majority class or random baseline
- `:metamorph.ml/random-forest`: Random forest classifier
**Preprocessing**:
See specific namespaces for transformers:
- `scicloj.metamorph.ml.preprocessing`: Scaling and normalization
- `scicloj.metamorph.ml.categorical`: One-hot encoding
- `scicloj.metamorph.ml.r-model-matrix`: R formula features
See also: `scicloj.metamorph.core` for metamorph pipeline mechanics,
`scicloj.metamorph.ml.tidy-models` for diagnostic validationCaching infrastructure for metamorph.ml train/predict operations.
This namespace provides flexible caching backends to store and retrieve results of machine learning training and prediction operations. This is useful for avoiding redundant computations when working with the same models and data.
Supported cache backends:
Usage:
(enable-atom-cache! (atom {})) ; Enable in-memory caching
;; or
(enable-disk-cache! "/tmp/ml-cache") ; Enable disk-based caching
;; or
(enable-redis-cache! {...}) ; Enable Redis caching
To disable caching:
(disable-cache!)
See individual function docs for more details on each backend.
Caching infrastructure for metamorph.ml train/predict operations.
This namespace provides flexible caching backends to store and retrieve results
of machine learning training and prediction operations. This is useful for
avoiding redundant computations when working with the same models and data.
Supported cache backends:
- **Atom cache**: In-memory caching using a Clojure atom (fast, ephemeral)
- **Disk cache**: File-based caching using Nippy serialization (persistent)
- **Redis cache**: Distributed caching via Redis (requires carmine library)
Usage:
```
(enable-atom-cache! (atom {})) ; Enable in-memory caching
;; or
(enable-disk-cache! "/tmp/ml-cache") ; Enable disk-based caching
;; or
(enable-redis-cache! {...}) ; Enable Redis caching
```
To disable caching:
```
(disable-cache!)
```
See individual function docs for more details on each backend.Categorical feature encoding for machine learning pipelines.
This namespace provides metamorph transformers for handling categorical variables commonly used in supervised learning. Currently focuses on one-hot encoding, which converts categorical values into binary indicator columns.
One-hot encoding is essential for:
Main API:
transform-one-hot: The primary metamorph transformer for one-hot encodingEncoding strategies:
:full Uses a predefined level set from full dataset context:fit Levels discovered during :fit used in :transform:independent Each mode independently determines and encodes levelsCategorical feature encoding for machine learning pipelines. This namespace provides metamorph transformers for handling categorical variables commonly used in supervised learning. Currently focuses on one-hot encoding, which converts categorical values into binary indicator columns. One-hot encoding is essential for: - Preparing categorical features for algorithms that expect numeric inputs - Preventing ordinal assumptions on nominal categories - Creating interpretable model features Main API: - `transform-one-hot`: The primary metamorph transformer for one-hot encoding Encoding strategies: - `:full` Uses a predefined level set from full dataset context - `:fit` Levels discovered during :fit used in :transform - `:independent` Each mode independently determines and encodes levels
Classification models and evaluation metrics for metamorph.ml.
This namespace provides tools for classification tasks including:
Key features:
confusion-map: Creates confusion matrices from predictions and true labelsconfusion-map->ds: Converts confusion matrices to tabular dataset format:metamorph.ml/dummy-classifier: A baseline classifier for sanity checksDummy Classifier Strategies:
:majority-class (default): Always predicts the most frequent class:fixed-class: Predicts a specified class:random-class: Predicts randomly from the observed classesConfusion Matrix Normalization:
:all (default): Row-wise normalization (recall perspective):none: Raw countsClassification models and evaluation metrics for metamorph.ml. This namespace provides tools for classification tasks including: - Confusion matrix generation and analysis - Baseline classifier implementations - Classification evaluation utilities Key features: - `confusion-map`: Creates confusion matrices from predictions and true labels - `confusion-map->ds`: Converts confusion matrices to tabular dataset format - `:metamorph.ml/dummy-classifier`: A baseline classifier for sanity checks Dummy Classifier Strategies: - `:majority-class` (default): Always predicts the most frequent class - `:fixed-class`: Predicts a specified class - `:random-class`: Predicts randomly from the observed classes Confusion Matrix Normalization: - `:all` (default): Row-wise normalization (recall perspective) - `:none`: Raw counts See also: [[scicloj.metamorph.ml.viz/confusion-matrix]]
Model evaluation metrics for classification and regression tasks.
This namespace provides functions to compute standard machine learning metrics on model predictions vs. ground truth labels, with support for both binary and multiclass classification as well as regression tasks.
Key Functions:
classification-metric: Evaluate classification model predictionsregression-metric: Evaluate regression model predictionsClassification Metrics (from fastmath.stats):
Supports binary and multiclass metrics including accuracy, precision, recall, F1-score, and more. Multiclass metrics can be averaged using:
:macro - Unweighted mean of per-class metrics:micro - Aggregated true/false positives globally
Also supports :roc-auc for multiclass AUC scoring.Regression Metrics (from fastmath.stats): Distance and similarity metrics such as MAE, MSE, RMSE, R², etc.
Data Format:
Validation: The functions perform extensive validation including:
See also: fastmath.stats documentation for available metric names
Model evaluation metrics for classification and regression tasks. This namespace provides functions to compute standard machine learning metrics on model predictions vs. ground truth labels, with support for both binary and multiclass classification as well as regression tasks. Key Functions: - `classification-metric`: Evaluate classification model predictions - `regression-metric`: Evaluate regression model predictions Classification Metrics (from fastmath.stats): Supports binary and multiclass metrics including accuracy, precision, recall, F1-score, and more. Multiclass metrics can be averaged using: - `:macro` - Unweighted mean of per-class metrics - `:micro` - Aggregated true/false positives globally Also supports `:roc-auc` for multiclass AUC scoring. Regression Metrics (from fastmath.stats): Distance and similarity metrics such as MAE, MSE, RMSE, R², etc. Data Format: - Input datasets must be tech.ml.dataset (TMD) format - Must have appropriate column metadata (:prediction, :target, etc.) - Support categorical mappings via :categorical-map metadata - Missing values and NaNs are detected and rejected appropriately Validation: The functions perform extensive validation including: - Column metadata correctness - Missing values and NaN detection - Type and datatype uniformity - Row count alignment between datasets - Single-label assumption (multi-label not yet supported) See also: `fastmath.stats` documentation for available metric names
Design matrix construction for machine learning pipelines.
This namespace provides utilities to transform datasets into numeric design matrices suitable for machine learning models. It supports deriving new features, transforming existing columns, managing target variables, and expanding complex column types (arrays, maps).
Main Entry Point:
create-design-matrix: Transform a dataset into a design matrix with custom specsDesign Matrix Specification Syntax:
Column specifications use [column-name transformation] pairs where:
Shorthand Syntax:
Available Aliases (no qualification needed):
ds - tech.v3.datasettc - tablecloth.apitcc - tablecloth.column.apiFeatures:
Limitations:
See also: fastmath.ml/lm for linear regression with formula-based transformations
Design matrix construction for machine learning pipelines. This namespace provides utilities to transform datasets into numeric design matrices suitable for machine learning models. It supports deriving new features, transforming existing columns, managing target variables, and expanding complex column types (arrays, maps). Main Entry Point: - `create-design-matrix`: Transform a dataset into a design matrix with custom specs Design Matrix Specification Syntax: Column specifications use [column-name transformation] pairs where: - Transformations are Clojure expressions (quoted with ') - Expressions can reference column names directly as symbols - Expressions are evaluated in order and can chain - Non-listed columns are removed from the output Shorthand Syntax: - :column-name Keeps column unchanged (identity function) - [nil '(+ a b)] Auto-generates column name for derived column - ['(+ a b)] Same as above Available Aliases (no qualification needed): - `ds` - tech.v3.dataset - `tc` - tablecloth.api - `tcc` - tablecloth.column.api - All of clojure.core Features: - Derives new columns from existing data - Expands array and map columns into separate columns - Automatically converts categorical columns to numbers - Sets inference target(s) for supervised learning - Chains transformations in dependency order Limitations: - Does not automatically expand categorical variables (specify manually) - Design matrix approach is more flexible but less compact than R formula syntax See also: `fastmath.ml/lm` for linear regression with formula-based transformations
Gridsearching as defined by create a map with gridsearch definitions for its values and then gridsearching which produces a sequence of full defined maps.
The initial default implementation uses the sobol sequence.
Gridsearching as defined by create a map with gridsearch definitions for its values and then gridsearching which produces a sequence of full defined maps. The initial default implementation uses the sobol sequence.
DEPRECATED: Simple loss functions.
DEPRECATED: Simple loss functions.
DEPRECATED: Excellent metrics tools from the cortex project.
DEPRECATED: Excellent metrics tools from the cortex project.
Feature scaling and normalization transformers for metamorph pipelines.
This namespace provides metamorph-compatible transformers for standardizing and normalizing numeric features. These preprocessing steps are essential for many machine learning algorithms to perform well.
Available Transformers:
std-scale: Standardization (z-score normalization)min-max-scale: Min-max scaling to a specified rangeStandardScaling (std-scale): Centers each numeric column (subtract mean) and/or scales by standard deviation, producing zero-mean unit-variance data. Useful for:
Options:
:mean? (default true): Center by subtracting column mean:stddev? (default true): Scale by standard deviationMin-Max Scaling (min-max-scale):
Rescales each numeric column to a specified range (default [-0.5, 0.5]). Options:
:min (default -0.5): Target minimum value:max (default 0.5): Target maximum valueMetamorph Integration: Both transformers follow the metamorph pipeline pattern:
:fit mode: Learn scaling parameters from training data:transform mode: Apply learned parameters to new data:metamorph/idFeature scaling and normalization transformers for metamorph pipelines. This namespace provides metamorph-compatible transformers for standardizing and normalizing numeric features. These preprocessing steps are essential for many machine learning algorithms to perform well. Available Transformers: - `std-scale`: Standardization (z-score normalization) - `min-max-scale`: Min-max scaling to a specified range StandardScaling (std-scale): Centers each numeric column (subtract mean) and/or scales by standard deviation, producing zero-mean unit-variance data. Useful for: - Algorithms sensitive to feature magnitude (SVMs, neural networks, KNN) - Distance-based models Options: - `:mean?` (default true): Center by subtracting column mean - `:stddev?` (default true): Scale by standard deviation Min-Max Scaling (min-max-scale): Rescales each numeric column to a specified range (default [-0.5, 0.5]). Options: - `:min` (default -0.5): Target minimum value - `:max` (default 0.5): Target maximum value Metamorph Integration: Both transformers follow the metamorph pipeline pattern: - `:fit` mode: Learn scaling parameters from training data - `:transform` mode: Apply learned parameters to new data - Stores transformation parameters in context under their assigned `:metamorph/id`
R-style formula-based feature engineering and linear regression.
This namespace provides tools to leverage R's powerful formula syntax for feature engineering and linear modeling within Clojure. R formulas enable expressive specification of interactions, transformations, and categorical expansions without manual column manipulation.
Key Functions:
r-model-matrix: Convert dataset + R formula to design matrixlm: Simplified linear regression using R formulasImplementation Backends: The namespace supports multiple R execution backends:
:ocpu Remote R via OpenCPU (cloud.opencpu.org) - no local R needed:renjin Java-based R implementation (https://renjin.org/):clojisr Local R via clojisr (requires R installation)Model Matrix Capabilities: R formulas handle:
y ~ x1 + x2y ~ x1 * x2 (expands to x1 + x2 + x1:x2)y ~ x + I(x^2)y ~ x - 1 (remove intercept)y ~ . - x3 (all columns except x3)Linear Regression (lm): Combines formula-based feature engineering with OLS regression training. Returns a ready-to-use trained model for predictions.
Notes:
See also: scicloj.metamorph.ml.design-matrix for Clojure-native feature engineering
R-style formula-based feature engineering and linear regression. This namespace provides tools to leverage R's powerful formula syntax for feature engineering and linear modeling within Clojure. R formulas enable expressive specification of interactions, transformations, and categorical expansions without manual column manipulation. Key Functions: - `r-model-matrix`: Convert dataset + R formula to design matrix - `lm`: Simplified linear regression using R formulas Implementation Backends: The namespace supports multiple R execution backends: - `:ocpu` Remote R via OpenCPU (cloud.opencpu.org) - no local R needed - `:renjin` Java-based R implementation (https://renjin.org/) - `:clojisr` Local R via clojisr (requires R installation) Model Matrix Capabilities: R formulas handle: - Basic features: `y ~ x1 + x2` - Interactions: `y ~ x1 * x2` (expands to x1 + x2 + x1:x2) - Polynomial terms: `y ~ x + I(x^2)` - Categorical encoding: Automatic dummy variable creation - Intercept control: `y ~ x - 1` (remove intercept) - Exclusions: `y ~ . - x3` (all columns except x3) Linear Regression (lm): Combines formula-based feature engineering with OLS regression training. Returns a ready-to-use trained model for predictions. Notes: - OpenCPU backend is convenient but requires internet connectivity - Renjin is standalone but may have some R incompatibilities - clojisr requires a local R installation but offers full R compatibility - Returned model matrices exclude row names and intercept columns by default See also: [[scicloj.metamorph.ml.design-matrix]] for Clojure-native feature engineering
Optimized Pure Clojure Random Forest implementation for classification and regression. Can be used specifying
:model-type :metamorph.ml/random-forest
Optimized Pure Clojure Random Forest implementation for classification and regression. Can be used specifying `:model-type :metamorph.ml/random-forest`
No vars found in this namespace.
Regression models for continuous target prediction.
This namespace provides implementations of various regression algorithms with a consistent metamorph.ml training and prediction interface. Models support statistical output formats (tidy, glance, augment) for analysis and diagnostics.
Available Models:
OLS (Ordinary Least Squares)
:metamorph.ml/ols: Apache Commons Math implementation (Java-based):fastmath/ols: FastMath implementation (pure Clojure)
Solves for regression coefficients β in: y = Xβ + ε
Assumes linear relationships and homoscedastic errors.GLM (Generalized Linear Model)
:fastmath/glm: FastMath GLM implementation
Extends linear regression to non-normal distributions and non-linear relationships
via link functions and variance models.Baseline Model
:metamorph.ml/dummy-regressor: Predicts mean of training target
Useful sanity check - models should outperform this baseline.Model Output Functions:
Example Usage (in metamorph pipeline):
(ml/train
data
{:model-type :fastmath/ols})
Model Diagnostics:
(ml/glance model) ; Overall model metrics
(ml/tidy model) ; Coefficient table
(ml/augment model data) ; Predicted values and residuals
See also: scicloj.metamorph.ml.r-model-matrix for R-formula-based feature engineering
Regression models for continuous target prediction.
This namespace provides implementations of various regression algorithms with
a consistent metamorph.ml training and prediction interface. Models support
statistical output formats (tidy, glance, augment) for analysis and diagnostics.
Available Models:
**OLS (Ordinary Least Squares)**
- `:metamorph.ml/ols`: Apache Commons Math implementation (Java-based)
- `:fastmath/ols`: FastMath implementation (pure Clojure)
Solves for regression coefficients β in: y = Xβ + ε
Assumes linear relationships and homoscedastic errors.
**GLM (Generalized Linear Model)**
- `:fastmath/glm`: FastMath GLM implementation
Extends linear regression to non-normal distributions and non-linear relationships
via link functions and variance models.
**Baseline Model**
- `:metamorph.ml/dummy-regressor`: Predicts mean of training target
Useful sanity check - models should outperform this baseline.
Model Output Functions:
- **:tidy-fn**: Extracts model coefficients with statistics
Returns dataset with :term, :estimate, :std.error, :statistic, :p.value
- **:glance-fn**: Extracts model-level diagnostics
Returns dataset with :r.squared, :adj.r.squared, :rss, :aic, :bic, etc.
- **:augment-fn**: Adds model predictions and residuals to data
Returns augmented dataset with :.fitted and :.resid columns
Example Usage (in metamorph pipeline):
```
(ml/train
data
{:model-type :fastmath/ols})
```
Model Diagnostics:
```
(ml/glance model) ; Overall model metrics
(ml/tidy model) ; Coefficient table
(ml/augment model data) ; Predicted values and residuals
```
See also: [[scicloj.metamorph.ml.r-model-matrix]] for R-formula-based feature engineeringLarge-scale text processing and TF-IDF feature engineering for NLP pipelines.
This namespace provides efficient tools for converting raw text documents into machine learning-ready features using TF-IDF (Term Frequency-Inverse Document Frequency) scoring. Designed to handle large text corpora with flexible memory management strategies.
Core Functions:
->tidy-text Parses text files or datasets into tidy-text format (one token per row). Line-by-line processing enables handling of files larger than available RAM. Supports custom tokenization and metadata extraction.
Output format: tech.v3.dataset with columns:
->tfidf Transforms tidy-text into TF-IDF vector representation for bag-of-words models. Calculates term frequency (TF) and inverse document frequency (IDF) for each token.
Output columns:
:document:token-idx:token-count:tf:tfidMemory Optimization:
The namespace provides flexible memory control for large texts via options:
Container Types:
:jvm-heap (default): Java heap storage (fast, limited by heap):native-heap: Off-heap native memory via tech.v3:mmap: Memory-mapped files (disk-backed, bypasses heap limits)Processing Options:
container-type: Storage for intermediate results during processingcolumn-container-type: Storage for final output datasetcombine-method: :coalesce-blocks! or :concat-buffers (tradeoffs)compacting-document-interval: Batch size for consolidating datadatatype-document/token-pos/idx: Memory datatype selection (:int16 vs :int32)Token Management:
token->index-map: Custom token lookup table (can reuse across runs)new-token-behaviour: :store (default), :fail, or :as-unknownPerformance Characteristics:
Typical Workflow:
See also: scicloj.metamorph.ml.column-metric for evaluation,
scicloj.metamorph.ml/train for model training
Large-scale text processing and TF-IDF feature engineering for NLP pipelines. This namespace provides efficient tools for converting raw text documents into machine learning-ready features using TF-IDF (Term Frequency-Inverse Document Frequency) scoring. Designed to handle large text corpora with flexible memory management strategies. Core Functions: **->tidy-text** Parses text files or datasets into tidy-text format (one token per row). Line-by-line processing enables handling of files larger than available RAM. Supports custom tokenization and metadata extraction. Output format: tech.v3.dataset with columns: - :document (int): Document/line identifier - :token-idx (int): Token as indexed integer (maps to lookup table) - :token-pos (int): Position of token within document - :meta (optional): Arbitrary metadata from line-split-fn **->tfidf** Transforms tidy-text into TF-IDF vector representation for bag-of-words models. Calculates term frequency (TF) and inverse document frequency (IDF) for each token. Output columns: - `:document` - `:token-idx` - `:token-count` - `:tf` - `:tfid` Memory Optimization: The namespace provides flexible memory control for large texts via options: Container Types: - `:jvm-heap` (default): Java heap storage (fast, limited by heap) - `:native-heap`: Off-heap native memory via tech.v3 - `:mmap`: Memory-mapped files (disk-backed, bypasses heap limits) Processing Options: - `container-type`: Storage for intermediate results during processing - `column-container-type`: Storage for final output dataset - `combine-method`: `:coalesce-blocks!` or `:concat-buffers` (tradeoffs) - `compacting-document-interval`: Batch size for consolidating data - `datatype-document/token-pos/idx`: Memory datatype selection (:int16 vs :int32) Token Management: - `token->index-map`: Custom token lookup table (can reuse across runs) - `new-token-behaviour`: `:store` (default), `:fail`, or `:as-unknown` Performance Characteristics: - Typical text requires ~1.5x the original file size in RAM - A 8GB text file typically needs ≥12GB total memory - Scaling strategy: Use off-heap or mmap for large corpora Typical Workflow: 1. Use ->tidy-text to create tidy text format from raw documents 2. Use ->tfidf to create TF-IDF feature vectors 3. Pass vectors to classification/regression models See also: [[scicloj.metamorph.ml.column-metric]] for evaluation, [[scicloj.metamorph.ml/train]] for model training
Model output standardization and validation following tidymodels conventions.
This namespace implements the tidymodels philosophy (inspired by R's tidymodels/broom packages) for standardized, machine-readable model outputs. All model outputs conform to consistent schemas defined in canonical column specification files.
Three Core Output Functions:
glance: One-row model summary
tidy: One-row-per-component output
augment: One-row-per-observation output
Validation and Schema Management:
allowed-tidy-columns: Canonical list of valid tidy column namesallowed-glance-columns: Canonical list of valid glance column namesallowed-augment-columns: Canonical list of valid augment column namesvalidate-tidy-ds: Validates dataset conforms to tidy standardvalidate-glance-ds: Validates dataset conforms to glance standardvalidate-augment-ds: Validates dataset conforms to augment standardSchemas are maintained in GitHub repository (resources/*.edn):
Control Validation:
The *validate-tidy-fns* dynamic variable controls strict validation:
true (default): Raises exception on invalid columnsfalse: Silently allows any columnsIntegration: Model implementations use these validators in their tidy-fn/glance-fn/augment-fn to ensure outputs conform to standardized schemas for consistency across models.
See also: scicloj.metamorph.ml for training and prediction,
scicloj.metamorph.ml.regression and scicloj.metamorph.ml.classification
for specific model implementations
Model output standardization and validation following tidymodels conventions. This namespace implements the tidymodels philosophy (inspired by R's tidymodels/broom packages) for standardized, machine-readable model outputs. All model outputs conform to consistent schemas defined in canonical column specification files. Three Core Output Functions: **glance**: One-row model summary - High-level goodness-of-fit statistics - Examples: R², AIC, BIC, log-likelihood, F-statistic, p-value - Use case: Quick model performance overview **tidy**: One-row-per-component output - Component-level details (e.g., one row per coefficient) - Examples: term, estimate, std.error, statistic, p.value - Use case: Detailed model inspection and reporting **augment**: One-row-per-observation output - Adds model predictions/residuals to original data - Original columns plus: .fitted, .resid, .hat, .sigma, .cooksd - Use case: Diagnostics and visualization of predictions Validation and Schema Management: - `allowed-tidy-columns`: Canonical list of valid tidy column names - `allowed-glance-columns`: Canonical list of valid glance column names - `allowed-augment-columns`: Canonical list of valid augment column names - `validate-tidy-ds`: Validates dataset conforms to tidy standard - `validate-glance-ds`: Validates dataset conforms to glance standard - `validate-augment-ds`: Validates dataset conforms to augment standard Schemas are maintained in GitHub repository (resources/*.edn): - columms-tidy.edn - columms-glance.edn - columms-augment.edn Control Validation: The `*validate-tidy-fns*` dynamic variable controls strict validation: - `true` (default): Raises exception on invalid columns - `false`: Silently allows any columns Integration: Model implementations use these validators in their tidy-fn/glance-fn/augment-fn to ensure outputs conform to standardized schemas for consistency across models. See also: `scicloj.metamorph.ml` for training and prediction, `scicloj.metamorph.ml.regression` and `scicloj.metamorph.ml.classification` for specific model implementations
Deprecated ns. Use scicloj.metamorph.ml.rdatasets instead
Deprecated ns. Use scicloj.metamorph.ml.rdatasets instead
cljdoc builds & hosts documentation for Clojure/Script libraries
| Ctrl+k | Jump to recent docs |
| ← | Move to previous article |
| → | Move to next article |
| Ctrl+/ | Jump to the search field |