R-style formula-based feature engineering and linear regression.
This namespace provides tools to leverage R's powerful formula syntax for feature engineering and linear modeling within Clojure. R formulas enable expressive specification of interactions, transformations, and categorical expansions without manual column manipulation.
Key Functions:
r-model-matrix: Convert dataset + R formula to design matrixlm: Simplified linear regression using R formulasImplementation Backends: The namespace supports multiple R execution backends:
:ocpu Remote R via OpenCPU (cloud.opencpu.org) - no local R needed:renjine Java-based R implementation (https://renjin.org/):clojisr Local R via clojisr (requires R installation)Model Matrix Capabilities: R formulas handle:
y ~ x1 + x2y ~ x1 * x2 (expands to x1 + x2 + x1:x2)y ~ x + I(x^2)y ~ x - 1 (remove intercept)y ~ . - x3 (all columns except x3)Linear Regression (lm): Combines formula-based feature engineering with OLS regression training. Returns a ready-to-use trained model for predictions.
Example Usage: (r-model-matrix iris-data "~ Sepal.Length + Sepal.Width" :renjine) (lm iris-data "Sepal.Width ~ Sepal.Length * Petal.Length" :Sepal.Width :ocpu)
Notes:
See also: scicloj.metamorph.ml.design-matrix for Clojure-native feature engineering
R-style formula-based feature engineering and linear regression.
This namespace provides tools to leverage R's powerful formula syntax for
feature engineering and linear modeling within Clojure. R formulas enable
expressive specification of interactions, transformations, and categorical
expansions without manual column manipulation.
Key Functions:
- `r-model-matrix`: Convert dataset + R formula to design matrix
- `lm`: Simplified linear regression using R formulas
Implementation Backends:
The namespace supports multiple R execution backends:
- `:ocpu` Remote R via OpenCPU (cloud.opencpu.org) - no local R needed
- `:renjine` Java-based R implementation (https://renjin.org/)
- `:clojisr` Local R via clojisr (requires R installation)
Model Matrix Capabilities:
R formulas handle:
- Basic features: `y ~ x1 + x2`
- Interactions: `y ~ x1 * x2` (expands to x1 + x2 + x1:x2)
- Polynomial terms: `y ~ x + I(x^2)`
- Categorical encoding: Automatic dummy variable creation
- Intercept control: `y ~ x - 1` (remove intercept)
- Exclusions: `y ~ . - x3` (all columns except x3)
Linear Regression (lm):
Combines formula-based feature engineering with OLS regression training.
Returns a ready-to-use trained model for predictions.
Example Usage:
(r-model-matrix iris-data "~ Sepal.Length + Sepal.Width" :renjine)
(lm iris-data "Sepal.Width ~ Sepal.Length * Petal.Length"
:Sepal.Width :ocpu)
Notes:
- OpenCPU backend is convenient but requires internet connectivity
- Renjin is standalone but may have some R incompatibilities
- clojisr requires a local R installation but offers full R compatibility
- Returned model matrices exclude row names and intercept columns by default
See also: `scicloj.metamorph.ml.design-matrix` for Clojure-native feature engineering(lm ds formula target-var formula-impl)Train a linear model using an R-style formula.
This function combines R formula-based feature engineering with ordinary least squares (OLS) regression. It creates a design matrix from the input dataset using the specified R formula, then trains a linear model on the resulting features.
Parameters:
ds A tech.ml.dataset dataset containing the input data with all
variables referenced in the formula and target variable.
formula A string containing the R formula (e.g., "y ~ x1 + x2 * x3").
The formula is interpreted by the R backend.
target-var A keyword or string naming the target variable for regression.
This variable must be present in the input dataset.
formula-impl An implementation keyword for formula evaluation:
:ocpu Uses OpenCPU (cloud.opencpu.org), no local R needed:renjine Uses Renjin, a Java implementation of R:clojisr Uses clojisr with local R installationReturns: A trained linear model (OLS from fastmath) ready for predictions. The model excludes the intercept column and row names from the design matrix by default.
Example: (lm iris-data "Sepal.Width ~ Sepal.Length + Petal.Length" :Sepal.Width :renjine)
Train a linear model using an R-style formula.
This function combines R formula-based feature engineering with ordinary least
squares (OLS) regression. It creates a design matrix from the input dataset using
the specified R formula, then trains a linear model on the resulting features.
Parameters:
- `ds` A tech.ml.dataset dataset containing the input data with all
variables referenced in the formula and target variable.
- `formula` A string containing the R formula (e.g., "y ~ x1 + x2 * x3").
The formula is interpreted by the R backend.
- `target-var` A keyword or string naming the target variable for regression.
This variable must be present in the input dataset.
- `formula-impl` An implementation keyword for formula evaluation:
- `:ocpu` Uses OpenCPU (cloud.opencpu.org), no local R needed
- `:renjine` Uses Renjin, a Java implementation of R
- `:clojisr` Uses clojisr with local R installation
Returns:
A trained linear model (OLS from fastmath) ready for predictions. The model
excludes the intercept column and row names from the design matrix by default.
Example:
(lm iris-data "Sepal.Width ~ Sepal.Length + Petal.Length" :Sepal.Width :renjine)(r-model-matrix ds r-formula impl)Compute a model matrix from a dataset and an R-style formula.
Parameters:
ds A tech.ml.dataset dataset representing the input data.r-formula A string containing the R formula to use for model matrix construction. The formua is interpreted by R itself, so should be full compatibleimpl An implementation keyword, either
:ocpu Uses an online service https://www.opencpu.org/api.html (server: cloud.opencpu.org):renjine Uses https://renjin.org/:clojisr Uses https://github.com/scicloj/clojisr, which requires a local R installationReturns a dataset containing the constructed design matrix. Dispatches to the appropriate backend implementation.
Returns a map with
:model-matrix-dataset having the TMD containing the design matrix specified by r-formula:attributes the (R) attributes of the model.matrix objectCompute a model matrix from a dataset and an R-style formula. Parameters: - `ds` A tech.ml.dataset dataset representing the input data. - `r-formula` A string containing the R formula to use for model matrix construction. The formua is interpreted by R itself, so should be full compatible - `impl` An implementation keyword, either - `:ocpu` Uses an online service https://www.opencpu.org/api.html (server: cloud.opencpu.org) - `:renjine` Uses https://renjin.org/ - `:clojisr` Uses https://github.com/scicloj/clojisr, which requires a local R installation Returns a dataset containing the constructed design matrix. Dispatches to the appropriate backend implementation. Returns a map with - `:model-matrix-dataset` having the TMD containing the design matrix specified by `r-formula` - `:attributes` the (R) attributes of the model.matrix object
cljdoc builds & hosts documentation for Clojure/Script libraries
| Ctrl+k | Jump to recent docs |
| ← | Move to previous article |
| → | Move to next article |
| Ctrl+/ | Jump to the search field |