R-style formula-based feature engineering and linear regression.
This namespace provides tools to leverage R's powerful formula syntax for feature engineering and linear modeling within Clojure. R formulas enable expressive specification of interactions, transformations, and categorical expansions without manual column manipulation.
Key Functions:
r-model-matrix: Convert dataset + R formula to design matrixlm: Simplified linear regression using R formulasImplementation Backends: The namespace supports multiple R execution backends:
:ocpu Remote R via OpenCPU (cloud.opencpu.org) - no local R needed:renjin Java-based R implementation (https://renjin.org/):clojisr Local R via clojisr (requires R installation)Model Matrix Capabilities: R formulas handle:
y ~ x1 + x2y ~ x1 * x2 (expands to x1 + x2 + x1:x2)y ~ x + I(x^2)y ~ x - 1 (remove intercept)y ~ . - x3 (all columns except x3)Linear Regression (lm): Combines formula-based feature engineering with OLS regression training. Returns a ready-to-use trained model for predictions.
Notes:
See also: scicloj.metamorph.ml.design-matrix for Clojure-native feature engineering
R-style formula-based feature engineering and linear regression. This namespace provides tools to leverage R's powerful formula syntax for feature engineering and linear modeling within Clojure. R formulas enable expressive specification of interactions, transformations, and categorical expansions without manual column manipulation. Key Functions: - `r-model-matrix`: Convert dataset + R formula to design matrix - `lm`: Simplified linear regression using R formulas Implementation Backends: The namespace supports multiple R execution backends: - `:ocpu` Remote R via OpenCPU (cloud.opencpu.org) - no local R needed - `:renjin` Java-based R implementation (https://renjin.org/) - `:clojisr` Local R via clojisr (requires R installation) Model Matrix Capabilities: R formulas handle: - Basic features: `y ~ x1 + x2` - Interactions: `y ~ x1 * x2` (expands to x1 + x2 + x1:x2) - Polynomial terms: `y ~ x + I(x^2)` - Categorical encoding: Automatic dummy variable creation - Intercept control: `y ~ x - 1` (remove intercept) - Exclusions: `y ~ . - x3` (all columns except x3) Linear Regression (lm): Combines formula-based feature engineering with OLS regression training. Returns a ready-to-use trained model for predictions. Notes: - OpenCPU backend is convenient but requires internet connectivity - Renjin is standalone but may have some R incompatibilities - clojisr requires a local R installation but offers full R compatibility - Returned model matrices exclude row names and intercept columns by default See also: [[scicloj.metamorph.ml.design-matrix]] for Clojure-native feature engineering
(lm ds formula target-var formula-impl)Train a linear model using an R-style formula.
This function combines R formula-based feature engineering with ordinary least squares (OLS) regression. It creates a design matrix from the input dataset using the specified R formula, then trains a linear model on the resulting features.
Parameters:
ds A tech.ml.dataset dataset containing the input data with all
variables referenced in the formula and target variable.
formula A string containing the R formula (e.g., "y ~ x1 + x2 * x3").
The formula is interpreted by the R backend.
target-var A keyword or string naming the target variable for regression.
This variable must be present in the input dataset.
formula-impl An implementation keyword for formula evaluation:
:ocpu Uses OpenCPU (cloud.opencpu.org), no local R needed:renjin Uses Renjin, a Java implementation of R:clojisr Uses clojisr with local R installationRequires setup of dependencies of teh engine, see: r-model-matrix
Returns: A trained linear model (OLS from fastmath) ready for predictions. The model excludes the intercept column and row names from the design matrix by default.
Train a linear model using an R-style formula.
This function combines R formula-based feature engineering with ordinary least
squares (OLS) regression. It creates a design matrix from the input dataset using
the specified R formula, then trains a linear model on the resulting features.
Parameters:
- `ds` A tech.ml.dataset dataset containing the input data with all
variables referenced in the formula and target variable.
- `formula` A string containing the R formula (e.g., "y ~ x1 + x2 * x3").
The formula is interpreted by the R backend.
- `target-var` A keyword or string naming the target variable for regression.
This variable must be present in the input dataset.
- `formula-impl` An implementation keyword for formula evaluation:
- `:ocpu` Uses OpenCPU (cloud.opencpu.org), no local R needed
- `:renjin` Uses Renjin, a Java implementation of R
- `:clojisr` Uses clojisr with local R installation
Requires setup of dependencies of teh engine, see: `r-model-matrix`
Returns:
A trained linear model (OLS from fastmath) ready for predictions. The model
excludes the intercept column and row names from the design matrix by default.
(r-model-matrix dataset r-formula impl)Compute a model matrix from a dataset and an R-style formula.
Parameters:
ds A tech.ml.dataset dataset representing the input data.
r-formula A string containing the R formula to use for model matrix construction. The formua is interpreted by R itself, so should be full compatible
impl An implementation keyword, either
:ocpu Uses an online service https://www.opencpu.org/api.html (server: cloud.opencpu.org):renjine Uses https://renjin.org/:clojisr Uses https://github.com/scicloj/clojisr, which requires a local R installationEach implementation requires dependencies to be added:
:ocpu : [opencpu-clj/opencpu-clj "0.3.1"]:renjin : [org.renjin/renjin-script-engine "3.5-beta76"]:clojisr : [scicloj/clojisr "1.1.0"]Returns a dataset containing the constructed design matrix.
If ds contains target columns, they are added to the returned dataset.
Dispatches to the appropriate backend implementation.
Returns a map with
:model-matrix-dataset having the TMD containing the design matrix specified by r-formula:attributes the (R) attributes of the model.matrix objectCompute a model matrix from a dataset and an R-style formula.
Parameters:
- `ds` A tech.ml.dataset dataset representing the input data.
- `r-formula` A string containing the R formula to use for model matrix construction. The formua is interpreted by R itself, so should be full compatible
- `impl` An implementation keyword, either
- `:ocpu` Uses an online service https://www.opencpu.org/api.html (server: cloud.opencpu.org)
- `:renjine` Uses https://renjin.org/
- `:clojisr` Uses https://github.com/scicloj/clojisr, which requires a local R installation
Each implementation requires dependencies to be added:
- `:ocpu` : [opencpu-clj/opencpu-clj "0.3.1"]
- `:renjin` : [org.renjin/renjin-script-engine "3.5-beta76"]
- `:clojisr` : [scicloj/clojisr "1.1.0"]
Returns a dataset containing the constructed design matrix.
If `ds` contains `target` columns, they are added to the returned dataset.
Dispatches to the appropriate backend implementation.
Returns a map with
- `:model-matrix-dataset` having the TMD containing the design matrix specified by `r-formula`
- `:attributes` the (R) attributes of the model.matrix object
cljdoc builds & hosts documentation for Clojure/Script libraries
| Ctrl+k | Jump to recent docs |
| ← | Move to previous article |
| → | Move to next article |
| Ctrl+/ | Jump to the search field |