R-style formula-based feature engineering and linear regression.
This namespace provides tools to leverage R's powerful formula syntax for feature engineering and linear modeling within Clojure. R formulas enable expressive specification of interactions, transformations, and categorical expansions without manual column manipulation.
Key Functions:
r-model-matrix: Convert dataset + R formula to design matrixlm: Simplified linear regression using R formulasImplementation Backends: The namespace supports multiple R execution backends:
:ocpu Remote R via OpenCPU (cloud.opencpu.org) - no local R needed:renjin Java-based R implementation (https://renjin.org/):clojisr Local R via clojisr (requires R installation)Model Matrix Capabilities: R formulas handle:
y ~ x1 + x2y ~ x1 * x2 (expands to x1 + x2 + x1:x2)y ~ x + I(x^2)y ~ x - 1 (remove intercept)y ~ . - x3 (all columns except x3)Linear Regression (lm): Combines formula-based feature engineering with OLS regression training. Returns a ready-to-use trained model for predictions.
Example Usage: (r-model-matrix iris-data "~ Sepal.Length + Sepal.Width" :renjin) (lm iris-data "Sepal.Width ~ Sepal.Length * Petal.Length" :Sepal.Width :ocpu)
Notes:
See also: scicloj.metamorph.ml.design-matrix for Clojure-native feature engineering
R-style formula-based feature engineering and linear regression.
This namespace provides tools to leverage R's powerful formula syntax for
feature engineering and linear modeling within Clojure. R formulas enable
expressive specification of interactions, transformations, and categorical
expansions without manual column manipulation.
Key Functions:
- `r-model-matrix`: Convert dataset + R formula to design matrix
- `lm`: Simplified linear regression using R formulas
Implementation Backends:
The namespace supports multiple R execution backends:
- `:ocpu` Remote R via OpenCPU (cloud.opencpu.org) - no local R needed
- `:renjin` Java-based R implementation (https://renjin.org/)
- `:clojisr` Local R via clojisr (requires R installation)
Model Matrix Capabilities:
R formulas handle:
- Basic features: `y ~ x1 + x2`
- Interactions: `y ~ x1 * x2` (expands to x1 + x2 + x1:x2)
- Polynomial terms: `y ~ x + I(x^2)`
- Categorical encoding: Automatic dummy variable creation
- Intercept control: `y ~ x - 1` (remove intercept)
- Exclusions: `y ~ . - x3` (all columns except x3)
Linear Regression (lm):
Combines formula-based feature engineering with OLS regression training.
Returns a ready-to-use trained model for predictions.
Example Usage:
(r-model-matrix iris-data "~ Sepal.Length + Sepal.Width" :renjin)
(lm iris-data "Sepal.Width ~ Sepal.Length * Petal.Length"
:Sepal.Width :ocpu)
Notes:
- OpenCPU backend is convenient but requires internet connectivity
- Renjin is standalone but may have some R incompatibilities
- clojisr requires a local R installation but offers full R compatibility
- Returned model matrices exclude row names and intercept columns by default
See also: `scicloj.metamorph.ml.design-matrix` for Clojure-native feature engineering(add-clojisr-dependency classloader-or-nil)Adds dynamically clojisr to classpath using pomegranate.
This might not work in all situations
Adds dynamically `clojisr` to classpath using pomegranate. This might not work in all situations
(add-opencpu-dependency classloader-or-nil)Adds dynamically opencpu-clj to classpath using pomegranate.
This might not work in all situations
Adds dynamically `opencpu-clj` to classpath using pomegranate. This might not work in all situations
(add-renjin-deps classloader-or-nil)Adds dynamically renjin to classpath using pomegranate.
This might not work in all situations
Adds dynamically `renjin` to classpath using pomegranate. This might not work in all situations
(lm ds formula target-var formula-impl)Train a linear model using an R-style formula.
This function combines R formula-based feature engineering with ordinary least squares (OLS) regression. It creates a design matrix from the input dataset using the specified R formula, then trains a linear model on the resulting features.
Parameters:
ds A tech.ml.dataset dataset containing the input data with all
variables referenced in the formula and target variable.
formula A string containing the R formula (e.g., "y ~ x1 + x2 * x3").
The formula is interpreted by the R backend.
target-var A keyword or string naming the target variable for regression.
This variable must be present in the input dataset.
formula-impl An implementation keyword for formula evaluation:
:ocpu Uses OpenCPU (cloud.opencpu.org), no local R needed:renjin Uses Renjin, a Java implementation of R:clojisr Uses clojisr with local R installationRequires setup of dependencies of teh engine, see: r-model-matrix
Returns:
A trained linear model (OLS from fastmath) ready for predictions. The model
excludes the intercept column and row names from the design matrix by default.
Example:
(lm iris-data "Sepal.Width ~ Sepal.Length + Petal.Length" :Sepal.Width :renjin)
Train a linear model using an R-style formula.
This function combines R formula-based feature engineering with ordinary least
squares (OLS) regression. It creates a design matrix from the input dataset using
the specified R formula, then trains a linear model on the resulting features.
Parameters:
- `ds` A tech.ml.dataset dataset containing the input data with all
variables referenced in the formula and target variable.
- `formula` A string containing the R formula (e.g., "y ~ x1 + x2 * x3").
The formula is interpreted by the R backend.
- `target-var` A keyword or string naming the target variable for regression.
This variable must be present in the input dataset.
- `formula-impl` An implementation keyword for formula evaluation:
- `:ocpu` Uses OpenCPU (cloud.opencpu.org), no local R needed
- `:renjin` Uses Renjin, a Java implementation of R
- `:clojisr` Uses clojisr with local R installation
Requires setup of dependencies of teh engine, see: `r-model-matrix`
Returns:
A trained linear model (OLS from fastmath) ready for predictions. The model
excludes the intercept column and row names from the design matrix by default.
Example:
```
(lm iris-data "Sepal.Width ~ Sepal.Length + Petal.Length" :Sepal.Width :renjin)
```
(r-model-matrix ds r-formula impl)Compute a model matrix from a dataset and an R-style formula.
Parameters:
ds A tech.ml.dataset dataset representing the input data.r-formula A string containing the R formula to use for model matrix construction. The formua is interpreted by R itself, so should be full compatibleimpl An implementation keyword, either
:ocpu Uses an online service https://www.opencpu.org/api.html (server: cloud.opencpu.org):renjine Uses https://renjin.org/:clojisr Uses https://github.com/scicloj/clojisr, which requires a local R installationEach implementation requires dependencies to be added:
:ocpu : [opencpu-clj/opencpu-clj "0.3.1"]:renjin : [org.renjin/renjin-script-engine "3.5-beta76"]:clojisr : [scicloj/clojisr "1.1.0"]Returns a dataset containing the constructed design matrix.
If ds contains target columns, they are added to the returned dataset.
Dispatches to the appropriate backend implementation.
Returns a map with
:model-matrix-dataset having the TMD containing the design matrix specified by r-formula:attributes the (R) attributes of the model.matrix objectCompute a model matrix from a dataset and an R-style formula. Parameters: - `ds` A tech.ml.dataset dataset representing the input data. - `r-formula` A string containing the R formula to use for model matrix construction. The formua is interpreted by R itself, so should be full compatible - `impl` An implementation keyword, either - `:ocpu` Uses an online service https://www.opencpu.org/api.html (server: cloud.opencpu.org) - `:renjine` Uses https://renjin.org/ - `:clojisr` Uses https://github.com/scicloj/clojisr, which requires a local R installation Each implementation requires dependencies to be added: - `:ocpu` : [opencpu-clj/opencpu-clj "0.3.1"] - `:renjin` : [org.renjin/renjin-script-engine "3.5-beta76"] - `:clojisr` : [scicloj/clojisr "1.1.0"] Returns a dataset containing the constructed design matrix. If `ds` contains `target` columns, they are added to the returned dataset. Dispatches to the appropriate backend implementation. Returns a map with - `:model-matrix-dataset` having the TMD containing the design matrix specified by `r-formula` - `:attributes` the (R) attributes of the model.matrix object
cljdoc builds & hosts documentation for Clojure/Script libraries
| Ctrl+k | Jump to recent docs |
| ← | Move to previous article |
| → | Move to next article |
| Ctrl+/ | Jump to the search field |