zero-one.geni.ml.regression

Liking cljdoc? Tell your friends :D

Clojure only.

aft-survival-regression
decision-tree-regressor
fm-regressor
gbt-regressor
generalised-linear-regression
generalized-linear-regression
glm
isotonic-regression
linear-regression
random-forest-regressor

aft-survival-regression^clj

(aft-survival-regression params)

Fit a parametric survival regression model named accelerated failure time (AFT) model (see Accelerated failure time model (Wikipedia)) based on the Weibull distribution of the survival time.

Source: https://spark.apache.org/docs/latest/api/scala/org/apache/spark/ml/regression/AFTSurvivalRegression.html Timestamp: 2020-10-02T14:21:20.345Z

Fit a parametric survival regression model named accelerated failure time (AFT) model
(see 
Accelerated failure time model (Wikipedia))
based on the Weibull distribution of the survival time.

Source: https://spark.apache.org/docs/latest/api/scala/org/apache/spark/ml/regression/AFTSurvivalRegression.html
Timestamp: 2020-10-02T14:21:20.345Z

source raw docstring

decision-tree-regressor^clj

(decision-tree-regressor params)

Decision tree learning algorithm for regression. It supports both continuous and categorical features.

Source: https://spark.apache.org/docs/latest/api/scala/org/apache/spark/ml/regression/DecisionTreeRegressor.html Timestamp: 2020-10-02T14:21:20.720Z

Decision tree
learning algorithm for regression.
It supports both continuous and categorical features.

Source: https://spark.apache.org/docs/latest/api/scala/org/apache/spark/ml/regression/DecisionTreeRegressor.html
Timestamp: 2020-10-02T14:21:20.720Z

source raw docstring

fm-regressor^clj

(fm-regressor params)

Factorization Machines learning algorithm for regression. It supports normal gradient descent and AdamW solver.The implementation is based upon:

S. Rendle. "Factorization machines" 2010.FM is able to estimate interactions even in problems with huge sparsity (like advertising and recommendation system). FM formula is:

$$ \begin{align} y = w_0 + \sum\limits^n_{i-1} w_i x_i + \sum\limits^n_{i=1} \sum\limits^n_{j=i+1} \langle v_i, v_j \rangle x_i x_j \end{align} $$

First two terms denote global bias and linear term (as same as linear regression), and last term denotes pairwise interactions term. v_i describes the i-th variable with k factors.FM regression model uses MSE loss which can be solved by gradient descent method, and regularization terms like L2 are usually added to the loss function to prevent overfitting.

Source: https://spark.apache.org/docs/latest/api/scala/org/apache/spark/ml/regression/FMRegressor.html Timestamp: 2020-10-02T14:21:21.102Z

Factorization Machines learning algorithm for regression.
It supports normal gradient descent and AdamW solver.The implementation is based upon:

S. Rendle. "Factorization machines" 2010.FM is able to estimate interactions even in problems with huge sparsity
(like advertising and recommendation system).
FM formula is:

  $$
  \begin{align}
  y = w_0 + \sum\limits^n_{i-1} w_i x_i +
    \sum\limits^n_{i=1} \sum\limits^n_{j=i+1} \langle v_i, v_j \rangle x_i x_j
  \end{align}
  $$

First two terms denote global bias and linear term (as same as linear regression),
and last term denotes pairwise interactions term. v_i describes the i-th variable
with k factors.FM regression model uses MSE loss which can be solved by gradient descent method, and
regularization terms like L2 are usually added to the loss function to prevent overfitting.

Source: https://spark.apache.org/docs/latest/api/scala/org/apache/spark/ml/regression/FMRegressor.html
Timestamp: 2020-10-02T14:21:21.102Z

source raw docstring

gbt-regressor^clj

(gbt-regressor params)

Gradient-Boosted Trees (GBTs) learning algorithm for regression. It supports both continuous and categorical features.The implementation is based upon: J.H. Friedman. "Stochastic Gradient Boosting." 1999.Notes on Gradient Boosting vs. TreeBoost:This implementation is for Stochastic Gradient Boosting, not for TreeBoost.Both algorithms learn tree ensembles by minimizing loss functions.TreeBoost (Friedman, 1999) additionally modifies the outputs at tree leaf nodes based on the loss function, whereas the original gradient boosting method does not.When the loss is SquaredError, these methods give the same result, but they could differ for other loss functions.We expect to implement TreeBoost in the future: [https://issues.apache.org/jira/browse/SPARK-4240]

Source: https://spark.apache.org/docs/latest/api/scala/org/apache/spark/ml/regression/GBTRegressor.html Timestamp: 2020-10-02T14:21:21.485Z

Gradient-Boosted Trees (GBTs)
learning algorithm for regression.
It supports both continuous and categorical features.The implementation is based upon: J.H. Friedman. "Stochastic Gradient Boosting." 1999.Notes on Gradient Boosting vs. TreeBoost:This implementation is for Stochastic Gradient Boosting, not for TreeBoost.Both algorithms learn tree ensembles by minimizing loss functions.TreeBoost (Friedman, 1999) additionally modifies the outputs at tree leaf nodes
   based on the loss function, whereas the original gradient boosting method does not.When the loss is SquaredError, these methods give the same result, but they could differ
      for other loss functions.We expect to implement TreeBoost in the future:
   [https://issues.apache.org/jira/browse/SPARK-4240]

Source: https://spark.apache.org/docs/latest/api/scala/org/apache/spark/ml/regression/GBTRegressor.html
Timestamp: 2020-10-02T14:21:21.485Z

source raw docstring

generalised-linear-regression^clj

(generalised-linear-regression params)

Fit a Generalized Linear Model (see Generalized linear model (Wikipedia)) specified by giving a symbolic description of the linear predictor (link function) and a description of the error distribution (family). It supports "gaussian", "binomial", "poisson", "gamma" and "tweedie" as family. Valid link functions for each family is listed below. The first link function of each family is the default one."gaussian" : "identity", "log", "inverse""binomial" : "logit", "probit", "cloglog""poisson" : "log", "identity", "sqrt""gamma" : "inverse", "identity", "log""tweedie" : power link function specified through "linkPower". The default link power in the tweedie family is 1 - variancePower.

Source: https://spark.apache.org/docs/latest/api/scala/org/apache/spark/ml/regression/GeneralizedLinearRegression.html Timestamp: 2020-10-02T14:21:21.946Z

Fit a Generalized Linear Model
(see 
Generalized linear model (Wikipedia))
specified by giving a symbolic description of the linear
predictor (link function) and a description of the error distribution (family).
It supports "gaussian", "binomial", "poisson", "gamma" and "tweedie" as family.
Valid link functions for each family is listed below. The first link function of each family
is the default one."gaussian" : "identity", "log", "inverse""binomial" : "logit", "probit", "cloglog""poisson"  : "log", "identity", "sqrt""gamma"    : "inverse", "identity", "log""tweedie"  : power link function specified through "linkPower". The default link power in
 the tweedie family is 1 - variancePower.

Source: https://spark.apache.org/docs/latest/api/scala/org/apache/spark/ml/regression/GeneralizedLinearRegression.html
Timestamp: 2020-10-02T14:21:21.946Z

source raw docstring

generalized-linear-regression^clj

(generalized-linear-regression params)

Source: https://spark.apache.org/docs/latest/api/scala/org/apache/spark/ml/regression/GeneralizedLinearRegression.html Timestamp: 2020-10-02T14:21:21.946Z

Fit a Generalized Linear Model
(see 
Generalized linear model (Wikipedia))
specified by giving a symbolic description of the linear
predictor (link function) and a description of the error distribution (family).
It supports "gaussian", "binomial", "poisson", "gamma" and "tweedie" as family.
Valid link functions for each family is listed below. The first link function of each family
is the default one."gaussian" : "identity", "log", "inverse""binomial" : "logit", "probit", "cloglog""poisson"  : "log", "identity", "sqrt""gamma"    : "inverse", "identity", "log""tweedie"  : power link function specified through "linkPower". The default link power in
 the tweedie family is 1 - variancePower.

Source: https://spark.apache.org/docs/latest/api/scala/org/apache/spark/ml/regression/GeneralizedLinearRegression.html
Timestamp: 2020-10-02T14:21:21.946Z

source raw docstring

glm^clj

(glm params)

Source: https://spark.apache.org/docs/latest/api/scala/org/apache/spark/ml/regression/GeneralizedLinearRegression.html Timestamp: 2020-10-02T14:21:21.946Z

Fit a Generalized Linear Model
(see 
Generalized linear model (Wikipedia))
specified by giving a symbolic description of the linear
predictor (link function) and a description of the error distribution (family).
It supports "gaussian", "binomial", "poisson", "gamma" and "tweedie" as family.
Valid link functions for each family is listed below. The first link function of each family
is the default one."gaussian" : "identity", "log", "inverse""binomial" : "logit", "probit", "cloglog""poisson"  : "log", "identity", "sqrt""gamma"    : "inverse", "identity", "log""tweedie"  : power link function specified through "linkPower". The default link power in
 the tweedie family is 1 - variancePower.

Source: https://spark.apache.org/docs/latest/api/scala/org/apache/spark/ml/regression/GeneralizedLinearRegression.html
Timestamp: 2020-10-02T14:21:21.946Z

source raw docstring

isotonic-regression^clj

(isotonic-regression params)

Isotonic regression.Currently implemented using parallelized pool adjacent violators algorithm. Only univariate (single feature) algorithm supported.Uses org.apache.spark.mllib.regression.IsotonicRegression.

Source: https://spark.apache.org/docs/latest/api/scala/org/apache/spark/ml/regression/IsotonicRegression.html Timestamp: 2020-10-02T14:21:22.305Z

Isotonic regression.Currently implemented using parallelized pool adjacent violators algorithm.
Only univariate (single feature) algorithm supported.Uses org.apache.spark.mllib.regression.IsotonicRegression.

Source: https://spark.apache.org/docs/latest/api/scala/org/apache/spark/ml/regression/IsotonicRegression.html
Timestamp: 2020-10-02T14:21:22.305Z

source raw docstring

linear-regression^clj

(linear-regression params)

Linear regression.The learning objective is to minimize the specified loss function, with regularization. This supports two kinds of loss:squaredError (a.k.a squared loss)huber (a hybrid of squared error for relatively small errors and absolute error for relatively large ones, and we estimate the scale parameter from training data)This supports multiple types of regularization:none (a.k.a. ordinary least squares)L2 (ridge regression)L1 (Lasso)L2 + L1 (elastic net)The squared error objective function is: $$ \begin{align} \min_{w}\frac{1}{2n}{\sum_{i=1}^n(X_{i}w - y_{i})^{2} + \lambda\left[\frac{1-\alpha}{2}{||w||{2}}^{2} + \alpha{||w||{1}}\right]} \end{align} $$ The huber objective function is: $$ \begin{align} \min_{w, \sigma}\frac{1}{2n}{\sum_{i=1}^n\left(\sigma + H_m\left(\frac{X_{i}w - y_{i}}{\sigma}\right)\sigma\right) + \frac{1}{2}\lambda {||w||_2}^2} \end{align} $$ where $$ \begin{align} H_m(z) = \begin{cases} z^2, & \text {if } |z| < \epsilon, \ 2\epsilon|z| - \epsilon^2, & \text{otherwise} \end{cases} \end{align} $$ Note: Fitting with huber loss only supports none and L2 regularization.

Source: https://spark.apache.org/docs/latest/api/scala/org/apache/spark/ml/regression/LinearRegression.html Timestamp: 2020-10-02T14:21:22.713Z

Linear regression.The learning objective is to minimize the specified loss function, with regularization.
This supports two kinds of loss:squaredError (a.k.a squared loss)huber (a hybrid of squared error for relatively small errors and absolute error for
 relatively large ones, and we estimate the scale parameter from training data)This supports multiple types of regularization:none (a.k.a. ordinary least squares)L2 (ridge regression)L1 (Lasso)L2 + L1 (elastic net)The squared error objective function is:
  $$
  \begin{align}
  \min_{w}\frac{1}{2n}{\sum_{i=1}^n(X_{i}w - y_{i})^{2} +
  \lambda\left[\frac{1-\alpha}{2}{||w||_{2}}^{2} + \alpha{||w||_{1}}\right]}
  \end{align}
  $$
The huber objective function is:
  $$
  \begin{align}
  \min_{w, \sigma}\frac{1}{2n}{\sum_{i=1}^n\left(\sigma +
  H_m\left(\frac{X_{i}w - y_{i}}{\sigma}\right)\sigma\right) + \frac{1}{2}\lambda {||w||_2}^2}
  \end{align}
  $$
where
  $$
  \begin{align}
  H_m(z) = \begin{cases}
           z^2, & \text {if } |z| < \epsilon, \\
           2\epsilon|z| - \epsilon^2, & \text{otherwise}
           \end{cases}
  \end{align}
  $$
Note: Fitting with huber loss only supports none and L2 regularization.

Source: https://spark.apache.org/docs/latest/api/scala/org/apache/spark/ml/regression/LinearRegression.html
Timestamp: 2020-10-02T14:21:22.713Z

source raw docstring

random-forest-regressor^clj

(random-forest-regressor params)

Random Forest learning algorithm for regression. It supports both continuous and categorical features.

Source: https://spark.apache.org/docs/latest/api/scala/org/apache/spark/ml/regression/RandomForestRegressor.html Timestamp: 2020-10-02T14:21:23.298Z

Random Forest
learning algorithm for regression.
It supports both continuous and categorical features.

Source: https://spark.apache.org/docs/latest/api/scala/org/apache/spark/ml/regression/RandomForestRegressor.html
Timestamp: 2020-10-02T14:21:23.298Z

source raw docstring