Liking cljdoc? Tell your friends :D
Clojure only.

zero-one.geni.ml


aft-survival-regressionclj

(aft-survival-regression params)

Fit a parametric survival regression model named accelerated failure time (AFT) model (see Accelerated failure time model (Wikipedia)) based on the Weibull distribution of the survival time.

Source: https://spark.apache.org/docs/latest/api/scala/org/apache/spark/ml/regression/AFTSurvivalRegression.html Timestamp: 2020-10-02T14:21:20.345Z

Fit a parametric survival regression model named accelerated failure time (AFT) model
(see 
Accelerated failure time model (Wikipedia))
based on the Weibull distribution of the survival time.

Source: https://spark.apache.org/docs/latest/api/scala/org/apache/spark/ml/regression/AFTSurvivalRegression.html
Timestamp: 2020-10-02T14:21:20.345Z
sourceraw docstring

alsclj

(als params)
source

alternating-least-squaresclj

source

approx-nearest-neighboursclj

(approx-nearest-neighbours dataset model key-v n-nearest)
(approx-nearest-neighbours dataset model key-v n-nearest dist-col)
source

approx-similarity-joinclj

(approx-similarity-join dataset-a dataset-b model threshold)
(approx-similarity-join dataset-a dataset-b model threshold dist-col)
source

association-rulesclj

(association-rules model)
source

best-modelclj

(best-model model)
source

binariserclj

(binariser params)
source

binarizerclj

source

binary-classification-evaluatorclj

(binary-classification-evaluator params)
source

binary-summaryclj

(binary-summary model)
source

bisecting-k-meansclj

(bisecting-k-means params)
source

boundariesclj

(boundaries model)
source

bucketed-random-projection-lshclj

(bucketed-random-projection-lsh params)
source

bucketiserclj

(bucketiser params)
source

bucketizerclj

source

category-mapsclj

(category-maps model)
source

category-sizesclj

(category-sizes model)
source

chi-sq-selectorclj

(chi-sq-selector params)
source

chi-square-testclj

(chi-square-test dataframe features-col label-col)
source

cluster-centersclj

(cluster-centers model)
source

clustering-evaluatorclj

(clustering-evaluator params)
source

coefficient-matrixclj

(coefficient-matrix model)
source

coefficientsclj

(coefficients model)
source

corrcljmultimethod

source

count-vectoriserclj

(count-vectoriser params)
source

count-vectorizerclj

source

cross-validatorclj

(cross-validator {:keys [estimator evaluator estimator-param-maps num-folds seed
                         parallelism]})
source

dctclj

source

decision-tree-classifierclj

(decision-tree-classifier params)
source

decision-tree-regressorclj

(decision-tree-regressor params)

Decision tree learning algorithm for regression. It supports both continuous and categorical features.

Source: https://spark.apache.org/docs/latest/api/scala/org/apache/spark/ml/regression/DecisionTreeRegressor.html Timestamp: 2020-10-02T14:21:20.720Z

Decision tree
learning algorithm for regression.
It supports both continuous and categorical features.

Source: https://spark.apache.org/docs/latest/api/scala/org/apache/spark/ml/regression/DecisionTreeRegressor.html
Timestamp: 2020-10-02T14:21:20.720Z
sourceraw docstring

depthclj

(depth model)
source

describe-topicsclj

source

discrete-cosine-transformclj

(discrete-cosine-transform params)
source

distributed?clj

source

elementwise-productclj

(elementwise-product params)
source

estimated-doc-concentrationclj

(estimated-doc-concentration model)
source

evaluateclj

(evaluate dataframe evaluator)
source

feature-hasherclj

(feature-hasher params)
source

feature-importancesclj

(feature-importances model)
source

features-colclj

source

find-frequent-sequential-patternsclj

(find-frequent-sequential-patterns dataset prefix-span)
source

find-patternsclj

source

fitclj

(fit dataframe estimator)
source

fm-classifierclj

(fm-classifier params)
source

fm-regressorclj

(fm-regressor params)

Factorization Machines learning algorithm for regression. It supports normal gradient descent and AdamW solver.The implementation is based upon:

S. Rendle. "Factorization machines" 2010.FM is able to estimate interactions even in problems with huge sparsity (like advertising and recommendation system). FM formula is:

$$ \begin{align} y = w_0 + \sum\limits^n_{i-1} w_i x_i + \sum\limits^n_{i=1} \sum\limits^n_{j=i+1} \langle v_i, v_j \rangle x_i x_j \end{align} $$

First two terms denote global bias and linear term (as same as linear regression), and last term denotes pairwise interactions term. v_i describes the i-th variable with k factors.FM regression model uses MSE loss which can be solved by gradient descent method, and regularization terms like L2 are usually added to the loss function to prevent overfitting.

Source: https://spark.apache.org/docs/latest/api/scala/org/apache/spark/ml/regression/FMRegressor.html Timestamp: 2020-10-02T14:21:21.102Z

Factorization Machines learning algorithm for regression.
It supports normal gradient descent and AdamW solver.The implementation is based upon:

S. Rendle. "Factorization machines" 2010.FM is able to estimate interactions even in problems with huge sparsity
(like advertising and recommendation system).
FM formula is:

  $$
  \begin{align}
  y = w_0 + \sum\limits^n_{i-1} w_i x_i +
    \sum\limits^n_{i=1} \sum\limits^n_{j=i+1} \langle v_i, v_j \rangle x_i x_j
  \end{align}
  $$

First two terms denote global bias and linear term (as same as linear regression),
and last term denotes pairwise interactions term. v_i describes the i-th variable
with k factors.FM regression model uses MSE loss which can be solved by gradient descent method, and
regularization terms like L2 are usually added to the loss function to prevent overfitting.

Source: https://spark.apache.org/docs/latest/api/scala/org/apache/spark/ml/regression/FMRegressor.html
Timestamp: 2020-10-02T14:21:21.102Z
sourceraw docstring

fp-growthclj

(fp-growth params)
source

freq-itemsetsclj

source

frequent-item-setsclj

(frequent-item-sets model)
source

frequent-pattern-growthclj

source

gaussian-mixtureclj

(gaussian-mixture params)
source

gaussians-dfclj

(gaussians-df model)
source

gbt-classifierclj

(gbt-classifier params)
source

gbt-regressorclj

(gbt-regressor params)

Gradient-Boosted Trees (GBTs) learning algorithm for regression. It supports both continuous and categorical features.The implementation is based upon: J.H. Friedman. "Stochastic Gradient Boosting." 1999.Notes on Gradient Boosting vs. TreeBoost:This implementation is for Stochastic Gradient Boosting, not for TreeBoost.Both algorithms learn tree ensembles by minimizing loss functions.TreeBoost (Friedman, 1999) additionally modifies the outputs at tree leaf nodes based on the loss function, whereas the original gradient boosting method does not.When the loss is SquaredError, these methods give the same result, but they could differ for other loss functions.We expect to implement TreeBoost in the future: [https://issues.apache.org/jira/browse/SPARK-4240]

Source: https://spark.apache.org/docs/latest/api/scala/org/apache/spark/ml/regression/GBTRegressor.html Timestamp: 2020-10-02T14:21:21.485Z

Gradient-Boosted Trees (GBTs)
learning algorithm for regression.
It supports both continuous and categorical features.The implementation is based upon: J.H. Friedman. "Stochastic Gradient Boosting." 1999.Notes on Gradient Boosting vs. TreeBoost:This implementation is for Stochastic Gradient Boosting, not for TreeBoost.Both algorithms learn tree ensembles by minimizing loss functions.TreeBoost (Friedman, 1999) additionally modifies the outputs at tree leaf nodes
   based on the loss function, whereas the original gradient boosting method does not.When the loss is SquaredError, these methods give the same result, but they could differ
      for other loss functions.We expect to implement TreeBoost in the future:
   [https://issues.apache.org/jira/browse/SPARK-4240]

Source: https://spark.apache.org/docs/latest/api/scala/org/apache/spark/ml/regression/GBTRegressor.html
Timestamp: 2020-10-02T14:21:21.485Z
sourceraw docstring

generalised-linear-regressionclj

(generalised-linear-regression params)

Fit a Generalized Linear Model (see Generalized linear model (Wikipedia)) specified by giving a symbolic description of the linear predictor (link function) and a description of the error distribution (family). It supports "gaussian", "binomial", "poisson", "gamma" and "tweedie" as family. Valid link functions for each family is listed below. The first link function of each family is the default one."gaussian" : "identity", "log", "inverse""binomial" : "logit", "probit", "cloglog""poisson" : "log", "identity", "sqrt""gamma" : "inverse", "identity", "log""tweedie" : power link function specified through "linkPower". The default link power in the tweedie family is 1 - variancePower.

Source: https://spark.apache.org/docs/latest/api/scala/org/apache/spark/ml/regression/GeneralizedLinearRegression.html Timestamp: 2020-10-02T14:21:21.946Z

Fit a Generalized Linear Model
(see 
Generalized linear model (Wikipedia))
specified by giving a symbolic description of the linear
predictor (link function) and a description of the error distribution (family).
It supports "gaussian", "binomial", "poisson", "gamma" and "tweedie" as family.
Valid link functions for each family is listed below. The first link function of each family
is the default one."gaussian" : "identity", "log", "inverse""binomial" : "logit", "probit", "cloglog""poisson"  : "log", "identity", "sqrt""gamma"    : "inverse", "identity", "log""tweedie"  : power link function specified through "linkPower". The default link power in
 the tweedie family is 1 - variancePower.

Source: https://spark.apache.org/docs/latest/api/scala/org/apache/spark/ml/regression/GeneralizedLinearRegression.html
Timestamp: 2020-10-02T14:21:21.946Z
sourceraw docstring

generalized-linear-regressionclj

(generalized-linear-regression params)

Fit a Generalized Linear Model (see Generalized linear model (Wikipedia)) specified by giving a symbolic description of the linear predictor (link function) and a description of the error distribution (family). It supports "gaussian", "binomial", "poisson", "gamma" and "tweedie" as family. Valid link functions for each family is listed below. The first link function of each family is the default one."gaussian" : "identity", "log", "inverse""binomial" : "logit", "probit", "cloglog""poisson" : "log", "identity", "sqrt""gamma" : "inverse", "identity", "log""tweedie" : power link function specified through "linkPower". The default link power in the tweedie family is 1 - variancePower.

Source: https://spark.apache.org/docs/latest/api/scala/org/apache/spark/ml/regression/GeneralizedLinearRegression.html Timestamp: 2020-10-02T14:21:21.946Z

Fit a Generalized Linear Model
(see 
Generalized linear model (Wikipedia))
specified by giving a symbolic description of the linear
predictor (link function) and a description of the error distribution (family).
It supports "gaussian", "binomial", "poisson", "gamma" and "tweedie" as family.
Valid link functions for each family is listed below. The first link function of each family
is the default one."gaussian" : "identity", "log", "inverse""binomial" : "logit", "probit", "cloglog""poisson"  : "log", "identity", "sqrt""gamma"    : "inverse", "identity", "log""tweedie"  : power link function specified through "linkPower". The default link power in
 the tweedie family is 1 - variancePower.

Source: https://spark.apache.org/docs/latest/api/scala/org/apache/spark/ml/regression/GeneralizedLinearRegression.html
Timestamp: 2020-10-02T14:21:21.946Z
sourceraw docstring

get-features-colclj

(get-features-col model)
source

get-input-colclj

(get-input-col model)
source

get-input-colsclj

(get-input-cols model)
source

get-label-colclj

(get-label-col model)
source

get-num-treesclj

(get-num-trees model)
source

get-output-colclj

(get-output-col model)
source

get-output-colsclj

(get-output-cols model)
source

get-prediction-colclj

(get-prediction-col model)
source

get-probability-colclj

(get-probability-col model)
source

get-raw-prediction-colclj

(get-raw-prediction-col model)
source

get-sizeclj

(get-size model)
source

get-thresholdsclj

(get-thresholds model)
source

glmclj

(glm params)

Fit a Generalized Linear Model (see Generalized linear model (Wikipedia)) specified by giving a symbolic description of the linear predictor (link function) and a description of the error distribution (family). It supports "gaussian", "binomial", "poisson", "gamma" and "tweedie" as family. Valid link functions for each family is listed below. The first link function of each family is the default one."gaussian" : "identity", "log", "inverse""binomial" : "logit", "probit", "cloglog""poisson" : "log", "identity", "sqrt""gamma" : "inverse", "identity", "log""tweedie" : power link function specified through "linkPower". The default link power in the tweedie family is 1 - variancePower.

Source: https://spark.apache.org/docs/latest/api/scala/org/apache/spark/ml/regression/GeneralizedLinearRegression.html Timestamp: 2020-10-02T14:21:21.946Z

Fit a Generalized Linear Model
(see 
Generalized linear model (Wikipedia))
specified by giving a symbolic description of the linear
predictor (link function) and a description of the error distribution (family).
It supports "gaussian", "binomial", "poisson", "gamma" and "tweedie" as family.
Valid link functions for each family is listed below. The first link function of each family
is the default one."gaussian" : "identity", "log", "inverse""binomial" : "logit", "probit", "cloglog""poisson"  : "log", "identity", "sqrt""gamma"    : "inverse", "identity", "log""tweedie"  : power link function specified through "linkPower". The default link power in
 the tweedie family is 1 - variancePower.

Source: https://spark.apache.org/docs/latest/api/scala/org/apache/spark/ml/regression/GeneralizedLinearRegression.html
Timestamp: 2020-10-02T14:21:21.946Z
sourceraw docstring

gmmclj

source

hashing-tfclj

(hashing-tf params)
source

idfclj

(idf params)
source

idf-vectorclj

(idf-vector model)
source

imputerclj

(imputer params)
source

index-to-stringclj

(index-to-string params)
source

input-colclj

source

input-colsclj

source

interactionclj

(interaction params)
source

interceptclj

(intercept model)
source

intercept-vectorclj

(intercept-vector model)
source

is-distributedclj

(is-distributed model)
source

isotonic-regressionclj

(isotonic-regression params)

Isotonic regression.Currently implemented using parallelized pool adjacent violators algorithm. Only univariate (single feature) algorithm supported.Uses org.apache.spark.mllib.regression.IsotonicRegression.

Source: https://spark.apache.org/docs/latest/api/scala/org/apache/spark/ml/regression/IsotonicRegression.html Timestamp: 2020-10-02T14:21:22.305Z

Isotonic regression.Currently implemented using parallelized pool adjacent violators algorithm.
Only univariate (single feature) algorithm supported.Uses org.apache.spark.mllib.regression.IsotonicRegression.

Source: https://spark.apache.org/docs/latest/api/scala/org/apache/spark/ml/regression/IsotonicRegression.html
Timestamp: 2020-10-02T14:21:22.305Z
sourceraw docstring

item-factorsclj

(item-factors model)
source

k-meansclj

(k-means params)
source

kolmogorov-smirnov-testclj

(kolmogorov-smirnov-test dataframe sample-col dist-name params)
source

label-colclj

source

labelsclj

(labels model)
source

latent-dirichlet-allocationclj

source

ldaclj

(lda params)
source

linear-regressionclj

(linear-regression params)

Linear regression.The learning objective is to minimize the specified loss function, with regularization. This supports two kinds of loss:squaredError (a.k.a squared loss)huber (a hybrid of squared error for relatively small errors and absolute error for relatively large ones, and we estimate the scale parameter from training data)This supports multiple types of regularization:none (a.k.a. ordinary least squares)L2 (ridge regression)L1 (Lasso)L2 + L1 (elastic net)The squared error objective function is: $$ \begin{align} \min_{w}\frac{1}{2n}{\sum_{i=1}^n(X_{i}w - y_{i})^{2} + \lambda\left[\frac{1-\alpha}{2}{||w||{2}}^{2} + \alpha{||w||{1}}\right]} \end{align} $$ The huber objective function is: $$ \begin{align} \min_{w, \sigma}\frac{1}{2n}{\sum_{i=1}^n\left(\sigma + H_m\left(\frac{X_{i}w - y_{i}}{\sigma}\right)\sigma\right) + \frac{1}{2}\lambda {||w||_2}^2} \end{align} $$ where $$ \begin{align} H_m(z) = \begin{cases} z^2, & \text {if } |z| < \epsilon, \ 2\epsilon|z| - \epsilon^2, & \text{otherwise} \end{cases} \end{align} $$ Note: Fitting with huber loss only supports none and L2 regularization.

Source: https://spark.apache.org/docs/latest/api/scala/org/apache/spark/ml/regression/LinearRegression.html Timestamp: 2020-10-02T14:21:22.713Z

Linear regression.The learning objective is to minimize the specified loss function, with regularization.
This supports two kinds of loss:squaredError (a.k.a squared loss)huber (a hybrid of squared error for relatively small errors and absolute error for
 relatively large ones, and we estimate the scale parameter from training data)This supports multiple types of regularization:none (a.k.a. ordinary least squares)L2 (ridge regression)L1 (Lasso)L2 + L1 (elastic net)The squared error objective function is:
  $$
  \begin{align}
  \min_{w}\frac{1}{2n}{\sum_{i=1}^n(X_{i}w - y_{i})^{2} +
  \lambda\left[\frac{1-\alpha}{2}{||w||_{2}}^{2} + \alpha{||w||_{1}}\right]}
  \end{align}
  $$
The huber objective function is:
  $$
  \begin{align}
  \min_{w, \sigma}\frac{1}{2n}{\sum_{i=1}^n\left(\sigma +
  H_m\left(\frac{X_{i}w - y_{i}}{\sigma}\right)\sigma\right) + \frac{1}{2}\lambda {||w||_2}^2}
  \end{align}
  $$
where
  $$
  \begin{align}
  H_m(z) = \begin{cases}
           z^2, & \text {if } |z| < \epsilon, \\
           2\epsilon|z| - \epsilon^2, & \text{otherwise}
           \end{cases}
  \end{align}
  $$
Note: Fitting with huber loss only supports none and L2 regularization.

Source: https://spark.apache.org/docs/latest/api/scala/org/apache/spark/ml/regression/LinearRegression.html
Timestamp: 2020-10-02T14:21:22.713Z
sourceraw docstring

linear-svcclj

(linear-svc params)
source

load-methodclj

(load-method cls)
source

load-method?clj

(load-method? method)
source

log-likelihoodclj

(log-likelihood dataset model)
source

log-perplexityclj

(log-perplexity dataset model)
source

logistic-regressionclj

(logistic-regression params)
source

max-absclj

(max-abs model)
source

max-abs-scalerclj

(max-abs-scaler params)
source

meanclj

(mean model)
source

min-hash-lshclj

(min-hash-lsh params)
source

min-max-scalerclj

(min-max-scaler params)
source

mlp-classifierclj

(mlp-classifier params)
source

multiclass-classification-evaluatorclj

(multiclass-classification-evaluator params)
source

multilabel-classification-evaluatorclj

(multilabel-classification-evaluator params)
source

multilayer-perceptron-classifierclj

source

n-gramclj

(n-gram params)
source

naive-bayesclj

(naive-bayes params)
source

normaliserclj

(normaliser params)
source

normalizerclj

source

num-classesclj

(num-classes model)
source

num-featuresclj

(num-features model)
source

num-nodesclj

(num-nodes model)
source

one-hot-encoderclj

(one-hot-encoder params)
source

one-vs-restclj

(one-vs-rest params)
source

original-maxclj

(original-max model)
source

original-minclj

(original-min model)
source

output-colclj

source

output-colsclj

source

param-gridclj

(param-grid grids)
source

paramsclj

(params stage)
source

pcclj

(pc model)
source

pcaclj

(pca params)
source

piclj

(pi model)
source

pipelineclj

(pipeline & stages)
source

polynomial-expansionclj

(polynomial-expansion params)
source

power-iteration-clusteringclj

(power-iteration-clustering params)
source

prediction-colclj

source

prefix-spanclj

(prefix-span params)
source

principal-componentsclj

source

probability-colclj

source

quantile-discretiserclj

(quantile-discretiser params)
source

quantile-discretizerclj

source

random-forest-classifierclj

(random-forest-classifier params)
source

random-forest-regressorclj

(random-forest-regressor params)

Random Forest learning algorithm for regression. It supports both continuous and categorical features.

Source: https://spark.apache.org/docs/latest/api/scala/org/apache/spark/ml/regression/RandomForestRegressor.html Timestamp: 2020-10-02T14:21:23.298Z

Random Forest
learning algorithm for regression.
It supports both continuous and categorical features.

Source: https://spark.apache.org/docs/latest/api/scala/org/apache/spark/ml/regression/RandomForestRegressor.html
Timestamp: 2020-10-02T14:21:23.298Z
sourceraw docstring

ranking-evaluatorclj

(ranking-evaluator params)
source

raw-prediction-colclj

source

read-stage!clj

(read-stage! model-cls path)
source

recommend-for-all-itemsclj

(recommend-for-all-items model num-users)
source

recommend-for-all-usersclj

(recommend-for-all-users model num-items)
source

recommend-for-item-subsetclj

(recommend-for-item-subset model items-df num-users)
source

recommend-for-user-subsetclj

(recommend-for-user-subset model users-df num-items)
source

recommend-itemsclj

(recommend-items model num-items)
(recommend-items model users-df num-items)
source

recommend-usersclj

(recommend-users model num-users)
(recommend-users model items-df num-users)
source

regex-tokeniserclj

(regex-tokeniser params)
source

regex-tokenizerclj

source

regression-evaluatorclj

(regression-evaluator params)
source

robust-scalerclj

(robust-scaler params)
source

root-nodeclj

(root-node model)
source

scaleclj

(scale model)
source

sql-transformerclj

(sql-transformer params)
source

stagesclj

(stages model)
source

standard-scalerclj

(standard-scaler params)
source

stdclj

(std model)
source

stop-words-removerclj

(stop-words-remover params)
source

string-indexerclj

(string-indexer params)
source

summaryclj

(summary model)
source

supported-optimisersclj

source

supported-optimizersclj

(supported-optimizers model)
source

surrogate-dfclj

(surrogate-df model)
source

thetaclj

(theta model)
source

thresholdsclj

source

tokeniserclj

(tokeniser params)
source

tokenizerclj

source

total-num-nodesclj

(total-num-nodes model)
source

train-validation-splitclj

(train-validation-split {:keys [estimator evaluator estimator-param-maps seed
                                parallelism]})
source

transformclj

(transform dataframe transformer)
source

tree-weightsclj

(tree-weights model)
source

treesclj

(trees model)
source

uidclj

(uid model)
source

user-factorsclj

(user-factors model)
source

vector->arrayclj

source

vector-assemblerclj

(vector-assembler params)
source

vector-indexerclj

(vector-indexer params)
source

vector-size-hintclj

(vector-size-hint params)
source

vector-to-arrayclj

(vector-to-array expr)
(vector-to-array expr dtype)
source

vocab-sizeclj

(vocab-size model)
source

vocabularyclj

(vocabulary model)
source

weightsclj

(weights model)
source

word2vecclj

(word2vec params)
source

write-native-model!clj

(write-native-model! model path)
source

write-stage!clj

(write-stage! stage path)
(write-stage! stage path options)
source

xgboost-classifierclj

(xgboost-classifier params)
source

xgboost-regressorclj

(xgboost-regressor params)
source

cljdoc is a website building & hosting documentation for Clojure/Script libraries

× close