zero-one.geni.ml.classification

Liking cljdoc? Tell your friends :D

Clojure only.

decision-tree-classifier
fm-classifier
gbt-classifier
linear-svc
logistic-regression
mlp-classifier
multilayer-perceptron-classifier
naive-bayes
one-vs-rest
random-forest-classifier

decision-tree-classifier^clj

(decision-tree-classifier params)

Decision tree learning algorithm (http://en.wikipedia.org/wiki/Decision_tree_learning) for classification. It supports both binary and multiclass labels, as well as both continuous and categorical features.

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/ml/classification/DecisionTreeClassifier.html

Timestamp: 2020-10-19T01:55:55.948Z

Decision tree learning algorithm (http://en.wikipedia.org/wiki/Decision_tree_learning)
for classification.
It supports both binary and multiclass labels, as well as both continuous and categorical
features.


Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/ml/classification/DecisionTreeClassifier.html

Timestamp: 2020-10-19T01:55:55.948Z

raw docstring

fm-classifier^clj

(fm-classifier params)

Factorization Machines learning algorithm for classification. It supports normal gradient descent and AdamW solver.

The implementation is based upon:

S. Rendle. "Factorization machines" 2010.

FM is able to estimate interactions even in problems with huge sparsity (like advertising and recommendation system). FM formula is:

FM classification model uses logistic loss which can be solved by gradient descent method, and regularization terms like L2 are usually added to the loss function to prevent overfitting.

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/ml/classification/FMClassifier.html

Timestamp: 2020-10-19T01:55:56.340Z

Factorization Machines learning algorithm for classification.
It supports normal gradient descent and AdamW solver.

The implementation is based upon:

S. Rendle. "Factorization machines" 2010.

FM is able to estimate interactions even in problems with huge sparsity
(like advertising and recommendation system).
FM formula is:


FM classification model uses logistic loss which can be solved by gradient descent method, and
regularization terms like L2 are usually added to the loss function to prevent overfitting.


Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/ml/classification/FMClassifier.html

Timestamp: 2020-10-19T01:55:56.340Z

raw docstring

gbt-classifier^clj

(gbt-classifier params)

Gradient-Boosted Trees (GBTs) (http://en.wikipedia.org/wiki/Gradient_boosting) learning algorithm for classification. It supports binary labels, as well as both continuous and categorical features.

The implementation is based upon: J.H. Friedman. "Stochastic Gradient Boosting." 1999.

Notes on Gradient Boosting vs. TreeBoost:

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/ml/classification/GBTClassifier.html

Timestamp: 2020-10-19T01:55:56.899Z

Gradient-Boosted Trees (GBTs) (http://en.wikipedia.org/wiki/Gradient_boosting)
learning algorithm for classification.
It supports binary labels, as well as both continuous and categorical features.

The implementation is based upon: J.H. Friedman. "Stochastic Gradient Boosting." 1999.

Notes on Gradient Boosting vs. TreeBoost:

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/ml/classification/GBTClassifier.html

Timestamp: 2020-10-19T01:55:56.899Z

raw docstring

linear-svc^clj

(linear-svc params)

Linear SVM Classifier

This binary classifier optimizes the Hinge Loss using the OWLQN optimizer. Only supports L2 regularization currently.

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/ml/classification/LinearSVC.html

Timestamp: 2020-10-19T01:55:57.279Z

  Linear SVM Classifier

This binary classifier optimizes the Hinge Loss using the OWLQN optimizer.
Only supports L2 regularization currently.

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/ml/classification/LinearSVC.html

Timestamp: 2020-10-19T01:55:57.279Z

raw docstring

logistic-regression^clj

(logistic-regression params)

Logistic regression. Supports:

This class supports fitting traditional logistic regression model by LBFGS/OWLQN and bound (box) constrained logistic regression model by LBFGSB.

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/ml/classification/LogisticRegression.html

Timestamp: 2020-10-19T01:55:57.830Z

Logistic regression. Supports:

This class supports fitting traditional logistic regression model by LBFGS/OWLQN and
bound (box) constrained logistic regression model by LBFGSB.


Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/ml/classification/LogisticRegression.html

Timestamp: 2020-10-19T01:55:57.830Z

raw docstring

mlp-classifier^clj

(mlp-classifier params)

Classifier trainer based on the Multilayer Perceptron. Each layer has sigmoid activation function, output layer has softmax. Number of inputs has to be equal to the size of feature vectors. Number of outputs has to be equal to the total number of labels.

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/ml/classification/MultilayerPerceptronClassifier.html

Timestamp: 2020-10-19T01:55:58.225Z

Classifier trainer based on the Multilayer Perceptron.
Each layer has sigmoid activation function, output layer has softmax.
Number of inputs has to be equal to the size of feature vectors.
Number of outputs has to be equal to the total number of labels.

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/ml/classification/MultilayerPerceptronClassifier.html

Timestamp: 2020-10-19T01:55:58.225Z

raw docstring

multilayer-perceptron-classifier^clj

(multilayer-perceptron-classifier params)

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/ml/classification/MultilayerPerceptronClassifier.html

Timestamp: 2020-10-19T01:55:58.225Z

Classifier trainer based on the Multilayer Perceptron.
Each layer has sigmoid activation function, output layer has softmax.
Number of inputs has to be equal to the size of feature vectors.
Number of outputs has to be equal to the total number of labels.

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/ml/classification/MultilayerPerceptronClassifier.html

Timestamp: 2020-10-19T01:55:58.225Z

raw docstring

naive-bayes^clj

(naive-bayes params)

Naive Bayes Classifiers. It supports Multinomial NB (see here) which can handle finitely supported discrete data. For example, by converting documents into TF-IDF vectors, it can be used for document classification. By making every vector a binary (0/1) data, it can also be used as Bernoulli NB (see here). The input feature values for Multinomial NB and Bernoulli NB must be nonnegative. Since 3.0.0, it supports Complement NB which is an adaptation of the Multinomial NB. Specifically, Complement NB uses statistics from the complement of each class to compute the model's coefficients The inventors of Complement NB show empirically that the parameter estimates for CNB are more stable than those for Multinomial NB. Like Multinomial NB, the input feature values for Complement NB must be nonnegative. Since 3.0.0, it also supports Gaussian NB (see here) which can handle continuous data.

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/ml/classification/NaiveBayes.html

Timestamp: 2020-10-19T01:55:58.596Z

Naive Bayes Classifiers.
It supports Multinomial NB
(see 
here)
which can handle finitely supported discrete data. For example, by converting documents into
TF-IDF vectors, it can be used for document classification. By making every vector a
binary (0/1) data, it can also be used as Bernoulli NB
(see 
here).
The input feature values for Multinomial NB and Bernoulli NB must be nonnegative.
Since 3.0.0, it supports Complement NB which is an adaptation of the Multinomial NB. Specifically,
Complement NB uses statistics from the complement of each class to compute the model's coefficients
The inventors of Complement NB show empirically that the parameter estimates for CNB are more stable
than those for Multinomial NB. Like Multinomial NB, the input feature values for Complement NB must
be nonnegative.
Since 3.0.0, it also supports Gaussian NB
(see 
here)
which can handle continuous data.


Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/ml/classification/NaiveBayes.html

Timestamp: 2020-10-19T01:55:58.596Z

raw docstring

one-vs-rest^clj

(one-vs-rest params)

Reduction of Multiclass Classification to Binary Classification. Performs reduction using one against all strategy. For a multiclass classification with k classes, train k models (one per class). Each example is scored against all k models and the model with highest score is picked to label the example.

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/ml/classification/OneVsRest.html

Timestamp: 2020-10-19T01:55:58.960Z

Reduction of Multiclass Classification to Binary Classification.
Performs reduction using one against all strategy.
For a multiclass classification with k classes, train k models (one per class).
Each example is scored against all k models and the model with highest score
is picked to label the example.


Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/ml/classification/OneVsRest.html

Timestamp: 2020-10-19T01:55:58.960Z

raw docstring

random-forest-classifier^clj

(random-forest-classifier params)

Random Forest learning algorithm for classification. It supports both binary and multiclass labels, as well as both continuous and categorical features.

Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/ml/classification/RandomForestClassifier.html

Timestamp: 2020-10-19T01:55:59.351Z

Random Forest learning algorithm for
classification.
It supports both binary and multiclass labels, as well as both continuous and categorical
features.


Source: https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/ml/classification/RandomForestClassifier.html

Timestamp: 2020-10-19T01:55:59.351Z

raw docstring