Liking cljdoc? Tell your friends :D

clj-ml.classifiers

This namespace contains several functions for building classifiers using different classification algorithms: Bayes networks, multilayer perceptron, decision tree or support vector machines are available. Some of these classifiers have incremental versions so they can be built without having all the dataset instances in memory.

Functions for evaluating the classifiers built using cross validation or a training set are also provided.

A sample use of the API for classifiers is shown below:

(use 'clj-ml.classifiers)

; Building a classifier using a C4.5 decision tree (def classifier (make-classifier :decision-tree :c45))

; We set the class attribute for the loaded dataset. ; dataset is supposed to contain a set of instances. (dataset-set-class dataset 4)

; Training the classifier (classifier-train classifier dataset)

; We evaluate the classifier using a test dataset (def evaluation (classifier-evaluate classifier :dataset dataset trainingset))

; We retrieve some data from the evaluation result (:kappa evaluation) (:root-mean-squared-error evaluation) (:precision evaluation)

; A trained classifier can be used to classify new instances (def to-classify (make-instance dataset {:class :Iris-versicolor :petalwidth 0.2 :petallength 1.4 :sepalwidth 3.5 :sepallength 5.1}))

; We retrieve the index of the class value assigned by the classifier (classifier-classify classifier to-classify)

; We retrieve a symbol with the value assigned by the classifier ; and assigns it to a certain instance (classifier-label classifier to-classify)

A classifier can also be trained using cross-validation:

(classifier-evaluate classifier :cross-validation dataset 10)

Finally a classifier can be stored in a file for later use:

(use 'clj-ml.utils)

(serialize-to-file classifier "/Users/antonio.garrote/Desktop/classifier.bin")

This namespace contains several functions for building classifiers using different
classification algorithms: Bayes networks, multilayer perceptron, decision tree or
support vector machines are available. Some of these classifiers have incremental
versions so they can be built without having all the dataset instances in memory.

Functions for evaluating the classifiers built using cross validation or a training
set are also provided.

A sample use of the API for classifiers is shown below:

 (use 'clj-ml.classifiers)

 ; Building a classifier using a  C4.5 decision tree
 (def *classifier* (make-classifier :decision-tree :c45))

 ; We set the class attribute for the loaded dataset.
 ; *dataset* is supposed to contain a set of instances.
 (dataset-set-class *dataset* 4)

 ; Training the classifier
 (classifier-train *classifier* *dataset*)

 ; We evaluate the classifier using a test dataset
 (def *evaluation*   (classifier-evaluate *classifier* :dataset *dataset* *trainingset*))

 ; We retrieve some data from the evaluation result
 (:kappa *evaluation*)
 (:root-mean-squared-error *evaluation*)
 (:precision *evaluation*)

 ; A trained classifier can be used to classify new instances
 (def *to-classify* (make-instance *dataset*  {:class :Iris-versicolor
                                               :petalwidth 0.2
                                               :petallength 1.4
                                               :sepalwidth 3.5
                                               :sepallength 5.1}))

 ; We retrieve the index of the class value assigned by the classifier
 (classifier-classify *classifier* *to-classify*)

 ; We retrieve a symbol with the value assigned by the classifier
 ; and assigns it to a certain instance
 (classifier-label *classifier* *to-classify*)

A classifier can also be trained using cross-validation:

 (classifier-evaluate *classifier* :cross-validation *dataset* 10)

Finally a classifier can be stored in a file for later use:

 (use 'clj-ml.utils)

 (serialize-to-file *classifier*
  "/Users/antonio.garrote/Desktop/classifier.bin")
raw docstring

classifier-classifyclj

(classifier-classify classifier instance)

Classifies an instance using the provided classifier. Returns the class as a keyword.

Classifies an instance using the provided classifier. Returns the
class as a keyword.
sourceraw docstring

classifier-copyclj

(classifier-copy classifier)

Performs a deep copy of the classifier

Performs a deep copy of the classifier
sourceraw docstring

classifier-copy-and-trainclj

(classifier-copy-and-train classifier dataset)

Performs a deep copy of the classifier, trains the copy, and returns it.

Performs a deep copy of the classifier, trains the copy, and returns it.
sourceraw docstring

classifier-evaluatecljmultimethod

Evaluates a trained classifier using the provided dataset or cross-validation. The first argument must be the classifier to evaluate, the second argument is the kind of evaluation to do. Two possible evaluations ara availabe: dataset and cross-validations. The values for the second argument can be:

  • :dataset
  • :cross-validation
  • :dataset

If dataset evaluation is desired, the function call must receive as the second parameter the keyword :dataset and as third and fourth parameters the original dataset used to build the classifier and the training data:

(classifier-evaluate classifier :dataset training evaluation)

  • :cross-validation

If cross-validation is desired, the function call must receive as the second parameter the keyword :cross-validation and as third and fourth parameters the dataset where for training and the number of folds.

(classifier-evaluate classifier :cross-validation training 10)

An optional seed can be provided for generation of the cross validation folds.

(classifier-evaluate classifier :cross-validation training 10 {:random-seed 29})

The metrics available in the evaluation are listed below:

  • :correct Number of instances correctly classified
  • :incorrect Number of instances incorrectly evaluated
  • :unclassified Number of instances incorrectly classified
  • :percentage-correct Percentage of correctly classified instances
  • :percentage-incorrect Percentage of incorrectly classified instances
  • :percentage-unclassified Percentage of not classified instances
  • :error-rate
  • :mean-absolute-error
  • :relative-absolute-error
  • :root-mean-squared-error
  • :root-relative-squared-error
  • :correlation-coefficient
  • :average-cost
  • :kappa The kappa statistic
  • :kb-information
  • :kb-mean-information
  • :kb-relative-information
  • :sf-entropy-gain
  • :sf-mean-entropy-gain
  • :roc-area
  • :false-positive-rate
  • :false-negative-rate
  • :f-measure
  • :precision
  • :recall
  • :evaluation-object The underlying Weka's Java object containing the evaluation
Evaluates a trained classifier using the provided dataset or cross-validation.
The first argument must be the classifier to evaluate, the second argument is
the kind of evaluation to do.
Two possible evaluations ara availabe: dataset and cross-validations. The values
for the second argument can be:

 - :dataset
 - :cross-validation

 * :dataset

 If dataset evaluation is desired, the function call must receive as the second
 parameter the keyword :dataset and as third and fourth parameters the original
 dataset used to build the classifier and the training data:

   (classifier-evaluate *classifier* :dataset *training* *evaluation*)

 * :cross-validation

 If cross-validation is desired, the function call must receive as the second
 parameter the keyword :cross-validation and as third and fourth parameters the dataset
 where for training and the number of folds.

   (classifier-evaluate *classifier* :cross-validation *training* 10)
 
 An optional seed can be provided for generation of the cross validation folds.

   (classifier-evaluate *classifier* :cross-validation *training* 10 {:random-seed 29})

 The metrics available in the evaluation are listed below:

 - :correct
     Number of instances correctly classified
 - :incorrect
     Number of instances incorrectly evaluated
 - :unclassified
     Number of instances incorrectly classified
 - :percentage-correct
     Percentage of correctly classified instances
 - :percentage-incorrect
     Percentage of incorrectly classified instances
 - :percentage-unclassified
     Percentage of not classified instances
 - :error-rate
 - :mean-absolute-error
 - :relative-absolute-error
 - :root-mean-squared-error
 - :root-relative-squared-error
 - :correlation-coefficient
 - :average-cost
 - :kappa
     The kappa statistic
 - :kb-information
 - :kb-mean-information
 - :kb-relative-information
 - :sf-entropy-gain
 - :sf-mean-entropy-gain
 - :roc-area
 - :false-positive-rate
 - :false-negative-rate
 - :f-measure
 - :precision
 - :recall
 - :evaluation-object
     The underlying Weka's Java object containing the evaluation
sourceraw docstring

classifier-labelclj

(classifier-label classifier instance)

Classifies and assign a label to a dataset instance. The function returns the newly classified instance. This call is destructive, the instance passed as an argument is modified.

Classifies and assign a label to a dataset instance.
The function returns the newly classified instance. This call is
destructive, the instance passed as an argument is modified.
sourceraw docstring

classifier-predict-numericclj

(classifier-predict-numeric classifier instance)

Predicts the class attribute of an instance using the provided classifier. Returns the value as a floating-point value (e.g., for regression).

Predicts the class attribute of an instance using the provided
classifier. Returns the value as a floating-point value (e.g., for
regression).
sourceraw docstring

classifier-predict-probabilityclj

(classifier-predict-probability classifier instance)

Classifies an instance using the provided classifier. Returns the probability distribution across classes for the instance

Classifies an instance using the provided classifier. Returns the
probability distribution across classes for the instance
sourceraw docstring

classifier-trainclj

(classifier-train classifier dataset)

Trains a classifier with the given dataset as the training data.

Trains a classifier with the given dataset as the training data.
sourceraw docstring

classifier-updateclj

(classifier-update classifier instance-s)

If the classifier is updateable it updates the classifier with the given instance or set of instances.

If the classifier is updateable it updates the classifier with the given instance or set of instances.
sourceraw docstring

make-classifiercljmultimethod

Creates a new classifier for the given kind algorithm and options.

The first argument identifies the kind of classifier and the second argument the algorithm to use, e.g. :decision-tree :c45.

The classifiers currently supported are:

  • :lazy :ibk
  • :decision-tree :c45
  • :decision-tree :boosted-stump
  • :decision-tree :M5P
  • :decision-tree :random-forest
  • :decision-tree :rotation-forest
  • :bayes :naive
  • :neural-network :multilayer-perceptron
  • :support-vector-machine :smo
  • :regression :linear
  • :regression :logistic
  • :regression :pace
  • :regression :pls

Optionally, a map of options can also be passed as an argument with a set of classifier specific options.

This is the description of the supported classifiers and the accepted option parameters for each of them:

  • :lazy :ibk

    K-nearest neighbor classification.

    Parameters:

    • :inverse-weighted Neighbors will be weighted by the inverse of their distance when voting. (default equal weighting) Sample value: true
    • :similarity-weighted Neighbors will be weighted by their similarity when voting. (default equal weighting) Sample value: true
    • :no-normalization Turns off normalization. Sample value: true
    • :num-neighbors Set the number of nearest neighbors to use in prediction (default 1) Sample value: 3
  • :decision-tree :c45

    A classifier building a pruned or unpruned C 4.5 decision tree using Weka J 4.8 implementation.

    Parameters:

    • :unpruned Use unpruned tree. Sample value: true
    • :reduce-error-pruning Sample value: true
    • :only-binary-splits Sample value: true
    • :no-raising Sample value: true
    • :no-cleanup Sample value: true
    • :laplace-smoothing For predicted probabilities. Sample value: true
    • :pruning-confidence Threshold for pruning. Default value: 0.25
    • :minimum-instances Minimum number of instances per leave. Default value: 2
    • :pruning-number-folds Set number of folds for reduced error pruning. Default value: 3
    • :random-seed Seed for random data shuffling. Default value: 1
  • :bayes :naive

    Classifier based on the Bayes' theorem with strong independence assumptions, among the probabilistic variables.

    Parameters:

    • :kernel-estimator Use kernel desity estimator rather than normal. Sample value: true
    • :supervised-discretization Use supervised discretization to to process numeric attributes (see :supervised-discretize filter in clj-ml.filters/make-filter function). Sample value: true
  • :neural-network :multilayer-perceptron

    Classifier built using a feedforward artificial neural network with three or more layers of neurons and nonlinear activation functions. It is able to distinguish data that is not linearly separable.

    Parameters:

    • :no-nominal-to-binary A :nominal-to-binary filter will not be applied by default. (see :supervised-nominal-to-binary filter in clj-ml.filters/make-filter function). Default value: false
    • :no-numeric-normalization A numeric class will not be normalized. Default value: false
    • :no-normalization No attribute will be normalized. Default value: false
    • :no-reset Reseting the network will not be allowed. Default value: false
    • :learning-rate-decay Learning rate decay will occur. Default value: false
    • :learning-rate Learning rate for the backpropagation algorithm. Value should be between [0,1]. Default value: 0.3
    • :momentum Momentum rate for the backpropagation algorithm. Value shuld be between [0,1]. Default value: 0.2
    • :epochs Number of iteration to train through. Default value: 500
    • :percentage-validation-set Percentage size of validation set to use to terminate training. If it is not zero it takes precende over the number of epochs to finish training. Values should be between [0,100]. Default value: 0
    • :random-seed Value of the seed for the random generator. Values should be longs greater than
      1. Default value: 1
    • :threshold-number-errors The consequetive number of errors allowed for validation testing before the network terminates. Values should be greater thant 0. Default value: 20
  • :support-vector-machine :smo

    Support vector machine (SVM) classifier built using the sequential minimal optimization (SMO) training algorithm.

    Parameters:

    • :fit-logistic-models Fit logistic models to SVM outputs. Default value :false
    • :complexity-constant The complexity constance. Default value: 1
    • :tolerance Tolerance parameter. Default value: 1.0e-3
    • :epsilon-roundoff Epsilon round-off error. Default value: 1.0e-12
    • :folds-for-cross-validation Number of folds for the internal cross-validation. Sample value: 10
    • :random-seed Value of the seed for the random generator. Values should be longs greater than
      1. Default value: 1
  • :support-vector-machine :libsvm

    TODO

  • :regression :linear

Parameters:

 - :attribute-selection
     Set the attribute selection method to use. 1 = None, 2 = Greedy. (default 0 = M5' method)
 - :keep-colinear
     Do not try to eliminate colinear attributes.
 - :ridge
     Set ridge parameter (default 1.0e-8).
  • :regression :logistic

Parameters:

 - :max-iterations
     Set the maximum number of iterations (default -1, until convergence).
 - :ridge
     Set the ridge in the log-likelihood.
Creates a new classifier for the given kind algorithm and options.

The first argument identifies the kind of classifier and the second
argument the algorithm to use, e.g. :decision-tree :c45.

The classifiers currently supported are:

  - :lazy :ibk
  - :decision-tree :c45
  - :decision-tree :boosted-stump
  - :decision-tree :M5P
  - :decision-tree :random-forest
  - :decision-tree :rotation-forest
  - :bayes :naive
  - :neural-network :multilayer-perceptron
  - :support-vector-machine :smo
  - :regression :linear
  - :regression :logistic
  - :regression :pace
  - :regression :pls

Optionally, a map of options can also be passed as an argument with
a set of classifier specific options.

This is the description of the supported classifiers and the accepted
option parameters for each of them:

 * :lazy :ibk

   K-nearest neighbor classification.

   Parameters:

     - :inverse-weighted
         Neighbors will be weighted by the inverse of their distance when voting. (default equal weighting)
         Sample value: true
     - :similarity-weighted
         Neighbors will be weighted by their similarity when voting. (default equal weighting)
         Sample value: true
     - :no-normalization
         Turns off normalization.
         Sample value: true
     - :num-neighbors
         Set the number of nearest neighbors to use in prediction (default 1)
         Sample value: 3

 * :decision-tree :c45

   A classifier building a pruned or unpruned C 4.5 decision tree using
   Weka J 4.8 implementation.

   Parameters:

     - :unpruned
         Use unpruned tree. Sample value: true
     - :reduce-error-pruning
         Sample value: true
     - :only-binary-splits
         Sample value: true
     - :no-raising
         Sample value: true
     - :no-cleanup
         Sample value: true
     - :laplace-smoothing
         For predicted probabilities. Sample value: true
     - :pruning-confidence
         Threshold for pruning. Default value: 0.25
     - :minimum-instances
         Minimum number of instances per leave. Default value: 2
     - :pruning-number-folds
         Set number of folds for reduced error pruning. Default value: 3
     - :random-seed
         Seed for random data shuffling. Default value: 1

 * :bayes :naive

   Classifier based on the Bayes' theorem with strong independence assumptions, among the
   probabilistic variables.

   Parameters:

     - :kernel-estimator
         Use kernel desity estimator rather than normal. Sample value: true
     - :supervised-discretization
         Use supervised discretization to to process numeric attributes (see :supervised-discretize
         filter in clj-ml.filters/make-filter function). Sample value: true

 * :neural-network :multilayer-perceptron

   Classifier built using a feedforward artificial neural network with three or more layers
   of neurons and nonlinear activation functions. It is able to distinguish data that is not
   linearly separable.

   Parameters:

     - :no-nominal-to-binary
         A :nominal-to-binary filter will not be applied by default. (see :supervised-nominal-to-binary
         filter in clj-ml.filters/make-filter function). Default value: false
     - :no-numeric-normalization
         A numeric class will not be normalized. Default value: false
     - :no-normalization
         No attribute will be normalized. Default value: false
     - :no-reset
         Reseting the network will not be allowed. Default value: false
     - :learning-rate-decay
         Learning rate decay will occur. Default value: false
     - :learning-rate
         Learning rate for the backpropagation algorithm. Value should be between [0,1].
         Default value: 0.3
     - :momentum
         Momentum rate for the backpropagation algorithm. Value shuld be between [0,1].
         Default value: 0.2
     - :epochs
         Number of iteration to train through. Default value: 500
     - :percentage-validation-set
         Percentage size of validation set to use to terminate training. If it is not zero
         it takes precende over the number of epochs to finish training. Values should be
         between [0,100]. Default value: 0
     - :random-seed
         Value of the seed for the random generator. Values should be longs greater than
         0. Default value: 1
     - :threshold-number-errors
         The consequetive number of errors allowed for validation testing before the network
         terminates. Values should be greater thant 0. Default value: 20

 * :support-vector-machine :smo

   Support vector machine (SVM) classifier built using the sequential minimal optimization (SMO)
   training algorithm.

   Parameters:

     - :fit-logistic-models
         Fit logistic models to SVM outputs. Default value :false
     - :complexity-constant
         The complexity constance. Default value: 1
     - :tolerance
         Tolerance parameter. Default value: 1.0e-3
     - :epsilon-roundoff
         Epsilon round-off error. Default value: 1.0e-12
     - :folds-for-cross-validation
         Number of folds for the internal cross-validation. Sample value: 10
     - :random-seed
         Value of the seed for the random generator. Values should be longs greater than
         0. Default value: 1

  * :support-vector-machine :libsvm

    TODO

  * :regression :linear

   Parameters:

     - :attribute-selection
         Set the attribute selection method to use. 1 = None, 2 = Greedy. (default 0 = M5' method)
     - :keep-colinear
         Do not try to eliminate colinear attributes.
     - :ridge
         Set ridge parameter (default 1.0e-8).

  * :regression :logistic

   Parameters:

     - :max-iterations
         Set the maximum number of iterations (default -1, until convergence).
     - :ridge
         Set the ridge in the log-likelihood.
sourceraw docstring

make-classifier-withclj

(make-classifier-with kind algorithm classifier-class options)
source

cljdoc is a website building & hosting documentation for Clojure/Script libraries

× close