Liking cljdoc? Tell your friends :D

clj-ml-dev.classifiers

This namespace contains several functions for building classifiers using different classification algorithms: Bayes networks, multilayer perceptron, decision tree or support vector machines are available. Some of these classifiers have incremental versions so they can be built without having all the dataset instances in memory.

Functions for evaluating the classifiers built using cross validation or a training set are also provided.

A sample use of the API for classifiers is shown below:

(use 'clj-ml-dev.classifiers)

; Building a classifier using a C4.5 decision tree (def classifier (make-classifier :decision-tree :c45))

; We set the class attribute for the loaded dataset. ; dataset is supposed to contain a set of instances. (dataset-set-class dataset 4)

; Training the classifier (classifier-train classifier dataset)

; We evaluate the classifier using a test dataset (def evaluation (classifier-evaluate classifier :dataset dataset trainingset))

; We retrieve some data from the evaluation result (:kappa evaluation) (:root-mean-squared-error evaluation) (:precision evaluation)

; A trained classifier can be used to classify new instances (def to-classify (make-instance dataset {:class :Iris-versicolor :petalwidth 0.2 :petallength 1.4 :sepalwidth 3.5 :sepallength 5.1}))

; We retrieve the index of the class value assigned by the classifier (classifier-classify classifier to-classify)

; We retrieve a symbol with the value assigned by the classifier ; and assigns it to a certain instance (classifier-label classifier to-classify)

A classifier can also be trained using cross-validation:

(classifier-evaluate classifier :cross-validation dataset 10)

Finally a classifier can be stored in a file for later use:

(use 'clj-ml-dev.utils)

(serialize-to-file classifier "/Users/antonio.garrote/Desktop/classifier.bin")

This namespace contains several functions for building classifiers using different
classification algorithms: Bayes networks, multilayer perceptron, decision tree or
support vector machines are available. Some of these classifiers have incremental
versions so they can be built without having all the dataset instances in memory.

Functions for evaluating the classifiers built using cross validation or a training
set are also provided.

A sample use of the API for classifiers is shown below:

 (use 'clj-ml-dev.classifiers)

 ; Building a classifier using a  C4.5 decision tree
 (def *classifier* (make-classifier :decision-tree :c45))

 ; We set the class attribute for the loaded dataset.
 ; *dataset* is supposed to contain a set of instances.
 (dataset-set-class *dataset* 4)

 ; Training the classifier
 (classifier-train *classifier* *dataset*)

 ; We evaluate the classifier using a test dataset
 (def *evaluation*   (classifier-evaluate *classifier* :dataset *dataset* *trainingset*))

 ; We retrieve some data from the evaluation result
 (:kappa *evaluation*)
 (:root-mean-squared-error *evaluation*)
 (:precision *evaluation*)

 ; A trained classifier can be used to classify new instances
 (def *to-classify* (make-instance *dataset*  {:class :Iris-versicolor
                                               :petalwidth 0.2
                                               :petallength 1.4
                                               :sepalwidth 3.5
                                               :sepallength 5.1}))

 ; We retrieve the index of the class value assigned by the classifier
 (classifier-classify *classifier* *to-classify*)

 ; We retrieve a symbol with the value assigned by the classifier
 ; and assigns it to a certain instance
 (classifier-label *classifier* *to-classify*)

A classifier can also be trained using cross-validation:

 (classifier-evaluate *classifier* :cross-validation *dataset* 10)

Finally a classifier can be stored in a file for later use:

 (use 'clj-ml-dev.utils)

 (serialize-to-file *classifier*
  "/Users/antonio.garrote/Desktop/classifier.bin")
raw docstring

classifier-classifyclj

(classifier-classify classifier instance)

Classifies an instance using the provided classifier. The value returned is the numeric attribute of that value for the list of valid values for the class.

Classifies an instance using the provided classifier.
The value returned is the numeric attribute of that value for
the list of valid values for the class.
raw docstring

classifier-evaluatecljmultimethod

Evaluates a trained classifier using the provided dataset or cross-validation. The first argument must be the classifier to evaluate, the second argument is the kind of evaluation to do. Two possible evaluations ara availabe: dataset and cross-validations. The values for the second argument can be:

  • :dataset
  • :cross-validation
  • :dataset

If dataset evaluation is desired, the function call must receive as the second parameter the keyword :dataset and as third and fourth parameters the original dataset used to build the classifier and the training data:

(classifier-evaluate classifier :dataset training evaluation)

  • :cross-validation

If cross-validation is desired, the function call must receive as the second parameter the keyword :cross-validation and as third and fourth parameters the dataset where for training and the number of folds.

(classifier-evaluate classifier :cross-validation training 10)

The metrics available in the evaluation are listed below:

  • :correct Number of instances correctly classified
  • :incorrect Number of instances incorrectly evaluated
  • :unclassified Number of instances incorrectly classified
  • :percentage-correct Percentage of correctly classified instances
  • :percentage-incorrect Percentage of incorrectly classified instances
  • :percentage-unclassified Percentage of not classified instances
  • :error-rate
  • :mean-absolute-error
  • :relative-absolute-error
  • :root-mean-squared-error
  • :root-relative-squared-error
  • :correlation-coefficient
  • :average-cost
  • :kappa The kappa statistic
  • :kb-information
  • :kb-mean-information
  • :kb-relative-information
  • :sf-entropy-gain
  • :sf-mean-entropy-gain
  • :roc-area
  • :false-positive-rate
  • :false-negative-rate
  • :f-measure
  • :precision
  • :recall
  • :evaluation-object The underlying Weka's Java object containing the evaluation
Evaluates a trained classifier using the provided dataset or cross-validation.
The first argument must be the classifier to evaluate, the second argument is
the kind of evaluation to do.
Two possible evaluations ara availabe: dataset and cross-validations. The values
for the second argument can be:

 - :dataset
 - :cross-validation

 * :dataset

 If dataset evaluation is desired, the function call must receive as the second
 parameter the keyword :dataset and as third and fourth parameters the original
 dataset used to build the classifier and the training data:

   (classifier-evaluate *classifier* :dataset *training* *evaluation*)

 * :cross-validation

 If cross-validation is desired, the function call must receive as the second
 parameter the keyword :cross-validation and as third and fourth parameters the dataset
 where for training and the number of folds.

   (classifier-evaluate *classifier* :cross-validation *training* 10)

 The metrics available in the evaluation are listed below:

 - :correct
     Number of instances correctly classified
 - :incorrect
     Number of instances incorrectly evaluated
 - :unclassified
     Number of instances incorrectly classified
 - :percentage-correct
     Percentage of correctly classified instances
 - :percentage-incorrect
     Percentage of incorrectly classified instances
 - :percentage-unclassified
     Percentage of not classified instances
 - :error-rate
 - :mean-absolute-error
 - :relative-absolute-error
 - :root-mean-squared-error
 - :root-relative-squared-error
 - :correlation-coefficient
 - :average-cost
 - :kappa
     The kappa statistic
 - :kb-information
 - :kb-mean-information
 - :kb-relative-information
 - :sf-entropy-gain
 - :sf-mean-entropy-gain
 - :roc-area
 - :false-positive-rate
 - :false-negative-rate
 - :f-measure
 - :precision
 - :recall
 - :evaluation-object
     The underlying Weka's Java object containing the evaluation
raw docstring

classifier-labelclj

(classifier-label classifier instance)

Classifies and assign a label to a dataset instance. This function is similar to classifier-classify but instead of just returning the numeric identifier for the new instance, it changes the class value for that instance to the newly assigned by the classifier.

The function returns the newly classified instance.

This call is destructive, the instance passed as an argument is modified.

; We create the instance to classify (def to-classify (make-instance dataset {:class :Iris-versicolor :petalwidth 0.2 :petallength 1.4 :sepalwidth 3.5 :sepallength 5.1}))

; We use the classifier to check the value for the class (classifier-classify classifier to-classify)

0.0

; We change the class for the instance according to the assigned class (classifier-label classifier to-classify)

#<Instance 5.1,3.5,1.4,0.2,Iris-setosa>

Classifies and assign a label to a dataset instance.
This function is similar to classifier-classify but
instead of just returning the numeric identifier for the
new instance, it changes the class value for that instance
to the newly assigned by the classifier.

The function returns the newly classified instance.

This call is destructive, the instance passed as an argument
is modified.

 ; We create the instance to classify
 (def *to-classify* (make-instance *dataset*  {:class :Iris-versicolor
                                               :petalwidth 0.2
                                               :petallength 1.4
                                               :sepalwidth 3.5
                                               :sepallength 5.1}))

 ; We use the classifier to check the value for the class
 (classifier-classify *classifier* *to-classify*)
  >0.0

 ; We change the class for the instance according to the assigned class
 (classifier-label *classifier* *to-classify*)
  >#<Instance 5.1,3.5,1.4,0.2,Iris-setosa>
raw docstring

classifier-trainclj

(classifier-train classifier dataset)

Trains a classifier with the given dataset as the training data.

Trains a classifier with the given dataset as the training data.
raw docstring

classifier-updateclj

(classifier-update classifier instance-s)

If the classifier is updateable it updates the classifier with the given instance or set of instances.

If the classifier is updateable it updates the classifier with the given instance or set of instances.
raw docstring

make-classifiercljmultimethod

Creates a new classifier for the given kind algorithm and options.

The first argument identifies the kind of classifier and the second argument the algorithm to use, e.g. :decision-tree :c45.

The classifiers currently supported are:

 - :decision-tree :c45
 - :decision-tree :boosted-stump
 - :decision-tree :boosted-decision-tree
 - :decision-tree :M5P
 - :decision-tree :random-forest

;; - :decision-tree :rotation-forest - :bayes :naive - :neural-network :mutilayer-perceptron - :support-vector-machine :smo - :regression :linear - :regression :logistic - :regression :pace

Optionally, a map of options can also be passed as an argument with a set of classifier specific options.

This is the description of the supported classifiers and the accepted option parameters for each of them:

* :decision-tree :c45

  A classifier building a pruned or unpruned C 4.5 decision tree using
  Weka J 4.8 implementation.

  Parameters:

    - :unpruned
        Use unpruned tree. Sample value: true
    - :reduce-error-pruning
        Sample value: true
    - :only-binary-splits
        Sample value: true
    - :no-raising
        Sample value: true
    - :no-cleanup
        Sample value: true
    - :laplace-smoothing
        For predicted probabilities. Sample value: true
    - :pruning-confidence
        Threshold for pruning. Default value: 0.25
    - :minimum-instances
        Minimum number of instances per leave. Default value: 2
    - :pruning-number-folds
        Set number of folds for reduced error pruning. Default value: 3
    - :random-seed
        Seed for random data shuffling. Default value: 1

* :bayes :naive

  Classifier based on the Bayes' theorem with strong independence assumptions, among the
  probabilistic variables.

  Parameters:

    - :kernel-estimator
        Use kernel desity estimator rather than normal. Sample value: true
    - :supervised-discretization
        Use supervised discretization to to process numeric attributes (see :supervised-discretize
        filter in clj-ml-dev.filters/make-filter function). Sample value: true

* :neural-network :multilayer-perceptron

  Classifier built using a feedforward artificial neural network with three or more layers
  of neurons and nonlinear activation functions. It is able to distinguish data that is not
  linearly separable.

  Parameters:

    - :no-nominal-to-binary
        A :nominal-to-binary filter will not be applied by default. (see :supervised-nominal-to-binary
        filter in clj-ml-dev.filters/make-filter function). Default value: false
    - :no-numeric-normalization
        A numeric class will not be normalized. Default value: false
    - :no-nomalization
        No attribute will be normalized. Default value: false
    - :no-reset
        Reseting the network will not be allowed. Default value: false
    - :learning-rate-decay
        Learning rate decay will occur. Default value: false
    - :learning-rate
        Learning rate for the backpropagation algorithm. Value should be between [0,1].
        Default value: 0.3
    - :momentum
        Momentum rate for the backpropagation algorithm. Value shuld be between [0,1].
        Default value: 0.2
    - :epochs
        Number of iteration to train through. Default value: 500
    - :percentage-validation-set
        Percentage size of validation set to use to terminate training. If it is not zero
        it takes precende over the number of epochs to finish training. Values should be
        between [0,100]. Default value: 0
    - :random-seed
        Value of the seed for the random generator. Values should be longs greater than
        0. Default value: 1
    - :threshold-number-errors
        The consequetive number of errors allowed for validation testing before the network
        terminates. Values should be greater thant 0. Default value: 20

* :support-vector-machine :smo

  Support vector machine (SVM) classifier built using the sequential minimal optimization (SMO)
  training algorithm.

  Parameters:

    - :fit-logistic-models
        Fit logistic models to SVM outputs. Default value :false
    - :complexity-constant
        The complexity constance. Default value: 1
    - :tolerance
        Tolerance parameter. Default value: 1.0e-3
    - :epsilon-roundoff
        Epsilon round-off error. Default value: 1.0e-12
    - :folds-for-cross-validation
        Number of folds for the internal cross-validation. Sample value: 10
    - :random-seed
        Value of the seed for the random generator. Values should be longs greater than
        0. Default value: 1

 * :regression :linear

  Parameters:

    - :attribute-selection
        Set the attribute selection method to use. 1 = None, 2 = Greedy. (default 0 = M5' method)
    - :keep-colinear
        Do not try to eliminate colinear attributes.
    - :ridge
        Set ridge parameter (default 1.0e-8).

 * :regression :logistic

  Parameters:

    - :max-iterations
        Set the maximum number of iterations (default -1, until convergence).
    - :ridge
        Set the ridge in the log-likelihood.
Creates a new classifier for the given kind algorithm and options.

   The first argument identifies the kind of classifier and the second
   argument the algorithm to use, e.g. :decision-tree :c45.

   The classifiers currently supported are:

     - :decision-tree :c45
     - :decision-tree :boosted-stump
     - :decision-tree :boosted-decision-tree
     - :decision-tree :M5P
     - :decision-tree :random-forest
;;     - :decision-tree :rotation-forest
     - :bayes :naive
     - :neural-network :mutilayer-perceptron
     - :support-vector-machine :smo
     - :regression :linear
     - :regression :logistic
     - :regression :pace

   Optionally, a map of options can also be passed as an argument with
   a set of classifier specific options.

   This is the description of the supported classifiers and the accepted
   option parameters for each of them:

    * :decision-tree :c45

      A classifier building a pruned or unpruned C 4.5 decision tree using
      Weka J 4.8 implementation.

      Parameters:

        - :unpruned
            Use unpruned tree. Sample value: true
        - :reduce-error-pruning
            Sample value: true
        - :only-binary-splits
            Sample value: true
        - :no-raising
            Sample value: true
        - :no-cleanup
            Sample value: true
        - :laplace-smoothing
            For predicted probabilities. Sample value: true
        - :pruning-confidence
            Threshold for pruning. Default value: 0.25
        - :minimum-instances
            Minimum number of instances per leave. Default value: 2
        - :pruning-number-folds
            Set number of folds for reduced error pruning. Default value: 3
        - :random-seed
            Seed for random data shuffling. Default value: 1

    * :bayes :naive

      Classifier based on the Bayes' theorem with strong independence assumptions, among the
      probabilistic variables.

      Parameters:

        - :kernel-estimator
            Use kernel desity estimator rather than normal. Sample value: true
        - :supervised-discretization
            Use supervised discretization to to process numeric attributes (see :supervised-discretize
            filter in clj-ml-dev.filters/make-filter function). Sample value: true

    * :neural-network :multilayer-perceptron

      Classifier built using a feedforward artificial neural network with three or more layers
      of neurons and nonlinear activation functions. It is able to distinguish data that is not
      linearly separable.

      Parameters:

        - :no-nominal-to-binary
            A :nominal-to-binary filter will not be applied by default. (see :supervised-nominal-to-binary
            filter in clj-ml-dev.filters/make-filter function). Default value: false
        - :no-numeric-normalization
            A numeric class will not be normalized. Default value: false
        - :no-nomalization
            No attribute will be normalized. Default value: false
        - :no-reset
            Reseting the network will not be allowed. Default value: false
        - :learning-rate-decay
            Learning rate decay will occur. Default value: false
        - :learning-rate
            Learning rate for the backpropagation algorithm. Value should be between [0,1].
            Default value: 0.3
        - :momentum
            Momentum rate for the backpropagation algorithm. Value shuld be between [0,1].
            Default value: 0.2
        - :epochs
            Number of iteration to train through. Default value: 500
        - :percentage-validation-set
            Percentage size of validation set to use to terminate training. If it is not zero
            it takes precende over the number of epochs to finish training. Values should be
            between [0,100]. Default value: 0
        - :random-seed
            Value of the seed for the random generator. Values should be longs greater than
            0. Default value: 1
        - :threshold-number-errors
            The consequetive number of errors allowed for validation testing before the network
            terminates. Values should be greater thant 0. Default value: 20

    * :support-vector-machine :smo

      Support vector machine (SVM) classifier built using the sequential minimal optimization (SMO)
      training algorithm.

      Parameters:

        - :fit-logistic-models
            Fit logistic models to SVM outputs. Default value :false
        - :complexity-constant
            The complexity constance. Default value: 1
        - :tolerance
            Tolerance parameter. Default value: 1.0e-3
        - :epsilon-roundoff
            Epsilon round-off error. Default value: 1.0e-12
        - :folds-for-cross-validation
            Number of folds for the internal cross-validation. Sample value: 10
        - :random-seed
            Value of the seed for the random generator. Values should be longs greater than
            0. Default value: 1

     * :regression :linear

      Parameters:

        - :attribute-selection
            Set the attribute selection method to use. 1 = None, 2 = Greedy. (default 0 = M5' method)
        - :keep-colinear
            Do not try to eliminate colinear attributes.
        - :ridge
            Set ridge parameter (default 1.0e-8).

     * :regression :logistic

      Parameters:

        - :max-iterations
            Set the maximum number of iterations (default -1, until convergence).
        - :ridge
            Set the ridge in the log-likelihood.
raw docstring

make-classifier-withclj

(make-classifier-with kind algorithm classifier-class options)

cljdoc is a website building & hosting documentation for Clojure/Script libraries

× close