Liking cljdoc? Tell your friends :D

zensols.model.eval-classifier

A client entry point library to help with evaluating machine learning models. This library not only wraps the Weka library but also provides additional functionality like a two pass cross validation (see with-two-pass).

A *client* entry point library to help with evaluating machine learning
models.  This library not only wraps the Weka library but also provides
additional functionality like a two pass cross
validation (see [[with-two-pass]]).
raw docstring

*default-set-type*clj

The default type of test, which is one of:

  • :cross-validation: run a N fold cross validation (default)
  • :train-test: train the classifier and then evaluate
The default type of test, which is one of:

* `:cross-validation`: run a N fold cross validation (default)
* `:train-test`: train the classifier and then evaluate
raw docstring

*throw-cross-validate*clj

If true, throw an exception during cross validation for any errors. Otherwise, the error is logged and cross-validation continues. This is useful for when classifiers are used and some choke given the dataset, but you still want the other results.

If `true`, throw an exception during cross validation for any errors.
Otherwise, the error is logged and cross-validation continues.  This is
useful for when classifiers are used and some choke given the dataset, but
you still want the other results.
raw docstring

analysis-fileclj

(analysis-file)
(analysis-file file-format)

compile-resultsclj

(compile-results classifier-sets feature-set-key)

Run cross-fold validation and compile into a nice results map sorted by performance.

See zensols.model.classifier/compile-results.

Run cross-fold validation and compile into a nice results map sorted by
performance.

See [[zensols.model.classifier/compile-results]].

* **classifier-sets** is a key in [[zensols.model.weka/*classifiers*]] or a
constructed classifier (see [[zensols.model.weka/make-classifiers]])

* **feature-sets-key** identifies what feature set (see
**:feature-sets-set** in [[zensols.model.execute-classifier/with-model-conf]])
raw docstring

create-modelclj

(create-model classifier-sets feature-set-key)

Create a model that can be trained. This runs cross fold validations to find the best classifier and feature set into a result that can be used with train-model and subsequently write-model.

See *throw-cross-validate*.

Create a model that can be trained.  This runs cross fold validations to
find the best classifier and feature set into a result that can be used
with [[train-model]] and subsequently [[write-model]].

* **classifier-sets** is a key in [[zensols.model.weka/*classifiers*]] or a
constructed classifier (see [[zensols.model.weka/make-classifiers]])
* **feature-sets-key** identifies what feature set (see
**:feature-sets-set** in [[zensols.model.execute-classifier/with-model-conf]])

See [[*throw-cross-validate*]].
raw docstring

cross-fold-infoclj

(cross-fold-info)

Return information about the current fold for two-pass validations. See weka/*cross-fold-info*

Return information about the current fold for two-pass validations.
See [[weka/*cross-fold-info*]]
raw docstring

display-featuresclj

(display-features & adb-keys)

Display features as configured in a model with zensols.model.execute-classifier/with-model-conf.

adb-keys are given to :create-feature-sets-fn as described in zensols.model.execute-classifier/with-model-conf. In addition it includes :max, which is the maximum number of instances to display.

Display features as configured in a model with
[[zensols.model.execute-classifier/with-model-conf]].

**adb-keys** are given to `:create-feature-sets-fn` as described
in [[zensols.model.execute-classifier/with-model-conf]].  In addition it
includes `:max`, which is the maximum number of instances to display.
raw docstring

eval-and-writeclj

(eval-and-write classifier-sets set-key)
(eval-and-write classifier-sets set-key file)

Perform a cross validation and write the results to an Excel formatted file.

See zensols.model.classifier/analysis-report-resource for where the file is written.

This uses eval-and-write-results to actually write the results.

See evaluations-file and *throw-cross-validate*.

Perform a cross validation and write the results to an Excel formatted file.

See [[zensols.model.classifier/analysis-report-resource]] for where the file is
written.

* **classifier-sets** is a key in [[zensols.model.weka/*classifiers*]] or a
constructed classifier (see [[zensols.model.weka/make-classifiers]])
* **feature-sets-key** identifies what feature set (see
**:feature-sets-set** in [[zensols.model.execute-classifier/with-model-conf]])

This uses [[eval-and-write-results]] to actually write the results.

See [[evaluations-file]] and [[*throw-cross-validate*]].
raw docstring

eval-and-write-resultsclj

(eval-and-write-results results)
(eval-and-write-results results output-file)

Perform a cross validation and write the results to an Excel formatted file. The data from results is obtained with run-tests.

See eval-and-write and *throw-cross-validate*.

Perform a cross validation and write the results to an Excel formatted file.
The data from **results** is obtained with [[run-tests]].

See [[eval-and-write]] and [[*throw-cross-validate*]].
raw docstring

evaluations-fileclj

(evaluations-file)
(evaluations-file fname)

Return the default file used to create an evaluations file with eval-and-write.

Return the default file used to create an evaluations file
with [[eval-and-write]].
raw docstring

executing-two-pass?clj

(executing-two-pass?)

Return true if we're currently using a two pass cross validation.

Return `true` if we're currently using a two pass cross validation.
raw docstring

features-fileclj

(features-file)

Return the default file used to create the features output file with write-features.

Return the default file used to create the features output file
with [[write-features]].
raw docstring

(print-best-results classifier-sets feature-set-key)

Print the highest (best) scored cross validation information.

See *throw-cross-validate*.

Print the highest (best) scored cross validation information.

* **classifier-sets** is a key in [[zensols.model.weka/*classifiers*]] or a
constructed classifier (see [[zensols.model.weka/make-classifiers]])
* **feature-sets-key** identifies what feature set (see
**:feature-sets-set** in [[zensols.model.execute-classifier/with-model-conf]])

See [[*throw-cross-validate*]].
raw docstring

(print-model-config)

Pretty print the model configuation set with zensols.model.execute-classifier/with-model-conf.

Pretty print the model configuation set
with [[zensols.model.execute-classifier/with-model-conf]].
raw docstring

read-arffclj

(read-arff)
(read-arff file)

Read the ARFF file configured with zensols.model.execute-classifier/with-model-conf. If file is given, use that file instead of getting it from zensols.model.classifier/analysis-report-resource.

Read the ARFF file configured
with [[zensols.model.execute-classifier/with-model-conf]].  If **file** is
given, use that file instead of getting it
from [[zensols.model.classifier/analysis-report-resource]].
raw docstring

read-modelclj

(read-model)

Read a model that was previously persisted to the file system.

See [[zensols.model.classifier/model-dir]] for where the model is read from.

Read a model that was previously persisted to the file system.

See [[zensols.model.classifier/model-dir]] for where the model is read from.
raw docstring

run-testsclj

(run-tests classifier-sets feature-set-key)

Create result sets useful to functions like eval-and-write. This package was designed for most use cases to not have to use this function.

See *throw-cross-validate*.

Create result sets useful to functions like [[eval-and-write]].  This
package was designed for most use cases to not have to use this function.

See [[*throw-cross-validate*]].
raw docstring

terse-resultsclj

(terse-results classifier-sets
               feature-set-key
               &
               {:keys [only-stats?] :or {only-stats? true}})

Return terse cross-validation results in an array:

Keys

  • :only-stats? if true only return statistic data

See *throw-cross-validate*.

Return terse cross-validation results in an array:
* classifier name
* weighted F-measure
* feature-metas

* **classifier-sets** is a key in [[zensols.model.weka/*classifiers*]] or a
constructed classifier (see [[zensols.model.weka/make-classifiers]])

* **feature-sets-key** identifies what feature set (see
**:feature-sets-set** in [[zensols.model.execute-classifier/with-model-conf]])

## Keys

* **:only-stats?** if `true` only return statistic data

See [[*throw-cross-validate*]].
raw docstring

test-train-series-fileclj

(test-train-series-file)
(test-train-series-file fname)

Return the default file used to create an evaluations file with eval-and-write.

Return the default file used to create an evaluations file
with [[eval-and-write]].
raw docstring

train-modelclj

(train-model model & {:keys [set-type] :or {set-type *default-set-type*}})

Train a model created from create-model. The model is trained on the full available dataset. After the classifier is trained, you can save it to disk by calling write-model.

See *throw-cross-validate*.

Train a model created from [[create-model]].  The model is trained on the
full available dataset.  After the classifier is trained, you can save it to
disk by calling [[write-model]].

* **model** a model that was created with [[create-model]]

See [[*throw-cross-validate*]].
raw docstring

train-test-resultsclj

(train-test-results classifier-sets feature-sets-key)

Test the performance of a model by training on a given set of data and evaluate on the test data.

See train-model for parameter details.

Test the performance of a model by training on a given set of data
and evaluate on the test data.

See [[train-model]] for parameter details.
raw docstring

train-test-seriesclj

(train-test-series classifiers meta-set divide-ratio-config)

Test and train with different rations and return the results. The return data is writable directly as an Excel file. However, you can also save it as a CSV with write-csv-train-test-series.

The keys are the classifier name and the values are the 2D result matrix.

See *throw-cross-validate*.

Test and train with different rations and return the results.  The return
data is writable directly as an Excel file.  However, you can also save it as
a CSV with [[write-csv-train-test-series]].

The keys are the classifier name and the values are the 2D result matrix.

See [[*throw-cross-validate*]].
raw docstring

two-pass-modelclj

(two-pass-model model id-key anon-by-id-fn anons-fn)

Don't use this function--instead, use with-two-pass.

Create a two pass model, which should be merged with the model created with zensols.model.execute-classifier/with-model-conf.

See with-two-pass.

Don't use this function--instead, use [[with-two-pass]].

Create a two pass model, which should be merged with the model created
with [[zensols.model.execute-classifier/with-model-conf]].

See [[with-two-pass]].
raw docstring

two-pass-test-instancesclj

(two-pass-test-instances insts train-state org folds fold)

Don't use this function--instead, use with-two-pass.

This is called by the zensols.model.weka namespace.

Don't use this function--instead, use [[with-two-pass]].

This is called by the [[zensols.model.weka]] namespace.
raw docstring

two-pass-train-instancesclj

(two-pass-train-instances insts state org folds fold)

Don't use this function--instead, use with-two-pass.

This is called by the zensols.model.weka namespace.

Don't use this function--instead, use [[with-two-pass]].

This is called by the [[zensols.model.weka]] namespace.
raw docstring

with-two-passclj/smacro

(with-two-pass model-conf opts & forms)

Like with-model-conf, but compute a context state (i.e. statistics needed by the model) on a per fold when executing a cross fold validation.

The model-conf parameter is the same model used with zensols.model.execute-classifier/with-model-conf.

Description

Two pass validation is a term used in this library. During cross-validation the entire data set is evaluated and (usually) statistics or some other additional modeling happens.

Take for example you want to count words (think Naive Bays spam filter). If create features for the entire dataset before cross-validation you're "cheating" because the features are based on data not seen from the test folds.

To get more accurate performance metrics you can provide functions that takes the current training fold, compute your word counts and create your features. During the testing phase, the computed data is provided to create features based on only that (current) fold.

To use two pass validation ever feature set needs a unique key (not needed as a feature). This key is then given to a function during validation to get the corresponding feature set that is to be stitched in.

Note This is only useful if:

  1. You want to use cross fold validation to test your model.
  2. Your model priors (context in implementation parlance) is composed of the dataset preproessing, and thus, needed to get reliable performance metrics.

Option Keys

In addition to all keys documented in zensols.model.execute-classifier/with-model-conf, the opts param is a map that also needs the following key/value pairs:

  • :id-key a function that takes a key as input and returns a feature set
  • :anon-by-id-fn is a function that takes a single integer argument of the annotation to retrieve by ID
  • :anons-fn is a function that retrieves all annotations
  • :create-two-pass-context-fn like :create-context-fn, as documented in zensols.model.execute-classifier/with-model-conf but called for two pass cross validation; this allows a more general context and a specific two pass context to be created for the unique needs of the model.

Example

(with-two-pass (create-model-config)
  {:id-key sf/id-key
   :anon-by-id-fn #(->> % adb/anon-by-id :instance)
   :anons-fn adb/anons}
(with-feature-context (sf/create-context :anons-fn adb/anons
                                         :set-type :train-test)
  (ec/terse-results [:j48] :set-test-two-pass :only-stats? true)))

See a working example for a more comprehensive code listing.

Like `with-model-conf`, but compute a context state (i.e. statistics needed
by the model) on a per fold when executing a cross fold validation.

The `model-conf` parameter is the same model used
with [[zensols.model.execute-classifier/with-model-conf]].


## Description

Two pass validation is a term used in this library.  During cross-validation
the entire data set is evaluated and (usually) statistics or some other
additional modeling happens.

Take for example you want to count words (think Naive Bays spam filter).  If
create features for the entire dataset before cross-validation you're
"cheating" because the features are based on data not seen from the test
folds.

To get more accurate performance metrics you can provide functions that takes
the current training fold, compute your word counts and create your features.
During the testing phase, the computed data is provided to create features
based on only that (current) fold.

To use two pass validation ever feature set needs a unique key (not needed as
a feature).  This key is then given to a function during validation to get
the corresponding feature set that is to be *stitched* in.

**Note** This is *only* useful if:

1. You want to use cross fold validation to test your model.
2. Your model priors (*context* in implementation parlance) is composed of
   the dataset preproessing, and thus, needed to get reliable performance
   metrics.


## Option Keys

In addition to all keys documented
in [[zensols.model.execute-classifier/with-model-conf]], the **opts** param
is a map that also needs the following key/value pairs:

* **:id-key** a function that takes a key as input and returns a feature set
* **:anon-by-id-fn** is a function that takes a single integer argument of the
annotation to retrieve by ID
* **:anons-fn** is a function that retrieves all annotations
* **:create-two-pass-context-fn** like `:create-context-fn`, as documented
in [[zensols.model.execute-classifier/with-model-conf]] but called for two
pass cross validation; this allows a more general context and a specific two
pass context to be created for the unique needs of the model.

## Example

```
(with-two-pass (create-model-config)
  {:id-key sf/id-key
   :anon-by-id-fn #(->> % adb/anon-by-id :instance)
   :anons-fn adb/anons}
(with-feature-context (sf/create-context :anons-fn adb/anons
                                         :set-type :train-test)
  (ec/terse-results [:j48] :set-test-two-pass :only-stats? true)))
```

See a [working example](https://github.com/plandes/clj-example-nlp-ml/blob/master/src/clojure/zensols/example/sa_tp_eval.clj)
for a more comprehensive code listing.
raw docstring

write-arffclj

(write-arff)
(write-arff file)

Write the ARFF file configured with zensols.model.execute-classifier/with-model-conf. If file is given, use that file instead of getting it from zensols.model.classifier/analysis-report-resource.

Write the ARFF file configured
with [[zensols.model.execute-classifier/with-model-conf]].  If **file** is
given, use that file instead of getting it
from [[zensols.model.classifier/analysis-report-resource]].
raw docstring

write-csv-train-test-seriesclj

(write-csv-train-test-series res)
(write-csv-train-test-series res out-file)

Write the results produced with train-test-series as a CSV file to the analysis directory.

Write the results produced with [[train-test-series]] as a CSV file to the
analysis directory.
raw docstring

write-featuresclj

(write-features)
(write-features file)

Write features as configured in a model with zensols.model.execute-classifier/with-model-conf to a CSV spreadsheet file.

See features-file for the default file

For the non-zero-arg form, see zensols.model.execute-classifier/with-model-conf.

Write features as configured in a model with [[zensols.model.execute-classifier/with-model-conf]] to a CSV
spreadsheet file.

See [[features-file]] for the default file

For the non-zero-arg form, see [[zensols.model.execute-classifier/with-model-conf]].
raw docstring

write-modelclj

(write-model model)
(write-model model name)

Persist/write the model to disk.

See [[zensols.model.classifier/model-dir]] for information about to where the model is written.

Persist/write the model to disk.

* **model** a model that was trained with [[train-model]]

See [[zensols.model.classifier/model-dir]] for information about to where the
model is written.
raw docstring

cljdoc is a website building & hosting documentation for Clojure/Script libraries

× close