A client entry point library to help with evaluating machine learning
models. This library not only wraps the Weka library but also provides
additional functionality like a two pass cross
validation (see with-two-pass
).
A *client* entry point library to help with evaluating machine learning models. This library not only wraps the Weka library but also provides additional functionality like a two pass cross validation (see [[with-two-pass]]).
The default type of test, which is one of:
:cross-validation
: run a N fold cross validation (default):train-test
: train the classifier and then evaluateThe default type of test, which is one of: * `:cross-validation`: run a N fold cross validation (default) * `:train-test`: train the classifier and then evaluate
If true
, throw an exception during cross validation for any errors.
Otherwise, the error is logged and cross-validation continues. This is
useful for when classifiers are used and some choke given the dataset, but
you still want the other results.
If `true`, throw an exception during cross validation for any errors. Otherwise, the error is logged and cross-validation continues. This is useful for when classifiers are used and some choke given the dataset, but you still want the other results.
(analysis-file)
(analysis-file file-format)
(compile-results classifier-sets feature-set-key)
Run cross-fold validation and compile into a nice results map sorted by performance.
See zensols.model.classifier/compile-results
.
classifier-sets is a key in zensols.model.weka/*classifiers*
or a
constructed classifier (see zensols.model.weka/make-classifiers
)
feature-sets-key identifies what feature set (see
:feature-sets-set in zensols.model.execute-classifier/with-model-conf
)
Run cross-fold validation and compile into a nice results map sorted by performance. See [[zensols.model.classifier/compile-results]]. * **classifier-sets** is a key in [[zensols.model.weka/*classifiers*]] or a constructed classifier (see [[zensols.model.weka/make-classifiers]]) * **feature-sets-key** identifies what feature set (see **:feature-sets-set** in [[zensols.model.execute-classifier/with-model-conf]])
(create-model classifier-sets feature-set-key)
Create a model that can be trained. This runs cross fold validations to
find the best classifier and feature set into a result that can be used
with train-model
and subsequently write-model
.
zensols.model.weka/*classifiers*
or a
constructed classifier (see zensols.model.weka/make-classifiers
)zensols.model.execute-classifier/with-model-conf
)Create a model that can be trained. This runs cross fold validations to find the best classifier and feature set into a result that can be used with [[train-model]] and subsequently [[write-model]]. * **classifier-sets** is a key in [[zensols.model.weka/*classifiers*]] or a constructed classifier (see [[zensols.model.weka/make-classifiers]]) * **feature-sets-key** identifies what feature set (see **:feature-sets-set** in [[zensols.model.execute-classifier/with-model-conf]]) See [[*throw-cross-validate*]].
(cross-fold-info)
Return information about the current fold for two-pass validations.
See weka/*cross-fold-info*
Return information about the current fold for two-pass validations. See [[weka/*cross-fold-info*]]
(display-features & adb-keys)
Display features as configured in a model with
zensols.model.execute-classifier/with-model-conf
.
adb-keys are given to :create-feature-sets-fn
as described
in zensols.model.execute-classifier/with-model-conf
. In addition it
includes :max
, which is the maximum number of instances to display.
Display features as configured in a model with [[zensols.model.execute-classifier/with-model-conf]]. **adb-keys** are given to `:create-feature-sets-fn` as described in [[zensols.model.execute-classifier/with-model-conf]]. In addition it includes `:max`, which is the maximum number of instances to display.
(eval-and-write classifier-sets set-key)
(eval-and-write classifier-sets set-key file)
Perform a cross validation and write the results to an Excel formatted file.
See zensols.model.classifier/analysis-report-resource
for where the file is
written.
zensols.model.weka/*classifiers*
or a
constructed classifier (see zensols.model.weka/make-classifiers
)zensols.model.execute-classifier/with-model-conf
)This uses eval-and-write-results
to actually write the results.
See evaluations-file
and *throw-cross-validate*
.
Perform a cross validation and write the results to an Excel formatted file. See [[zensols.model.classifier/analysis-report-resource]] for where the file is written. * **classifier-sets** is a key in [[zensols.model.weka/*classifiers*]] or a constructed classifier (see [[zensols.model.weka/make-classifiers]]) * **feature-sets-key** identifies what feature set (see **:feature-sets-set** in [[zensols.model.execute-classifier/with-model-conf]]) This uses [[eval-and-write-results]] to actually write the results. See [[evaluations-file]] and [[*throw-cross-validate*]].
(eval-and-write-results results)
(eval-and-write-results results output-file)
Perform a cross validation and write the results to an Excel formatted file.
The data from results is obtained with run-tests
.
See eval-and-write
and *throw-cross-validate*
.
Perform a cross validation and write the results to an Excel formatted file. The data from **results** is obtained with [[run-tests]]. See [[eval-and-write]] and [[*throw-cross-validate*]].
(evaluations-file)
(evaluations-file fname)
Return the default file used to create an evaluations file
with eval-and-write
.
Return the default file used to create an evaluations file with [[eval-and-write]].
(executing-two-pass?)
Return true
if we're currently using a two pass cross validation.
Return `true` if we're currently using a two pass cross validation.
(features-file)
Return the default file used to create the features output file
with write-features
.
Return the default file used to create the features output file with [[write-features]].
(print-best-results classifier-sets feature-set-key)
Print the highest (best) scored cross validation information.
zensols.model.weka/*classifiers*
or a
constructed classifier (see zensols.model.weka/make-classifiers
)zensols.model.execute-classifier/with-model-conf
)Print the highest (best) scored cross validation information. * **classifier-sets** is a key in [[zensols.model.weka/*classifiers*]] or a constructed classifier (see [[zensols.model.weka/make-classifiers]]) * **feature-sets-key** identifies what feature set (see **:feature-sets-set** in [[zensols.model.execute-classifier/with-model-conf]]) See [[*throw-cross-validate*]].
(print-model-config)
Pretty print the model configuation set
with zensols.model.execute-classifier/with-model-conf
.
Pretty print the model configuation set with [[zensols.model.execute-classifier/with-model-conf]].
(read-arff)
(read-arff file)
Read the ARFF file configured
with zensols.model.execute-classifier/with-model-conf
. If file is
given, use that file instead of getting it
from zensols.model.classifier/analysis-report-resource
.
Read the ARFF file configured with [[zensols.model.execute-classifier/with-model-conf]]. If **file** is given, use that file instead of getting it from [[zensols.model.classifier/analysis-report-resource]].
(read-model)
Read a model that was previously persisted to the file system.
See [[zensols.model.classifier/model-dir]] for where the model is read from.
Read a model that was previously persisted to the file system. See [[zensols.model.classifier/model-dir]] for where the model is read from.
(run-tests classifier-sets feature-set-key)
Create result sets useful to functions like eval-and-write
. This
package was designed for most use cases to not have to use this function.
Create result sets useful to functions like [[eval-and-write]]. This package was designed for most use cases to not have to use this function. See [[*throw-cross-validate*]].
(terse-results classifier-sets
feature-set-key
&
{:keys [only-stats?] :or {only-stats? true}})
Return terse cross-validation results in an array:
classifier name
weighted F-measure
feature-metas
classifier-sets is a key in zensols.model.weka/*classifiers*
or a
constructed classifier (see zensols.model.weka/make-classifiers
)
feature-sets-key identifies what feature set (see
:feature-sets-set in zensols.model.execute-classifier/with-model-conf
)
true
only return statistic dataReturn terse cross-validation results in an array: * classifier name * weighted F-measure * feature-metas * **classifier-sets** is a key in [[zensols.model.weka/*classifiers*]] or a constructed classifier (see [[zensols.model.weka/make-classifiers]]) * **feature-sets-key** identifies what feature set (see **:feature-sets-set** in [[zensols.model.execute-classifier/with-model-conf]]) ## Keys * **:only-stats?** if `true` only return statistic data See [[*throw-cross-validate*]].
(test-train-series-file)
(test-train-series-file fname)
Return the default file used to create an evaluations file
with eval-and-write
.
Return the default file used to create an evaluations file with [[eval-and-write]].
(train-model model & {:keys [set-type] :or {set-type *default-set-type*}})
Train a model created from create-model
. The model is trained on the
full available dataset. After the classifier is trained, you can save it to
disk by calling write-model
.
create-model
Train a model created from [[create-model]]. The model is trained on the full available dataset. After the classifier is trained, you can save it to disk by calling [[write-model]]. * **model** a model that was created with [[create-model]] See [[*throw-cross-validate*]].
(train-test-results classifier-sets feature-sets-key)
Test the performance of a model by training on a given set of data and evaluate on the test data.
See train-model
for parameter details.
Test the performance of a model by training on a given set of data and evaluate on the test data. See [[train-model]] for parameter details.
(train-test-series classifiers meta-set divide-ratio-config)
Test and train with different rations and return the results. The return
data is writable directly as an Excel file. However, you can also save it as
a CSV with write-csv-train-test-series
.
The keys are the classifier name and the values are the 2D result matrix.
Test and train with different rations and return the results. The return data is writable directly as an Excel file. However, you can also save it as a CSV with [[write-csv-train-test-series]]. The keys are the classifier name and the values are the 2D result matrix. See [[*throw-cross-validate*]].
(two-pass-model model id-key anon-by-id-fn anons-fn)
Don't use this function--instead, use with-two-pass
.
Create a two pass model, which should be merged with the model created
with zensols.model.execute-classifier/with-model-conf
.
See with-two-pass
.
Don't use this function--instead, use [[with-two-pass]]. Create a two pass model, which should be merged with the model created with [[zensols.model.execute-classifier/with-model-conf]]. See [[with-two-pass]].
(two-pass-test-instances insts train-state org folds fold)
Don't use this function--instead, use with-two-pass
.
This is called by the zensols.model.weka
namespace.
Don't use this function--instead, use [[with-two-pass]]. This is called by the [[zensols.model.weka]] namespace.
(two-pass-train-instances insts state org folds fold)
Don't use this function--instead, use with-two-pass
.
This is called by the zensols.model.weka
namespace.
Don't use this function--instead, use [[with-two-pass]]. This is called by the [[zensols.model.weka]] namespace.
(with-two-pass model-conf opts & forms)
Like with-model-conf
, but compute a context state (i.e. statistics needed
by the model) on a per fold when executing a cross fold validation.
The model-conf
parameter is the same model used
with zensols.model.execute-classifier/with-model-conf
.
Two pass validation is a term used in this library. During cross-validation the entire data set is evaluated and (usually) statistics or some other additional modeling happens.
Take for example you want to count words (think Naive Bays spam filter). If create features for the entire dataset before cross-validation you're "cheating" because the features are based on data not seen from the test folds.
To get more accurate performance metrics you can provide functions that takes the current training fold, compute your word counts and create your features. During the testing phase, the computed data is provided to create features based on only that (current) fold.
To use two pass validation ever feature set needs a unique key (not needed as a feature). This key is then given to a function during validation to get the corresponding feature set that is to be stitched in.
Note This is only useful if:
In addition to all keys documented
in zensols.model.execute-classifier/with-model-conf
, the opts param
is a map that also needs the following key/value pairs:
:create-context-fn
, as documented
in zensols.model.execute-classifier/with-model-conf
but called for two
pass cross validation; this allows a more general context and a specific two
pass context to be created for the unique needs of the model.(with-two-pass (create-model-config)
{:id-key sf/id-key
:anon-by-id-fn #(->> % adb/anon-by-id :instance)
:anons-fn adb/anons}
(with-feature-context (sf/create-context :anons-fn adb/anons
:set-type :train-test)
(ec/terse-results [:j48] :set-test-two-pass :only-stats? true)))
See a working example for a more comprehensive code listing.
Like `with-model-conf`, but compute a context state (i.e. statistics needed by the model) on a per fold when executing a cross fold validation. The `model-conf` parameter is the same model used with [[zensols.model.execute-classifier/with-model-conf]]. ## Description Two pass validation is a term used in this library. During cross-validation the entire data set is evaluated and (usually) statistics or some other additional modeling happens. Take for example you want to count words (think Naive Bays spam filter). If create features for the entire dataset before cross-validation you're "cheating" because the features are based on data not seen from the test folds. To get more accurate performance metrics you can provide functions that takes the current training fold, compute your word counts and create your features. During the testing phase, the computed data is provided to create features based on only that (current) fold. To use two pass validation ever feature set needs a unique key (not needed as a feature). This key is then given to a function during validation to get the corresponding feature set that is to be *stitched* in. **Note** This is *only* useful if: 1. You want to use cross fold validation to test your model. 2. Your model priors (*context* in implementation parlance) is composed of the dataset preproessing, and thus, needed to get reliable performance metrics. ## Option Keys In addition to all keys documented in [[zensols.model.execute-classifier/with-model-conf]], the **opts** param is a map that also needs the following key/value pairs: * **:id-key** a function that takes a key as input and returns a feature set * **:anon-by-id-fn** is a function that takes a single integer argument of the annotation to retrieve by ID * **:anons-fn** is a function that retrieves all annotations * **:create-two-pass-context-fn** like `:create-context-fn`, as documented in [[zensols.model.execute-classifier/with-model-conf]] but called for two pass cross validation; this allows a more general context and a specific two pass context to be created for the unique needs of the model. ## Example ``` (with-two-pass (create-model-config) {:id-key sf/id-key :anon-by-id-fn #(->> % adb/anon-by-id :instance) :anons-fn adb/anons} (with-feature-context (sf/create-context :anons-fn adb/anons :set-type :train-test) (ec/terse-results [:j48] :set-test-two-pass :only-stats? true))) ``` See a [working example](https://github.com/plandes/clj-example-nlp-ml/blob/master/src/clojure/zensols/example/sa_tp_eval.clj) for a more comprehensive code listing.
(write-arff)
(write-arff file)
Write the ARFF file configured
with zensols.model.execute-classifier/with-model-conf
. If file is
given, use that file instead of getting it
from zensols.model.classifier/analysis-report-resource
.
Write the ARFF file configured with [[zensols.model.execute-classifier/with-model-conf]]. If **file** is given, use that file instead of getting it from [[zensols.model.classifier/analysis-report-resource]].
(write-csv-train-test-series res)
(write-csv-train-test-series res out-file)
Write the results produced with train-test-series
as a CSV file to the
analysis directory.
Write the results produced with [[train-test-series]] as a CSV file to the analysis directory.
(write-features)
(write-features file)
Write features as configured in a model with zensols.model.execute-classifier/with-model-conf
to a CSV
spreadsheet file.
See features-file
for the default file
For the non-zero-arg form, see zensols.model.execute-classifier/with-model-conf
.
Write features as configured in a model with [[zensols.model.execute-classifier/with-model-conf]] to a CSV spreadsheet file. See [[features-file]] for the default file For the non-zero-arg form, see [[zensols.model.execute-classifier/with-model-conf]].
(write-model model)
(write-model model name)
Persist/write the model to disk.
train-model
See [[zensols.model.classifier/model-dir]] for information about to where the model is written.
Persist/write the model to disk. * **model** a model that was trained with [[train-model]] See [[zensols.model.classifier/model-dir]] for information about to where the model is written.
cljdoc is a website building & hosting documentation for Clojure/Script libraries
× close