Core functions for machine learninig and pipeline execution.
Functions are re-exported from:
Core functions for machine learninig and pipeline execution. Functions are re-exported from: * scicloj.metamorph.ml.* * scicloj.metamorph.core
(->pipeline ops)
(->pipeline config ops)
Create pipeline from declarative description.
Create pipeline from declarative description.
(categorical value-vec)
Given a vector a categorical values create a gridsearch definition.
Given a vector a categorical values create a gridsearch definition.
(classification-accuracy lhs rhs)
correct/total. Model output is a sequence of probability distributions. label-seq is a sequence of values. The answer is considered correct if the key highest probability in the model output entry matches that label.
correct/total. Model output is a sequence of probability distributions. label-seq is a sequence of values. The answer is considered correct if the key highest probability in the model output entry matches that label.
(classification-loss lhs rhs)
1.0 - classification-accuracy.
1.0 - classification-accuracy.
(confusion-map predicted-labels labels)
(confusion-map predicted-labels labels normalize)
(confusion-map->ds conf-matrix-map)
(confusion-map->ds conf-matrix-map normalize)
(default-loss-fn dataset)
Given a datset which must have exactly 1 inference target column return a default loss fn. If column is categorical, loss is tech.v3.ml.loss/classification-loss, else the loss is tech.v3.ml.loss/mae (mean average error).
Given a datset which must have exactly 1 inference target column return a default loss fn. If column is categorical, loss is tech.v3.ml.loss/classification-loss, else the loss is tech.v3.ml.loss/mae (mean average error).
(define-model! model-kwd
train-fn
predict-fn
{:keys [hyperparameters thaw-fn explain-fn options
documentation]})
Create a model definition. An ml model is a function that takes a dataset and an options map and returns a model. A model is something that, combined with a dataset, produces a inferred dataset.
Create a model definition. An ml model is a function that takes a dataset and an options map and returns a model. A model is something that, combined with a dataset, produces a inferred dataset.
(evaluate-pipelines pipe-fn-seq train-test-split-seq metric-fn loss-or-accuracy)
(evaluate-pipelines pipe-fn-seq
train-test-split-seq
metric-fn
loss-or-accuracy
options)
Evaluates performance of a seq of metamorph pipelines, which are suposed to have a model as last step, which behaves correctly in mode :fit and
:transform
It calculates the loss, given as loss-fn
of each pipeline in pipeline-fn-seq
using all the train-test splits given in train-test-split-seq
.
It runs the pipelines in mode :fit and in mode :transform for each pipeline-fn in pipe-fn-seq
for each split in train-test-split-seq
.
The function returns a seq of seqs of evaluation results per pipe-fn per train-test split.
pipe-fn-seq
need to be sequence of functions which follow the metamorph approach. They should take as input the metamorph context map,
which has the dataset under key :metamorph/data, manipulate it as needed for the transformation pipeline and read and write only to the
context as needed. These type of functions get produced typically by calling scicloj.metamorph/pipeline
train-test-split-seq
need to be a sequence of maps containing the train and test dataset (being tech.ml.dataset) at keys :train and :test.
tableclot.api/split->seq
produces such splits.
metric-fn
Metric function to use. Typically comming from tech.v3.ml.loss
loss-or-accuracy
If the metric-fn is a loss or accuracy calculation. Can be :loss or :accuracy.
options
map controls some mainly performance related parameters, which are:
:result-dissoc-in-seq
- Controls how much information is returned for each cross validation. We call dissoc-in
on every seq of this for the fit-ctx
and transform-ctx
before returning them. Default is[[:fit-ctx :metamorph/data]
[:fit-ctx :scicloj.metamorph.ml/target-ds]
[:transform-ctx :metamorph/data]
[:transform-ctx :scicloj.metamorph.ml/target-ds]
[:transform-ctx :scicloj.metamorph.ml/feature-ds]]
:return-best-pipeline-only
- Only return information of the best performing pipeline. Default is true.:return-best-crossvalidation-only
- Only return information of the best crossvalidation (per pipeline returned). Default is true.:map-fn
- Controls parallelism, so if we use map (:map) or pmap (:pmap) to map over different pipelines. Default :pmap:evaluation-handler-fn
- Gets called once with the complete result of an evluation step. Its return alue is ignre ande default i a noop.This function expects as well the ground truth of the target variable into
a specific key in the context :scicloj.metamorph.ml/target-ds
See here for the simplest way to set this up: https://github.com/behrica/metamorph.ml/blob/main/README.md
The function scicloj.ml.metamorph/model
does this correctly.
Evaluates performance of a seq of metamorph pipelines, which are suposed to have a model as last step, which behaves correctly in mode :fit and :transform It calculates the loss, given as `loss-fn` of each pipeline in `pipeline-fn-seq` using all the train-test splits given in `train-test-split-seq`. It runs the pipelines in mode :fit and in mode :transform for each pipeline-fn in `pipe-fn-seq` for each split in `train-test-split-seq`. The function returns a seq of seqs of evaluation results per pipe-fn per train-test split. * `pipe-fn-seq` need to be sequence of functions which follow the metamorph approach. They should take as input the metamorph context map, which has the dataset under key :metamorph/data, manipulate it as needed for the transformation pipeline and read and write only to the context as needed. These type of functions get produced typically by calling `scicloj.metamorph/pipeline` * `train-test-split-seq` need to be a sequence of maps containing the train and test dataset (being tech.ml.dataset) at keys :train and :test. `tableclot.api/split->seq` produces such splits. * `metric-fn` Metric function to use. Typically comming from `tech.v3.ml.loss` `loss-or-accuracy` If the metric-fn is a loss or accuracy calculation. Can be :loss or :accuracy. * `options` map controls some mainly performance related parameters, which are: * `:result-dissoc-in-seq` - Controls how much information is returned for each cross validation. We call `dissoc-in` on every seq of this for the `fit-ctx` and `transform-ctx` before returning them. Default is ``` [[:fit-ctx :metamorph/data] [:fit-ctx :scicloj.metamorph.ml/target-ds] [:transform-ctx :metamorph/data] [:transform-ctx :scicloj.metamorph.ml/target-ds] [:transform-ctx :scicloj.metamorph.ml/feature-ds]] ``` * `:return-best-pipeline-only` - Only return information of the best performing pipeline. Default is true. * `:return-best-crossvalidation-only` - Only return information of the best crossvalidation (per pipeline returned). Default is true. * `:map-fn` - Controls parallelism, so if we use map (:map) or pmap (:pmap) to map over different pipelines. Default :pmap * `:evaluation-handler-fn` - Gets called once with the complete result of an evluation step. Its return alue is ignre ande default i a noop. This function expects as well the ground truth of the target variable into a specific key in the context `:scicloj.metamorph.ml/target-ds` See here for the simplest way to set this up: https://github.com/behrica/metamorph.ml/blob/main/README.md The function [[scicloj.ml.metamorph/model]] does this correctly.
(explain model & [options])
Explain (if possible) an ml model. A model explanation is a model-specific map of data that usually indicates some level of mapping between features and importance
Explain (if possible) an ml model. A model explanation is a model-specific map of data that usually indicates some level of mapping between features and importance
(hyperparameters model-kwd)
Get the hyperparameters for this model definition
Get the hyperparameters for this model definition
(lift op & params)
Create context aware version of the given op
function. :metamorph/data
will be used as a first parameter.
Result of the op
function will be stored under :metamorph/data
Create context aware version of the given `op` function. `:metamorph/data` will be used as a first parameter. Result of the `op` function will be stored under `:metamorph/data`
(linear start end)
(linear start end n-steps)
(linear start end n-steps res-dtype-or-space)
Create a gridsearch definition which does a linear search.
Create a gridsearch definition which does a linear search. * res-dtype-or-space map be either a datatype keyword or a vector of categorical values.
(model-definition-names)
Return a list of all registered model defintion names.
Return a list of all registered model defintion names.
Map of model kwd to model definition
Map of model kwd to model definition
(options->model-def options)
Return the model definition that corresponse to the :model-type option
Return the model definition that corresponse to the :model-type option
(pipeline & ops)
(probability-distributions->labels prob-dists)
(sobol-gridsearch opt-map)
(sobol-gridsearch opt-map start-idx)
Given an map of key->values where some of the values are gridsearch definitions produce a sequence of fully defined maps.
user> (require '[tech.v3.ml.gridsearch :as ml-gs])
nil
user> (def opt-map {:a (ml-gs/categorical [:a :b :c])
:b (ml-gs/linear 0.01 1 10)
:c :not-searched})
user> opt-map
{:a
{:tech.v3.ml.gridsearch/type :linear,
:start 0.0,
:end 2.0,
:n-steps 3,
:result-space [:a :b :c]}
...
user> (ml-gs/sobol-gridsearch opt-map)
({:a :b, :b 0.56, :c :not-searched}
{:a :c, :b 0.22999999999999998, :c :not-searched}
{:a :b, :b 0.78, :c :not-searched}
...
Given an map of key->values where some of the values are gridsearch definitions produce a sequence of fully defined maps. ```clojure user> (require '[tech.v3.ml.gridsearch :as ml-gs]) nil user> (def opt-map {:a (ml-gs/categorical [:a :b :c]) :b (ml-gs/linear 0.01 1 10) :c :not-searched}) user> opt-map {:a {:tech.v3.ml.gridsearch/type :linear, :start 0.0, :end 2.0, :n-steps 3, :result-space [:a :b :c]} ... user> (ml-gs/sobol-gridsearch opt-map) ({:a :b, :b 0.56, :c :not-searched} {:a :c, :b 0.22999999999999998, :c :not-searched} {:a :b, :b 0.78, :c :not-searched} ... ```
(thaw-model model)
(thaw-model model {:keys [thaw-fn]})
Thaw a model. Model's returned from train may be 'frozen' meaning a 'thaw' operation is needed in order to use the model. This happens for you during preduct but you may also cached the 'thawed' model on the model map under the ':thawed-model' keyword in order to do fast predictions on small datasets.
Thaw a model. Model's returned from train may be 'frozen' meaning a 'thaw' operation is needed in order to use the model. This happens for you during preduct but you may also cached the 'thawed' model on the model map under the ':thawed-model' keyword in order to do fast predictions on small datasets.
cljdoc is a website building & hosting documentation for Clojure/Script libraries
× close