Core functions for machine learninig and pipeline execution.
Requiring this namesspace registers as well the model in:
Functions are re-exported from:
Core functions for machine learninig and pipeline execution. Requiring this namesspace registers as well the model in: * scicloj.ml.smile.classification * scicloj.ml.smile.regression * scicloj.ml.xgboost Functions are re-exported from: * scicloj.metamorph.ml.* * scicloj.metamorph.core
(->pipeline ops)
(->pipeline config ops)
Create pipeline from declarative description.
Create pipeline from declarative description.
(categorical value-vec)
Given a vector a categorical values create a gridsearch definition.
Given a vector a categorical values create a gridsearch definition.
(classification-accuracy lhs rhs)
correct/total. Model output is a sequence of probability distributions. label-seq is a sequence of values. The answer is considered correct if the key highest probability in the model output entry matches that label.
correct/total. Model output is a sequence of probability distributions. label-seq is a sequence of values. The answer is considered correct if the key highest probability in the model output entry matches that label.
(classification-loss lhs rhs)
1.0 - classification-accuracy.
1.0 - classification-accuracy.
(confusion-map predicted-labels labels)
(confusion-map predicted-labels labels normalize)
(confusion-map->ds conf-matrix-map)
(confusion-map->ds conf-matrix-map normalize)
(def-ctx varname)
Convenience macro for defining pipelined operations that bind the current value of the context to a var, for simple debugging purposes.
Convenience macro for defining pipelined operations that bind the current value of the context to a var, for simple debugging purposes.
(default-loss-fn dataset)
Given a datset which must have exactly 1 inference target column return a default loss fn. If column is categorical, loss is tech.v3.ml.loss/classification-loss, else the loss is tech.v3.ml.loss/mae (mean average error).
Given a datset which must have exactly 1 inference target column return a default loss fn. If column is categorical, loss is tech.v3.ml.loss/classification-loss, else the loss is tech.v3.ml.loss/mae (mean average error).
(define-model! model-kwd
train-fn
predict-fn
{:keys [hyperparameters thaw-fn explain-fn options
documentation]})
Create a model definition. An ml model is a function that takes a dataset and an options map and returns a model. A model is something that, combined with a dataset, produces a inferred dataset.
Create a model definition. An ml model is a function that takes a dataset and an options map and returns a model. A model is something that, combined with a dataset, produces a inferred dataset.
(do-ctx f)
Apply f:: ctx -> any, ignore the result, leaving pipeline unaffected. Akin to using doseq for side-effecting operations like printing, visualization, or binding to vars for debugging.
Apply f:: ctx -> any, ignore the result, leaving pipeline unaffected. Akin to using doseq for side-effecting operations like printing, visualization, or binding to vars for debugging.
(evaluate-pipelines pipe-fn-seq train-test-split-seq metric-fn loss-or-accuracy)
(evaluate-pipelines pipe-fn-or-decl-seq
train-test-split-seq
metric-fn
loss-or-accuracy
options)
Evaluates performance of a seq of metamorph pipelines, which are suposed to have a model as last step, which behaves correctly in mode :fit and
:transform
It calculates the loss, given as loss-fn
of each pipeline in pipeline-fn-seq
using all the train-test splits given in train-test-split-seq
.
It runs the pipelines in mode :fit and in mode :transform for each pipeline-fn in pipe-fn-seq
for each split in train-test-split-seq
.
The function returns a seq of seqs of evaluation results per pipe-fn per train-test split.
pipe-fn-or-decl-seq
need to be sequence of functions or pipline declarations which follow the metamorph approach. They should take as input the metamorph context map,
which has the dataset under key :metamorph/data, manipulate it as needed for the transformation pipeline and read and write only to the
context as needed. These type of functions get produced typically by calling scicloj.metamorph/pipeline
train-test-split-seq
need to be a sequence of maps containing the train and test dataset (being tech.ml.dataset) at keys :train and :test.
tableclot.api/split->seq
produces such splits.
metric-fn
Metric function to use. Typically comming from tech.v3.ml.loss
loss-or-accuracy
If the metric-fn is a loss or accuracy calculation. Can be :loss or :accuracy.
options
map controls some mainly performance related parameters, which are:
:result-dissoc-in-seq
- Controls how much information is returned for each cross validation. We call dissoc-in
on every seq of this for the fit-ctx
and transform-ctx
before returning them. Default is
[[:fit-ctx :metamorph/data]
[:train-transform :ctx :metamorph/data] [:train-transform :ctx :scicloj.metamorph.ml/target-ds] [:train-transform :ctx :scicloj.metamorph.ml/feature-ds]
[:test-transform :ctx :metamorph/data] [:test-transform :ctx :scicloj.metamorph.ml/target-ds] [:test-transform :ctx :scicloj.metamorph.ml/feature-ds]] ```
* `:return-best-pipeline-only` - Only return information of the best performing pipeline. Default is true.
* `:return-best-crossvalidation-only` - Only return information of the best crossvalidation (per pipeline returned). Default is true.
* `:map-fn` - Controls parallelism, so if we use map (:map) , pmap (:pmap) or :mapv to map over different pipelines. Default :pmap
* `:evaluation-handler-fn` - Gets called once with the complete result of an individual evaluation step. Its result is ignre and it's default is a noop.
* `:other-metrices` Specifies other metrices to be calculated during evaluation
This function expects as well the ground truth of the target variable into
a specific key in the context :scicloj.metamorph.ml/target-ds
See here for the simplest way to set this up: https://github.com/behrica/metamorph.ml/blob/main/README.md
The function scicloj.ml.metamorph/model
does this correctly.
Evaluates performance of a seq of metamorph pipelines, which are suposed to have a model as last step, which behaves correctly in mode :fit and :transform It calculates the loss, given as `loss-fn` of each pipeline in `pipeline-fn-seq` using all the train-test splits given in `train-test-split-seq`. It runs the pipelines in mode :fit and in mode :transform for each pipeline-fn in `pipe-fn-seq` for each split in `train-test-split-seq`. The function returns a seq of seqs of evaluation results per pipe-fn per train-test split. * `pipe-fn-or-decl-seq` need to be sequence of functions or pipline declarations which follow the metamorph approach. They should take as input the metamorph context map, which has the dataset under key :metamorph/data, manipulate it as needed for the transformation pipeline and read and write only to the context as needed. These type of functions get produced typically by calling `scicloj.metamorph/pipeline` * `train-test-split-seq` need to be a sequence of maps containing the train and test dataset (being tech.ml.dataset) at keys :train and :test. `tableclot.api/split->seq` produces such splits. * `metric-fn` Metric function to use. Typically comming from `tech.v3.ml.loss` `loss-or-accuracy` If the metric-fn is a loss or accuracy calculation. Can be :loss or :accuracy. * `options` map controls some mainly performance related parameters, which are: * `:result-dissoc-in-seq` - Controls how much information is returned for each cross validation. We call `dissoc-in` on every seq of this for the `fit-ctx` and `transform-ctx` before returning them. Default is ``` [[:fit-ctx :metamorph/data] [:train-transform :ctx :metamorph/data] [:train-transform :ctx :scicloj.metamorph.ml/target-ds] [:train-transform :ctx :scicloj.metamorph.ml/feature-ds] [:test-transform :ctx :metamorph/data] [:test-transform :ctx :scicloj.metamorph.ml/target-ds] [:test-transform :ctx :scicloj.metamorph.ml/feature-ds]] ``` * `:return-best-pipeline-only` - Only return information of the best performing pipeline. Default is true. * `:return-best-crossvalidation-only` - Only return information of the best crossvalidation (per pipeline returned). Default is true. * `:map-fn` - Controls parallelism, so if we use map (:map) , pmap (:pmap) or :mapv to map over different pipelines. Default :pmap * `:evaluation-handler-fn` - Gets called once with the complete result of an individual evaluation step. Its result is ignre and it's default is a noop. * `:other-metrices` Specifies other metrices to be calculated during evaluation This function expects as well the ground truth of the target variable into a specific key in the context `:scicloj.metamorph.ml/target-ds` See here for the simplest way to set this up: https://github.com/behrica/metamorph.ml/blob/main/README.md The function [[scicloj.ml.metamorph/model]] does this correctly.
(explain model & [options])
Explain (if possible) an ml model. A model explanation is a model-specific map of data that usually indicates some level of mapping between features and importance
Explain (if possible) an ml model. A model explanation is a model-specific map of data that usually indicates some level of mapping between features and importance
(fit data & ops)
Helper function which executes pipeline op(s) in mode :fit on the given data and returns the fitted ctx.
Main use is for cases in which the pipeline gets executed ones and no model is part of the pipeline.
Helper function which executes pipeline op(s) in mode :fit on the given data and returns the fitted ctx. Main use is for cases in which the pipeline gets executed ones and no model is part of the pipeline.
(fit-pipe data pipe-fn)
Helper function which executes pipeline op(s) in mode :fit on the given data and returns the fitted ctx.
Main use is for cases in which the pipeline gets executed ones and no model is part of the pipeline.
Helper function which executes pipeline op(s) in mode :fit on the given data and returns the fitted ctx. Main use is for cases in which the pipeline gets executed ones and no model is part of the pipeline.
(hyperparameters model-kwd)
Get the hyperparameters for this model definition
Get the hyperparameters for this model definition
(lift op & params)
Create context aware version of the given op
function. :metamorph/data
will be used as a first parameter.
Result of the op
function will be stored under :metamorph/data
Create context aware version of the given `op` function. `:metamorph/data` will be used as a first parameter. Result of the `op` function will be stored under `:metamorph/data`
(linear start end)
(linear start end n-steps)
(linear start end n-steps res-dtype-or-space)
Create a gridsearch definition which does a linear search.
Create a gridsearch definition which does a linear search. * res-dtype-or-space map be either a datatype keyword or a vector of categorical values.
(model-definition-names)
Return a list of all registered model defintion names.
Return a list of all registered model defintion names.
Map of model kwd to model definition
Map of model kwd to model definition
(options->model-def options)
Return the model definition that corresponse to the :model-type option
Return the model definition that corresponse to the :model-type option
(pipe-it data & ops)
Takes a data objects, executes the pipeline op(s) with it in :metamorph/data in mode :fit and returns content of :metamorph/data. Usefull to use execute a pipeline of pure data->data functions on some data
Takes a data objects, executes the pipeline op(s) with it in :metamorph/data in mode :fit and returns content of :metamorph/data. Usefull to use execute a pipeline of pure data->data functions on some data
(pipeline & ops)
Create a metamorph pipeline function out of operators.
ops
are metamorph compliant functions (basicaly fn, which takle a ctx as first argument)
This function returns a function, whcih can ve execute with a ctx as parameter.
Create a metamorph pipeline function out of operators. `ops` are metamorph compliant functions (basicaly fn, which takle a ctx as first argument) This function returns a function, whcih can ve execute with a ctx as parameter.
(predict dataset model)
Predict returns a dataset with only the predictions in it.
Predict returns a dataset with only the predictions in it. * For regression, a single column dataset is returned with the column named after the target * For classification, a dataset is returned with a float64 column for each target value and values that describe the probability distribution.
(sobol-gridsearch opt-map)
(sobol-gridsearch opt-map start-idx)
Given an map of key->values where some of the values are gridsearch definitions produce a sequence of fully defined maps.
user> (require '[tech.v3.ml.gridsearch :as ml-gs])
nil
user> (def opt-map {:a (ml-gs/categorical [:a :b :c])
:b (ml-gs/linear 0.01 1 10)
:c :not-searched})
user> opt-map
{:a
{:tech.v3.ml.gridsearch/type :linear,
:start 0.0,
:end 2.0,
:n-steps 3,
:result-space [:a :b :c]}
...
user> (ml-gs/sobol-gridsearch opt-map)
({:a :b, :b 0.56, :c :not-searched}
{:a :c, :b 0.22999999999999998, :c :not-searched}
{:a :b, :b 0.78, :c :not-searched}
...
Given an map of key->values where some of the values are gridsearch definitions produce a sequence of fully defined maps. ```clojure user> (require '[tech.v3.ml.gridsearch :as ml-gs]) nil user> (def opt-map {:a (ml-gs/categorical [:a :b :c]) :b (ml-gs/linear 0.01 1 10) :c :not-searched}) user> opt-map {:a {:tech.v3.ml.gridsearch/type :linear, :start 0.0, :end 2.0, :n-steps 3, :result-space [:a :b :c]} ... user> (ml-gs/sobol-gridsearch opt-map) ({:a :b, :b 0.56, :c :not-searched} {:a :c, :b 0.22999999999999998, :c :not-searched} {:a :b, :b 0.78, :c :not-searched} ... ```
(thaw-model model)
(thaw-model model {:keys [thaw-fn]})
Thaw a model. Model's returned from train may be 'frozen' meaning a 'thaw' operation is needed in order to use the model. This happens for you during preduct but you may also cached the 'thawed' model on the model map under the ':thawed-model' keyword in order to do fast predictions on small datasets.
Thaw a model. Model's returned from train may be 'frozen' meaning a 'thaw' operation is needed in order to use the model. This happens for you during preduct but you may also cached the 'thawed' model on the model map under the ':thawed-model' keyword in order to do fast predictions on small datasets.
(train dataset options)
Given a dataset and an options map produce a model. The model-type keyword in the options map selects which model definition to use to train the model. Returns a map containing at least:
:model-data
- the result of that definitions's train-fn.:options
- the options passed in.:id
- new randomly generated UUID.:feature-columns
- vector of column names.:target-columns
- vector of column names.Given a dataset and an options map produce a model. The model-type keyword in the options map selects which model definition to use to train the model. Returns a map containing at least: * `:model-data` - the result of that definitions's train-fn. * `:options` - the options passed in. * `:id` - new randomly generated UUID. * `:feature-columns` - vector of column names. * `:target-columns` - vector of column names.
(transform-pipe data pipe-fn ctx)
Helper functions which execute the passed pipe-fn
on the given data
in mode :transform.
It merges the data into the provided ctx
while doing so.
Helper functions which execute the passed `pipe-fn` on the given `data` in mode :transform. It merges the data into the provided `ctx` while doing so.
cljdoc is a website building & hosting documentation for Clojure/Script libraries
× close