Liking cljdoc? Tell your friends :D

scicloj.ml.core

Core functions for machine learninig and pipeline execution.

Functions are re-exported from:

  • scicloj.metamorph.ml.*
  • scicloj.metamorph.core
Core functions for machine learninig and pipeline execution.

Functions are re-exported from:

* scicloj.metamorph.ml.*
* scicloj.metamorph.core

raw docstring

->pipelineclj

(->pipeline ops)
(->pipeline config ops)

Create pipeline from declarative description.

Create pipeline from declarative description.
raw docstring

categoricalclj

(categorical value-vec)

Given a vector a categorical values create a gridsearch definition.

Given a vector a categorical values create a gridsearch definition.
raw docstring

classification-accuracyclj

(classification-accuracy lhs rhs)

correct/total. Model output is a sequence of probability distributions. label-seq is a sequence of values. The answer is considered correct if the key highest probability in the model output entry matches that label.

correct/total.
Model output is a sequence of probability distributions.
label-seq is a sequence of values.  The answer is considered correct
if the key highest probability in the model output entry matches
that label.
raw docstring

classification-lossclj

(classification-loss lhs rhs)

1.0 - classification-accuracy.

1.0 - classification-accuracy.
raw docstring

confusion-mapclj

(confusion-map predicted-labels labels)
(confusion-map predicted-labels labels normalize)

confusion-map->dsclj

(confusion-map->ds conf-matrix-map)
(confusion-map->ds conf-matrix-map normalize)

default-loss-fnclj

(default-loss-fn dataset)

Given a datset which must have exactly 1 inference target column return a default loss fn. If column is categorical, loss is tech.v3.ml.loss/classification-loss, else the loss is tech.v3.ml.loss/mae (mean average error).

Given a datset which must have exactly 1 inference target column return a default
loss fn. If column is categorical, loss is tech.v3.ml.loss/classification-loss, else
the loss is tech.v3.ml.loss/mae (mean average error).
raw docstring

default-result-dissoc-in-seqclj


define-model!clj

(define-model! model-kwd
               train-fn
               predict-fn
               {:keys [hyperparameters thaw-fn explain-fn options
                       documentation]})

Create a model definition. An ml model is a function that takes a dataset and an options map and returns a model. A model is something that, combined with a dataset, produces a inferred dataset.

Create a model definition.  An ml model is a function that takes a dataset and an
options map and returns a model.  A model is something that, combined with a dataset,
produces a inferred dataset.
raw docstring

evaluate-pipelinesclj

(evaluate-pipelines pipe-fn-seq train-test-split-seq metric-fn loss-or-accuracy)
(evaluate-pipelines pipe-fn-seq
                    train-test-split-seq
                    metric-fn
                    loss-or-accuracy
                    options)

Evaluates performance of a seq of metamorph pipelines, which are suposed to have a model as last step, which behaves correctly in mode :fit and :transform It calculates the loss, given as loss-fn of each pipeline in pipeline-fn-seq using all the train-test splits given in train-test-split-seq.

It runs the pipelines in mode :fit and in mode :transform for each pipeline-fn in pipe-fn-seq for each split in train-test-split-seq.

The function returns a seq of seqs of evaluation results per pipe-fn per train-test split.

  • pipe-fn-seq need to be sequence of functions which follow the metamorph approach. They should take as input the metamorph context map, which has the dataset under key :metamorph/data, manipulate it as needed for the transformation pipeline and read and write only to the context as needed. These type of functions get produced typically by calling scicloj.metamorph/pipeline

  • train-test-split-seq need to be a sequence of maps containing the train and test dataset (being tech.ml.dataset) at keys :train and :test. tableclot.api/split->seq produces such splits.

  • metric-fn Metric function to use. Typically comming from tech.v3.ml.loss loss-or-accuracy If the metric-fn is a loss or accuracy calculation. Can be :loss or :accuracy.

  • options map controls some mainly performance related parameters, which are:

    • :result-dissoc-in-seq - Controls how much information is returned for each cross validation. We call dissoc-in on every seq of this for the fit-ctx and transform-ctx before returning them. Default is
    [[:fit-ctx :metamorph/data]
     [:fit-ctx :scicloj.metamorph.ml/target-ds]
     [:transform-ctx :metamorph/data]
     [:transform-ctx :scicloj.metamorph.ml/target-ds]
     [:transform-ctx :scicloj.metamorph.ml/feature-ds]]
    
    • :return-best-pipeline-only - Only return information of the best performing pipeline. Default is true.
    • :return-best-crossvalidation-only - Only return information of the best crossvalidation (per pipeline returned). Default is true.
    • :map-fn - Controls parallelism, so if we use map (:map) or pmap (:pmap) to map over different pipelines. Default :pmap
    • :evaluation-handler-fn - Gets called once with the complete result of an evluation step. Its return alue is ignre ande default i a noop.

This function expects as well the ground truth of the target variable into a specific key in the context :scicloj.metamorph.ml/target-ds See here for the simplest way to set this up: https://github.com/behrica/metamorph.ml/blob/main/README.md

The function scicloj.ml.metamorph/model does this correctly.

Evaluates performance of a seq of metamorph pipelines, which are suposed to have a  model as last step, which behaves correctly  in mode :fit and 
:transform
It calculates the loss, given as `loss-fn` of each pipeline in `pipeline-fn-seq` using all the train-test splits given in `train-test-split-seq`.

It runs the pipelines  in mode  :fit and in mode :transform for each pipeline-fn in `pipe-fn-seq` for each split in `train-test-split-seq`.

The function returns a seq of seqs of evaluation results per pipe-fn per train-test split.

* `pipe-fn-seq` need to be  sequence of functions which follow the metamorph approach. They should take as input the metamorph context map,
 which has the dataset under key :metamorph/data, manipulate it as needed for the transformation pipeline and read and write only to the
 context as needed. These type of functions get produced typically by calling `scicloj.metamorph/pipeline`

* `train-test-split-seq` need to be a sequence of maps containing the  train and test dataset (being tech.ml.dataset) at keys :train and :test.
 `tableclot.api/split->seq` produces such splits.

* `metric-fn` Metric function to use. Typically comming from `tech.v3.ml.loss`
`loss-or-accuracy` If the metric-fn is a loss or accuracy calculation. Can be :loss or :accuracy.

* `options` map controls some mainly performance related parameters, which are:

    * `:result-dissoc-in-seq`  - Controls how much information is returned for each cross validation. We call `dissoc-in` on every seq of this for the `fit-ctx` and `transform-ctx` before returning them. Default is

    ```
    [[:fit-ctx :metamorph/data]
     [:fit-ctx :scicloj.metamorph.ml/target-ds]
     [:transform-ctx :metamorph/data]
     [:transform-ctx :scicloj.metamorph.ml/target-ds]
     [:transform-ctx :scicloj.metamorph.ml/feature-ds]]
    ```

    * `:return-best-pipeline-only` - Only return information of the best performing pipeline. Default is true.
    * `:return-best-crossvalidation-only` - Only return information of the best crossvalidation (per pipeline returned). Default is true.
    * `:map-fn` - Controls parallelism, so if we use map (:map) or pmap (:pmap) to map over different pipelines. Default :pmap
    * `:evaluation-handler-fn` - Gets called once with the complete result of an evluation step. Its return alue is ignre ande default i a noop.

This function expects as well the ground truth of the target variable into
a specific key in the context `:scicloj.metamorph.ml/target-ds`
See here for the simplest way to set this up: https://github.com/behrica/metamorph.ml/blob/main/README.md

The function [[scicloj.ml.metamorph/model]] does this correctly.
raw docstring

explainclj

(explain model & [options])

Explain (if possible) an ml model. A model explanation is a model-specific map of data that usually indicates some level of mapping between features and importance

Explain (if possible) an ml model.  A model explanation is a model-specific map
of data that usually indicates some level of mapping between features and importance
raw docstring

hyperparametersclj

(hyperparameters model-kwd)

Get the hyperparameters for this model definition

Get the hyperparameters for this model definition
raw docstring

liftclj

(lift op & params)

Create context aware version of the given op function. :metamorph/data will be used as a first parameter.

Result of the op function will be stored under :metamorph/data

Create context aware version of the given `op` function. `:metamorph/data` will be used as a first parameter.

Result of the `op` function will be stored under `:metamorph/data`
raw docstring

linearclj

(linear start end)
(linear start end n-steps)
(linear start end n-steps res-dtype-or-space)

Create a gridsearch definition which does a linear search.

  • res-dtype-or-space map be either a datatype keyword or a vector of categorical values.
Create a gridsearch definition which does a linear search.

* res-dtype-or-space map be either a datatype keyword or a vector
  of categorical values.
raw docstring

maeclj

(mae predictions labels)

mean absolute error

mean absolute error
raw docstring

model-definition-namesclj

(model-definition-names)

Return a list of all registered model defintion names.

Return a list of all registered model defintion names.
raw docstring

model-definitions*clj

Map of model kwd to model definition

Map of model kwd to model definition
raw docstring

mseclj

(mse predictions labels)

mean squared error

mean squared error
raw docstring

options->model-defclj

(options->model-def options)

Return the model definition that corresponse to the :model-type option

Return the model definition that corresponse to the :model-type option
raw docstring

pipelineclj

(pipeline & ops)

probability-distributions->labelsclj

(probability-distributions->labels prob-dists)

rmseclj

(rmse predictions labels)

root mean squared error

root mean squared error
raw docstring

sobol-gridsearchclj

(sobol-gridsearch opt-map)
(sobol-gridsearch opt-map start-idx)

Given an map of key->values where some of the values are gridsearch definitions produce a sequence of fully defined maps.

user> (require '[tech.v3.ml.gridsearch :as ml-gs])
nil
user> (def opt-map  {:a (ml-gs/categorical [:a :b :c])
                     :b (ml-gs/linear 0.01 1 10)
                     :c :not-searched})
user> opt-map
{:a
 {:tech.v3.ml.gridsearch/type :linear,
  :start 0.0,
  :end 2.0,
  :n-steps 3,
  :result-space [:a :b :c]}
  ...

user> (ml-gs/sobol-gridsearch opt-map)
({:a :b, :b 0.56, :c :not-searched}
 {:a :c, :b 0.22999999999999998, :c :not-searched}
 {:a :b, :b 0.78, :c :not-searched}
...
Given an map of key->values where some of the values are gridsearch definitions
  produce a sequence of fully defined maps.


```clojure
user> (require '[tech.v3.ml.gridsearch :as ml-gs])
nil
user> (def opt-map  {:a (ml-gs/categorical [:a :b :c])
                     :b (ml-gs/linear 0.01 1 10)
                     :c :not-searched})
user> opt-map
{:a
 {:tech.v3.ml.gridsearch/type :linear,
  :start 0.0,
  :end 2.0,
  :n-steps 3,
  :result-space [:a :b :c]}
  ...

user> (ml-gs/sobol-gridsearch opt-map)
({:a :b, :b 0.56, :c :not-searched}
 {:a :c, :b 0.22999999999999998, :c :not-searched}
 {:a :b, :b 0.78, :c :not-searched}
...
```
  
raw docstring

thaw-modelclj

(thaw-model model)
(thaw-model model {:keys [thaw-fn]})

Thaw a model. Model's returned from train may be 'frozen' meaning a 'thaw' operation is needed in order to use the model. This happens for you during preduct but you may also cached the 'thawed' model on the model map under the ':thawed-model' keyword in order to do fast predictions on small datasets.

Thaw a model.  Model's returned from train may be 'frozen' meaning a 'thaw'
operation is needed in order to use the model.  This happens for you during preduct
but you may also cached the 'thawed' model on the model map under the
':thawed-model'  keyword in order to do fast predictions on small datasets.
raw docstring

cljdoc is a website building & hosting documentation for Clojure/Script libraries

× close