Liking cljdoc? Tell your friends :D

scicloj.metamorph.ml


augmentclj

(augment model data)

Adds informations about observations to a dataset

Potential row names are these: https://raw.githubusercontent.com/scicloj/metamorph.ml/main/resources/columms-augment.edn

No other row names should be used. Each model will only return a small subset of possible rows.

A model might not implement this function, and then the dataset is returned unchanged.

 Adds informations about observations to a dataset

 Potential row names are these:
 https://raw.githubusercontent.com/scicloj/metamorph.ml/main/resources/columms-augment.edn

No other row names should be used.
Each model will only return a small subset of possible rows.

 A model might not implement this function, and then the dataset is
 returned unchanged.
sourceraw docstring

default-loss-fnclj

(default-loss-fn dataset)

Given a datset which must have exactly 1 inference target column return a default loss fn. If column is categorical, loss is tech.v3.ml.loss/classification-loss, else the loss is tech.v3.ml.loss/mae (mean average error).

Given a datset which must have exactly 1 inference target column return a default
loss fn. If column is categorical, loss is tech.v3.ml.loss/classification-loss, else
the loss is tech.v3.ml.loss/mae (mean average error).
sourceraw docstring

default-result-dissoc-in-fnclj

(default-result-dissoc-in-fn result)
source

default-result-dissoc-in-seqclj

source

define-model!clj

(define-model! model-kwd
               train-fn
               predict-fn
               {:keys [hyperparameters thaw-fn explain-fn loglik-fn tidy-fn
                       glance-fn augment-fn options documentation unsupervised?]
                :as opts})

Create a model definition. An ml model is a function that takes a dataset and an options map and returns a model. A model is something that, combined with a dataset, produces a inferred dataset.

Create a model definition.  An ml model is a function that takes a dataset and an
options map and returns a model.  A model is something that, combined with a dataset,
produces a inferred dataset.
sourceraw docstring

ensemble-pipeclj

(ensemble-pipe pipes)

Creates an ensemble pipeline function out of various pipelines. The different predictions get combined via majority voting. Can be used in the same way as any other pipeline.

Creates an ensemble pipeline function out of various pipelines. The different predictions
get combined via majority voting.
Can be used in the same way as any other pipeline.
sourceraw docstring

evaluate-pipelinesclj

(evaluate-pipelines pipe-fn-seq train-test-split-seq metric-fn loss-or-accuracy)
(evaluate-pipelines pipe-fn-or-decl-seq
                    train-test-split-seq
                    metric-fn
                    loss-or-accuracy
                    options)

Evaluates the performance of a seq of metamorph pipelines, which are suposed to have a model as last step under key :model, which behaves correctly in mode :fit and :transform. The function scicloj.metamorph.ml/model is such function behaving correctly.

This function calculates the accuracy or loss, given as metric-fn of each pipeline in pipeline-fn-seq using all the train-test splits given in train-test-split-seq.

It runs the pipelines in mode :fit and in mode :transform for each pipeline-fn in pipe-fn-seq for each split in train-test-split-seq.

The function returns a seq of seqs of evaluation results per pipe-fn per train-test split. Each of the evaluation results is a context map, which is specified in the malli schema attached to this function.

  • pipe-fn-or-decl-seq need to be sequence of pipeline functions or pipline declarations which follow the metamorph approach. These type of functions get produced typically by calling scicloj.metamorph/pipeline. Documentation is here:

  • train-test-split-seq need to be a sequence of maps containing the train and test dataset (being tech.ml.dataset) at keys :train and :test. tablecloth.api/split->seq produces such splits. Supervised models require both keys (:train and :test), while unsupervised models only use :train

  • metric-fn Metric function to use. Typically comming from tech.v3.ml.loss. For supervised models the metric-fn receives the trueth and predicted values and should return a single double number. The metric fns receives a a seq without categorical maps. These get reverse-applied to the prediction , if present, before passing the values to the metriic fn. For unsupervised models the function receives the fitted ctx and should return a singel double number as well. This metric will be used to sort and eventualy filter the result, depending on the options (:return-best-pipeline-only and :return-best-crossvalidation-only). The notion of best comes from metric-fn combined with loss-and-accuracy

  • loss-or-accuracy If the metric-fn is a loss or accuracy calculation. Can be :loss or :accuracy. Decided the notion of best model. In case of :loss pipelines with lower metric are better, in case of :accuracy pipelines with higher value are better.

  • options map controls some mainly performance related parameters. These function can potentialy result in a large ammount of data, able to bring the JVM into out-of-memory. We can control how many details the function returns by the following parameter: The default are quite aggresive in removing details, and this can be tweaked further into more or less details via:

    • :return-best-pipeline-only - Only return information of the best performing pipeline. Default is true.

    • :return-best-crossvalidation-only - Only return information of the best crossvalidation (per pipeline returned). Default is true.

    • :map-fn - Controls parallelism, so if we use map (:map) , pmap (:pmap) or :mapv to map over different pipelines. Default is :map

    • :evaluation-handler-fn - Gets called once with the complete result of an individual pipeline evaluation. It can be used to adapt the data returned for each evaluation and / or to make side effects using the evaluatio data. The result of this function is taken as evaluation result. It need to contain as a minumum this 2 key paths: [:train-transform :metric] [:test-transform :metric] All other evalution data can be removed, if desired.

      It can be used for side effects as well, like experiment tracking on disk. The passed in evaluation result is a map with all information on the current evaluation, including the datasets used.

      The default handler function is: scicloj.metamorph.ml/default-result-dissoc--in-fn which removes the often large model object and the training data. identity can be use to get all evaluation data. scicloj.metamorph.ml/result-dissoc-in-seq--all reduces even more agressively.

    • :other-metrices Specifies other metrices to be calculated during evaluation

This function expects as well the ground truth of the target variable into a specific key in the context at key :model :scicloj.metamorph.ml/target-ds See here for the simplest way to set this up: https://github.com/behrica/metamorph.ml/blob/main/README.md The function [[scicloj.ml.metamorph/model]] does this correctly.

Evaluates the performance of a seq of metamorph pipelines, which are suposed to have a model as last step under key :model,
which behaves correctly  in mode :fit and  :transform. The function `scicloj.metamorph.ml/model` is such function behaving correctly.

 This function calculates the accuracy or loss, given as `metric-fn` of each pipeline in `pipeline-fn-seq` using all the train-test splits
given in  `train-test-split-seq`.

 It runs the pipelines  in mode  :fit and in mode :transform for each pipeline-fn in `pipe-fn-seq` for each split in `train-test-split-seq`.

 The function returns a seq of seqs of evaluation results per pipe-fn per train-test split.
 Each of the evaluation results is a context map, which is specified in the malli schema attached to this function.

 * `pipe-fn-or-decl-seq` need to be  sequence of pipeline functions or pipline declarations which follow the metamorph approach.
    These type of functions get produced typically by calling `scicloj.metamorph/pipeline`. Documentation is here:

 * `train-test-split-seq` need to be a sequence of maps containing the  train and test dataset (being tech.ml.dataset) at keys :train and :test.
  `tablecloth.api/split->seq` produces such splits. Supervised models require both keys (:train and :test), while unsupervised models only use :train

 * `metric-fn` Metric function to use. Typically comming from `tech.v3.ml.loss`. For supervised models the metric-fn receives the trueth
    and predicted values and should return a single double number.  The metric fns receives a a seq *without* categorical maps. These
    get reverse-applied to the prediction , if present, before passing the values to the metriic fn.
    For unsupervised models the function receives the fitted ctx
    and should return a singel double number as well. This metric will be used to sort and eventualy filter the result, depending on the options
    (:return-best-pipeline-only   and :return-best-crossvalidation-only). The notion of `best` comes from metric-fn combined with loss-and-accuracy


 * `loss-or-accuracy` If the metric-fn is a loss or accuracy calculation. Can be :loss or :accuracy. Decided the notion of `best` model.
    In case of :loss pipelines with lower metric are better, in case of :accuracy pipelines with higher value are better.

* `options` map controls some mainly performance related parameters. These function can potentialy result in a large ammount of data,
  able to bring the JVM into out-of-memory. We can control how many details the function returns by the following parameter: 
   The default are quite aggresive in removing details, and this can be tweaked further into more or less details via:
   


     * `:return-best-pipeline-only` - Only return information of the best performing pipeline. Default is true.
     * `:return-best-crossvalidation-only` - Only return information of the best crossvalidation (per pipeline returned). Default is `true`.
     * `:map-fn` - Controls parallelism, so if we use map (:map) , pmap (:pmap) or :mapv to map over different pipelines. Default is `:map`
     * `:evaluation-handler-fn` - Gets called once with the complete result of an individual pipeline evaluation.
         It can be used to adapt the data returned for each evaluation and / or to make side effects using
         the evaluatio data.
         The result of this function is taken as evaluation result. It need to  contain as a minumum this 2 key paths:
         [:train-transform :metric]
         [:test-transform :metric]
         All other evalution data can be removed, if desired.

         It can be used for side effects as well, like experiment tracking on disk.
         The passed in evaluation result is a map with all information on the current evaluation, including the datasets used.

         The default handler function is:  `scicloj.metamorph.ml/default-result-dissoc--in-fn` which removes the often large
         model object and the training data.
         `identity` can be use to get all evaluation data.
         `scicloj.metamorph.ml/result-dissoc-in-seq--all` reduces even more agressively.



     * `:other-metrices` Specifies other metrices to be calculated during evaluation

 This function expects as well the ground truth of the target variable into
 a specific key in the context at key `:model :scicloj.metamorph.ml/target-ds`
 See here for the simplest way to set this up: https://github.com/behrica/metamorph.ml/blob/main/README.md
 The function [[scicloj.ml.metamorph/model]] does this correctly.
sourceraw docstring

explainclj

(explain model & [options])

Explain (if possible) an ml model. A model explanation is a model-specific map of data that usually indicates some level of mapping between features and importance

Explain (if possible) an ml model.  A model explanation is a model-specific map
of data that usually indicates some level of mapping between features and importance
sourceraw docstring

get-categorical-mapsclj

(get-categorical-maps ds)
source

glanceclj

(glance model)

Gives a glance on the model, returning a dataset with model information about the entire model.

Potential row names are these: https://raw.githubusercontent.com/scicloj/metamorph.ml/main/resources/columms-glance.edn

No other row names should be used. Each model will only return a small subset of possible rows. The list of allowed row names might change over time.

A model might not implement this function, and then an empty dataset will be returned.

Gives a glance on the model, returning a dataset with model information
 about the entire model.

 Potential row names are these:
 https://raw.githubusercontent.com/scicloj/metamorph.ml/main/resources/columms-glance.edn

No other row names should be used.
Each model will only return a small subset of possible rows.
The list of allowed row names might change over time.

A model might not implement this function, and then an empty dataset will be returned.
sourceraw docstring

hyperparametersclj

(hyperparameters model-kwd)

Get the hyperparameters for this model definition

Get the hyperparameters for this model definition
sourceraw docstring

loglikclj

(loglik model y yhat)
source

modelclj

(model options)

Executes a machine learning model in train/predict (depending on :mode) from the metamorph.ml model registry.

The model is passed between both invocation via the shared context ctx in a key (a step indentifier) which is passed in key :metamorph/id and guarantied to be unique for each pipeline step.

The function writes and reads into this common context key.

Options:

  • :model-type - Keyword for the model to use

Further options get passed to train functions and are model specific.

See here for an overview for the models build into scicloj.ml:

https://scicloj.github.io/scicloj.ml-tutorials/userguide-models.html

Other libraries might contribute other models, which are documented as part of the library.

metamorph.
Behaviour in mode :fitCalls scicloj.metamorph.ml/train using data in :metamorph/data and optionsand stores trained model in ctx under key in :metamorph/id
Behaviour in mode :transformReads trained model from ctx and calls scicloj.metamorph.ml/predict with the model in $id and data in :metamorph/data
Reads keys from ctxIn mode :transform : Reads trained model to use for prediction from key in :metamorph/id.
Writes keys to ctxIn mode :fit : Stores trained model in key $id and writes feature-ds and target-ds before prediction into ctx at :scicloj.metamorph.ml/feature-ds /:scicloj.metamorph.ml/target-ds

See as well:

  • scicloj.metamorph.ml/train
  • scicloj.metamorph.ml/predict
Executes a machine learning model in train/predict (depending on :mode)
from the `metamorph.ml` model registry.

The model is passed between both invocation via the shared context ctx in a
key (a step indentifier) which is passed in key `:metamorph/id` and guarantied to be unique for each
pipeline step.

The function writes and reads into this common context key.

Options:
- `:model-type` - Keyword for the model to use

Further options get passed to `train` functions and are model specific.

See here for an overview for the models build into scicloj.ml:


https://scicloj.github.io/scicloj.ml-tutorials/userguide-models.html

Other libraries might contribute other models,
which are documented as part of the library.


metamorph                            | .
-------------------------------------|----------------------------------------------------------------------------
Behaviour in mode :fit               | Calls `scicloj.metamorph.ml/train` using data in `:metamorph/data` and `options`and stores trained model in ctx under key in `:metamorph/id`
Behaviour in mode :transform         | Reads trained model from ctx and calls `scicloj.metamorph.ml/predict` with the model in $id and data in `:metamorph/data`
Reads keys from ctx                  | In mode `:transform` : Reads trained model to use for prediction from key in `:metamorph/id`.
Writes keys to ctx                   | In mode `:fit` : Stores trained model in key $id and writes feature-ds and target-ds before prediction into ctx at `:scicloj.metamorph.ml/feature-ds` /`:scicloj.metamorph.ml/target-ds`




See as well:

* `scicloj.metamorph.ml/train`
* `scicloj.metamorph.ml/predict`

sourceraw docstring

model-definition-namesclj

(model-definition-names)

Return a list of all registered model defintion names.

Return a list of all registered model defintion names.
sourceraw docstring

model-definitions*clj

Map of model kwd to model definition

Map of model kwd to model definition
sourceraw docstring

options->model-defclj

(options->model-def options)

Return the model definition that corresponse to the :model-type option

Return the model definition that corresponse to the :model-type option
sourceraw docstring

predictclj

(predict dataset model)

Predict returns a dataset with only the predictions in it.

  • For regression, a single column dataset is returned with the column named after the target
  • For classification, a dataset is returned with a float64 column for each target value and values that describe the probability distribution.

Each implementing model should construct its prediction in a shape expressed by :target-column
:target-datatypes :target-categorical-maps

it is receiving.

Any implementing model need to behave symetric between the 'datatype in the target columns of training data' and the 'datatype of the prediction columns` A model can decide to not accept certaiin dataypes in the target columns of training data. (and fail with exception). But any model should try to minimize this and accept for categorical data:

  • all numeric types ( :int32, :int64, :float32, :float64)
  • string
  • categorical maps

It NEED to be symetric, and return the same datatype in prediction as it receives in training: numeric in train -> same numeric in predict string in train -> string in predict categorical map in train -> equivalent categorical map in predict

ml/train passes the needed information of the rain target column to the model implementaion to do this.

Predict returns a dataset with only the predictions in it.

* For regression, a single column dataset is returned with the column named after the
  target
* For classification, a dataset is returned with a float64 column for each target
  value and values that describe the probability distribution.
 
Each implementing model should construct its prediction in a shape expressed by
 :target-column  
 :target-datatypes 
 :target-categorical-maps  

 it is receiving.

 Any implementing model need to behave symetric between the 'datatype in the target columns
 of training data' and the 'datatype of the prediction columns`
 A model can decide to not accept certaiin dataypes in the target columns of training data.
 (and fail with exception). But any model should try to minimize this and accept for categorical data:

 - all numeric types ( :int32, :int64, :float32, :float64)
 - string
 - categorical maps

 It NEED to be symetric, and return the same datatype in prediction as it receives in training:
 numeric in train -> same numeric in predict
 string in train -> string in predict
 categorical map in train -> equivalent categorical map in predict
 
 ml/train passes the needed information of the rain target column to the model implementaion to do this.

 
sourceraw docstring

result-dissoc-in-seq--allclj

source

result-dissoc-in-seq--all-fnclj

(result-dissoc-in-seq--all-fn result)
source

result-dissoc-in-seq--ctxsclj

source

result-dissoc-in-seq-ctx-fnclj

(result-dissoc-in-seq-ctx-fn result)
source

scoreclj

(score predictions-ds trueth-ds target-column-name metric-fn other-metrices)
source

thaw-modelclj

(thaw-model model)
(thaw-model model {:keys [thaw-fn] :as opts})

Thaw a model. Model's returned from train may be 'frozen' meaning a 'thaw' operation is needed in order to use the model. This happens for you during predict but you may also cached the 'thawed' model on the model map under the ':thawed-model' keyword in order to do fast predictions on small datasets.

Thaw a model.  Model's returned from train may be 'frozen' meaning a 'thaw'
operation is needed in order to use the model.  This happens for you during predict
but you may also cached the 'thawed' model on the model map under the
':thawed-model'  keyword in order to do fast predictions on small datasets.
sourceraw docstring

tidyclj

(tidy model)

summarizes information about model components. Returns a dataset with rows from this list: https://raw.githubusercontent.com/scicloj/metamorph.ml/main/resources/columms-tidy.edn

No other row names should be used. Each model will only return a small subset of possible rows. The list of allowed row names might change over time.

A model might not implement this function, and then an empty dataset will be returned.

summarizes information about model components.
 Returns a dataset with rows from this list:
https://raw.githubusercontent.com/scicloj/metamorph.ml/main/resources/columms-tidy.edn

 No other row names should be used.
Each model will only return a small subset of possible rows.
The list of allowed row names might change over time.

A model might not implement this function, and then an empty dataset will be returned.

 
sourceraw docstring

trainclj

(train dataset options)

Given a dataset and an options map produce a model. The model-type keyword in the options map selects which model definition to use to train the model. Returns a map containing at least:

  • :model-data - the result of that definitions's train-fn.
  • :options - the options passed in.
  • :id - new randomly generated UUID.
  • :feature-columns - vector of column names.
  • :target-columns - vector of column names.
  • :target-datatypes - map of target columns names -> target columns type
  • :target-categorical-maps - the categorical maps of the target columns, if present

A well behaving model implementaion should use :target-column
:target-datatypes :target-categorical-maps

to construct its prediction dataset so that its matches with the train data target column.

Given a dataset and an options map produce a model.  The model-type keyword in the
 options map selects which model definition to use to train the model.  Returns a map
 containing at least:


 * `:model-data` - the result of that definitions's train-fn.
 * `:options` - the options passed in.
 * `:id` - new randomly generated UUID.
 * `:feature-columns` - vector of column names.
 * `:target-columns` - vector of column names.
 * `:target-datatypes` - map of target columns names -> target columns type 
 * `:target-categorical-maps` - the categorical maps of the target columns, if present 
  
A well behaving model implementaion should use 
  :target-column  
  :target-datatypes 
  :target-categorical-maps  
  
  to construct its prediction dataset so that its matches with the train data target column.
  
sourceraw docstring

cljdoc is a website building & hosting documentation for Clojure/Script libraries

× close