A Clojure library designed to providing pipelining operations.
It allows to express any data transformation and machine learning pipeline as a simple sequence of pure functions:
(def pipe
(pipeline
(select-columns [:Text :Score])
(count-vectorize :Text :bow nlp/default-text->bow {})
(bow->sparse-array :bow :bow-sparse #(nlp/->vocabulary-top-n % 1000))
(set-inference-target :Score)
(ds/select-columns [:bow-sparse :Score])
(model {:p 1000
:model-type :maxent-multinomial
:sparse-column :bow-sparse})))
Several code examples for metamorph are available in this repo metamorph-examples
Pipeline operation is a function which accepts context as a map and returns possibly modified context map.
Context is just a map where pipeline information is stored. There are three reserved keys which are supposed to help organize a pipeline:
:metamorph/data
- object which is subject to change and where the main data is stored. It can be anything: dataset, tensor, object, whatever you want:metamorph/id
- unique operation number which is injected to the context just before pipeline operation is called. This way pipeline operation have some identity which can be used to store and restore private data in the context.:metamorph/mode
- additional context information which can be used to determine pipeline phase. It can be added explicitely during pipeline creation.
Different pipeline functions can work together, if they agree on a common set of modes and act accordingly depending on the mode.
The main use case for this are pipelines which include a statistical model in some form. In here the model either gets fitted on the data (= learns form data) or it gets applied to data. For this common use case we define two standard modes, namely:
:fit
- While the pipeline has this mode, a model containing function in the pipeline should fit its model from the data , this is as well called "train". It should write as well the fitted model to the key in :metamorph/id
so, that on the next pipeline run in mode transform
it can be used:transform
- While the pipeline is in this mode, the fitted model should be read from the key in :metamorph/id
and apply the fitted model to the dataIn machine learning terminology, these 2 modes are typically called train and predict. In metamorph we use the fit/transform terms as the generalisation.
Functions which only manipulate the data, should simply behave the same in any :mode, so ignoring :metamorph/mode
All the steps of a metamorph pipeline are functions which need to follow the following conventions, in order to work well together:
:metamorph/data
is considered to be the main data object, which nearly all functions will interact with. A functions which only interacts with this main data object, needs nevertheless return the whole context map with the data at key :metamorph/data
A typical skeleton of a compliant function looks like this:
(defn my-data-transform-function [any number of options]
(fn [{:metamorph/keys [id data mode] :as ctx}]
;; do something with data and eventual with id and mode
;; and write it back somewhere in the ctx often to key `:metamorph/data`, but could be any key
;; the assoc makes as well sure, that other data in ctx is left unchanged
(assoc ctx :metamorph/data ......)
))))
The following libraries provied metamorph compliant functions in a recent version:
library | purpose | link |
---|---|---|
tablecloth | dataset manipulation | https://github.com/scicloj/tablecloth |
tech.ml.dataset | dataset manipulation | https://github.com/techascent/tech.ml.dataset |
tech.ml | machine learning | https://github.com/techascent/tech.ml |
sklearn-clj | sklearn estimators as metamorph | https://github.com/scicloj/sklearn-clj |
Other libraries which do "data transformations" can decide to make their functions metamorph compliant. This does not require any dependency on metamorhp, just the usage of the standard keys.
Functions can easely be lifted to become metamorph compliant. For this we have the function `metamorph/lift"
A sister project metamorph.ml allows to evaluate machine learning pipelines based on metamorph.
The metamorph
concept is similar to the pipeline
concept of sklearn, which allows as well to run a give pipeline in fit
and transform
.
But metamorph allows to combine models with arbitrary transform functions, which don't need to be models.
We foresee that mainly 2 types of functions get added to a pipeline.
Mode independend functions:
They only manipulate the main data object, and will ignore all other information in contexts.
Neither will they use :metamorph/mode
nor the :metamorph/id
in the context map.Mode dependend functions
: These functions will behave different depending on the :mode and will likely store data in the context map, which can be used by the same function in an other mode or by other functions later in the pipeline.Metamorph pipelines can be either constructed from a sequence of function calls via th function metmorhp.core/pipeline
or declarative as a sequence of maps.
Both rely on the same functions.
See here for examples: https://github.com/scicloj/tablecloth/blob/pipelines/src/tablecloth/pipeline.clj
This should allow advanced use cases, like the generation of pipelines, which gives large flexibility for hyper parameter tuning in machine learning.
tablecloth
with the concept of fitted models and machine leariningTo create a pipeline function you can use two functions:
metamorph.core/pipeline
to make a pipeline function out of pipeline operators (= compliant functions as described above)metamorph.core/->pipeline
works as above, but using declarative maps (describing as well compliant functions) to describe the pipelineCompliant pipeline operations can either be created by "lifting" functions which work on the data object itself, or by using them from compliant libraries.
Most functions in tablecloth take a dataset as input in first position, and return a dataset. This means they can be used with the function "metamorhp.core/lift" to be converted (lifted) into a metamorph compliant function. (Tabecloth will make lifted versions of their functions available soon)
In this short example, the main data object in the context is a simple string.
(require '[scicloj.metamorph.core :as morph])
;; a regular function which takes and returns a main object
(defn regular-function-to-be-lifted
[main-object par1 par2]
(str "Hey, " (clojure.string/upper-case main-object) " , I'm regular function! (pars: " par1 ", " par2 ")"))
;; we make a pipeline-fn using `lift` and the regular function
(def lifted-pipeline
(morph/pipeline
:anymode
(morph/lift regular-function-to-be-lifted 1 2)))
;; lifted-pipeline is a regular Clojure function, taking the context in first place
(lifted-pipeline {:metamorph/data "main data project"} )
;;->
:metamorph{:data "Hey, MAIN DATA PROJECT , I'm regular function! (pars: 1, 2)"}
Copyright © 2021 Scicloj
This program and the accompanying materials are made available under the terms of the Eclipse Public License 2.0 which is available at http://www.eclipse.org/legal/epl-2.0.
This Source Code may also be made available under the following Secondary Licenses when the conditions for such availability set forth in the Eclipse Public License, v. 2.0 are satisfied: GNU General Public License as published by the Free Software Foundation, either version 2 of the License, or (at your option) any later version, with the GNU Classpath Exception which is available at https://www.gnu.org/software/classpath/license.html.
Can you improve this documentation? These fine people already did:
Carsten Behring, GenerateMe & samroseEdit on GitHub
cljdoc is a website building & hosting documentation for Clojure/Script libraries
× close