Onyx Task Bundle for Implementing Data Processing Tasks in R
onyx-r provides an Onyx task bundle for running data processing tasks in R.
A typical use case is running R models (created via statistical or machine learning algorithms) in Onyx job workflows, at scale:
Each Onyx peer runs an Rserve instance, each virtual peer holds a connection to its local Rserve instance. onyx-r tasks are configured at job submit time through pure Clojure data in the Onyx catalog. onyx-r tasks are implemented as pure R functions that take an Onyx segment as input and return a modified Onyx segment as output. For this to work seamlessly, onyx-r automatically translates between Clojure and R data structures. onyx-r tasks must be configured with the name of the R segment processing function to call.
When an onyx-r task is prepared for execution on a virtual peer through Onyx
lifecycles,
the task can be provided with R code to source
, R data (in RData
format
exported from R via save
) to load
and Clojure values to assign
to R
variables. These configuration options are also supplied by the user at job
submit time through the Onyx catalog.
First, install Rserve on each Onyx peer as described at: https://www.rforge.net/Rserve/doc.html
onyx-r is available in Clojars. Add this :dependency
to your Leiningen
project.clj
:
[sourcewerk/onyx-r "0.1.0-SNAPSHOT"]
Start a local Rserve server as documented at: https://www.rforge.net/Rserve/doc.html#start
Then type lein test
to runn all tests for onyx-r.
The following Clojure code block shows how to configure an onyr-r task through
add-task
:
(add-task
my-base-job
(onyx-r.tasks.r/r-function
:rfun ; name of the Onyx task
"rfun" ; name of the R function to call
{:source ["rfun <- function(segment) list(segment = segment, assigned = c(bar, baz), loaded = testData)"] ; R code to source when the task is prepared for execution on a virtual peer
:load [(slurp-bytes "testData.RData")] ; RData to load when the task is prepared for execution on a virtual peer
:assign {:bar 42
:baz "Hallo, Onyx!"}} ; R variables to assign when the task is prepared for execution on a virtual peer
batch-settings))
Use something like slurp-bytes
to load RData files into a Byte arrays
expected by onyx-r's :load
parameter:
(defn slurp-bytes
"Slurp the bytes from a slurpable thing"
[x]
(with-open [out (java.io.ByteArrayOutputStream.)]
(clojure.java.io/copy (clojure.java.io/input-stream x) out)
(.toByteArray out)))
The supplied demo jobs show how to use onyr-r's features in context:
Copyright © 2016 sourcewerk GmbH
Distributed under the Eclipse Public License, the same as Clojure and Onyx.
Commercial support is available through sourcewerk GmbH:
Email: info@sourcewerk.de
Can you improve this documentation?Edit on GitHub
cljdoc is a website building & hosting documentation for Clojure/Script libraries
× close