;;; in
;;; |
;;; increment
;;; |
;;; output
[[:in :increment] [:increment :out]]
We’ll take a quick overview of some terms you’ll see in the rest of this user guide.
A segment is the unit of data in Onyx, and it’s represented by a Clojure map. Segments represent the data flowing through the cluster. Segments are the only shape of data that Onyx allows you to emit between functions.
A task is the smallest unit of work in Onyx. It represents an activity of either input, processing, or output.
A workflow is the structural specification of an Onyx program. Its purpose is to articulate the paths that data flows through the cluster at runtime. It is specified via a directed, acyclic graph.
The workflow representation is a Clojure vector of vectors. Each inner vector contains exactly two elements, which are keywords. The keywords represent nodes in the graph, and the vector represents a directed edge from the first node to the second.
;;; in
;;; |
;;; increment
;;; |
;;; output
[[:in :increment] [:increment :out]]
;;; input
;;; /\
;;; processing-1 processing-2
;;; | |
;;; output-1 output-2
[[:input :processing-1]
[:input :processing-2]
[:processing-1 :output-1]
[:processing-2 :output-2]]
;;; input
;;; /\
;;; processing-1 processing-2
;;; \ /
;;; output
[[:input :processing-1]
[:input :processing-2]
[:processing-1 :output]
[:processing-2 :output]]
Example projects: flat-workflow, multi-output-workflow |
All inputs, outputs, reducers, and functions in a workflow must be described via a catalog. A catalog is a vector of maps, strikingly similar to Datomic’s schema. Configuration and docstrings are described in the catalog.
Example:
[{:onyx/name :in
:onyx/plugin :onyx.plugin.core-async/input
:onyx/type :input
:onyx/medium :core.async
:onyx/batch-size batch-size
:onyx/max-peers 1
:onyx/doc "Reads segments from a core.async channel"}
{:onyx/name :inc
:onyx/fn :onyx.peer.min-peers-test/my-inc
:onyx/type :function
:onyx/batch-size batch-size}
{:onyx/name :out
:onyx/plugin :onyx.plugin.core-async/output
:onyx/type :output
:onyx/medium :core.async
:onyx/batch-size batch-size
:onyx/max-peers 1
:onyx/doc "Writes segments to a core.async channel"}]
In contrast to a workflow, flow conditions specify on a segment-by-segment basis which direction data should flow determined by predicate functions. This is helpful for conditionally processing a segment based off of its content.
Example:
[{:flow/from :input-stream
:flow/to [:process-adults]
:flow/predicate :my.ns/adult?
:flow/doc "Emits segment if this segment is an adult."}
A function is a construct that receives segments and emits segments for further processing. It literally is a Clojure function.
A lifecycle is a construct that describes the lifetime of a task. There is an entire chapter devoted to lifecycles, but to be brief, a lifecycle allows you to hook in and execute arbitrary code at critical points during a task. A lifecycle carries a context map that you can merge results back into for use later.
Windows are a construct that partitions a possible unbounded sequence of data into finite pieces, allowing aggregations to be specified. This lets you treat an infinite sequence of data as if it were finite over a given period of time.
A plugin is a means for hooking into data sources to extract data as input and produce data as output. Onyx comes with a few plugins, but you can craft your own, too.
A sentinel is a value that can be pushed into Onyx to signal the end of
a stream of data. This effectively lets Onyx switch between streaming
and batching mode. The sentinel in Onyx is represented by the Clojure
keyword :done
.
A Peer is a node in the cluster responsible for processing data. A single "peer" refers to a physical machine, though we often use the terms peer and virtual peer interchangeably when the difference doesn’t matter.
A Virtual Peer refers to a single peer process running on a single physical machine. A single Virtual Peer executes at most one task at a time.
A job is the collection of a workflow, catalog, flow conditions, lifecycles, and execution parameters. A job is most coarse unit of work, and every task is associated with exactly one job - hence a peer can only be working at most one job at any given time.
Can you improve this documentation? These fine people already did:
vijaykiran & Lucas BradstreetEdit on GitHub
cljdoc is a website building & hosting documentation for Clojure/Script libraries
× close