Onyx plugin for Amazon S3.
In your project file:
[org.onyxplatform/onyx-amazon-s3 "0.14.5.0"]
In your peer boot-up namespace:
(:require [onyx.plugin.s3-input])
Catalog entry:
{:onyx/name task-name
 :onyx/plugin :onyx.plugin.s3-input/input
 :onyx/type :input
 :onyx/medium :s3
 :onyx/batch-size 20
 :onyx/max-peers 1
 :s3/bucket "mybucket"
 :s3/prefix "filter-prefix/example/"
 :s3/deserializer-fn :my.ns/deserializer-fn
 :s3/buffer-size-bytes 10000000
 :onyx/doc "Reads segments from keys in an S3 bucket."}
Lifecycle entry:
{:lifecycle/task <<TASK_NAME>>
 :lifecycle/calls :onyx.plugin.s3-input/s3-input-calls}
| key | type | description | 
|---|---|---|
| :s3/bucket | string | The name of the s3 bucket to read objects from. | 
| :s3/deserializer-fn | keyword | A namespaced keyword pointing to a fully qualified function that will deserialize from bytes to segments. Currently only reading from newline separated values is supported, thus the serializer must deserialize line by line. | 
| :s3/prefix | string | Filter the keys to be read by a supplied prefix. | 
| :s3/file-key | string | When set, includes the S3 key of file from which the segment's line was read under this key. | 
In your peer boot-up namespace:
(:require [onyx.plugin.s3-output])
Catalog entry:
{:onyx/name <<TASK_NAME>>
 :onyx/plugin :onyx.plugin.s3-output/output
 :s3/bucket <<BUCKET_NAME>>
 :s3/encryption :none
 :s3/serializer-fn :my.ns/serializer-fn
 :s3/key-naming-fn :onyx.plugin.s3-output/default-naming-fn
 :s3/prefix "filter-prefix/example/"
 :s3/prefix-separator "/"
 :s3/serialize-per-element? false
 :s3/max-concurrent-uploads 20
 :onyx/type :output
 :onyx/medium :s3
 :onyx/batch-size 20
 :onyx/doc "Writes segments to s3 files, one file per batch"}
Segments received by this task must be serialized to bytes by the :s3/serializer-fn,
into a file per batch, placed at a key in the bucket which is named via the
function defined at :s3/key-naming-fn. This function takes an event map and
returns a string. Using the default naming function, :onyx.plugin.s3-output/default-naming-fn,
will name keys in the following format in UTC time format:
"yyyy-MM-dd-hh.mm.ss.SSS_batch_BATCH_UUID".
You can define :s3/encryption to be :aes256 if your S3 bucket has
encryption enabled. The default value is :none.
When :s3/serialize-per-element? is set to true, the serializer will be called
on each individual segmnt, rather than the whole batch, and will be separated
by the string value set in :s3/serialize-per-element-separator.
An alternative upload mode can be selected via :s3/multi-upload documented below.
When this option is used segments will be partitioned into different objects via a grouping key.
Lifecycle entry:
{:lifecycle/task <<TASK_NAME>>
 :lifecycle/calls :onyx.plugin.s3-output/s3-output-calls}
| key | type | description | 
|---|---|---|
| :s3/bucket | string | The name of the s3 bucket to write to | 
| :s3/serializer-fn | keyword | A namespaced keyword pointing to a fully qualified function that will serialize the batch of segments to bytes | 
| :s3/key-naming-fn | keyword | A namespaced keyword pointing to a fully qualified function that be supplied with the Onyx event map, and produce an s3 key for the batch. | 
| :s3/prefix | string | A prefix to prepend to the keys generated by :s3/key-naming-fn. | 
| :s3/prefix-separator | string | A separator to add after :s3/prefixand before the result of:s3/key-naming-fn. Defaults to "/". | 
| :s3/multi-upload | boolean | Flag that causes the plugin to group the batch of segments by :s3/prefix-key, and upload an object per group. | 
| :s3/prefix-key | any | Used with s3/multi-upload. Key to batch segments into prefixed objects via e.g.[{:a 3 :k "batch1"}{:a 2 :k "batch2"}]will cause two objects to be uploaded under prefix "batch1" and "batch2". | 
| :s3/content-type | string | Optional content type for value | 
| :s3/encryption | keyword | Optional server side encryption setting. One of :sse256or:none. | 
| :s3/region | string | The S3 region to write objects to. | 
| :s3/endpoint-url | string | The S3 endpoint-url to connect to (for S3-compatible storage solutions). | 
| :s3/max-concurrent-uploads | integer | Maximum number of simultaneous uploads. | 
| :s3/serialize-per-element? | boolean | Flag for whether to serialize as an entire batch, or serialize per element and separate by newline characters. | 
| :s3/serialize-per-element-separator | string | String to separate per element strings with. Defaults to newline charactor. | 
Many thanks to AdGoji for allowing this work to be open sourced and contributed back to the Onyx Platform community.
Pull requests into the master branch are welcomed.
Copyright © 2017 Distributed Masonry LLC
Distributed under the Eclipse Public License, the same as Clojure.
Can you improve this documentation?Edit on GitHub
cljdoc builds & hosts documentation for Clojure/Script libraries
| Ctrl+k | Jump to recent docs | 
| ← | Move to previous article | 
| → | Move to next article | 
| Ctrl+/ | Jump to the search field |