Onyx plugin for Amazon S3.
In your project file:
[org.onyxplatform/onyx-amazon-s3 "0.14.5.0"]
In your peer boot-up namespace:
(:require [onyx.plugin.s3-input])
Catalog entry:
{:onyx/name task-name
:onyx/plugin :onyx.plugin.s3-input/input
:onyx/type :input
:onyx/medium :s3
:onyx/batch-size 20
:onyx/max-peers 1
:s3/bucket "mybucket"
:s3/prefix "filter-prefix/example/"
:s3/deserializer-fn :my.ns/deserializer-fn
:s3/buffer-size-bytes 10000000
:onyx/doc "Reads segments from keys in an S3 bucket."}
Lifecycle entry:
{:lifecycle/task <<TASK_NAME>>
:lifecycle/calls :onyx.plugin.s3-input/s3-input-calls}
key | type | description |
---|---|---|
:s3/bucket | string | The name of the s3 bucket to read objects from. |
:s3/deserializer-fn | keyword | A namespaced keyword pointing to a fully qualified function that will deserialize from bytes to segments. Currently only reading from newline separated values is supported, thus the serializer must deserialize line by line. |
:s3/prefix | string | Filter the keys to be read by a supplied prefix. |
:s3/file-key | string | When set, includes the S3 key of file from which the segment's line was read under this key. |
In your peer boot-up namespace:
(:require [onyx.plugin.s3-output])
Catalog entry:
{:onyx/name <<TASK_NAME>>
:onyx/plugin :onyx.plugin.s3-output/output
:s3/bucket <<BUCKET_NAME>>
:s3/encryption :none
:s3/serializer-fn :my.ns/serializer-fn
:s3/key-naming-fn :onyx.plugin.s3-output/default-naming-fn
:s3/prefix "filter-prefix/example/"
:s3/prefix-separator "/"
:s3/serialize-per-element? false
:s3/max-concurrent-uploads 20
:onyx/type :output
:onyx/medium :s3
:onyx/batch-size 20
:onyx/doc "Writes segments to s3 files, one file per batch"}
Segments received by this task must be serialized to bytes by the :s3/serializer-fn
,
into a file per batch, placed at a key in the bucket which is named via the
function defined at :s3/key-naming-fn
. This function takes an event map and
returns a string. Using the default naming function, :onyx.plugin.s3-output/default-naming-fn
,
will name keys in the following format in UTC time format:
"yyyy-MM-dd-hh.mm.ss.SSS_batch_BATCH_UUID".
You can define :s3/encryption
to be :aes256
if your S3 bucket has
encryption enabled. The default value is :none
.
When :s3/serialize-per-element?
is set to true, the serializer will be called
on each individual segmnt, rather than the whole batch, and will be separated
by the string value set in :s3/serialize-per-element-separator
.
An alternative upload mode can be selected via :s3/multi-upload
documented below.
When this option is used segments will be partitioned into different objects via a grouping key.
Lifecycle entry:
{:lifecycle/task <<TASK_NAME>>
:lifecycle/calls :onyx.plugin.s3-output/s3-output-calls}
key | type | description |
---|---|---|
:s3/bucket | string | The name of the s3 bucket to write to |
:s3/serializer-fn | keyword | A namespaced keyword pointing to a fully qualified function that will serialize the batch of segments to bytes |
:s3/key-naming-fn | keyword | A namespaced keyword pointing to a fully qualified function that be supplied with the Onyx event map, and produce an s3 key for the batch. |
:s3/prefix | string | A prefix to prepend to the keys generated by :s3/key-naming-fn . |
:s3/prefix-separator | string | A separator to add after :s3/prefix and before the result of :s3/key-naming-fn . Defaults to "/". |
:s3/multi-upload | boolean | Flag that causes the plugin to group the batch of segments by :s3/prefix-key , and upload an object per group. |
:s3/prefix-key | any | Used with s3/multi-upload . Key to batch segments into prefixed objects via e.g. [{:a 3 :k "batch1"}{:a 2 :k "batch2"}] will cause two objects to be uploaded under prefix "batch1" and "batch2". |
:s3/content-type | string | Optional content type for value |
:s3/encryption | keyword | Optional server side encryption setting. One of :sse256 or :none . |
:s3/region | string | The S3 region to write objects to. |
:s3/endpoint-url | string | The S3 endpoint-url to connect to (for S3-compatible storage solutions). |
:s3/max-concurrent-uploads | integer | Maximum number of simultaneous uploads. |
:s3/serialize-per-element? | boolean | Flag for whether to serialize as an entire batch, or serialize per element and separate by newline characters. |
:s3/serialize-per-element-separator | string | String to separate per element strings with. Defaults to newline charactor. |
Many thanks to AdGoji for allowing this work to be open sourced and contributed back to the Onyx Platform community.
Pull requests into the master branch are welcomed.
Copyright © 2017 Distributed Masonry LLC
Distributed under the Eclipse Public License, the same as Clojure.
Can you improve this documentation?Edit on GitHub
cljdoc is a website building & hosting documentation for Clojure/Script libraries
× close