Liking cljdoc? Tell your friends :D

datasplash.api


->>cljmacro

(->> nam input & body)

Creates and applies a single named PTransform from a sequence of transforms on a single PCollection. You can use it as you would use ->> in Clojure.

Example:

(ds/pt->> :transform-name input-pcollection
          (ds/map inc {:name :inc})
          (ds/filter even? {:name :even?}))
Creates and applies a single named PTransform from a sequence of transforms on a single PCollection. You can use it as you would use ->> in Clojure.

Example:
```
(ds/pt->> :transform-name input-pcollection
          (ds/map inc {:name :inc})
          (ds/filter even? {:name :even?}))
```
sourceraw docstring

cogroup-byclj

(cogroup-by options specs)
(cogroup-by options specs reduce-fn)

Takes a specification of the join between pcolls and returns a PCollection of KVs (unless a :collector fn is given) with values being list of values corresponding to the key-fn. The specification is a list of triple [pcoll f options].

  • pcoll is a pcoll on which to join
  • f is a joining function, used to produce the keys on which to join. Can be nil if the coll is already made up of KVs
  • options is a map configuring each sides of the join

Only one option is supported for now in join:

  • :type -> :optional or :required to select between left and right join. Defaults to :optional

Example:

(ds/cogroup-by {:name :my-cogroup-by
                :collector (fn [[key [list-of-elts-from-pcoll1 list-of-elts-from-pcoll2]]]
                               [key (concat list-of-elts-from-pcoll1 list-of-elts-from-pcoll2)])}
               [[pcoll1 :id {:type :required}]
                [pcoll2 (fn [elt] (:foreign-key elt)) {:type :optional}]])

See https://cloud.google.com/dataflow/java-sdk/JavaDoc/com/google/cloud/dataflow/sdk/transforms/join/CoGroupByKey and for a different approach to joins see join-by

Available options:

  • :checkpoint => Given a path, will store the resulting pcoll at this path in edn to facilitate dev/debug.
  • :coder => Uses a specific Coder for the results of this transform. Usually defaults to some form of nippy-coder.
  • :collector => A collector fn to apply after the cogroup. The signature is
(fn [[key [list-of-elts-from-pcoll1 list-of-elts-from-pcoll2]]] ...)
  • :name => Adds a name to the Transform.
Takes a specification of the join between pcolls and returns a PCollection of KVs (unless a :collector fn is given) with values being list of values corresponding to the key-fn. The specification is a list of triple [pcoll f options].

  - pcoll is a pcoll on which to join
  - f is a joining function, used to produce the keys on which to join. Can be nil if the coll is already made up of KVs
  - options is a map configuring each sides of the join

 Only one option is supported for now in join:

  - :type -> :optional or :required to select between left and right join. Defaults to :optional

Example:
```
(ds/cogroup-by {:name :my-cogroup-by
                :collector (fn [[key [list-of-elts-from-pcoll1 list-of-elts-from-pcoll2]]]
                               [key (concat list-of-elts-from-pcoll1 list-of-elts-from-pcoll2)])}
               [[pcoll1 :id {:type :required}]
                [pcoll2 (fn [elt] (:foreign-key elt)) {:type :optional}]])

```
See https://cloud.google.com/dataflow/java-sdk/JavaDoc/com/google/cloud/dataflow/sdk/transforms/join/CoGroupByKey and for a different approach to joins see [[join-by]]

Available options:

  - :checkpoint => Given a path, will store the resulting pcoll at this path in edn to facilitate dev/debug.
  - :coder => Uses a specific Coder for the results of this transform. Usually defaults to some form of nippy-coder.
  - :collector => A collector fn to apply after the cogroup. The signature is
```
(fn [[key [list-of-elts-from-pcoll1 list-of-elts-from-pcoll2]]] ...)
```
  - :name => Adds a name to the Transform.
sourceraw docstring

combineclj

(combine f pcoll)
(combine f {:keys [coder key-coder value-coder] :as options} pcoll)

Applies a CombineFn or a Clojure function with equivalent arities to the PCollection of KVs.

See https://cloud.google.com/dataflow/java-sdk/JavaDoc/com/google/cloud/dataflow/sdk/transforms/Combine

Available options:

  • :as-singleton-view => The transform returns a PCollectionView whose elements are the result of combining elements per-window in the input PCollection.
  • :checkpoint => Given a path, will store the resulting pcoll at this path in edn to facilitate dev/debug.
  • :coder => Uses a specific Coder for the results of this transform. Usually defaults to some form of nippy-coder.
  • :fanout => Uses an intermediate node to combine parts of the data to reduce load on the final global combine step. Can be either an integer or a fn from key to integer (for combine-by-key scope).
  • :scope => Specifies the combiner scope of application | One of [:global :per-key] | Defaults to :global
  • :without-defaults => Boolean indicating if the transform should attempt to provide a default value in the case of empty input.
Applies a CombineFn or a Clojure function with equivalent arities to the PCollection of KVs.

See https://cloud.google.com/dataflow/java-sdk/JavaDoc/com/google/cloud/dataflow/sdk/transforms/Combine

Available options:

  - :as-singleton-view => The transform returns a PCollectionView whose elements are the result of combining elements per-window in the input PCollection.
  - :checkpoint => Given a path, will store the resulting pcoll at this path in edn to facilitate dev/debug.
  - :coder => Uses a specific Coder for the results of this transform. Usually defaults to some form of nippy-coder.
  - :fanout => Uses an intermediate node to combine parts of the data to reduce load on the final global combine step. Can be either an integer or a fn from key to integer (for combine-by-key scope).
  - :scope => Specifies the combiner scope of application | One of [:global :per-key] | Defaults to :global
  - :without-defaults => Boolean indicating if the transform should attempt to provide a default value in the case of empty input.
sourceraw docstring

combine-byclj

(combine-by key-fn f pcoll)
(combine-by key-fn f options pcoll)

Shortcut to easily group values in a PColl by a function and combine all the values by key according to a combiner fn. Returns a PCollection of KVs.

Example:

;; Returns a pcoll of two KVs, with false and true as keys, and the sum of even? and odd? numbers as values
(->> pcoll
     (ds/combine-by even? (ds/sum-fn) {:name :my-combine-by}))

See https://cloud.google.com/dataflow/java-sdk/JavaDoc/com/google/cloud/dataflow/sdk/transforms/Combine and combine-fn for options about creating a combiner function (combine-fn is applied on the given clojure fn if necessary, you do not need to call it yourself)

Available options:

  • :checkpoint => Given a path, will store the resulting pcoll at this path in edn to facilitate dev/debug.
  • :coder => Uses a specific Coder for the results of this transform. Usually defaults to some form of nippy-coder.
  • :fanout => Uses an intermediate node to combine parts of the data to reduce load on the final global combine step. Can be either an integer or a fn from key to integer (for combine-by-key scope).
  • :key-coder => Coder to be used for encoding keys in the resulting KV PColl.
  • :value-coder => Coder to be used for encoding values in the resulting KV PColl.
  • :without-defaults => Boolean indicating if the transform should attempt to provide a default value in the case of empty input.
Shortcut to easily group values in a PColl by a function and combine all the values by key according to a combiner fn. Returns a PCollection of KVs.

Example:
```
;; Returns a pcoll of two KVs, with false and true as keys, and the sum of even? and odd? numbers as values
(->> pcoll
     (ds/combine-by even? (ds/sum-fn) {:name :my-combine-by}))
```

See https://cloud.google.com/dataflow/java-sdk/JavaDoc/com/google/cloud/dataflow/sdk/transforms/Combine and [[combine-fn]] for options about creating a combiner function (combine-fn is applied on the given clojure fn if necessary, you do not need to call it yourself)

Available options:

  - :checkpoint => Given a path, will store the resulting pcoll at this path in edn to facilitate dev/debug.
  - :coder => Uses a specific Coder for the results of this transform. Usually defaults to some form of nippy-coder.
  - :fanout => Uses an intermediate node to combine parts of the data to reduce load on the final global combine step. Can be either an integer or a fn from key to integer (for combine-by-key scope).
  - :key-coder => Coder to be used for encoding keys in the resulting KV PColl.
  - :value-coder => Coder to be used for encoding values in the resulting KV PColl.
  - :without-defaults => Boolean indicating if the transform should attempt to provide a default value in the case of empty input.
sourceraw docstring

combine-fnclj

(combine-fn reducef)
(combine-fn reducef extractf)
(combine-fn reducef extractf combinef)
(combine-fn reducef extractf combinef initf)
(combine-fn reducef extractf combinef initf output-coder)
(combine-fn reducef extractf combinef initf output-coder acc-coder)

Returns a CombineFn instance from given args. See https://cloud.google.com/dataflow/java-sdk/JavaDoc/com/google/cloud/dataflow/sdk/transforms/Combine.CombineFn.html

Arguments in order:

  • reducef: adds element to accumulator: fn of two arguments, returns updated accumulator
(fn [acc elt] (assoc acc (ds/key elt) (ds/val elt)))
  • extractf: fn taking a single accumulator as arg and returning the final result. Defaults to identity
  • combinef: fn taking a variable number of accumulators and returning a single merged accumulator. Defaults to using the reduce fn
(fn [& accs] (apply merge accs))
  • initf: fn of 0 args, returns empty accumulator. Defaults to reduce fn with no args
(fn [] {})
  • output-coder: coder for the resulting PCollection. Defaults to nippy-coder
  • acc-coder: coder for the accumulator. Defaults to nippy-coder

This function is reminiscent of the reducers api. In has sensible defaults in order to reuse existing functions. For example, this a a perfectly valid combine-fn that sums all numbers in a pcoll:

(combine-fn +)
Returns a CombineFn instance from given args. See https://cloud.google.com/dataflow/java-sdk/JavaDoc/com/google/cloud/dataflow/sdk/transforms/Combine.CombineFn.html

Arguments in order:

- reducef: adds element to accumulator: fn of two arguments, returns updated accumulator
```
(fn [acc elt] (assoc acc (ds/key elt) (ds/val elt)))
```
- extractf: fn taking a single accumulator as arg and returning the final result. Defaults to identity
- combinef: fn taking a variable number of accumulators and returning a single merged accumulator. Defaults to using the reduce fn
```
(fn [& accs] (apply merge accs))
```
- initf: fn of 0 args, returns empty accumulator. Defaults to reduce fn with no args
```
(fn [] {})
```
- output-coder: coder for the resulting PCollection. Defaults to nippy-coder
- acc-coder: coder for the accumulator. Defaults to nippy-coder


This function is reminiscent of the reducers api. In has sensible defaults in order to reuse existing functions. For example, this a a perfectly valid combine-fn that sums all numbers in a pcoll:
```
(combine-fn +)
```
sourceraw docstring

concatclj

(concat options & colls)

Returns a single PCollection containing all the given pcolls. Accepts an option map as an optional first arg.

Example:

(ds/concat pcoll1 pcoll2)
(ds/concat {:name :concat-node} pcoll1 pcoll2)
Returns a single PCollection containing all the given pcolls. Accepts an option map as an optional first arg.

Example:
```
(ds/concat pcoll1 pcoll2)
(ds/concat {:name :concat-node} pcoll1 pcoll2)
```
sourceraw docstring

cond->>cljmacro

(cond->> nam input & body)

Creates and applies a single named PTransform from a sequence of transforms on a single PCollection according to the results of the given predicates. You can use it as you would use cond->> in Clojure.

Example:

(ds/cond->> :transform-name input-pcollection
          (:do-inc? config) (ds/map inc {:name :inc})
          (:do-filter? config) (ds/filter even? {:name :even?}))
Creates and applies a single named PTransform from a sequence of transforms on a single PCollection according to the results of the given predicates. You can use it as you would use cond->> in Clojure.

Example:
```
(ds/cond->> :transform-name input-pcollection
          (:do-inc? config) (ds/map inc {:name :inc})
          (:do-filter? config) (ds/filter even? {:name :even?}))
```
sourceraw docstring

contextclj

(context)
In the context of a ParDo, contains the corresponding Context object.
See https://cloud.google.com/dataflow/java-sdk/JavaDoc/com/google/cloud/dataflow/sdk/transforms/DoFn.ProcessContext.html
sourceraw docstring

count-fnclj

(count-fn &
          {:keys [mapper predicate]
           :or {mapper (fn [_] 1) predicate (constantly true)}})
source

defoptionscljmacro

(defoptions interface-name specs)
source

distinctclj

(distinct pcoll)
(distinct options pcoll)

Removes duplicate element from PCollection.

See https://cloud.google.com/dataflow/java-sdk/JavaDoc/com/google/cloud/dataflow/sdk/transforms/RemoveDuplicates.html

Example:

(ds/distinct pcoll)

Available options:

  • :checkpoint => Given a path, will store the resulting pcoll at this path in edn to facilitate dev/debug.
  • :coder => Uses a specific Coder for the results of this transform. Usually defaults to some form of nippy-coder.
Removes duplicate element from PCollection.

See https://cloud.google.com/dataflow/java-sdk/JavaDoc/com/google/cloud/dataflow/sdk/transforms/RemoveDuplicates.html

Example:
```
(ds/distinct pcoll)
```

Available options:

  - :checkpoint => Given a path, will store the resulting pcoll at this path in edn to facilitate dev/debug.
  - :coder => Uses a specific Coder for the results of this transform. Usually defaults to some form of nippy-coder.
sourceraw docstring

filename-policyclj

(filename-policy options)

Create a filename-policy object

Examples:

(ds/filename-policy {:file-name "file"
                     :prefix "gs://toto/"
                     :suffix "json"})

;; with custom functions
(require '[clj-time.format :as tf])

(defn windowed-fn
  [shard-number shard-count ^BoundedWindow window _]
  (let [timestamp (tf/unparse (:date-hour-minute tf/formatters)
                              (.start window))]
    (str file-name "-" shard-number "of" shard-count "-" timestamp "." "txt")))

(ds/filename-policy {:windowed-fn windowed-fn
                     :unwindowed-fn (fn [_ _ _] "file.txt")})

Available options:

  • :file-name => set the default filename prefix (only used when no custom function is set)
  • :suffix => set the default filename suffix (only used when no custom function is set)
  • :unwindowed-fn => override the filename function for unwindowed PCollection
  • :windowed-fn => override the filename function for windowed PCollection
Create a filename-policy object

Examples:
```
(ds/filename-policy {:file-name "file"
                     :prefix "gs://toto/"
                     :suffix "json"})

;; with custom functions
(require '[clj-time.format :as tf])

(defn windowed-fn
  [shard-number shard-count ^BoundedWindow window _]
  (let [timestamp (tf/unparse (:date-hour-minute tf/formatters)
                              (.start window))]
    (str file-name "-" shard-number "of" shard-count "-" timestamp "." "txt")))

(ds/filename-policy {:windowed-fn windowed-fn
                     :unwindowed-fn (fn [_ _ _] "file.txt")})
```

Available options:

  - :file-name => set the default filename prefix (only used when no custom function is set)
  - :suffix => set the default filename suffix (only used when no custom function is set)
  - :unwindowed-fn => override the filename function for unwindowed PCollection
  - :windowed-fn => override the filename function for windowed PCollection
sourceraw docstring

filterclj

(filter pred pcoll)
(filter f options pcoll)

Returns a PCollection that only contains the items for which (pred item) returns true.

Example:

    (ds/filter even? foo)
(ds/filter (fn [x] (even? (* x x))) foo)

Available options:

  • :checkpoint => Given a path, will store the resulting pcoll at this path in edn to facilitate dev/debug.
  • :coder => Uses a specific Coder for the results of this transform. Usually defaults to some form of nippy-coder.
  • :name => Adds a name to the Transform.
  • :side-inputs => Adds a map of PCollectionViews as side inputs to the underlying ParDo Transform. They can be accessed there by key in the return of side-inputs fn.
  • :side-outputs => Defines as a seq of keywords the output tags for the underlying ParDo Transform. The map fn should return a map with keys set to the same set of keywords.
  • :without-coercion-to-clj => Avoids coercing Dataflow types to Clojure, like KV. Coercion will happen by default
Returns a PCollection that only contains the items for which (pred item)
returns true.

  Example:
```
    (ds/filter even? foo)
(ds/filter (fn [x] (even? (* x x))) foo)
```

Available options:

  - :checkpoint => Given a path, will store the resulting pcoll at this path in edn to facilitate dev/debug.
  - :coder => Uses a specific Coder for the results of this transform. Usually defaults to some form of nippy-coder.
  - :name => Adds a name to the Transform.
  - :side-inputs => Adds a map of PCollectionViews as side inputs to the underlying ParDo Transform. They can be accessed there by key in the return of side-inputs fn.
  - :side-outputs => Defines as a seq of keywords the output tags for the underlying ParDo Transform. The map fn should return a map with keys set to the same set of keywords.
  - :without-coercion-to-clj => Avoids coercing Dataflow types to Clojure, like KV. Coercion will happen by default
sourceraw docstring

fixed-windowsclj

(fixed-windows width pcoll)
(fixed-windows width options pcoll)

Apply a fixed window input PCollection (useful for unbounded PCollections).

See https://cloud.google.com/dataflow/model/windowing#setting-fixed-time-windows

Example:

(require '[clj-time.core :as time])
(ds/fixed-windows (time/minutes 30) pcoll)

Available options:

  • :accumulate-mode => Accumulate mode when a Trigger is fired (accumulate or discard)
  • :checkpoint => Given a path, will store the resulting pcoll at this path in edn to facilitate dev/debug.
  • :coder => Uses a specific Coder for the results of this transform. Usually defaults to some form of nippy-coder.
  • :name => Adds a name to the Transform.
  • :trigger => Adds a Trigger to the Window.
  • :with-allowed-lateness => Allow late data. Mandatory for custom trigger
Apply a fixed window input PCollection (useful for unbounded PCollections).

See https://cloud.google.com/dataflow/model/windowing#setting-fixed-time-windows

Example:
```
(require '[clj-time.core :as time])
(ds/fixed-windows (time/minutes 30) pcoll)
```

Available options:

  - :accumulate-mode => Accumulate mode when a Trigger is fired (accumulate or discard)
  - :checkpoint => Given a path, will store the resulting pcoll at this path in edn to facilitate dev/debug.
  - :coder => Uses a specific Coder for the results of this transform. Usually defaults to some form of nippy-coder.
  - :name => Adds a name to the Transform.
  - :trigger => Adds a Trigger to the Window.
  - :with-allowed-lateness => Allow late data. Mandatory for custom trigger
sourceraw docstring

flattenclj

(flatten pcoll)
(flatten options pcoll)

Returns a single Pcollection containing all the pcolls in the given pcolls iterable.

See https://cloud.google.com/dataflow/java-sdk/JavaDoc/com/google/cloud/dataflow/sdk/transforms/Flatten.html

Example:

(ds/flatten [pcoll1 pcoll2 pcoll3])
Returns a single Pcollection containing all the pcolls in the given pcolls iterable.

See https://cloud.google.com/dataflow/java-sdk/JavaDoc/com/google/cloud/dataflow/sdk/transforms/Flatten.html

Example:
```
(ds/flatten [pcoll1 pcoll2 pcoll3])
```
sourceraw docstring

frequenciesclj

(frequencies pcoll)
(frequencies options pcoll)

Returns the frequency of each unique element of the input PCollection.

See https://cloud.google.com/dataflow/java-sdk/JavaDoc/com/google/cloud/dataflow/sdk/transforms/Count.html

Example:

(ds/frequencies pcoll)

Available options:

  • :checkpoint => Given a path, will store the resulting pcoll at this path in edn to facilitate dev/debug.
  • :coder => Uses a specific Coder for the results of this transform. Usually defaults to some form of nippy-coder.
Returns the frequency of each unique element of the input PCollection.

See https://cloud.google.com/dataflow/java-sdk/JavaDoc/com/google/cloud/dataflow/sdk/transforms/Count.html

Example:
```
(ds/frequencies pcoll)
```

Available options:

  - :checkpoint => Given a path, will store the resulting pcoll at this path in edn to facilitate dev/debug.
  - :coder => Uses a specific Coder for the results of this transform. Usually defaults to some form of nippy-coder.
sourceraw docstring

frequencies-fnclj

(frequencies-fn &
                {:keys [mapper predicate]
                 :or {mapper identity predicate (constantly true)}})
source

from-ednclj

source

generate-inputclj

(generate-input p)
(generate-input coll p)
(generate-input coll options p)

Generates a pcollection from the given collection. Also accepts empty collections. See https://cloud.google.com/dataflow/java-sdk/JavaDoc/com/google/cloud/dataflow/sdk/transforms/Create.html

Example:

(ds/generate-input (range 0 1000) pipeline)

Available options:

  • :checkpoint => Given a path, will store the resulting pcoll at this path in edn to facilitate dev/debug.
  • :coder => Uses a specific Coder for the results of this transform. Usually defaults to some form of nippy-coder.
Generates a pcollection from the given collection.
Also accepts empty collections.
See https://cloud.google.com/dataflow/java-sdk/JavaDoc/com/google/cloud/dataflow/sdk/transforms/Create.html

Example:
```
(ds/generate-input (range 0 1000) pipeline)
```

Available options:

  - :checkpoint => Given a path, will store the resulting pcoll at this path in edn to facilitate dev/debug.
  - :coder => Uses a specific Coder for the results of this transform. Usually defaults to some form of nippy-coder.
sourceraw docstring

get-pipeline-configurationclj

(get-pipeline-configuration)

Returns a map corresponding to the bean of configuration the current pipeline was run with. Must be called inside a function wrapping a ParDo, e.g. ds/map or ds/mapcat.

Returns a map corresponding to the bean of configuration the current pipeline was run with. Must be called inside a function wrapping a ParDo, e.g. ds/map or ds/mapcat.
sourceraw docstring

get-pipeline-optionsclj

(get-pipeline-options pipeline)

Returns a map corresponding to the bean of options the given pipeline was built with. Must be called on a PipelineWithOptions object produced by make-pipeline.

Returns a map corresponding to the bean of options the given pipeline was built with. Must be called on a `PipelineWithOptions` object produced by `make-pipeline`.
sourceraw docstring

group-byclj

(group-by f pcoll)
(group-by f {:keys [key-coder value-coder coder] :as options} pcoll)

Groups a Pcollection by the result of calling (f item) for each item.

This produces a sequence of KV values, similar to using seq with a map. Each value will be a list of the values that match key.

Example:

    (ds/group-by :a foo)
(ds/group-by count foo)

Available options:

  • :checkpoint => Given a path, will store the resulting pcoll at this path in edn to facilitate dev/debug.
  • :coder => Uses a specific Coder for the results of this transform. Usually defaults to some form of nippy-coder.
Groups a Pcollection by the result of calling (f item) for each item.

This produces a sequence of KV values, similar to using seq with a
map. Each value will be a list of the values that match key.

  Example:
```
    (ds/group-by :a foo)
(ds/group-by count foo)
```

Available options:

  - :checkpoint => Given a path, will store the resulting pcoll at this path in edn to facilitate dev/debug.
  - :coder => Uses a specific Coder for the results of this transform. Usually defaults to some form of nippy-coder.
sourceraw docstring

group-by-keyclj

(group-by-key pcoll)
(group-by-key {:keys [key-coder value-coder coder] :as options} pcoll)

Takes a KV PCollection as input and returns a KV PCollection as output of K to list of V.

See https://cloud.google.com/dataflow/java-sdk/JavaDoc/com/google/cloud/dataflow/sdk/transforms/GroupByKey.html

Takes a KV PCollection as input and returns a KV PCollection as output of K to list of V.

See https://cloud.google.com/dataflow/java-sdk/JavaDoc/com/google/cloud/dataflow/sdk/transforms/GroupByKey.html
sourceraw docstring

identityclj

(identity c)

Identity function for use in a ParDo

Identity function for use in a ParDo
sourceraw docstring

join-byclj

(join-by options specs)
(join-by {nam :name :as options} specs join-fn)

Takes a specification of the join between pcolls and returns a PCollection of the cartesian product (only difference from cogroup-by) of all elements joined according to the spec. The specification is a list of triple [pcoll f options].

  • pcoll is a pcoll on which to join
  • f is a joining function, used to produce the keys on which to join. Can be nil if the coll is already made up of KVs
  • options is a map configuring each sides of the join

Only one option is supported for now in join:

  • :type -> :optional or :required to select between left and right join. Defaults to :optional

Example:

(ds/join-by {:name :my-join-by
                :collector (fn [elt1 elt2]
                               (merge elt1 elt2))}
               [[pcoll1 :id {:type :required}]
                [pcoll2 (fn [elt] (:foreign-key elt)) {:type :optional}]])

See https://cloud.google.com/dataflow/java-sdk/JavaDoc/com/google/cloud/dataflow/sdk/transforms/join/CoGroupByKey and for a different approach to joins see cogroup-by

Available options:

  • :checkpoint => Given a path, will store the resulting pcoll at this path in edn to facilitate dev/debug.
  • :coder => Uses a specific Coder for the results of this transform. Usually defaults to some form of nippy-coder.
  • :collector => A collector fn to apply after the join. The signature is like map, one element for each pcoll in the join.
  • :name => Adds a name to the Transform.
Takes a specification of the join between pcolls and returns a PCollection of the cartesian product (only difference from cogroup-by) of all elements joined according to the spec. The specification is a list of triple [pcoll f options].

  - pcoll is a pcoll on which to join
  - f is a joining function, used to produce the keys on which to join. Can be nil if the coll is already made up of KVs
  - options is a map configuring each sides of the join

 Only one option is supported for now in join:

  - :type -> :optional or :required to select between left and right join. Defaults to :optional

Example:
```
(ds/join-by {:name :my-join-by
                :collector (fn [elt1 elt2]
                               (merge elt1 elt2))}
               [[pcoll1 :id {:type :required}]
                [pcoll2 (fn [elt] (:foreign-key elt)) {:type :optional}]])

```
See https://cloud.google.com/dataflow/java-sdk/JavaDoc/com/google/cloud/dataflow/sdk/transforms/join/CoGroupByKey and for a different approach to joins see [[cogroup-by]]

Available options:

  - :checkpoint => Given a path, will store the resulting pcoll at this path in edn to facilitate dev/debug.
  - :coder => Uses a specific Coder for the results of this transform. Usually defaults to some form of nippy-coder.
  - :collector => A collector fn to apply after the join. The signature is like map, one element for each pcoll in the join.
  - :name => Adds a name to the Transform.
sourceraw docstring

juxtclj

(juxt & fns)

Creates a CombineFn that applies multiple combiners in one go. Produces a vector of combined results. 'sibling fusion' in Dataflow optimizes multiple independant combiners in the same way, but you might find juxt more concise.

Only works with functions created with combine-fn or native clojure functions, and not with native Dataflow CombineFn

Example:

(ds/combine (ds/juxt + *) pcoll)
Creates a CombineFn that applies multiple combiners in one go. Produces a vector of combined results.
'sibling fusion' in Dataflow optimizes multiple independant combiners in the same way, but you might find juxt more concise.

Only works with functions created with combine-fn or native clojure functions, and not with native Dataflow CombineFn

Example:
```
(ds/combine (ds/juxt + *) pcoll)
```
sourceraw docstring

keyclj

(key elt)

Returns the key part of a KV or MapEntry.

Returns the key part of a KV or MapEntry.
sourceraw docstring

make-kvclj

(make-kv kv)
(make-kv k v)

Returns a KV object from the given arg(s), either [k v] or a MapEntry or seq of two elements.

Returns a KV object from the given arg(s), either [k v] or a MapEntry or seq of two elements.
sourceraw docstring

make-kv-coderclj

(make-kv-coder)
(make-kv-coder k-coder v-coder)

Returns an instance of a KvCoder using by default nippy for serialization.

Returns an instance of a KvCoder using by default nippy for serialization.
sourceraw docstring

make-nippy-coderclj

(make-nippy-coder)

Returns an instance of a CustomCoder using nippy for serialization

Returns an instance of a CustomCoder using nippy for serialization
sourceraw docstring

make-partition-mappingclj

(make-partition-mapping coll)
source

make-pipelinecljmacro

(make-pipeline & args)

Builds a Pipeline from command lines args and configuration. Also accepts a jobNameTemplate param which is a string in which the following var are interpolated:

  • %A -> Application name
  • %U -> User name
  • %T -> Timestamp

It means the template %A-%U-%T is equivalent to the default jobName

Builds a Pipeline from command lines args and configuration. Also accepts a jobNameTemplate param which is a string in which the following var are interpolated:

  - %A -> Application name
  - %U -> User name
  - %T -> Timestamp

It means the template %A-%U-%T is equivalent to the default jobName
sourceraw docstring

mapclj

(map f pcoll)
(map f options pcoll)

Returns a PCollection of f applied to every item in the source PCollection. Function f should be a function of one argument.

Example:

(ds/map inc foo)
(ds/map (fn [x] (* x x)) foo)

Note: Unlike clojure.core/map, datasplash.api/map takes only one PCollection.

Available options:

  • :checkpoint => Given a path, will store the resulting pcoll at this path in edn to facilitate dev/debug.
  • :coder => Uses a specific Coder for the results of this transform. Usually defaults to some form of nippy-coder.
  • :name => Adds a name to the Transform.
  • :side-inputs => Adds a map of PCollectionViews as side inputs to the underlying ParDo Transform. They can be accessed there by key in the return of side-inputs fn.
  • :side-outputs => Defines as a seq of keywords the output tags for the underlying ParDo Transform. The map fn should return a map with keys set to the same set of keywords.
  • :without-coercion-to-clj => Avoids coercing Dataflow types to Clojure, like KV. Coercion will happen by default
Returns a PCollection of f applied to every item in the source PCollection.
Function f should be a function of one argument.

Example:
```
(ds/map inc foo)
(ds/map (fn [x] (* x x)) foo)
```

Note: Unlike clojure.core/map, datasplash.api/map takes only one PCollection.

Available options:

  - :checkpoint => Given a path, will store the resulting pcoll at this path in edn to facilitate dev/debug.
  - :coder => Uses a specific Coder for the results of this transform. Usually defaults to some form of nippy-coder.
  - :name => Adds a name to the Transform.
  - :side-inputs => Adds a map of PCollectionViews as side inputs to the underlying ParDo Transform. They can be accessed there by key in the return of side-inputs fn.
  - :side-outputs => Defines as a seq of keywords the output tags for the underlying ParDo Transform. The map fn should return a map with keys set to the same set of keywords.
  - :without-coercion-to-clj => Avoids coercing Dataflow types to Clojure, like KV. Coercion will happen by default
sourceraw docstring

map-kvclj

(map-kv f pcoll)
(map-kv f options pcoll)

Returns a KV PCollection of f applied to every item in the source PCollection. Function f should be a function of one argument and return seq of keys/values.

Example:

(ds/map-kv (fn [{:keys [month revenue]}] [month revenue]) foo)

Note: Unlike clojure.core/map, datasplash.api/map-kv takes only one PCollection.

Available options:

  • :checkpoint => Given a path, will store the resulting pcoll at this path in edn to facilitate dev/debug.
  • :coder => Uses a specific Coder for the results of this transform. Usually defaults to some form of nippy-coder.
  • :name => Adds a name to the Transform.
  • :side-inputs => Adds a map of PCollectionViews as side inputs to the underlying ParDo Transform. They can be accessed there by key in the return of side-inputs fn.
  • :side-outputs => Defines as a seq of keywords the output tags for the underlying ParDo Transform. The map fn should return a map with keys set to the same set of keywords.
  • :without-coercion-to-clj => Avoids coercing Dataflow types to Clojure, like KV. Coercion will happen by default
Returns a KV PCollection of f applied to every item in the source PCollection.
Function f should be a function of one argument and return seq of keys/values.

Example:
```
(ds/map-kv (fn [{:keys [month revenue]}] [month revenue]) foo)
```

Note: Unlike clojure.core/map, datasplash.api/map-kv takes only one PCollection.

Available options:

  - :checkpoint => Given a path, will store the resulting pcoll at this path in edn to facilitate dev/debug.
  - :coder => Uses a specific Coder for the results of this transform. Usually defaults to some form of nippy-coder.
  - :name => Adds a name to the Transform.
  - :side-inputs => Adds a map of PCollectionViews as side inputs to the underlying ParDo Transform. They can be accessed there by key in the return of side-inputs fn.
  - :side-outputs => Defines as a seq of keywords the output tags for the underlying ParDo Transform. The map fn should return a map with keys set to the same set of keywords.
  - :without-coercion-to-clj => Avoids coercing Dataflow types to Clojure, like KV. Coercion will happen by default
sourceraw docstring

mapcatclj

(mapcat f pcoll)
(mapcat f options pcoll)

Returns the result of applying concat, or flattening, the result of applying f to each item in the PCollection. Thus f should return a Clojure or Java collection.

Example:

(ds/mapcat (fn [x] [(dec x) x (inc x)]) foo)

Available options:

  • :checkpoint => Given a path, will store the resulting pcoll at this path in edn to facilitate dev/debug.
  • :coder => Uses a specific Coder for the results of this transform. Usually defaults to some form of nippy-coder.
  • :name => Adds a name to the Transform.
  • :side-inputs => Adds a map of PCollectionViews as side inputs to the underlying ParDo Transform. They can be accessed there by key in the return of side-inputs fn.
  • :side-outputs => Defines as a seq of keywords the output tags for the underlying ParDo Transform. The map fn should return a map with keys set to the same set of keywords.
  • :without-coercion-to-clj => Avoids coercing Dataflow types to Clojure, like KV. Coercion will happen by default
Returns the result of applying concat, or flattening, the result of applying
f to each item in the PCollection. Thus f should return a Clojure or Java collection.

Example:
```
(ds/mapcat (fn [x] [(dec x) x (inc x)]) foo)
```

Available options:

  - :checkpoint => Given a path, will store the resulting pcoll at this path in edn to facilitate dev/debug.
  - :coder => Uses a specific Coder for the results of this transform. Usually defaults to some form of nippy-coder.
  - :name => Adds a name to the Transform.
  - :side-inputs => Adds a map of PCollectionViews as side inputs to the underlying ParDo Transform. They can be accessed there by key in the return of side-inputs fn.
  - :side-outputs => Defines as a seq of keywords the output tags for the underlying ParDo Transform. The map fn should return a map with keys set to the same set of keywords.
  - :without-coercion-to-clj => Avoids coercing Dataflow types to Clojure, like KV. Coercion will happen by default
sourceraw docstring

max-fnclj

(max-fn &
        {:keys [mapper predicate]
         :or {mapper identity predicate (constantly true)}})
source

mean-fnclj

(mean-fn &
         {:keys [mapper predicate]
          :or {mapper identity predicate (constantly true)}})
source

min-fnclj

(min-fn &
        {:keys [mapper predicate]
         :or {mapper identity predicate (constantly true)}})
source

pardoclj

(pardo f pcoll)
(pardo f options pcoll)

Uses a raw pardo-fn as a ppardo transform Function f should be a function of one argument, the Pardo$Context object.

Available options:

  • :checkpoint => Given a path, will store the resulting pcoll at this path in edn to facilitate dev/debug.
  • :coder => Uses a specific Coder for the results of this transform. Usually defaults to some form of nippy-coder.
  • :name => Adds a name to the Transform.
  • :side-inputs => Adds a map of PCollectionViews as side inputs to the underlying ParDo Transform. They can be accessed there by key in the return of side-inputs fn.
  • :side-outputs => Defines as a seq of keywords the output tags for the underlying ParDo Transform. The map fn should return a map with keys set to the same set of keywords.
  • :without-coercion-to-clj => Avoids coercing Dataflow types to Clojure, like KV. Coercion will happen by default
Uses a raw pardo-fn as a ppardo transform
Function f should be a function of one argument, the Pardo$Context object.

Available options:

  - :checkpoint => Given a path, will store the resulting pcoll at this path in edn to facilitate dev/debug.
  - :coder => Uses a specific Coder for the results of this transform. Usually defaults to some form of nippy-coder.
  - :name => Adds a name to the Transform.
  - :side-inputs => Adds a map of PCollectionViews as side inputs to the underlying ParDo Transform. They can be accessed there by key in the return of side-inputs fn.
  - :side-outputs => Defines as a seq of keywords the output tags for the underlying ParDo Transform. The map fn should return a map with keys set to the same set of keywords.
  - :without-coercion-to-clj => Avoids coercing Dataflow types to Clojure, like KV. Coercion will happen by default
sourceraw docstring

partition-byclj

(partition-by f num pcoll)
(partition-by f num options pcoll)

Partitions the content of pcoll according to the PartitionFn. See https://cloud.google.com/dataflow/java-sdk/JavaDoc/com/google/cloud/dataflow/sdk/transforms/Partition. The partition function is given two arguments: the current element and the number of partitions.

Available options:

  • :checkpoint => Given a path, will store the resulting pcoll at this path in edn to facilitate dev/debug.
  • :coder => Uses a specific Coder for the results of this transform. Usually defaults to some form of nippy-coder.
  • :name => Adds a name to the Transform.
Partitions the content of pcoll according to the PartitionFn.
See https://cloud.google.com/dataflow/java-sdk/JavaDoc/com/google/cloud/dataflow/sdk/transforms/Partition.
The partition function is given two arguments: the current element and the number of partitions.

Available options:

  - :checkpoint => Given a path, will store the resulting pcoll at this path in edn to facilitate dev/debug.
  - :coder => Uses a specific Coder for the results of this transform. Usually defaults to some form of nippy-coder.
  - :name => Adds a name to the Transform.
sourceraw docstring

partition-fnclj

(partition-fn f)

Returns a Partition.PartitionFn if possible

Returns a Partition.PartitionFn if possible
sourceraw docstring

pt->>cljmacro

(pt->> nam input & body)

Creates and applies a single named PTransform from a sequence of transforms on a single PCollection. You can use it as you would use ->> in Clojure.

Example:

(ds/pt->> :transform-name input-pcollection
          (ds/map inc {:name :inc})
          (ds/filter even? {:name :even?}))
Creates and applies a single named PTransform from a sequence of transforms on a single PCollection. You can use it as you would use ->> in Clojure.

Example:
```
(ds/pt->> :transform-name input-pcollection
          (ds/map inc {:name :inc})
          (ds/filter even? {:name :even?}))
```
sourceraw docstring

ptransformcljmacro

(ptransform nam input & body)

Generates a PTransform with the given name, apply signature and body. Should rarely by used in user code, see pt->> for the more general use case in application code.

Example (actual implementation of the group-by transform):

(ptransform
 :group-by
 [^PCollection pcoll]
 (->> pcoll
      (ds/with-keys f opts)
      (ds/group-by-key opts)))
Generates a PTransform with the given name, apply signature and body. Should rarely by used in user code, see [[pt->>]] for the more general use case in application code.

Example (actual implementation of the group-by transform):
```
(ptransform
 :group-by
 [^PCollection pcoll]
 (->> pcoll
      (ds/with-keys f opts)
      (ds/group-by-key opts)))
```
sourceraw docstring

read-edn-fileclj

(read-edn-file from p)
(read-edn-file from options p)

Reads a PCollection of edn strings from disk or Google Storage, with records separated by newlines. See https://cloud.google.com/dataflow/java-sdk/JavaDoc/com/google/cloud/dataflow/sdk/io/TextIO.Read.html Example:

(read-edn-file "gs://target/path" pcoll)

Available options:

  • :checkpoint => Given a path, will store the resulting pcoll at this path in edn to facilitate dev/debug.
  • :coder => Uses a specific Coder for the results of this transform. Usually defaults to some form of nippy-coder.
  • :compression-type => Choose compression type. :auto by default. | One of (:auto :bzip2 :gzip :zip :deflate :uncompressed)
  • :delimiter => Specify delimiter
  • :empty-match-treatment => Options for allowing or disallowing filepatterns that match no resources | One of (:allow :allow-if-wildcard :disallow)
  • :many-files => Hints that the filepattern specified matches a very large number of files.
  • :watch-new-files => watch if new files arrives and handle them in streaming
Reads a PCollection of edn strings from disk or Google Storage, with records separated by newlines.
See https://cloud.google.com/dataflow/java-sdk/JavaDoc/com/google/cloud/dataflow/sdk/io/TextIO.Read.html
Example:
```
(read-edn-file "gs://target/path" pcoll)
```

Available options:

  - :checkpoint => Given a path, will store the resulting pcoll at this path in edn to facilitate dev/debug.
  - :coder => Uses a specific Coder for the results of this transform. Usually defaults to some form of nippy-coder.
  - :compression-type => Choose compression type. :auto by default. | One of (:auto :bzip2 :gzip :zip :deflate :uncompressed)
  - :delimiter => Specify delimiter
  - :empty-match-treatment => Options for allowing or disallowing filepatterns that match no resources | One of (:allow :allow-if-wildcard :disallow)
  - :many-files => Hints that the filepattern specified matches a very large number of files.
  - :watch-new-files => watch if new files arrives and handle them in streaming
sourceraw docstring

read-edn-filesclj

(read-edn-files from)
(read-edn-files options from)

Reads multiple EDN files from a PCollection of Strings from disk or Google Storage, with records separated by newlines.

See https://cloud.google.com/dataflow/java-sdk/JavaDoc/com/google/cloud/dataflow/sdk/io/TextIO.ReadFiles.html

Example:

(->> (ds/generate-input ["gs://target/path" "gs://target/another-path"] pipeline)
     (read-edn-files))

Available options:

  • :checkpoint => Given a path, will store the resulting pcoll at this path in edn to facilitate dev/debug.
  • :coder => Uses a specific Coder for the results of this transform. Usually defaults to some form of nippy-coder.
  • :compression-type => Choose compression type. :auto by default. | One of (:auto :bzip2 :gzip :zip :deflate :uncompressed)
  • :delimiter => Specify delimiter
  • :empty-match-treatment => Options for allowing or disallowing filepatterns that match no resources | One of (:allow :allow-if-wildcard :disallow)
  • :many-files => Hints that the filepattern specified matches a very large number of files.
  • :watch-new-files => watch if new files arrives and handle them in streaming
Reads multiple EDN files from a PCollection of Strings from disk or Google Storage, with records separated by newlines.

See https://cloud.google.com/dataflow/java-sdk/JavaDoc/com/google/cloud/dataflow/sdk/io/TextIO.ReadFiles.html

Example:
```
(->> (ds/generate-input ["gs://target/path" "gs://target/another-path"] pipeline)
     (read-edn-files))
```

Available options:

  - :checkpoint => Given a path, will store the resulting pcoll at this path in edn to facilitate dev/debug.
  - :coder => Uses a specific Coder for the results of this transform. Usually defaults to some form of nippy-coder.
  - :compression-type => Choose compression type. :auto by default. | One of (:auto :bzip2 :gzip :zip :deflate :uncompressed)
  - :delimiter => Specify delimiter
  - :empty-match-treatment => Options for allowing or disallowing filepatterns that match no resources | One of (:allow :allow-if-wildcard :disallow)
  - :many-files => Hints that the filepattern specified matches a very large number of files.
  - :watch-new-files => watch if new files arrives and handle them in streaming
sourceraw docstring

read-json-fileclj

(read-json-file from p)
(read-json-file from {:keys [key-fn return-type] :as options} p)

Reads a PCollection of JSON strings from disk or Google Storage, with records separated by newlines.

See https://cloud.google.com/dataflow/java-sdk/JavaDoc/com/google/cloud/dataflow/sdk/io/TextIO.Read.html and https://github.com/dakrone/cheshire#decoding for details on options.

Example:

(read-json-file "gs://target/path" pcoll)

Available options:

  • :checkpoint => Given a path, will store the resulting pcoll at this path in edn to facilitate dev/debug.
  • :coder => Uses a specific Coder for the results of this transform. Usually defaults to some form of nippy-coder.
  • :compression-type => Choose compression type. :auto by default. | One of (:auto :bzip2 :gzip :zip :deflate :uncompressed)
  • :delimiter => Specify delimiter
  • :empty-match-treatment => Options for allowing or disallowing filepatterns that match no resources | One of (:allow :allow-if-wildcard :disallow)
  • :key-fn => Selects a policy for parsing map keys. If true, keywordizes the keys. If given a fn, uses it to transform each map key.
  • :many-files => Hints that the filepattern specified matches a very large number of files.
  • :return-type => Allows passing in a function to specify what kind of types to return.
  • :watch-new-files => watch if new files arrives and handle them in streaming
Reads a PCollection of JSON strings from disk or Google Storage, with records separated by newlines.

See https://cloud.google.com/dataflow/java-sdk/JavaDoc/com/google/cloud/dataflow/sdk/io/TextIO.Read.html and https://github.com/dakrone/cheshire#decoding for details on options.

Example:
```
(read-json-file "gs://target/path" pcoll)
```

Available options:

  - :checkpoint => Given a path, will store the resulting pcoll at this path in edn to facilitate dev/debug.
  - :coder => Uses a specific Coder for the results of this transform. Usually defaults to some form of nippy-coder.
  - :compression-type => Choose compression type. :auto by default. | One of (:auto :bzip2 :gzip :zip :deflate :uncompressed)
  - :delimiter => Specify delimiter
  - :empty-match-treatment => Options for allowing or disallowing filepatterns that match no resources | One of (:allow :allow-if-wildcard :disallow)
  - :key-fn => Selects a policy for parsing map keys. If true, keywordizes the keys. If given a fn, uses it to transform each map key.
  - :many-files => Hints that the filepattern specified matches a very large number of files.
  - :return-type => Allows passing in a function to specify what kind of types to return.
  - :watch-new-files => watch if new files arrives and handle them in streaming
sourceraw docstring

read-json-filesclj

(read-json-files from)
(read-json-files {:keys [key-fn return-type] :as options} from)

Reads multiple JSON files from a PCollection of Strings from disk or Google Storage, with records separated by newlines.

See https://cloud.google.com/dataflow/java-sdk/JavaDoc/com/google/cloud/dataflow/sdk/io/TextIO.ReadFiles.html

Example:

(->> (ds/generate-input ["gs://target/path" "gs://target/another-path"] pipeline)
     (read-json-files))

Available options:

  • :checkpoint => Given a path, will store the resulting pcoll at this path in edn to facilitate dev/debug.
  • :coder => Uses a specific Coder for the results of this transform. Usually defaults to some form of nippy-coder.
  • :key-fn => Selects a policy for parsing map keys. If true, keywordizes the keys. If given a fn, uses it to transform each map key.
  • :return-type => Allows passing in a function to specify what kind of types to return.
Reads multiple JSON files from a PCollection of Strings from disk or Google Storage, with records separated by newlines.

See https://cloud.google.com/dataflow/java-sdk/JavaDoc/com/google/cloud/dataflow/sdk/io/TextIO.ReadFiles.html

Example:
```
(->> (ds/generate-input ["gs://target/path" "gs://target/another-path"] pipeline)
     (read-json-files))
```

Available options:

  - :checkpoint => Given a path, will store the resulting pcoll at this path in edn to facilitate dev/debug.
  - :coder => Uses a specific Coder for the results of this transform. Usually defaults to some form of nippy-coder.
  - :key-fn => Selects a policy for parsing map keys. If true, keywordizes the keys. If given a fn, uses it to transform each map key.
  - :return-type => Allows passing in a function to specify what kind of types to return.
sourceraw docstring

read-text-fileclj

(read-text-file from p)
(read-text-file from options p)

Reads a PCollection of Strings from disk or Google Storage, with records separated by newlines.

See https://cloud.google.com/dataflow/java-sdk/JavaDoc/com/google/cloud/dataflow/sdk/io/TextIO.Read.html

Example:

(read-text-file "gs://target/path" pcoll)

Available options:

  • :checkpoint => Given a path, will store the resulting pcoll at this path in edn to facilitate dev/debug.
  • :coder => Uses a specific Coder for the results of this transform. Usually defaults to some form of nippy-coder.
  • :compression-type => Choose compression type. :auto by default. | One of (:auto :bzip2 :gzip :zip :deflate :uncompressed)
  • :delimiter => Specify delimiter
  • :empty-match-treatment => Options for allowing or disallowing filepatterns that match no resources | One of (:allow :allow-if-wildcard :disallow)
  • :many-files => Hints that the filepattern specified matches a very large number of files.
  • :watch-new-files => watch if new files arrives and handle them in streaming
Reads a PCollection of Strings from disk or Google Storage, with records separated by newlines.

See https://cloud.google.com/dataflow/java-sdk/JavaDoc/com/google/cloud/dataflow/sdk/io/TextIO.Read.html

Example:
```
(read-text-file "gs://target/path" pcoll)
```

Available options:

  - :checkpoint => Given a path, will store the resulting pcoll at this path in edn to facilitate dev/debug.
  - :coder => Uses a specific Coder for the results of this transform. Usually defaults to some form of nippy-coder.
  - :compression-type => Choose compression type. :auto by default. | One of (:auto :bzip2 :gzip :zip :deflate :uncompressed)
  - :delimiter => Specify delimiter
  - :empty-match-treatment => Options for allowing or disallowing filepatterns that match no resources | One of (:allow :allow-if-wildcard :disallow)
  - :many-files => Hints that the filepattern specified matches a very large number of files.
  - :watch-new-files => watch if new files arrives and handle them in streaming
sourceraw docstring

read-text-filesclj

(read-text-files from)
(read-text-files options from)

Reads multiple text files from a PCollection of Strings from disk or Google Storage, with records separated by newlines.

See https://cloud.google.com/dataflow/java-sdk/JavaDoc/com/google/cloud/dataflow/sdk/io/TextIO.ReadFiles.html

Example:

(->> (ds/generate-input ["gs://target/path" "gs://target/another-path"] pipeline)
     (read-text-files))

Available options:

Reads multiple text files from a PCollection of Strings from disk or Google Storage, with records separated by newlines.

See https://cloud.google.com/dataflow/java-sdk/JavaDoc/com/google/cloud/dataflow/sdk/io/TextIO.ReadFiles.html

Example:
```
(->> (ds/generate-input ["gs://target/path" "gs://target/another-path"] pipeline)
     (read-text-files))
```

Available options:
sourceraw docstring

run-pipelineclj

(run-pipeline topology)

Run the computation for a given pipeline or PCollection.

Run the computation for a given pipeline or PCollection.
sourceraw docstring

safe-execcljmacro

(safe-exec & body)

Executes body while trying to sanely require missing ns if the runtime is not yet properly loaded for Clojure in distributed mode. Always wrap try block with this if you intend to eat every Exception produced.

(ds/map (fn [elt]
          (try
            (ds/safe-exec (dangerous-parse-fn elt))
            (catch Exception e
              (log/error e "parsing error"))))
        pcoll)
Executes body while trying to sanely require missing ns if the runtime is not yet properly loaded for Clojure in distributed mode. Always wrap try block with this if you intend to eat every Exception produced.
```
(ds/map (fn [elt]
          (try
            (ds/safe-exec (dangerous-parse-fn elt))
            (catch Exception e
              (log/error e "parsing error"))))
        pcoll)
```
sourceraw docstring

safe-exec-cfgcljmacro

(safe-exec-cfg config & body)

Like safe-exec, but takes a map as first argument containing the name of the ptransform for better error message

Like [[safe-exec]], but takes a map as first argument containing the name of the ptransform for better error message
sourceraw docstring

sampleclj

(sample size pcoll)
(sample size {:keys [scope] :as options} pcoll)

Takes samples of the elements in a PCollection, or samples of the values associated with each key in a PCollection of KVs.

See https://cloud.google.com/dataflow/java-sdk/JavaDoc/com/google/cloud/dataflow/sdk/transforms/Sample.html

Example:

    (ds/sample 100 {:scope :per-key} pcoll)

Available options:

  • :checkpoint => Given a path, will store the resulting pcoll at this path in edn to facilitate dev/debug.
  • :coder => Uses a specific Coder for the results of this transform. Usually defaults to some form of nippy-coder.
  • :scope => Scope given to the combinating operation. One of (:globally :per-key).
Takes samples of the elements in a PCollection, or samples of the values associated with each key in a PCollection of KVs.

See https://cloud.google.com/dataflow/java-sdk/JavaDoc/com/google/cloud/dataflow/sdk/transforms/Sample.html

  Example:
```
    (ds/sample 100 {:scope :per-key} pcoll)
```

Available options:

  - :checkpoint => Given a path, will store the resulting pcoll at this path in edn to facilitate dev/debug.
  - :coder => Uses a specific Coder for the results of this transform. Usually defaults to some form of nippy-coder.
  - :scope => Scope given to the combinating operation. One of (:globally :per-key).
sourceraw docstring

session-windowsclj

(session-windows gap pcoll)
(session-windows gap options pcoll)

Apply a Session window to divide a PCollection (useful for unbounded PCollections).

See https://cloud.google.com/dataflow/model/windowing#setting-session-windows

Example:

(require '[clj-time.core :as time])
(ds/session-windows (time/minutes 10) pcoll)

Available options:

  • :accumulate-mode => Accumulate mode when a Trigger is fired (accumulate or discard)
  • :checkpoint => Given a path, will store the resulting pcoll at this path in edn to facilitate dev/debug.
  • :coder => Uses a specific Coder for the results of this transform. Usually defaults to some form of nippy-coder.
  • :name => Adds a name to the Transform.
  • :trigger => Adds a Trigger to the Window.
  • :with-allowed-lateness => Allow late data. Mandatory for custom trigger
Apply a Session window to divide a PCollection (useful for unbounded PCollections).

See https://cloud.google.com/dataflow/model/windowing#setting-session-windows

Example:
```
(require '[clj-time.core :as time])
(ds/session-windows (time/minutes 10) pcoll)
```

Available options:

  - :accumulate-mode => Accumulate mode when a Trigger is fired (accumulate or discard)
  - :checkpoint => Given a path, will store the resulting pcoll at this path in edn to facilitate dev/debug.
  - :coder => Uses a specific Coder for the results of this transform. Usually defaults to some form of nippy-coder.
  - :name => Adds a name to the Transform.
  - :trigger => Adds a Trigger to the Window.
  - :with-allowed-lateness => Allow late data. Mandatory for custom trigger
sourceraw docstring

sfnclj

(sfn f)

Returns an instance of SerializableFunction equivalent to f.

Returns an instance of SerializableFunction equivalent to f.
sourceraw docstring

side-inputsclj

(side-inputs)

In the context of a ParDo, returns the corresponding side inputs as a map from names to values.

Example:

    (let [input (ds/generate-input [1 2 3 4 5] p)
          side-input (ds/view (ds/generate-input [{1 :a 2 :b 3 :c 4 :d 5 :e}] p))
          proc (ds/map (fn [x] (get-in (ds/side-inputs) [:mapping x]))
                       {:side-inputs {:mapping side-input}} input)])
In the context of a ParDo, returns the corresponding side inputs as a map from names to values.

  Example:
```
    (let [input (ds/generate-input [1 2 3 4 5] p)
          side-input (ds/view (ds/generate-input [{1 :a 2 :b 3 :c 4 :d 5 :e}] p))
          proc (ds/map (fn [x] (get-in (ds/side-inputs) [:mapping x]))
                       {:side-inputs {:mapping side-input}} input)])
```
sourceraw docstring

side-outputsclj

(side-outputs & kvs)

Returns multiple outputs keyed by keyword. Example:

(let [input (ds/generate-input [1 2 3 4 5] p)
   ;; simple and multi are pcoll with their respective elements)
   {:keys [simple multi]} (ds/map (fn [x] (ds/side-outputs :simple x :multi (* x 10)))
                                  {:side-outputs [:simple :multi]} input)])
Returns multiple outputs keyed by keyword.
   Example:
   ```
(let [input (ds/generate-input [1 2 3 4 5] p)
      ;; simple and multi are pcoll with their respective elements)
      {:keys [simple multi]} (ds/map (fn [x] (ds/side-outputs :simple x :multi (* x 10)))
                                     {:side-outputs [:simple :multi]} input)])
   ```
sourceraw docstring

sliding-windowsclj

(sliding-windows width step pcoll)
(sliding-windows width step options pcoll)

Apply a sliding window to divide a PCollection (useful for unbounded PCollections).

See https://cloud.google.com/dataflow/model/windowing#setting-sliding-time-windows

Example:

(require '[clj-time.core :as time])
(ds/sliding-windows (time/minutes 30) (time/seconds 5) pcoll)

Available options:

  • :accumulate-mode => Accumulate mode when a Trigger is fired (accumulate or discard)
  • :checkpoint => Given a path, will store the resulting pcoll at this path in edn to facilitate dev/debug.
  • :coder => Uses a specific Coder for the results of this transform. Usually defaults to some form of nippy-coder.
  • :name => Adds a name to the Transform.
  • :trigger => Adds a Trigger to the Window.
  • :with-allowed-lateness => Allow late data. Mandatory for custom trigger
Apply a sliding window to divide a PCollection (useful for unbounded PCollections).

See https://cloud.google.com/dataflow/model/windowing#setting-sliding-time-windows

Example:
```
(require '[clj-time.core :as time])
(ds/sliding-windows (time/minutes 30) (time/seconds 5) pcoll)
```

Available options:

  - :accumulate-mode => Accumulate mode when a Trigger is fired (accumulate or discard)
  - :checkpoint => Given a path, will store the resulting pcoll at this path in edn to facilitate dev/debug.
  - :coder => Uses a specific Coder for the results of this transform. Usually defaults to some form of nippy-coder.
  - :name => Adds a name to the Transform.
  - :trigger => Adds a Trigger to the Window.
  - :with-allowed-lateness => Allow late data. Mandatory for custom trigger
sourceraw docstring

sum-fnclj

(sum-fn &
        {:keys [mapper predicate]
         :or {mapper identity predicate (constantly true)}})
source

to-ednclj

source

valclj

(val elt)

Returns the value part of a KV or MapEntry.

Returns the value part of a KV or MapEntry.
sourceraw docstring

viewclj

(view pcoll)
(view {:keys [type] :or {type :singleton} :as options} pcoll)

Produces a View out of a PColl, to be later consumed as a side-input for example. See https://cloud.google.com/dataflow/java-sdk/JavaDoc/

Available options:

  • :checkpoint => Given a path, will store the resulting pcoll at this path in edn to facilitate dev/debug.
  • :coder => Uses a specific Coder for the results of this transform. Usually defaults to some form of nippy-coder.
  • :default => Sets a default value for SingletonView
  • :type => Type of View | One of [:singleton :iterable :list :map :multi-map] | Defaults to :singleton
Produces a View out of a PColl, to be later consumed as a side-input for example. See https://cloud.google.com/dataflow/java-sdk/JavaDoc/

Available options:

  - :checkpoint => Given a path, will store the resulting pcoll at this path in edn to facilitate dev/debug.
  - :coder => Uses a specific Coder for the results of this transform. Usually defaults to some form of nippy-coder.
  - :default => Sets a default value for SingletonView
  - :type => Type of View | One of [:singleton :iterable :list :map :multi-map] | Defaults to :singleton
sourceraw docstring

wait-pipeline-resultclj

(wait-pipeline-result pip-res)
(wait-pipeline-result pip-res timeout)

Blocks until this PipelineResult finishes. Returns the final state.

Blocks until this PipelineResult finishes. Returns the final state.
sourceraw docstring

with-keysclj

(with-keys f pcoll)
(with-keys f {:keys [key-coder value-coder coder] :as options} pcoll)

Returns a PCollection of KV by applying f on each element of the input PColelction and using the return value as the key and the element as the value.

See https://cloud.google.com/dataflow/java-sdk/JavaDoc/com/google/cloud/dataflow/sdk/transforms/WithKeys.html

Example:

(with-keys even? pcoll)

Available options:

  • :checkpoint => Given a path, will store the resulting pcoll at this path in edn to facilitate dev/debug.
  • :coder => Uses a specific Coder for the results of this transform. Usually defaults to some form of nippy-coder.
  • :key-coder => Coder to be used for encoding keys in the resulting KV PColl.
  • :value-coder => Coder to be used for encoding values in the resulting KV PColl.
Returns a PCollection of KV by applying f on each element of the input PColelction and using the return value as the key and the element as the value.

See https://cloud.google.com/dataflow/java-sdk/JavaDoc/com/google/cloud/dataflow/sdk/transforms/WithKeys.html

Example:
```
(with-keys even? pcoll)
```

Available options:

  - :checkpoint => Given a path, will store the resulting pcoll at this path in edn to facilitate dev/debug.
  - :coder => Uses a specific Coder for the results of this transform. Usually defaults to some form of nippy-coder.
  - :key-coder => Coder to be used for encoding keys in the resulting KV PColl.
  - :value-coder => Coder to be used for encoding values in the resulting KV PColl.
sourceraw docstring

with-timestampclj

(with-timestamp timestamp result)

Returns element(s) with the given timestamp as Timestamp. Anything that can be coerced by clj-time can be given as input. It can be nested inside a (side-outputs) or outside (in which case it applies to all results). Exemple:

(ds/map (fn [e] (ds/with-timestamp (clj-time.core/now) (* 2 e)) pcoll))
Returns element(s) with the given timestamp as Timestamp. Anything that can be coerced by clj-time can be given as input.
 It can be nested inside a `(side-outputs)` or outside (in which case it applies to all results).
 Exemple:
```
(ds/map (fn [e] (ds/with-timestamp (clj-time.core/now) (* 2 e)) pcoll))
```
sourceraw docstring

write-edn-fileclj

(write-edn-file to pcoll)
(write-edn-file to options pcoll)

Writes a PCollection of data to disk or Google Storage, with edn records separated by newlines.

See https://cloud.google.com/dataflow/java-sdk/JavaDoc/com/google/cloud/dataflow/sdk/io/TextIO.Write.html

Example:

(write-edn-file "gs://target/path" pcoll)

Available options:

  • :checkpoint => Given a path, will store the resulting pcoll at this path in edn to facilitate dev/debug.
  • :coder => Uses a specific Coder for the results of this transform. Usually defaults to some form of nippy-coder.
  • :compression-type => Choose compression type. | One of (:auto :bzip2 :gzip :zip :deflate :uncompressed)
  • :dynamic-fn => Uses the dynamic write to change destination file according to the content of the item
  • :file-format => Choose file format.
  • :naming-fn => Uses the naming fn
  • :num-shards => Selects the desired number of output shards (file fragments). 0 to let the system decide (recommended).
  • :prefix => Uses the given filename prefix.
  • :suffix => Uses the given filename suffix.
  • :temp-directory => Use temp directory when using Filename Policy as output (see filename-policy fn)
Writes a PCollection of data to disk or Google Storage, with edn records separated by newlines.

See https://cloud.google.com/dataflow/java-sdk/JavaDoc/com/google/cloud/dataflow/sdk/io/TextIO.Write.html

Example:
```
(write-edn-file "gs://target/path" pcoll)
```

Available options:

  - :checkpoint => Given a path, will store the resulting pcoll at this path in edn to facilitate dev/debug.
  - :coder => Uses a specific Coder for the results of this transform. Usually defaults to some form of nippy-coder.
  - :compression-type => Choose compression type. | One of (:auto :bzip2 :gzip :zip :deflate :uncompressed)
  - :dynamic-fn => Uses the dynamic write to change destination file according to the content of the item
  - :file-format => Choose file format.
  - :naming-fn => Uses the naming fn
  - :num-shards => Selects the desired number of output shards (file fragments). 0 to let the system decide (recommended).
  - :prefix => Uses the given filename prefix.
  - :suffix => Uses the given filename suffix.
  - :temp-directory => Use temp directory when using Filename Policy as output (see filename-policy fn)
sourceraw docstring

write-json-fileclj

(write-json-file to pcoll)
(write-json-file to options pcoll)

Writes a PCollection of data to disk or Google Storage, with JSON records separated by newlines.

See https://cloud.google.com/dataflow/java-sdk/JavaDoc/com/google/cloud/dataflow/sdk/io/TextIO.Write.html and https://github.com/dakrone/cheshire#encoding for details on options

Example:

(write-json-file "gs://target/path" pcoll)

Available options:

  • :checkpoint => Given a path, will store the resulting pcoll at this path in edn to facilitate dev/debug.
  • :coder => Uses a specific Coder for the results of this transform. Usually defaults to some form of nippy-coder.
  • :compression-type => Choose compression type. | One of (:auto :bzip2 :gzip :zip :deflate :uncompressed)
  • :date-format => Pattern for encoding java.util.Date objects. Defaults to yyyy-MM-dd'T'HH:mm:ss'Z'
  • :dynamic-fn => Uses the dynamic write to change destination file according to the content of the item
  • :escape-non-ascii => Generate JSON escaping UTF-8.
  • :file-format => Choose file format.
  • :key-fn => Generate JSON and munge keys with a custom function.
  • :naming-fn => Uses the naming fn
  • :num-shards => Selects the desired number of output shards (file fragments). 0 to let the system decide (recommended).
  • :prefix => Uses the given filename prefix.
  • :suffix => Uses the given filename suffix.
  • :temp-directory => Use temp directory when using Filename Policy as output (see filename-policy fn)
Writes a PCollection of data to disk or Google Storage, with JSON records separated by newlines.

See https://cloud.google.com/dataflow/java-sdk/JavaDoc/com/google/cloud/dataflow/sdk/io/TextIO.Write.html and https://github.com/dakrone/cheshire#encoding for details on options

Example:
```
(write-json-file "gs://target/path" pcoll)
```

Available options:

  - :checkpoint => Given a path, will store the resulting pcoll at this path in edn to facilitate dev/debug.
  - :coder => Uses a specific Coder for the results of this transform. Usually defaults to some form of nippy-coder.
  - :compression-type => Choose compression type. | One of (:auto :bzip2 :gzip :zip :deflate :uncompressed)
  - :date-format => Pattern for encoding java.util.Date objects. Defaults to yyyy-MM-dd'T'HH:mm:ss'Z'
  - :dynamic-fn => Uses the dynamic write to change destination file according to the content of the item
  - :escape-non-ascii => Generate JSON escaping UTF-8.
  - :file-format => Choose file format.
  - :key-fn => Generate JSON and munge keys with a custom function.
  - :naming-fn => Uses the naming fn
  - :num-shards => Selects the desired number of output shards (file fragments). 0 to let the system decide (recommended).
  - :prefix => Uses the given filename prefix.
  - :suffix => Uses the given filename suffix.
  - :temp-directory => Use temp directory when using Filename Policy as output (see filename-policy fn)
sourceraw docstring

write-text-fileclj

(write-text-file to {:keys [dynamic? dynamic-fn file-format] :as options} pcoll)

Writes a PCollection of Strings to disk or Google Storage, with records separated by newlines.

See https://cloud.google.com/dataflow/java-sdk/JavaDoc/com/google/cloud/dataflow/sdk/io/TextIO.Write.html

Example:

    (write-text-file "gs://target/path" pcoll)

Available options:

  • :checkpoint => Given a path, will store the resulting pcoll at this path in edn to facilitate dev/debug.
  • :coder => Uses a specific Coder for the results of this transform. Usually defaults to some form of nippy-coder.
  • :compression-type => Choose compression type. | One of (:auto :bzip2 :gzip :zip :deflate :uncompressed)
  • :dynamic-fn => Uses the dynamic write to change destination file according to the content of the item
  • :file-format => Choose file format.
  • :naming-fn => Uses the naming fn
  • :num-shards => Selects the desired number of output shards (file fragments). 0 to let the system decide (recommended).
  • :prefix => Uses the given filename prefix.
  • :suffix => Uses the given filename suffix.
  • :temp-directory => Use temp directory when using Filename Policy as output (see filename-policy fn)
Writes a PCollection of Strings to disk or Google Storage, with records separated by newlines.

See https://cloud.google.com/dataflow/java-sdk/JavaDoc/com/google/cloud/dataflow/sdk/io/TextIO.Write.html

  Example:
```
    (write-text-file "gs://target/path" pcoll)
```

Available options:

  - :checkpoint => Given a path, will store the resulting pcoll at this path in edn to facilitate dev/debug.
  - :coder => Uses a specific Coder for the results of this transform. Usually defaults to some form of nippy-coder.
  - :compression-type => Choose compression type. | One of (:auto :bzip2 :gzip :zip :deflate :uncompressed)
  - :dynamic-fn => Uses the dynamic write to change destination file according to the content of the item
  - :file-format => Choose file format.
  - :naming-fn => Uses the naming fn
  - :num-shards => Selects the desired number of output shards (file fragments). 0 to let the system decide (recommended).
  - :prefix => Uses the given filename prefix.
  - :suffix => Uses the given filename suffix.
  - :temp-directory => Use temp directory when using Filename Policy as output (see filename-policy fn)
sourceraw docstring

cljdoc is a website building & hosting documentation for Clojure/Script libraries

× close