Liking cljdoc? Tell your friends :D
Clojure only.

datasplash.core


*pipeline-builder-caller*clj

source

->combine-fnclj

(->combine-fn f)

Returns a CombineFn if f is not one already.

Returns a CombineFn if f is not one already.
sourceraw docstring

->optionsclj

(->options o)
source

accumulation-mode-enumclj

source

apply-transformclj

(apply-transform pcoll
                 transform
                 schema
                 {:keys [coder coll-name side-outputs checkpoint] :as options})

apply the PTransform to the given Pcoll applying options according to schema.

apply the PTransform to the given Pcoll applying options according to schema.
sourceraw docstring

base-combine-schemaclj

source

base-schemaclj

source

clean-filenameclj

(clean-filename s)

Clean filename for transform name building purpose.

Clean filename for transform name building purpose.
sourceraw docstring

clj->kvclj

(clj->kv obj)

Coerce from Clojure data to KV objects

Coerce from Clojure data to KV objects
sourceraw docstring

cogroup-byclj

(cogroup-by options specs)
(cogroup-by options specs reduce-fn)

Takes a specification of the join between pcolls and returns a PCollection of KVs (unless a :collector fn is given) with values being list of values corresponding to the key-fn. The specification is a list of triple [pcoll f options].

  • pcoll is a pcoll on which to join
  • f is a joining function, used to produce the keys on which to join. Can be nil if the coll is already made up of KVs
  • options is a map configuring each sides of the join

Only one option is supported for now in join:

  • :type -> :optional or :required to select between left and right join. Defaults to :optional

Example:

(ds/cogroup-by {:name :my-cogroup-by
                :collector (fn [[key [list-of-elts-from-pcoll1 list-of-elts-from-pcoll2]]]
                               [key (concat list-of-elts-from-pcoll1 list-of-elts-from-pcoll2)])}
               [[pcoll1 :id {:type :required}]
                [pcoll2 (fn [elt] (:foreign-key elt)) {:type :optional}]])

See https://cloud.google.com/dataflow/java-sdk/JavaDoc/com/google/cloud/dataflow/sdk/transforms/join/CoGroupByKey and for a different approach to joins see join-by

Available options:

  • :checkpoint => Given a path, will store the resulting pcoll at this path in edn to facilitate dev/debug.
  • :coder => Uses a specific Coder for the results of this transform. Usually defaults to some form of nippy-coder.
  • :collector => A collector fn to apply after the cogroup. The signature is
(fn [[key [list-of-elts-from-pcoll1 list-of-elts-from-pcoll2]]] ...)
  • :name => Adds a name to the Transform.
Takes a specification of the join between pcolls and returns a PCollection of KVs (unless a :collector fn is given) with values being list of values corresponding to the key-fn. The specification is a list of triple [pcoll f options].

  - pcoll is a pcoll on which to join
  - f is a joining function, used to produce the keys on which to join. Can be nil if the coll is already made up of KVs
  - options is a map configuring each sides of the join

 Only one option is supported for now in join:

  - :type -> :optional or :required to select between left and right join. Defaults to :optional

Example:
```
(ds/cogroup-by {:name :my-cogroup-by
                :collector (fn [[key [list-of-elts-from-pcoll1 list-of-elts-from-pcoll2]]]
                               [key (concat list-of-elts-from-pcoll1 list-of-elts-from-pcoll2)])}
               [[pcoll1 :id {:type :required}]
                [pcoll2 (fn [elt] (:foreign-key elt)) {:type :optional}]])

```
See https://cloud.google.com/dataflow/java-sdk/JavaDoc/com/google/cloud/dataflow/sdk/transforms/join/CoGroupByKey and for a different approach to joins see [[join-by]]

Available options:

  - :checkpoint => Given a path, will store the resulting pcoll at this path in edn to facilitate dev/debug.
  - :coder => Uses a specific Coder for the results of this transform. Usually defaults to some form of nippy-coder.
  - :collector => A collector fn to apply after the cogroup. The signature is
```
(fn [[key [list-of-elts-from-pcoll1 list-of-elts-from-pcoll2]]] ...)
```
  - :name => Adds a name to the Transform.
sourceraw docstring

cogroup-by-schemaclj

source

cogroup-by-transformclj

(cogroup-by-transform {:keys [collector] :as options})
source

cogroup-transformclj

(cogroup-transform {:keys [join-nil?] nam :name :as options})
source

combineclj

(combine f pcoll)
(combine f {:keys [coder key-coder value-coder] :as options} pcoll)

Applies a CombineFn or a Clojure function with equivalent arities to the PCollection of KVs.

See https://cloud.google.com/dataflow/java-sdk/JavaDoc/com/google/cloud/dataflow/sdk/transforms/Combine

Available options:

  • :as-singleton-view => The transform returns a PCollectionView whose elements are the result of combining elements per-window in the input PCollection.
  • :checkpoint => Given a path, will store the resulting pcoll at this path in edn to facilitate dev/debug.
  • :coder => Uses a specific Coder for the results of this transform. Usually defaults to some form of nippy-coder.
  • :fanout => Uses an intermediate node to combine parts of the data to reduce load on the final global combine step. Can be either an integer or a fn from key to integer (for combine-by-key scope).
  • :scope => Specifies the combiner scope of application | One of [:global :per-key] | Defaults to :global
  • :without-defaults => Boolean indicating if the transform should attempt to provide a default value in the case of empty input.
Applies a CombineFn or a Clojure function with equivalent arities to the PCollection of KVs.

See https://cloud.google.com/dataflow/java-sdk/JavaDoc/com/google/cloud/dataflow/sdk/transforms/Combine

Available options:

  - :as-singleton-view => The transform returns a PCollectionView whose elements are the result of combining elements per-window in the input PCollection.
  - :checkpoint => Given a path, will store the resulting pcoll at this path in edn to facilitate dev/debug.
  - :coder => Uses a specific Coder for the results of this transform. Usually defaults to some form of nippy-coder.
  - :fanout => Uses an intermediate node to combine parts of the data to reduce load on the final global combine step. Can be either an integer or a fn from key to integer (for combine-by-key scope).
  - :scope => Specifies the combiner scope of application | One of [:global :per-key] | Defaults to :global
  - :without-defaults => Boolean indicating if the transform should attempt to provide a default value in the case of empty input.
sourceraw docstring

combine-byclj

(combine-by key-fn f pcoll)
(combine-by key-fn f options pcoll)

Shortcut to easily group values in a PColl by a function and combine all the values by key according to a combiner fn. Returns a PCollection of KVs.

Example:

;; Returns a pcoll of two KVs, with false and true as keys, and the sum of even? and odd? numbers as values
(->> pcoll
     (ds/combine-by even? (ds/sum-fn) {:name :my-combine-by}))

See https://cloud.google.com/dataflow/java-sdk/JavaDoc/com/google/cloud/dataflow/sdk/transforms/Combine and combine-fn for options about creating a combiner function (combine-fn is applied on the given clojure fn if necessary, you do not need to call it yourself)

Available options:

  • :checkpoint => Given a path, will store the resulting pcoll at this path in edn to facilitate dev/debug.
  • :coder => Uses a specific Coder for the results of this transform. Usually defaults to some form of nippy-coder.
  • :fanout => Uses an intermediate node to combine parts of the data to reduce load on the final global combine step. Can be either an integer or a fn from key to integer (for combine-by-key scope).
  • :key-coder => Coder to be used for encoding keys in the resulting KV PColl.
  • :value-coder => Coder to be used for encoding values in the resulting KV PColl.
  • :without-defaults => Boolean indicating if the transform should attempt to provide a default value in the case of empty input.
Shortcut to easily group values in a PColl by a function and combine all the values by key according to a combiner fn. Returns a PCollection of KVs.

Example:
```
;; Returns a pcoll of two KVs, with false and true as keys, and the sum of even? and odd? numbers as values
(->> pcoll
     (ds/combine-by even? (ds/sum-fn) {:name :my-combine-by}))
```

See https://cloud.google.com/dataflow/java-sdk/JavaDoc/com/google/cloud/dataflow/sdk/transforms/Combine and [[combine-fn]] for options about creating a combiner function (combine-fn is applied on the given clojure fn if necessary, you do not need to call it yourself)

Available options:

  - :checkpoint => Given a path, will store the resulting pcoll at this path in edn to facilitate dev/debug.
  - :coder => Uses a specific Coder for the results of this transform. Usually defaults to some form of nippy-coder.
  - :fanout => Uses an intermediate node to combine parts of the data to reduce load on the final global combine step. Can be either an integer or a fn from key to integer (for combine-by-key scope).
  - :key-coder => Coder to be used for encoding keys in the resulting KV PColl.
  - :value-coder => Coder to be used for encoding values in the resulting KV PColl.
  - :without-defaults => Boolean indicating if the transform should attempt to provide a default value in the case of empty input.
sourceraw docstring

combine-fnclj

(combine-fn reducef)
(combine-fn reducef extractf)
(combine-fn reducef extractf combinef)
(combine-fn reducef extractf combinef initf)
(combine-fn reducef extractf combinef initf output-coder)
(combine-fn reducef extractf combinef initf output-coder acc-coder)

Returns a CombineFn instance from given args. See https://cloud.google.com/dataflow/java-sdk/JavaDoc/com/google/cloud/dataflow/sdk/transforms/Combine.CombineFn.html

Arguments in order:

  • reducef: adds element to accumulator: fn of two arguments, returns updated accumulator
(fn [acc elt] (assoc acc (ds/key elt) (ds/val elt)))
  • extractf: fn taking a single accumulator as arg and returning the final result. Defaults to identity
  • combinef: fn taking a variable number of accumulators and returning a single merged accumulator. Defaults to using the reduce fn
(fn [& accs] (apply merge accs))
  • initf: fn of 0 args, returns empty accumulator. Defaults to reduce fn with no args
(fn [] {})
  • output-coder: coder for the resulting PCollection. Defaults to nippy-coder
  • acc-coder: coder for the accumulator. Defaults to nippy-coder

This function is reminiscent of the reducers api. In has sensible defaults in order to reuse existing functions. For example, this a a perfectly valid combine-fn that sums all numbers in a pcoll:

(combine-fn +)
Returns a CombineFn instance from given args. See https://cloud.google.com/dataflow/java-sdk/JavaDoc/com/google/cloud/dataflow/sdk/transforms/Combine.CombineFn.html

Arguments in order:

- reducef: adds element to accumulator: fn of two arguments, returns updated accumulator
```
(fn [acc elt] (assoc acc (ds/key elt) (ds/val elt)))
```
- extractf: fn taking a single accumulator as arg and returning the final result. Defaults to identity
- combinef: fn taking a variable number of accumulators and returning a single merged accumulator. Defaults to using the reduce fn
```
(fn [& accs] (apply merge accs))
```
- initf: fn of 0 args, returns empty accumulator. Defaults to reduce fn with no args
```
(fn [] {})
```
- output-coder: coder for the resulting PCollection. Defaults to nippy-coder
- acc-coder: coder for the accumulator. Defaults to nippy-coder


This function is reminiscent of the reducers api. In has sensible defaults in order to reuse existing functions. For example, this a a perfectly valid combine-fn that sums all numbers in a pcoll:
```
(combine-fn +)
```
sourceraw docstring

combine-schemaclj

source

compression-type-enumclj

source

contextclj

(context)
In the context of a ParDo, contains the corresponding Context object.
See https://cloud.google.com/dataflow/java-sdk/JavaDoc/com/google/cloud/dataflow/sdk/transforms/DoFn.ProcessContext.html
sourceraw docstring

count-fnclj

(count-fn &
          {:keys [mapper predicate]
           :or {mapper (fn [_] 1) predicate (constantly true)}})
source

create-timestampclj

(create-timestamp)
source

dconcatclj

(dconcat options & colls)

Returns a single PCollection containing all the given pcolls. Accepts an option map as an optional first arg.

Example:

(ds/concat pcoll1 pcoll2)
(ds/concat {:name :concat-node} pcoll1 pcoll2)
Returns a single PCollection containing all the given pcolls. Accepts an option map as an optional first arg.

Example:
```
(ds/concat pcoll1 pcoll2)
(ds/concat {:name :concat-node} pcoll1 pcoll2)
```
sourceraw docstring

ddistinctclj

(ddistinct pcoll)
(ddistinct options pcoll)

Removes duplicate element from PCollection.

See https://cloud.google.com/dataflow/java-sdk/JavaDoc/com/google/cloud/dataflow/sdk/transforms/RemoveDuplicates.html

Example:

(ds/distinct pcoll)

Available options:

  • :checkpoint => Given a path, will store the resulting pcoll at this path in edn to facilitate dev/debug.
  • :coder => Uses a specific Coder for the results of this transform. Usually defaults to some form of nippy-coder.
Removes duplicate element from PCollection.

See https://cloud.google.com/dataflow/java-sdk/JavaDoc/com/google/cloud/dataflow/sdk/transforms/RemoveDuplicates.html

Example:
```
(ds/distinct pcoll)
```

Available options:

  - :checkpoint => Given a path, will store the resulting pcoll at this path in edn to facilitate dev/debug.
  - :coder => Uses a specific Coder for the results of this transform. Usually defaults to some form of nippy-coder.
sourceraw docstring

dfilterclj

(dfilter pred pcoll)
(dfilter f options pcoll)

Returns a PCollection that only contains the items for which (pred item) returns true.

Example:

    (ds/filter even? foo)
(ds/filter (fn [x] (even? (* x x))) foo)

Available options:

  • :checkpoint => Given a path, will store the resulting pcoll at this path in edn to facilitate dev/debug.
  • :coder => Uses a specific Coder for the results of this transform. Usually defaults to some form of nippy-coder.
  • :name => Adds a name to the Transform.
  • :side-inputs => Adds a map of PCollectionViews as side inputs to the underlying ParDo Transform. They can be accessed there by key in the return of side-inputs fn.
  • :side-outputs => Defines as a seq of keywords the output tags for the underlying ParDo Transform. The map fn should return a map with keys set to the same set of keywords.
  • :without-coercion-to-clj => Avoids coercing Dataflow types to Clojure, like KV. Coercion will happen by default
Returns a PCollection that only contains the items for which (pred item)
returns true.

  Example:
```
    (ds/filter even? foo)
(ds/filter (fn [x] (even? (* x x))) foo)
```

Available options:

  - :checkpoint => Given a path, will store the resulting pcoll at this path in edn to facilitate dev/debug.
  - :coder => Uses a specific Coder for the results of this transform. Usually defaults to some form of nippy-coder.
  - :name => Adds a name to the Transform.
  - :side-inputs => Adds a map of PCollectionViews as side inputs to the underlying ParDo Transform. They can be accessed there by key in the return of side-inputs fn.
  - :side-outputs => Defines as a seq of keywords the output tags for the underlying ParDo Transform. The map fn should return a map with keys set to the same set of keywords.
  - :without-coercion-to-clj => Avoids coercing Dataflow types to Clojure, like KV. Coercion will happen by default
sourceraw docstring

dflattenclj

(dflatten pcoll)
(dflatten options pcoll)

Returns a single Pcollection containing all the pcolls in the given pcolls iterable.

See https://cloud.google.com/dataflow/java-sdk/JavaDoc/com/google/cloud/dataflow/sdk/transforms/Flatten.html

Example:

(ds/flatten [pcoll1 pcoll2 pcoll3])
Returns a single Pcollection containing all the pcolls in the given pcolls iterable.

See https://cloud.google.com/dataflow/java-sdk/JavaDoc/com/google/cloud/dataflow/sdk/transforms/Flatten.html

Example:
```
(ds/flatten [pcoll1 pcoll2 pcoll3])
```
sourceraw docstring

dfrequenciesclj

(dfrequencies pcoll)
(dfrequencies options pcoll)

Returns the frequency of each unique element of the input PCollection.

See https://cloud.google.com/dataflow/java-sdk/JavaDoc/com/google/cloud/dataflow/sdk/transforms/Count.html

Example:

(ds/frequencies pcoll)

Available options:

  • :checkpoint => Given a path, will store the resulting pcoll at this path in edn to facilitate dev/debug.
  • :coder => Uses a specific Coder for the results of this transform. Usually defaults to some form of nippy-coder.
Returns the frequency of each unique element of the input PCollection.

See https://cloud.google.com/dataflow/java-sdk/JavaDoc/com/google/cloud/dataflow/sdk/transforms/Count.html

Example:
```
(ds/frequencies pcoll)
```

Available options:

  - :checkpoint => Given a path, will store the resulting pcoll at this path in edn to facilitate dev/debug.
  - :coder => Uses a specific Coder for the results of this transform. Usually defaults to some form of nippy-coder.
sourceraw docstring

dgroup-byclj

(dgroup-by f pcoll)
(dgroup-by f {:keys [key-coder value-coder coder] :as options} pcoll)

Groups a Pcollection by the result of calling (f item) for each item.

This produces a sequence of KV values, similar to using seq with a map. Each value will be a list of the values that match key.

Example:

    (ds/group-by :a foo)
(ds/group-by count foo)

Available options:

  • :checkpoint => Given a path, will store the resulting pcoll at this path in edn to facilitate dev/debug.
  • :coder => Uses a specific Coder for the results of this transform. Usually defaults to some form of nippy-coder.
Groups a Pcollection by the result of calling (f item) for each item.

This produces a sequence of KV values, similar to using seq with a
map. Each value will be a list of the values that match key.

  Example:
```
    (ds/group-by :a foo)
(ds/group-by count foo)
```

Available options:

  - :checkpoint => Given a path, will store the resulting pcoll at this path in edn to facilitate dev/debug.
  - :coder => Uses a specific Coder for the results of this transform. Usually defaults to some form of nippy-coder.
sourceraw docstring

didentityclj

(didentity c)

Identity function for use in a ParDo

Identity function for use in a ParDo
sourceraw docstring

djuxtclj

(djuxt & fns)

Creates a CombineFn that applies multiple combiners in one go. Produces a vector of combined results. 'sibling fusion' in Dataflow optimizes multiple independant combiners in the same way, but you might find juxt more concise.

Only works with functions created with combine-fn or native clojure functions, and not with native Dataflow CombineFn

Example:

(ds/combine (ds/juxt + *) pcoll)
Creates a CombineFn that applies multiple combiners in one go. Produces a vector of combined results.
'sibling fusion' in Dataflow optimizes multiple independant combiners in the same way, but you might find juxt more concise.

Only works with functions created with combine-fn or native clojure functions, and not with native Dataflow CombineFn

Example:
```
(ds/combine (ds/juxt + *) pcoll)
```
sourceraw docstring

dkeyclj

(dkey elt)

Returns the key part of a KV or MapEntry.

Returns the key part of a KV or MapEntry.
sourceraw docstring

dmapclj

(dmap f pcoll)
(dmap f options pcoll)

Returns a PCollection of f applied to every item in the source PCollection. Function f should be a function of one argument.

Example:

(ds/map inc foo)
(ds/map (fn [x] (* x x)) foo)

Note: Unlike clojure.core/map, datasplash.api/map takes only one PCollection.

Available options:

  • :checkpoint => Given a path, will store the resulting pcoll at this path in edn to facilitate dev/debug.
  • :coder => Uses a specific Coder for the results of this transform. Usually defaults to some form of nippy-coder.
  • :name => Adds a name to the Transform.
  • :side-inputs => Adds a map of PCollectionViews as side inputs to the underlying ParDo Transform. They can be accessed there by key in the return of side-inputs fn.
  • :side-outputs => Defines as a seq of keywords the output tags for the underlying ParDo Transform. The map fn should return a map with keys set to the same set of keywords.
  • :without-coercion-to-clj => Avoids coercing Dataflow types to Clojure, like KV. Coercion will happen by default
Returns a PCollection of f applied to every item in the source PCollection.
Function f should be a function of one argument.

Example:
```
(ds/map inc foo)
(ds/map (fn [x] (* x x)) foo)
```

Note: Unlike clojure.core/map, datasplash.api/map takes only one PCollection.

Available options:

  - :checkpoint => Given a path, will store the resulting pcoll at this path in edn to facilitate dev/debug.
  - :coder => Uses a specific Coder for the results of this transform. Usually defaults to some form of nippy-coder.
  - :name => Adds a name to the Transform.
  - :side-inputs => Adds a map of PCollectionViews as side inputs to the underlying ParDo Transform. They can be accessed there by key in the return of side-inputs fn.
  - :side-outputs => Defines as a seq of keywords the output tags for the underlying ParDo Transform. The map fn should return a map with keys set to the same set of keywords.
  - :without-coercion-to-clj => Avoids coercing Dataflow types to Clojure, like KV. Coercion will happen by default
sourceraw docstring

dmapcatclj

(dmapcat f pcoll)
(dmapcat f options pcoll)

Returns the result of applying concat, or flattening, the result of applying f to each item in the PCollection. Thus f should return a Clojure or Java collection.

Example:

(ds/mapcat (fn [x] [(dec x) x (inc x)]) foo)

Available options:

  • :checkpoint => Given a path, will store the resulting pcoll at this path in edn to facilitate dev/debug.
  • :coder => Uses a specific Coder for the results of this transform. Usually defaults to some form of nippy-coder.
  • :name => Adds a name to the Transform.
  • :side-inputs => Adds a map of PCollectionViews as side inputs to the underlying ParDo Transform. They can be accessed there by key in the return of side-inputs fn.
  • :side-outputs => Defines as a seq of keywords the output tags for the underlying ParDo Transform. The map fn should return a map with keys set to the same set of keywords.
  • :without-coercion-to-clj => Avoids coercing Dataflow types to Clojure, like KV. Coercion will happen by default
Returns the result of applying concat, or flattening, the result of applying
f to each item in the PCollection. Thus f should return a Clojure or Java collection.

Example:
```
(ds/mapcat (fn [x] [(dec x) x (inc x)]) foo)
```

Available options:

  - :checkpoint => Given a path, will store the resulting pcoll at this path in edn to facilitate dev/debug.
  - :coder => Uses a specific Coder for the results of this transform. Usually defaults to some form of nippy-coder.
  - :name => Adds a name to the Transform.
  - :side-inputs => Adds a map of PCollectionViews as side inputs to the underlying ParDo Transform. They can be accessed there by key in the return of side-inputs fn.
  - :side-outputs => Defines as a seq of keywords the output tags for the underlying ParDo Transform. The map fn should return a map with keys set to the same set of keywords.
  - :without-coercion-to-clj => Avoids coercing Dataflow types to Clojure, like KV. Coercion will happen by default
sourceraw docstring

dofnclj

(dofn f)
(dofn f
      {:keys [start-bundle finish-bundle without-coercion-to-clj side-inputs
              side-outputs name window-fn]
       :or {start-bundle (fn [_] nil)
            finish-bundle (fn [_] nil)
            window-fn (fn [_] nil)}
       :as opts})

Returns an Instance of DoFn from given Clojure fn

Returns an Instance of DoFn from given Clojure fn
sourceraw docstring

dpartition-byclj

(dpartition-by f num pcoll)
(dpartition-by f num options pcoll)

Partitions the content of pcoll according to the PartitionFn. See https://cloud.google.com/dataflow/java-sdk/JavaDoc/com/google/cloud/dataflow/sdk/transforms/Partition. The partition function is given two arguments: the current element and the number of partitions.

Available options:

  • :checkpoint => Given a path, will store the resulting pcoll at this path in edn to facilitate dev/debug.
  • :coder => Uses a specific Coder for the results of this transform. Usually defaults to some form of nippy-coder.
  • :name => Adds a name to the Transform.
Partitions the content of pcoll according to the PartitionFn.
See https://cloud.google.com/dataflow/java-sdk/JavaDoc/com/google/cloud/dataflow/sdk/transforms/Partition.
The partition function is given two arguments: the current element and the number of partitions.

Available options:

  - :checkpoint => Given a path, will store the resulting pcoll at this path in edn to facilitate dev/debug.
  - :coder => Uses a specific Coder for the results of this transform. Usually defaults to some form of nippy-coder.
  - :name => Adds a name to the Transform.
sourceraw docstring

dtrycljmacro

(dtry & body)

Just like try except it wraps the body in a safe-exec

Just like try except it wraps the body in a safe-exec
sourceraw docstring

dvalclj

(dval elt)

Returns the value part of a KV or MapEntry.

Returns the value part of a KV or MapEntry.
sourceraw docstring

empty-match-treatment-enumclj

source

filename-policyclj

(filename-policy options)

Create a filename-policy object

Examples:

(ds/filename-policy {:file-name "file"
                     :prefix "gs://toto/"
                     :suffix "json"})

;; with custom functions
(require '[clj-time.format :as tf])

(defn windowed-fn
  [shard-number shard-count ^BoundedWindow window _]
  (let [timestamp (tf/unparse (:date-hour-minute tf/formatters)
                              (.start window))]
    (str file-name "-" shard-number "of" shard-count "-" timestamp "." "txt")))

(ds/filename-policy {:windowed-fn windowed-fn
                     :unwindowed-fn (fn [_ _ _] "file.txt")})

Available options:

  • :file-name => set the default filename prefix (only used when no custom function is set)
  • :suffix => set the default filename suffix (only used when no custom function is set)
  • :unwindowed-fn => override the filename function for unwindowed PCollection
  • :windowed-fn => override the filename function for windowed PCollection
Create a filename-policy object

Examples:
```
(ds/filename-policy {:file-name "file"
                     :prefix "gs://toto/"
                     :suffix "json"})

;; with custom functions
(require '[clj-time.format :as tf])

(defn windowed-fn
  [shard-number shard-count ^BoundedWindow window _]
  (let [timestamp (tf/unparse (:date-hour-minute tf/formatters)
                              (.start window))]
    (str file-name "-" shard-number "of" shard-count "-" timestamp "." "txt")))

(ds/filename-policy {:windowed-fn windowed-fn
                     :unwindowed-fn (fn [_ _ _] "file.txt")})
```

Available options:

  - :file-name => set the default filename prefix (only used when no custom function is set)
  - :suffix => set the default filename suffix (only used when no custom function is set)
  - :unwindowed-fn => override the filename function for unwindowed PCollection
  - :windowed-fn => override the filename function for windowed PCollection
sourceraw docstring

filename-schemaclj

source

filter-fnclj

(filter-fn f)

Returns a function that corresponds to a Clojure filter operation inside a ParDo

Returns a function that corresponds to a Clojure filter operation inside a ParDo
sourceraw docstring

fixed-windowsclj

(fixed-windows width pcoll)
(fixed-windows width options pcoll)

Apply a fixed window input PCollection (useful for unbounded PCollections).

See https://cloud.google.com/dataflow/model/windowing#setting-fixed-time-windows

Example:

(require '[clj-time.core :as time])
(ds/fixed-windows (time/minutes 30) pcoll)

Available options:

  • :accumulate-mode => Accumulate mode when a Trigger is fired (accumulate or discard)
  • :checkpoint => Given a path, will store the resulting pcoll at this path in edn to facilitate dev/debug.
  • :coder => Uses a specific Coder for the results of this transform. Usually defaults to some form of nippy-coder.
  • :name => Adds a name to the Transform.
  • :trigger => Adds a Trigger to the Window.
  • :with-allowed-lateness => Allow late data. Mandatory for custom trigger
Apply a fixed window input PCollection (useful for unbounded PCollections).

See https://cloud.google.com/dataflow/model/windowing#setting-fixed-time-windows

Example:
```
(require '[clj-time.core :as time])
(ds/fixed-windows (time/minutes 30) pcoll)
```

Available options:

  - :accumulate-mode => Accumulate mode when a Trigger is fired (accumulate or discard)
  - :checkpoint => Given a path, will store the resulting pcoll at this path in edn to facilitate dev/debug.
  - :coder => Uses a specific Coder for the results of this transform. Usually defaults to some form of nippy-coder.
  - :name => Adds a name to the Transform.
  - :trigger => Adds a Trigger to the Window.
  - :with-allowed-lateness => Allow late data. Mandatory for custom trigger
sourceraw docstring

frequencies-fnclj

(frequencies-fn &
                {:keys [mapper predicate]
                 :or {mapper identity predicate (constantly true)}})
source

from-ednclj

source

generate-inputclj

(generate-input coll p)
(generate-input coll options p)

Generates a pcollection from the given collection. See https://cloud.google.com/dataflow/java-sdk/JavaDoc/com/google/cloud/dataflow/sdk/transforms/Create.html

Example:

(ds/generate-input (range 0 1000) pipeline)

Available options:

  • :checkpoint => Given a path, will store the resulting pcoll at this path in edn to facilitate dev/debug.
  • :coder => Uses a specific Coder for the results of this transform. Usually defaults to some form of nippy-coder.
Generates a pcollection from the given collection.
See https://cloud.google.com/dataflow/java-sdk/JavaDoc/com/google/cloud/dataflow/sdk/transforms/Create.html

Example:
```
(ds/generate-input (range 0 1000) pipeline)
```

Available options:

  - :checkpoint => Given a path, will store the resulting pcoll at this path in edn to facilitate dev/debug.
  - :coder => Uses a specific Coder for the results of this transform. Usually defaults to some form of nippy-coder.
sourceraw docstring

get-element-from-contextclj

(get-element-from-context c)

Get element from context in ParDo while applying relevent Clojure type conversions

Get element from context in ParDo while applying relevent Clojure type conversions
sourceraw docstring

get-hostnameclj

source

get-pipeline-configurationclj

(get-pipeline-configuration)

Returns a map corresponding to the bean of configuration the current pipeline was run with. Must be called inside a function wrapping a ParDo, e.g. ds/map or ds/mapcat.

Returns a map corresponding to the bean of configuration the current pipeline was run with. Must be called inside a function wrapping a ParDo, e.g. ds/map or ds/mapcat.
sourceraw docstring

get-pipeline-optionsclj

(get-pipeline-options pipeline)

Returns a map corresponding to the bean of options the given pipeline was built with. Must be called on a PipelineWithOptions object produced by make-pipeline.

Returns a map corresponding to the bean of options the given pipeline was built with. Must be called on a `PipelineWithOptions` object produced by `make-pipeline`.
sourceraw docstring

greedy-emit-cogbkresultclj

(greedy-emit-cogbkresult raw-values size idx tag context)
source

greedy-read-cogbkresultclj

(greedy-read-cogbkresult raw-values tag)
source

group-by-keyclj

(group-by-key pcoll)
(group-by-key {:keys [key-coder value-coder coder] :as options} pcoll)

Takes a KV PCollection as input and returns a KV PCollection as output of K to list of V.

See https://cloud.google.com/dataflow/java-sdk/JavaDoc/com/google/cloud/dataflow/sdk/transforms/GroupByKey.html

Takes a KV PCollection as input and returns a KV PCollection as output of K to list of V.

See https://cloud.google.com/dataflow/java-sdk/JavaDoc/com/google/cloud/dataflow/sdk/transforms/GroupByKey.html
sourceraw docstring

interface->classclj

(interface->class itf)
source

job-name-templateclj

(job-name-template tpl args)
source

join-byclj

(join-by options specs)
(join-by {nam :name :as options} specs join-fn)

Takes a specification of the join between pcolls and returns a PCollection of the cartesian product (only difference from cogroup-by) of all elements joined according to the spec. The specification is a list of triple [pcoll f options].

  • pcoll is a pcoll on which to join
  • f is a joining function, used to produce the keys on which to join. Can be nil if the coll is already made up of KVs
  • options is a map configuring each sides of the join

Only one option is supported for now in join:

  • :type -> :optional or :required to select between left and right join. Defaults to :optional

Example:

(ds/join-by {:name :my-join-by
                :collector (fn [elt1 elt2]
                               (merge elt1 elt2))}
               [[pcoll1 :id {:type :required}]
                [pcoll2 (fn [elt] (:foreign-key elt)) {:type :optional}]])

See https://cloud.google.com/dataflow/java-sdk/JavaDoc/com/google/cloud/dataflow/sdk/transforms/join/CoGroupByKey and for a different approach to joins see cogroup-by

Available options:

  • :checkpoint => Given a path, will store the resulting pcoll at this path in edn to facilitate dev/debug.
  • :coder => Uses a specific Coder for the results of this transform. Usually defaults to some form of nippy-coder.
  • :collector => A collector fn to apply after the join. The signature is like map, one element for each pcoll in the join.
  • :name => Adds a name to the Transform.
Takes a specification of the join between pcolls and returns a PCollection of the cartesian product (only difference from cogroup-by) of all elements joined according to the spec. The specification is a list of triple [pcoll f options].

  - pcoll is a pcoll on which to join
  - f is a joining function, used to produce the keys on which to join. Can be nil if the coll is already made up of KVs
  - options is a map configuring each sides of the join

 Only one option is supported for now in join:

  - :type -> :optional or :required to select between left and right join. Defaults to :optional

Example:
```
(ds/join-by {:name :my-join-by
                :collector (fn [elt1 elt2]
                               (merge elt1 elt2))}
               [[pcoll1 :id {:type :required}]
                [pcoll2 (fn [elt] (:foreign-key elt)) {:type :optional}]])

```
See https://cloud.google.com/dataflow/java-sdk/JavaDoc/com/google/cloud/dataflow/sdk/transforms/join/CoGroupByKey and for a different approach to joins see [[cogroup-by]]

Available options:

  - :checkpoint => Given a path, will store the resulting pcoll at this path in edn to facilitate dev/debug.
  - :coder => Uses a specific Coder for the results of this transform. Usually defaults to some form of nippy-coder.
  - :collector => A collector fn to apply after the join. The signature is like map, one element for each pcoll in the join.
  - :name => Adds a name to the Transform.
sourceraw docstring

join-by-schemaclj

source

json-reader-schemaclj

source

json-writer-schemaclj

source

kv->cljclj

(kv->clj kv)

Coerce from KV to Clojure MapEntry

Coerce from KV to Clojure MapEntry
sourceraw docstring

kv-coder-schemaclj

source

make-group-specsclj

(make-group-specs specs)
source

make-keyed-pcollection-tupleclj

(make-keyed-pcollection-tuple pcolls)
source

make-kvclj

(make-kv kv)
(make-kv k v)

Returns a KV object from the given arg(s), either [k v] or a MapEntry or seq of two elements.

Returns a KV object from the given arg(s), either [k v] or a MapEntry or seq of two elements.
sourceraw docstring

make-kv-coderclj

(make-kv-coder)
(make-kv-coder k-coder v-coder)

Returns an instance of a KvCoder using by default nippy for serialization.

Returns an instance of a KvCoder using by default nippy for serialization.
sourceraw docstring

make-nippy-coderclj

(make-nippy-coder)

Returns an instance of a CustomCoder using nippy for serialization

Returns an instance of a CustomCoder using nippy for serialization
sourceraw docstring

make-partition-mappingclj

(make-partition-mapping coll)
source

make-pipelinecljmacro

(make-pipeline & args)

Builds a Pipeline from command lines args and configuration. Also accepts a jobNameTemplate param which is a string in which the following var are interpolated:

  • %A -> Application name
  • %U -> User name
  • %T -> Timestamp

It means the template %A-%U-%T is equivalent to the default jobName

Builds a Pipeline from command lines args and configuration. Also accepts a jobNameTemplate param which is a string in which the following var are interpolated:

  - %A -> Application name
  - %U -> User name
  - %T -> Timestamp

It means the template %A-%U-%T is equivalent to the default jobName
sourceraw docstring

make-pipeline*clj

(make-pipeline* arg)
(make-pipeline* arg1 arg2)
(make-pipeline* itf str-args kw-args)
source

map-fnclj

(map-fn f)

Returns a function that corresponds to a Clojure map operation inside a ParDo

Returns a function that corresponds to a Clojure map operation inside a ParDo
sourceraw docstring

map-kvclj

(map-kv f pcoll)
(map-kv f options pcoll)

Returns a KV PCollection of f applied to every item in the source PCollection. Function f should be a function of one argument and return seq of keys/values.

Example:

(ds/map-kv (fn [{:keys [month revenue]}] [month revenue]) foo)

Note: Unlike clojure.core/map, datasplash.api/map-kv takes only one PCollection.

Available options:

  • :checkpoint => Given a path, will store the resulting pcoll at this path in edn to facilitate dev/debug.
  • :coder => Uses a specific Coder for the results of this transform. Usually defaults to some form of nippy-coder.
  • :name => Adds a name to the Transform.
  • :side-inputs => Adds a map of PCollectionViews as side inputs to the underlying ParDo Transform. They can be accessed there by key in the return of side-inputs fn.
  • :side-outputs => Defines as a seq of keywords the output tags for the underlying ParDo Transform. The map fn should return a map with keys set to the same set of keywords.
  • :without-coercion-to-clj => Avoids coercing Dataflow types to Clojure, like KV. Coercion will happen by default
Returns a KV PCollection of f applied to every item in the source PCollection.
Function f should be a function of one argument and return seq of keys/values.

Example:
```
(ds/map-kv (fn [{:keys [month revenue]}] [month revenue]) foo)
```

Note: Unlike clojure.core/map, datasplash.api/map-kv takes only one PCollection.

Available options:

  - :checkpoint => Given a path, will store the resulting pcoll at this path in edn to facilitate dev/debug.
  - :coder => Uses a specific Coder for the results of this transform. Usually defaults to some form of nippy-coder.
  - :name => Adds a name to the Transform.
  - :side-inputs => Adds a map of PCollectionViews as side inputs to the underlying ParDo Transform. They can be accessed there by key in the return of side-inputs fn.
  - :side-outputs => Defines as a seq of keywords the output tags for the underlying ParDo Transform. The map fn should return a map with keys set to the same set of keywords.
  - :without-coercion-to-clj => Avoids coercing Dataflow types to Clojure, like KV. Coercion will happen by default
sourceraw docstring

map-kv-fnclj

(map-kv-fn f)

Returns a function that corresponds to a Clojure map operation inside a ParDo coercing to KV the return

Returns a function that corresponds to a Clojure map operation inside a ParDo coercing to KV the return
sourceraw docstring

map-opclj

(map-op transform {:keys [isomorph? kv?] :as base-options})
source

mapcat-fnclj

(mapcat-fn f)

Returns a function that corresponds to a Clojure mapcat operation inside a ParDo

Returns a function that corresponds to a Clojure mapcat operation inside a ParDo
sourceraw docstring

max-fnclj

(max-fn &
        {:keys [mapper predicate]
         :or {mapper identity predicate (constantly true)}})
source

mean-fnclj

(mean-fn &
         {:keys [mapper predicate]
          :or {mapper identity predicate (constantly true)}})
source

min-fnclj

(min-fn &
        {:keys [mapper predicate]
         :or {mapper identity predicate (constantly true)}})
source

named-schemaclj

source

output-to-contextclj

(output-to-context context result)
(output-to-context tx context result)
source

output-value!clj

(output-value! context entity bindings)
source

pardoclj

(pardo f pcoll)
(pardo f options pcoll)

Uses a raw pardo-fn as a ppardo transform Function f should be a function of one argument, the Pardo$Context object.

Available options:

  • :checkpoint => Given a path, will store the resulting pcoll at this path in edn to facilitate dev/debug.
  • :coder => Uses a specific Coder for the results of this transform. Usually defaults to some form of nippy-coder.
  • :name => Adds a name to the Transform.
  • :side-inputs => Adds a map of PCollectionViews as side inputs to the underlying ParDo Transform. They can be accessed there by key in the return of side-inputs fn.
  • :side-outputs => Defines as a seq of keywords the output tags for the underlying ParDo Transform. The map fn should return a map with keys set to the same set of keywords.
  • :without-coercion-to-clj => Avoids coercing Dataflow types to Clojure, like KV. Coercion will happen by default
Uses a raw pardo-fn as a ppardo transform
Function f should be a function of one argument, the Pardo$Context object.

Available options:

  - :checkpoint => Given a path, will store the resulting pcoll at this path in edn to facilitate dev/debug.
  - :coder => Uses a specific Coder for the results of this transform. Usually defaults to some form of nippy-coder.
  - :name => Adds a name to the Transform.
  - :side-inputs => Adds a map of PCollectionViews as side inputs to the underlying ParDo Transform. They can be accessed there by key in the return of side-inputs fn.
  - :side-outputs => Defines as a seq of keywords the output tags for the underlying ParDo Transform. The map fn should return a map with keys set to the same set of keywords.
  - :without-coercion-to-clj => Avoids coercing Dataflow types to Clojure, like KV. Coercion will happen by default
sourceraw docstring

pardo-fnclj

(pardo-fn f)

Returns a function that uses the raw ProcessContext from ParDo

Returns a function that uses the raw ProcessContext from ParDo
sourceraw docstring

pardo-schemaclj

source

partition-fnclj

(partition-fn f)

Returns a Partition.PartitionFn if possible

Returns a Partition.PartitionFn if possible
sourceraw docstring

pcolltuple->mapclj

(pcolltuple->map pcolltuple)
source

pt->>cljmacro

(pt->> nam input & body)

Creates and applies a single named PTransform from a sequence of transforms on a single PCollection. You can use it as you would use ->> in Clojure.

Example:

(ds/pt->> :transform-name input-pcollection
          (ds/map inc {:name :inc})
          (ds/filter even? {:name :even?}))
Creates and applies a single named PTransform from a sequence of transforms on a single PCollection. You can use it as you would use ->> in Clojure.

Example:
```
(ds/pt->> :transform-name input-pcollection
          (ds/map inc {:name :inc})
          (ds/filter even? {:name :even?}))
```
sourceraw docstring

pt-cond->>cljmacro

(pt-cond->> nam input & body)

Creates and applies a single named PTransform from a sequence of transforms on a single PCollection according to the results of the given predicates. You can use it as you would use cond->> in Clojure.

Example:

(ds/cond->> :transform-name input-pcollection
          (:do-inc? config) (ds/map inc {:name :inc})
          (:do-filter? config) (ds/filter even? {:name :even?}))
Creates and applies a single named PTransform from a sequence of transforms on a single PCollection according to the results of the given predicates. You can use it as you would use cond->> in Clojure.

Example:
```
(ds/cond->> :transform-name input-pcollection
          (:do-inc? config) (ds/map inc {:name :inc})
          (:do-filter? config) (ds/filter even? {:name :even?}))
```
sourceraw docstring

ptransformcljmacro

(ptransform nam input & body)

Generates a PTransform with the given name, apply signature and body. Should rarely by used in user code, see pt->> for the more general use case in application code.

Example (actual implementation of the group-by transform):

(ptransform
 :group-by
 [^PCollection pcoll]
 (->> pcoll
      (ds/with-keys f opts)
      (ds/group-by-key opts)))
Generates a PTransform with the given name, apply signature and body. Should rarely by used in user code, see [[pt->>]] for the more general use case in application code.

Example (actual implementation of the group-by transform):
```
(ptransform
 :group-by
 [^PCollection pcoll]
 (->> pcoll
      (ds/with-keys f opts)
      (ds/group-by-key opts)))
```
sourceraw docstring

read-edn-fileclj

(read-edn-file from p)
(read-edn-file from options p)

Reads a PCollection of edn strings from disk or Google Storage, with records separated by newlines.

See https://cloud.google.com/dataflow/java-sdk/JavaDoc/com/google/cloud/dataflow/sdk/io/TextIO.Read.html

Example:

(read-edn-file "gs://target/path" pcoll)

Available options:

  • :checkpoint => Given a path, will store the resulting pcoll at this path in edn to facilitate dev/debug.
  • :coder => Uses a specific Coder for the results of this transform. Usually defaults to some form of nippy-coder.
  • :compression-type => Choose compression type. :auto by default. | One of (:auto :bzip2 :gzip :zip :deflate :uncompressed)
  • :delimiter => Specify delimiter
  • :empty-match-treatment => Options for allowing or disallowing filepatterns that match no resources | One of (:allow :allow-if-wildcard :disallow)
  • :many-files => Hints that the filepattern specified matches a very large number of files.
  • :watch-new-files => watch if new files arrives and handle them in streaming
Reads a PCollection of edn strings from disk or Google Storage, with records separated by newlines.

See https://cloud.google.com/dataflow/java-sdk/JavaDoc/com/google/cloud/dataflow/sdk/io/TextIO.Read.html

Example:
```
(read-edn-file "gs://target/path" pcoll)
```

Available options:

  - :checkpoint => Given a path, will store the resulting pcoll at this path in edn to facilitate dev/debug.
  - :coder => Uses a specific Coder for the results of this transform. Usually defaults to some form of nippy-coder.
  - :compression-type => Choose compression type. :auto by default. | One of (:auto :bzip2 :gzip :zip :deflate :uncompressed)
  - :delimiter => Specify delimiter
  - :empty-match-treatment => Options for allowing or disallowing filepatterns that match no resources | One of (:allow :allow-if-wildcard :disallow)
  - :many-files => Hints that the filepattern specified matches a very large number of files.
  - :watch-new-files => watch if new files arrives and handle them in streaming
sourceraw docstring

read-json-fileclj

(read-json-file from p)
(read-json-file from {:keys [key-fn return-type] :as options} p)

Reads a PCollection of JSON strings from disk or Google Storage, with records separated by newlines.

See https://cloud.google.com/dataflow/java-sdk/JavaDoc/com/google/cloud/dataflow/sdk/io/TextIO.Read.html and https://github.com/dakrone/cheshire#decoding for details on options.

Example:

(read-json-file "gs://target/path" pcoll)

Available options:

  • :checkpoint => Given a path, will store the resulting pcoll at this path in edn to facilitate dev/debug.
  • :coder => Uses a specific Coder for the results of this transform. Usually defaults to some form of nippy-coder.
  • :compression-type => Choose compression type. :auto by default. | One of (:auto :bzip2 :gzip :zip :deflate :uncompressed)
  • :delimiter => Specify delimiter
  • :empty-match-treatment => Options for allowing or disallowing filepatterns that match no resources | One of (:allow :allow-if-wildcard :disallow)
  • :key-fn => Selects a policy for parsing map keys. If true, keywordizes the keys. If given a fn, uses it to transform each map key.
  • :many-files => Hints that the filepattern specified matches a very large number of files.
  • :return-type => Allows passing in a function to specify what kind of types to return.
  • :watch-new-files => watch if new files arrives and handle them in streaming
Reads a PCollection of JSON strings from disk or Google Storage, with records separated by newlines.

See https://cloud.google.com/dataflow/java-sdk/JavaDoc/com/google/cloud/dataflow/sdk/io/TextIO.Read.html and https://github.com/dakrone/cheshire#decoding for details on options.

Example:
```
(read-json-file "gs://target/path" pcoll)
```

Available options:

  - :checkpoint => Given a path, will store the resulting pcoll at this path in edn to facilitate dev/debug.
  - :coder => Uses a specific Coder for the results of this transform. Usually defaults to some form of nippy-coder.
  - :compression-type => Choose compression type. :auto by default. | One of (:auto :bzip2 :gzip :zip :deflate :uncompressed)
  - :delimiter => Specify delimiter
  - :empty-match-treatment => Options for allowing or disallowing filepatterns that match no resources | One of (:allow :allow-if-wildcard :disallow)
  - :key-fn => Selects a policy for parsing map keys. If true, keywordizes the keys. If given a fn, uses it to transform each map key.
  - :many-files => Hints that the filepattern specified matches a very large number of files.
  - :return-type => Allows passing in a function to specify what kind of types to return.
  - :watch-new-files => watch if new files arrives and handle them in streaming
sourceraw docstring

read-text-fileclj

(read-text-file from p)
(read-text-file from options p)

Reads a PCollection of Strings from disk or Google Storage, with records separated by newlines.

See https://cloud.google.com/dataflow/java-sdk/JavaDoc/com/google/cloud/dataflow/sdk/io/TextIO.Read.html

Example:

(read-text-file "gs://target/path" pcoll)

Available options:

  • :checkpoint => Given a path, will store the resulting pcoll at this path in edn to facilitate dev/debug.
  • :coder => Uses a specific Coder for the results of this transform. Usually defaults to some form of nippy-coder.
  • :compression-type => Choose compression type. :auto by default. | One of (:auto :bzip2 :gzip :zip :deflate :uncompressed)
  • :delimiter => Specify delimiter
  • :empty-match-treatment => Options for allowing or disallowing filepatterns that match no resources | One of (:allow :allow-if-wildcard :disallow)
  • :many-files => Hints that the filepattern specified matches a very large number of files.
  • :watch-new-files => watch if new files arrives and handle them in streaming
Reads a PCollection of Strings from disk or Google Storage, with records separated by newlines.

See https://cloud.google.com/dataflow/java-sdk/JavaDoc/com/google/cloud/dataflow/sdk/io/TextIO.Read.html

Example:
```
(read-text-file "gs://target/path" pcoll)
```

Available options:

  - :checkpoint => Given a path, will store the resulting pcoll at this path in edn to facilitate dev/debug.
  - :coder => Uses a specific Coder for the results of this transform. Usually defaults to some form of nippy-coder.
  - :compression-type => Choose compression type. :auto by default. | One of (:auto :bzip2 :gzip :zip :deflate :uncompressed)
  - :delimiter => Specify delimiter
  - :empty-match-treatment => Options for allowing or disallowing filepatterns that match no resources | One of (:allow :allow-if-wildcard :disallow)
  - :many-files => Hints that the filepattern specified matches a very large number of files.
  - :watch-new-files => watch if new files arrives and handle them in streaming
sourceraw docstring

required-nsclj

source

reverse-mapclj

(reverse-map m)
source

run-pipelineclj

(run-pipeline topology)

Run the computation for a given pipeline or PCollection.

Run the computation for a given pipeline or PCollection.
sourceraw docstring

safe-execcljmacro

(safe-exec & body)

Executes body while trying to sanely require missing ns if the runtime is not yet properly loaded for Clojure in distributed mode. Always wrap try block with this if you intend to eat every Exception produced.

(ds/map (fn [elt]
          (try
            (ds/safe-exec (dangerous-parse-fn elt))
            (catch Exception e
              (log/error e "parsing error"))))
        pcoll)
Executes body while trying to sanely require missing ns if the runtime is not yet properly loaded for Clojure in distributed mode. Always wrap try block with this if you intend to eat every Exception produced.
```
(ds/map (fn [elt]
          (try
            (ds/safe-exec (dangerous-parse-fn elt))
            (catch Exception e
              (log/error e "parsing error"))))
        pcoll)
```
sourceraw docstring

safe-exec-cfgcljmacro

(safe-exec-cfg config & body)

Like safe-exec, but takes a map as first argument containing the name of the ptransform for better error message

Like [[safe-exec]], but takes a map as first argument containing the name of the ptransform for better error message
sourceraw docstring

sampleclj

(sample size pcoll)
(sample size {:keys [scope] :as options} pcoll)

Takes samples of the elements in a PCollection, or samples of the values associated with each key in a PCollection of KVs.

See https://cloud.google.com/dataflow/java-sdk/JavaDoc/com/google/cloud/dataflow/sdk/transforms/Sample.html

Example:

    (ds/sample 100 {:scope :per-key} pcoll)

Available options:

  • :checkpoint => Given a path, will store the resulting pcoll at this path in edn to facilitate dev/debug.
  • :coder => Uses a specific Coder for the results of this transform. Usually defaults to some form of nippy-coder.
  • :scope => Scope given to the combinating operation. One of (:globally :per-key).
Takes samples of the elements in a PCollection, or samples of the values associated with each key in a PCollection of KVs.

See https://cloud.google.com/dataflow/java-sdk/JavaDoc/com/google/cloud/dataflow/sdk/transforms/Sample.html

  Example:
```
    (ds/sample 100 {:scope :per-key} pcoll)
```

Available options:

  - :checkpoint => Given a path, will store the resulting pcoll at this path in edn to facilitate dev/debug.
  - :coder => Uses a specific Coder for the results of this transform. Usually defaults to some form of nippy-coder.
  - :scope => Scope given to the combinating operation. One of (:globally :per-key).
sourceraw docstring

scoped-ops-schemaclj

source

select-enum-option-fnclj

(select-enum-option-fn option-name enum-map action)
source

session-windowsclj

(session-windows gap pcoll)
(session-windows gap options pcoll)

Apply a Session window to divide a PCollection (useful for unbounded PCollections).

See https://cloud.google.com/dataflow/model/windowing#setting-session-windows

Example:

(require '[clj-time.core :as time])
(ds/session-windows (time/minutes 10) pcoll)

Available options:

  • :accumulate-mode => Accumulate mode when a Trigger is fired (accumulate or discard)
  • :checkpoint => Given a path, will store the resulting pcoll at this path in edn to facilitate dev/debug.
  • :coder => Uses a specific Coder for the results of this transform. Usually defaults to some form of nippy-coder.
  • :name => Adds a name to the Transform.
  • :trigger => Adds a Trigger to the Window.
  • :with-allowed-lateness => Allow late data. Mandatory for custom trigger
Apply a Session window to divide a PCollection (useful for unbounded PCollections).

See https://cloud.google.com/dataflow/model/windowing#setting-session-windows

Example:
```
(require '[clj-time.core :as time])
(ds/session-windows (time/minutes 10) pcoll)
```

Available options:

  - :accumulate-mode => Accumulate mode when a Trigger is fired (accumulate or discard)
  - :checkpoint => Given a path, will store the resulting pcoll at this path in edn to facilitate dev/debug.
  - :coder => Uses a specific Coder for the results of this transform. Usually defaults to some form of nippy-coder.
  - :name => Adds a name to the Transform.
  - :trigger => Adds a Trigger to the Window.
  - :with-allowed-lateness => Allow late data. Mandatory for custom trigger
sourceraw docstring

sfnclj

(sfn f)

Returns an instance of SerializableFunction equivalent to f.

Returns an instance of SerializableFunction equivalent to f.
sourceraw docstring

side-inputsclj

(side-inputs)

In the context of a ParDo, returns the corresponding side inputs as a map from names to values.

Example:

    (let [input (ds/generate-input [1 2 3 4 5] p)
          side-input (ds/view (ds/generate-input [{1 :a 2 :b 3 :c 4 :d 5 :e}] p))
          proc (ds/map (fn [x] (get-in (ds/side-inputs) [:mapping x]))
                       {:side-inputs {:mapping side-input}} input)])
In the context of a ParDo, returns the corresponding side inputs as a map from names to values.

  Example:
```
    (let [input (ds/generate-input [1 2 3 4 5] p)
          side-input (ds/view (ds/generate-input [{1 :a 2 :b 3 :c 4 :d 5 :e}] p))
          proc (ds/map (fn [x] (get-in (ds/side-inputs) [:mapping x]))
                       {:side-inputs {:mapping side-input}} input)])
```
sourceraw docstring

side-outputsclj

(side-outputs & kvs)

Returns multiple outputs keyed by keyword. Example:

(let [input (ds/generate-input [1 2 3 4 5] p)
   ;; simple and multi are pcoll with their respective elements)
   {:keys [simple multi]} (ds/map (fn [x] (ds/side-outputs :simple x :multi (* x 10)))
                                  {:side-outputs [:simple :multi]} input)])
Returns multiple outputs keyed by keyword.
   Example:
   ```
(let [input (ds/generate-input [1 2 3 4 5] p)
      ;; simple and multi are pcoll with their respective elements)
      {:keys [simple multi]} (ds/map (fn [x] (ds/side-outputs :simple x :multi (* x 10)))
                                     {:side-outputs [:simple :multi]} input)])
   ```
sourceraw docstring

sliding-windowsclj

(sliding-windows width step pcoll)
(sliding-windows width step options pcoll)

Apply a sliding window to divide a PCollection (useful for unbounded PCollections).

See https://cloud.google.com/dataflow/model/windowing#setting-sliding-time-windows

Example:

(require '[clj-time.core :as time])
(ds/sliding-windows (time/minutes 30) (time/seconds 5) pcoll)

Available options:

  • :accumulate-mode => Accumulate mode when a Trigger is fired (accumulate or discard)
  • :checkpoint => Given a path, will store the resulting pcoll at this path in edn to facilitate dev/debug.
  • :coder => Uses a specific Coder for the results of this transform. Usually defaults to some form of nippy-coder.
  • :name => Adds a name to the Transform.
  • :trigger => Adds a Trigger to the Window.
  • :with-allowed-lateness => Allow late data. Mandatory for custom trigger
Apply a sliding window to divide a PCollection (useful for unbounded PCollections).

See https://cloud.google.com/dataflow/model/windowing#setting-sliding-time-windows

Example:
```
(require '[clj-time.core :as time])
(ds/sliding-windows (time/minutes 30) (time/seconds 5) pcoll)
```

Available options:

  - :accumulate-mode => Accumulate mode when a Trigger is fired (accumulate or discard)
  - :checkpoint => Given a path, will store the resulting pcoll at this path in edn to facilitate dev/debug.
  - :coder => Uses a specific Coder for the results of this transform. Usually defaults to some form of nippy-coder.
  - :name => Adds a name to the Transform.
  - :trigger => Adds a Trigger to the Window.
  - :with-allowed-lateness => Allow late data. Mandatory for custom trigger
sourceraw docstring

split-pathclj

(split-path p)
source

sum-fnclj

(sum-fn &
        {:keys [mapper predicate]
         :or {mapper identity predicate (constantly true)}})
source

tapplyclj

(tapply pcoll nam tr)
source

text-reader-schemaclj

source

text-writer-schemaclj

source

to-ednclj

source

try-derefcljmacro

(try-deref at)
source

unloaded-ns-from-exclj

(unloaded-ns-from-ex e)
source

unwrap-ex-infocljmacro

(unwrap-ex-info e)
source

val->cljclj

(val->clj kv)
source

viewclj

(view pcoll)
(view {:keys [type] :or {type :singleton} :as options} pcoll)

Produces a View out of a PColl, to be later consumed as a side-input for example. See https://cloud.google.com/dataflow/java-sdk/JavaDoc/

Available options:

  • :checkpoint => Given a path, will store the resulting pcoll at this path in edn to facilitate dev/debug.
  • :coder => Uses a specific Coder for the results of this transform. Usually defaults to some form of nippy-coder.
  • :default => Sets a default value for SingletonView
  • :type => Type of View | One of [:singleton :iterable :list :map :multi-map] | Defaults to :singleton
Produces a View out of a PColl, to be later consumed as a side-input for example. See https://cloud.google.com/dataflow/java-sdk/JavaDoc/

Available options:

  - :checkpoint => Given a path, will store the resulting pcoll at this path in edn to facilitate dev/debug.
  - :coder => Uses a specific Coder for the results of this transform. Usually defaults to some form of nippy-coder.
  - :default => Sets a default value for SingletonView
  - :type => Type of View | One of [:singleton :iterable :list :map :multi-map] | Defaults to :singleton
sourceraw docstring

view-schemaclj

source

wait-pipeline-resultclj

(wait-pipeline-result pip-res)
(wait-pipeline-result pip-res timeout)

Blocks until this PipelineResult finishes. Returns the final state.

Blocks until this PipelineResult finishes. Returns the final state.
sourceraw docstring

window-schemaclj

source

with-keysclj

(with-keys f pcoll)
(with-keys f {:keys [key-coder value-coder coder] :as options} pcoll)

Returns a PCollection of KV by applying f on each element of the input PColelction and using the return value as the key and the element as the value.

See https://cloud.google.com/dataflow/java-sdk/JavaDoc/com/google/cloud/dataflow/sdk/transforms/WithKeys.html

Example:

(with-keys even? pcoll)

Available options:

  • :checkpoint => Given a path, will store the resulting pcoll at this path in edn to facilitate dev/debug.
  • :coder => Uses a specific Coder for the results of this transform. Usually defaults to some form of nippy-coder.
  • :key-coder => Coder to be used for encoding keys in the resulting KV PColl.
  • :value-coder => Coder to be used for encoding values in the resulting KV PColl.
Returns a PCollection of KV by applying f on each element of the input PColelction and using the return value as the key and the element as the value.

See https://cloud.google.com/dataflow/java-sdk/JavaDoc/com/google/cloud/dataflow/sdk/transforms/WithKeys.html

Example:
```
(with-keys even? pcoll)
```

Available options:

  - :checkpoint => Given a path, will store the resulting pcoll at this path in edn to facilitate dev/debug.
  - :coder => Uses a specific Coder for the results of this transform. Usually defaults to some form of nippy-coder.
  - :key-coder => Coder to be used for encoding keys in the resulting KV PColl.
  - :value-coder => Coder to be used for encoding values in the resulting KV PColl.
sourceraw docstring

with-optsclj

(with-opts schema opts ptransform)
source

with-opts-docstrclj

(with-opts-docstr doc-string & schemas)
source

with-timestampclj

(with-timestamp timestamp result)

Returns element(s) with the given timestamp as Timestamp. Anything that can be coerced by clj-time can be given as input. It can be nested inside a (side-outputs) or outside (in which case it applies to all results). Exemple:

(ds/map (fn [e] (ds/with-timestamp (clj-time.core/now) (* 2 e)) pcoll))
Returns element(s) with the given timestamp as Timestamp. Anything that can be coerced by clj-time can be given as input.
 It can be nested inside a `(side-outputs)` or outside (in which case it applies to all results).
 Exemple:
```
(ds/map (fn [e] (ds/with-timestamp (clj-time.core/now) (* 2 e)) pcoll))
```
sourceraw docstring

write-edn-fileclj

(write-edn-file to pcoll)
(write-edn-file to options pcoll)

Writes a PCollection of data to disk or Google Storage, with edn records separated by newlines.

See https://cloud.google.com/dataflow/java-sdk/JavaDoc/com/google/cloud/dataflow/sdk/io/TextIO.Write.html

Example:

(write-edn-file "gs://target/path" pcoll)

Available options:

  • :checkpoint => Given a path, will store the resulting pcoll at this path in edn to facilitate dev/debug.
  • :coder => Uses a specific Coder for the results of this transform. Usually defaults to some form of nippy-coder.
  • :compression-type => Choose compression type. | One of (:auto :bzip2 :gzip :zip :deflate :uncompressed)
  • :dynamic-fn => Uses the dynamic write to change destination file according to the content of the item
  • :file-format => Choose file format.
  • :naming-fn => Uses the naming fn
  • :num-shards => Selects the desired number of output shards (file fragments). 0 to let the system decide (recommended).
  • :prefix => Uses the given filename prefix.
  • :suffix => Uses the given filename suffix.
  • :temp-directory => Use temp directory when using Filename Policy as output (see filename-policy fn)
Writes a PCollection of data to disk or Google Storage, with edn records separated by newlines.

See https://cloud.google.com/dataflow/java-sdk/JavaDoc/com/google/cloud/dataflow/sdk/io/TextIO.Write.html

Example:
```
(write-edn-file "gs://target/path" pcoll)
```

Available options:

  - :checkpoint => Given a path, will store the resulting pcoll at this path in edn to facilitate dev/debug.
  - :coder => Uses a specific Coder for the results of this transform. Usually defaults to some form of nippy-coder.
  - :compression-type => Choose compression type. | One of (:auto :bzip2 :gzip :zip :deflate :uncompressed)
  - :dynamic-fn => Uses the dynamic write to change destination file according to the content of the item
  - :file-format => Choose file format.
  - :naming-fn => Uses the naming fn
  - :num-shards => Selects the desired number of output shards (file fragments). 0 to let the system decide (recommended).
  - :prefix => Uses the given filename prefix.
  - :suffix => Uses the given filename suffix.
  - :temp-directory => Use temp directory when using Filename Policy as output (see filename-policy fn)
sourceraw docstring

write-json-fileclj

(write-json-file to pcoll)
(write-json-file to options pcoll)

Writes a PCollection of data to disk or Google Storage, with JSON records separated by newlines.

See https://cloud.google.com/dataflow/java-sdk/JavaDoc/com/google/cloud/dataflow/sdk/io/TextIO.Write.html and https://github.com/dakrone/cheshire#encoding for details on options

Example:

(write-json-file "gs://target/path" pcoll)

Available options:

  • :checkpoint => Given a path, will store the resulting pcoll at this path in edn to facilitate dev/debug.
  • :coder => Uses a specific Coder for the results of this transform. Usually defaults to some form of nippy-coder.
  • :compression-type => Choose compression type. | One of (:auto :bzip2 :gzip :zip :deflate :uncompressed)
  • :date-format => Pattern for encoding java.util.Date objects. Defaults to yyyy-MM-dd'T'HH:mm:ss'Z'
  • :dynamic-fn => Uses the dynamic write to change destination file according to the content of the item
  • :escape-non-ascii => Generate JSON escaping UTF-8.
  • :file-format => Choose file format.
  • :key-fn => Generate JSON and munge keys with a custom function.
  • :naming-fn => Uses the naming fn
  • :num-shards => Selects the desired number of output shards (file fragments). 0 to let the system decide (recommended).
  • :prefix => Uses the given filename prefix.
  • :suffix => Uses the given filename suffix.
  • :temp-directory => Use temp directory when using Filename Policy as output (see filename-policy fn)
Writes a PCollection of data to disk or Google Storage, with JSON records separated by newlines.

See https://cloud.google.com/dataflow/java-sdk/JavaDoc/com/google/cloud/dataflow/sdk/io/TextIO.Write.html and https://github.com/dakrone/cheshire#encoding for details on options

Example:
```
(write-json-file "gs://target/path" pcoll)
```

Available options:

  - :checkpoint => Given a path, will store the resulting pcoll at this path in edn to facilitate dev/debug.
  - :coder => Uses a specific Coder for the results of this transform. Usually defaults to some form of nippy-coder.
  - :compression-type => Choose compression type. | One of (:auto :bzip2 :gzip :zip :deflate :uncompressed)
  - :date-format => Pattern for encoding java.util.Date objects. Defaults to yyyy-MM-dd'T'HH:mm:ss'Z'
  - :dynamic-fn => Uses the dynamic write to change destination file according to the content of the item
  - :escape-non-ascii => Generate JSON escaping UTF-8.
  - :file-format => Choose file format.
  - :key-fn => Generate JSON and munge keys with a custom function.
  - :naming-fn => Uses the naming fn
  - :num-shards => Selects the desired number of output shards (file fragments). 0 to let the system decide (recommended).
  - :prefix => Uses the given filename prefix.
  - :suffix => Uses the given filename suffix.
  - :temp-directory => Use temp directory when using Filename Policy as output (see filename-policy fn)
sourceraw docstring

write-text-fileclj

(write-text-file to {:keys [dynamic? dynamic-fn] :as options} pcoll)

Writes a PCollection of Strings to disk or Google Storage, with records separated by newlines.

See https://cloud.google.com/dataflow/java-sdk/JavaDoc/com/google/cloud/dataflow/sdk/io/TextIO.Write.html

Example:

    (write-text-file "gs://target/path" pcoll)

Available options:

  • :checkpoint => Given a path, will store the resulting pcoll at this path in edn to facilitate dev/debug.
  • :coder => Uses a specific Coder for the results of this transform. Usually defaults to some form of nippy-coder.
  • :compression-type => Choose compression type. | One of (:auto :bzip2 :gzip :zip :deflate :uncompressed)
  • :dynamic-fn => Uses the dynamic write to change destination file according to the content of the item
  • :file-format => Choose file format.
  • :naming-fn => Uses the naming fn
  • :num-shards => Selects the desired number of output shards (file fragments). 0 to let the system decide (recommended).
  • :prefix => Uses the given filename prefix.
  • :suffix => Uses the given filename suffix.
  • :temp-directory => Use temp directory when using Filename Policy as output (see filename-policy fn)
Writes a PCollection of Strings to disk or Google Storage, with records separated by newlines.

See https://cloud.google.com/dataflow/java-sdk/JavaDoc/com/google/cloud/dataflow/sdk/io/TextIO.Write.html

  Example:
```
    (write-text-file "gs://target/path" pcoll)
```

Available options:

  - :checkpoint => Given a path, will store the resulting pcoll at this path in edn to facilitate dev/debug.
  - :coder => Uses a specific Coder for the results of this transform. Usually defaults to some form of nippy-coder.
  - :compression-type => Choose compression type. | One of (:auto :bzip2 :gzip :zip :deflate :uncompressed)
  - :dynamic-fn => Uses the dynamic write to change destination file according to the content of the item
  - :file-format => Choose file format.
  - :naming-fn => Uses the naming fn
  - :num-shards => Selects the desired number of output shards (file fragments). 0 to let the system decide (recommended).
  - :prefix => Uses the given filename prefix.
  - :suffix => Uses the given filename suffix.
  - :temp-directory => Use temp directory when using Filename Policy as output (see filename-policy fn)
sourceraw docstring

write-text-file-by-transformclj

(write-text-file-by-transform output-transform f mapping to options)
source

cljdoc is a website building & hosting documentation for Clojure/Script libraries

× close