Liking cljdoc? Tell your friends :D

cascalog.cascading.operations


add-opclj

(add-op flow fn)

Accepts a generator and a function from pipe to pipe and applies the operation to the active head pipe.

Accepts a generator and a function from pipe to pipe and applies
the operation to the active head pipe.
raw docstring

aggclj

(agg agg-fn in-fields out-fields)

Returns in instance of IAggregator that adds a reduce-side-only aggregation to its supplied pipe.

Returns in instance of IAggregator that adds a reduce-side-only
aggregation to its supplied pipe.
raw docstring

aggregate-modeclj

(aggregate-mode aggregators force-reduce?)

Accepts a sequence of aggregators and a boolean force-reduce? flag and returns a keyword representing the aggregation type.

Accepts a sequence of aggregators and a boolean force-reduce? flag
and returns a keyword representing the aggregation type.
raw docstring

aggregator?clj

(aggregator? x)

assemblycljmacro

(assembly args & ops)

bufferclj

(buffer buffer-fn in-fields out-fields)

buffer?clj

(buffer? x)

bufferiterclj

(bufferiter buffer-fn in-fields out-fields)

build-join-groupcljmacro

(build-join-group group-op group-name pipes group-fields decl-fields join)

build-tripletclj

(build-triplet gen join-fields)

cascalog-joinclj

(cascalog-join gen-seq join-fields options)

checkpoint*clj

(checkpoint* flow)

Forces results of the flow so far to be written to temp file, ensuring a M/R job boundary at this point in the flow.

TODO: allow specifying a tap for checkpoint

Forces results of the flow so far to be written to temp file, ensuring
 a M/R job boundary at this point in the flow.

TODO: allow specifying a tap for checkpoint
raw docstring

co-group*clj

(co-group* flows
           group-fields
           &
           {:keys [decl-fields aggs reducers join name] :or {join :inner}})

constant-substitutionsclj

(constant-substitutions vars)

Returns a 2-vector of the form

[new variables, {map of newvars to values to substitute}]

Returns a 2-vector of the form

[new variables, {map of newvars to values to substitute}]
raw docstring

debug*clj

(debug* flow)

Prints all tuples that pass through the StdOut.

Prints all tuples that pass through the StdOut.
raw docstring

declared-fieldsclj

(declared-fields join-fields renames infields)

Accepts a sequence of join fields and a sequence of field-seqs (each containing the join-fields, presumably) and returns a full vector of unique field names, suitable for the return value of a co-group.

Accepts a sequence of join fields and a sequence of
field-seqs (each containing the join-fields, presumably) and returns
a full vector of unique field names, suitable for the return value
of a co-group.
raw docstring

defopcljmacro

(defop f-name & tail)

Defines a flow operation.

Defines a flow operation.
raw docstring

discard*clj

(discard* flow drop-fields)

Discard the supplied fields.

Discard the supplied fields.
raw docstring

eachclj

(each flow f from-fields to-fields)

Accepts a flow, a function from result fields => cascading Function, input fields and output fields and returns a new flow.

Accepts a flow, a function from result fields => cascading
Function, input fields and output fields and returns a new flow.
raw docstring

ensure-projectclj

(ensure-project gen-seq)

Makes sure that the declared fields are in the proper order.

Makes sure that the declared fields are in the proper order.
raw docstring

fields-to-keepclj

(fields-to-keep gen-seq)

We want to keep the out-field of Existence nodes and all available fields of the Inner and Outer nodes.

We want to keep the out-field of Existence nodes and all available
fields of the Inner and Outer nodes.
raw docstring

filter*clj

(filter* flow op-var in-fields)

filter-nullable-varsclj

(filter-nullable-vars flow all-fields)

If there are any nullable variables present in the output, filter nulls out now.

If there are any nullable variables present in the output, filter
nulls out now.
raw docstring

generate-join-fieldsclj

(generate-join-fields numfields numpipes)

group-by*clj

(group-by* flow
           group-fields
           aggs
           &
           {:keys [reducers spill-threshold sort-fields reverse? reduce-only
                   name]
            :or {spill-threshold 0}})

Applies a grouping operation to the supplied generator.

Applies a grouping operation to the supplied generator.
raw docstring

hash-join*clj

(hash-join* flows
            join-fields
            &
            {:keys [join decl-fields name] :or {join :inner}})

Performs a map-side join of flows on join-fields. By default does an inner join, but callers can specify a join type using :join keyword argument, which can be :inner, :outer, or a Cascading Joiner implementation.

Note: full or right outer joins have odd behavior in hash joins. See Cascading documentation for details.

Performs a map-side join of flows on join-fields. By default
does an inner join, but callers can specify a join type using
:join keyword argument, which can be :inner, :outer, or a
Cascading Joiner implementation.

Note: full or right outer joins have odd behavior in hash joins.
      See Cascading documentation for details.
raw docstring

hash-join-manyclj

(hash-join-many flow-joins decl-fields)

Takes a sequence of [pipe, join-fields, join-type] triplets along with other hash-join arguments and performs a mixed join. Allowed join types are :inner, :outer, and :exists. The first entry must be of join type :inner.

Takes a sequence of [pipe, join-fields, join-type] triplets along
with other hash-join arguments and performs a mixed join. Allowed
join types are :inner, :outer, and :exists. The first entry must
be of join type :inner.
raw docstring

hash-join-with-tinyclj

(hash-join-with-tiny larger-flow fields1 tiny-flow fields2)

IAggregateBycljprotocol

aggregate-byclj

(aggregate-by _)

IAggregatorcljprotocol

add-aggregatorclj

(add-aggregator _ pipe)

IBuffercljprotocol

add-bufferclj

(add-buffer _ pipe)

identity*clj

(identity* flow input output)

Mirrors the supplied set of input fields into the output fields.

Mirrors the supplied set of input fields into the output fields.
raw docstring

in-branchclj

(in-branch flow f)
(in-branch flow name f)

Accepts a temporary name and a function from flow => flow and performs the operation within a renamed branch.

Accepts a temporary name and a function from flow => flow and
performs the operation within a renamed branch.
raw docstring

insert*clj

(insert* flow & field-v-pairs)

Accepts a flow and alternating field/value pairs and inserts these items into the flow.

Accepts a flow and alternating field/value pairs and inserts these
items into the flow.
raw docstring

insert-subsclj

(insert-subs flow sub-m)

join->joinerclj

(join->joiner join)

Converts the supplier joiner instance or keyword to a Cascading Joiner.

Converts the supplier joiner instance or keyword to a Cascading
Joiner.
raw docstring

join-fields-selectorclj

(join-fields-selector num-fields)

Returns a selector that's used to go pull out groups from the join that aren't all nil.

Returns a selector that's used to go pull out groups from the join
that aren't all nil.
raw docstring

join-manyclj

(join-many flow-joins decl-fields options)

Takes a sequence of [pipe, join-fields, join-type] triplets along with other co-group arguments and performs a mixed join. Allowed join types are :inner, :outer, and :exists.

Takes a sequence of [pipe, join-fields, join-type] triplets along
with other co-group arguments and performs a mixed join. Allowed
join types are :inner, :outer, and :exists.
raw docstring

join-with-largerclj

(join-with-larger smaller-flow
                  fields1
                  larger-flow
                  fields2
                  group-fields
                  aggs
                  &
                  opts)

join-with-smallerclj

(join-with-smaller larger-flow fields1 smaller-flow fields2 & opts)

lazy-generatorclj

(lazy-generator tmp-path [tuple :as l-seq])

Returns a cascalog generator on the supplied sequence of tuples. lazy-generator serializes each item in the lazy sequence into a sequencefile located at the supplied temporary directory and returns a tap for the data in that directory.

It's recommended to wrap queries that use this tap with cascalog.cascading.io/with-fs-tmp; for example,

(with-fs-tmp [_ tmp-dir] (let [lazy-tap (lazy-generator tmp-dir lazy-seq)] (?<- (stdout) [?field1 ?field2 ... etc] (lazy-tap ?field1 ?field2) ...)))

Returns a cascalog generator on the supplied sequence of
tuples. `lazy-generator` serializes each item in the lazy sequence
into a sequencefile located at the supplied temporary directory and returns
a tap for the data in that directory.

It's recommended to wrap queries that use this tap with
`cascalog.cascading.io/with-fs-tmp`; for example,

  (with-fs-tmp [_ tmp-dir]
    (let [lazy-tap (lazy-generator tmp-dir lazy-seq)]
      (?<- (stdout)
           [?field1 ?field2 ... etc]
           (lazy-tap ?field1 ?field2)
           ...)))
raw docstring

left-hash-join-with-tinyclj

(left-hash-join-with-tiny larger-flow fields1 tiny-flow fields2)

left-join-with-largerclj

(left-join-with-larger smaller-flow
                       fields1
                       larger-flow
                       fields2
                       aggs
                       &
                       {:as opts})

left-join-with-smallerclj

(left-join-with-smaller larger-flow fields1 smaller-flow fields2 aggs & opts)

lift-pipesclj

(lift-pipes flows)

logicallyclj

(logically gen in-fields out-fields f)

Accepts a flow, input fields, output fields and a function that accepts the same things and allows for the following features:

Any variables not prefixed with !, !! or ? are treated as constants in the flow. This allows for (map* flow + 10 ["?a"] ["?b"]) to work properly and clean up its fields without hassle.

Any non-nullable output variables (prefixed with ?) are removed from the flow.

Duplicate input fields are allowed. It is currently NOT allowed to output one of the input variables. In Cascalog, this triggers an implicit filter; this needs to be supplied at another layer.

Accepts a flow, input fields, output fields and a function that
accepts the same things and allows for the following features:

Any variables not prefixed with !, !! or ? are treated as constants
in the flow. This allows for (map* flow + 10 ["?a"] ["?b"]) to
work properly and clean up its fields without hassle.

Any non-nullable output variables (prefixed with ?) are removed from
the flow.

Duplicate input fields are allowed. It is currently NOT allowed to
output one of the input variables. In Cascalog, this triggers an
implicit filter; this needs to be supplied at another layer.
raw docstring

map*clj

(map* flow op-var in-fields out-fields)

mapcat*clj

(mapcat* flow op-var in-fields out-fields)

multigroupclj

(multigroup pairs declared-group-vars op out-fields)

Take a sequence of pairs of [pipe, join-fields]

Take a sequence of pairs of [pipe, join-fields]
raw docstring

name-flowclj

(name-flow gen name)

Assigns a new name to the clojure flow.

Assigns a new name to the clojure flow.
raw docstring

new-pipe-nameclj

(new-pipe-name joined-seq)

no-overlap?clj

(no-overlap? large small)

parallel-aggclj

(parallel-agg agg-fn in-fields out-fields & {:keys [init-var present-var]})

Creates a parallel aggregation operation.

Creates a parallel aggregation operation.
raw docstring

parallel-agg?clj

(parallel-agg? x)

REDUCER-KEYclj


rename*clj

(rename* flow new-fields)
(rename* flow old-fields new-fields)

rename old-fields to new-fields.

rename old-fields to new-fields.
raw docstring

rename-pipeclj

(rename-pipe gen)
(rename-pipe gen name)

replace-join-fieldsclj

(replace-join-fields join-fields join-renames fields)

sample*clj

(sample* flow percent)
(sample* flow percent seed)

Sample some percentage of elements within this pipe. percent should be between 0.00 (0%) and 1.00 (100%) you can provide a seed to get reproducible results.

Sample some percentage of elements within this pipe. percent should
be between 0.00 (0%) and 1.00 (100%) you can provide a seed to get
reproducible results.
raw docstring

select*clj

(select* flow keep-fields)

Remove all but the supplied fields from the given flow.

Remove all but the supplied fields from the given flow.
raw docstring

set-reducersclj

(set-reducers pipe reducers)

Set the number of reducers for this step in the pipe.

Set the number of reducers for this step in the pipe.
raw docstring

substitute-ifclj

(substitute-if pred subfn aseq)

Returns [newseq {map of newvals to oldvals}]

Returns [newseq {map of newvals to oldvals}]
raw docstring

trap*clj

(trap* flow trap)

Applies a trap to the current branch of the supplied flow.

Applies a trap to the current branch of the supplied flow.
raw docstring

union*clj

(union* & flows)

Merges the supplied flows and ensures uniqueness of the resulting tuples.

Merges the supplied flows and ensures uniqueness of the resulting
tuples.
raw docstring

uniqueclj

(unique flow)
(unique flow unique-fields & options)

Performs a unique on the input pipe by the supplied fields.

Performs a unique on the input pipe by the supplied fields.
raw docstring

unique-aggregatorclj

(unique-aggregator)

with-constantsclj

(with-constants gen in-fields f)

Allows constant substitution on inputs.

Allows constant substitution on inputs.
raw docstring

with-duplicate-inputsclj

(with-duplicate-inputs flow from-fields f)

Accepts a flow, some fields, and a function from (flow, unique-fields, new-fields) => flow and appropriately handles duplicate entries inside of the fields.

The fields passed to the supplied function will be guaranteed unique. New fields are passed as a third option to the supplying function, which may decide to call (discard* delta) if the fields are still around.

Accepts a flow, some fields, and a function from (flow,
unique-fields, new-fields) => flow and appropriately handles
duplicate entries inside of the fields.

The fields passed to the supplied function will be guaranteed
unique. New fields are passed as a third option to the supplying
function, which may decide to call (discard* delta) if the fields
are still around.
raw docstring

with-trap*clj

(with-trap* flow trap f)

Applies a trap to everything that occurs within the supplied function of flow => flow.

Applies a trap to everything that occurs within the supplied
function of flow => flow.
raw docstring

write*clj

(write* flow sink)

cljdoc is a website building & hosting documentation for Clojure/Script libraries

× close