(add-op flow fn)
Accepts a generator and a function from pipe to pipe and applies the operation to the active head pipe.
Accepts a generator and a function from pipe to pipe and applies the operation to the active head pipe.
(agg agg-fn in-fields out-fields)
Returns in instance of IAggregator that adds a reduce-side-only aggregation to its supplied pipe.
Returns in instance of IAggregator that adds a reduce-side-only aggregation to its supplied pipe.
(aggregate-mode aggregators force-reduce?)
Accepts a sequence of aggregators and a boolean force-reduce? flag and returns a keyword representing the aggregation type.
Accepts a sequence of aggregators and a boolean force-reduce? flag and returns a keyword representing the aggregation type.
(aggregator? x)
(assembly args & ops)
(buffer buffer-fn in-fields out-fields)
(buffer? x)
(bufferiter buffer-fn in-fields out-fields)
(build-join-group group-op group-name pipes group-fields decl-fields join)
(build-triplet gen join-fields)
(cascalog-join gen-seq join-fields options)
(checkpoint* flow)
Forces results of the flow so far to be written to temp file, ensuring a M/R job boundary at this point in the flow.
TODO: allow specifying a tap for checkpoint
Forces results of the flow so far to be written to temp file, ensuring a M/R job boundary at this point in the flow. TODO: allow specifying a tap for checkpoint
(co-group* flows
group-fields
&
{:keys [decl-fields aggs reducers join name] :or {join :inner}})
(constant-substitutions vars)
Returns a 2-vector of the form
[new variables, {map of newvars to values to substitute}]
Returns a 2-vector of the form [new variables, {map of newvars to values to substitute}]
(debug* flow)
Prints all tuples that pass through the StdOut.
Prints all tuples that pass through the StdOut.
(declared-fields join-fields renames infields)
Accepts a sequence of join fields and a sequence of field-seqs (each containing the join-fields, presumably) and returns a full vector of unique field names, suitable for the return value of a co-group.
Accepts a sequence of join fields and a sequence of field-seqs (each containing the join-fields, presumably) and returns a full vector of unique field names, suitable for the return value of a co-group.
(discard* flow drop-fields)
Discard the supplied fields.
Discard the supplied fields.
(each flow f from-fields to-fields)
Accepts a flow, a function from result fields => cascading Function, input fields and output fields and returns a new flow.
Accepts a flow, a function from result fields => cascading Function, input fields and output fields and returns a new flow.
(ensure-project gen-seq)
Makes sure that the declared fields are in the proper order.
Makes sure that the declared fields are in the proper order.
(fields-to-keep gen-seq)
We want to keep the out-field of Existence nodes and all available fields of the Inner and Outer nodes.
We want to keep the out-field of Existence nodes and all available fields of the Inner and Outer nodes.
(filter* flow op-var in-fields)
(filter-nullable-vars flow all-fields)
If there are any nullable variables present in the output, filter nulls out now.
If there are any nullable variables present in the output, filter nulls out now.
(generate-join-fields numfields numpipes)
(group-by* flow
group-fields
aggs
&
{:keys [reducers spill-threshold sort-fields reverse? reduce-only
name]
:or {spill-threshold 0}})
Applies a grouping operation to the supplied generator.
Applies a grouping operation to the supplied generator.
(hash-join* flows
join-fields
&
{:keys [join decl-fields name] :or {join :inner}})
Performs a map-side join of flows on join-fields. By default does an inner join, but callers can specify a join type using :join keyword argument, which can be :inner, :outer, or a Cascading Joiner implementation.
Note: full or right outer joins have odd behavior in hash joins. See Cascading documentation for details.
Performs a map-side join of flows on join-fields. By default does an inner join, but callers can specify a join type using :join keyword argument, which can be :inner, :outer, or a Cascading Joiner implementation. Note: full or right outer joins have odd behavior in hash joins. See Cascading documentation for details.
(hash-join-many flow-joins decl-fields)
Takes a sequence of [pipe, join-fields, join-type] triplets along with other hash-join arguments and performs a mixed join. Allowed join types are :inner, :outer, and :exists. The first entry must be of join type :inner.
Takes a sequence of [pipe, join-fields, join-type] triplets along with other hash-join arguments and performs a mixed join. Allowed join types are :inner, :outer, and :exists. The first entry must be of join type :inner.
(hash-join-with-tiny larger-flow fields1 tiny-flow fields2)
(aggregate-by _)
(add-aggregator _ pipe)
(add-buffer _ pipe)
(identity* flow input output)
Mirrors the supplied set of input fields into the output fields.
Mirrors the supplied set of input fields into the output fields.
(in-branch flow f)
(in-branch flow name f)
Accepts a temporary name and a function from flow => flow and performs the operation within a renamed branch.
Accepts a temporary name and a function from flow => flow and performs the operation within a renamed branch.
(insert* flow & field-v-pairs)
Accepts a flow and alternating field/value pairs and inserts these items into the flow.
Accepts a flow and alternating field/value pairs and inserts these items into the flow.
(insert-subs flow sub-m)
(join->joiner join)
Converts the supplier joiner instance or keyword to a Cascading Joiner.
Converts the supplier joiner instance or keyword to a Cascading Joiner.
(join-fields-selector num-fields)
Returns a selector that's used to go pull out groups from the join that aren't all nil.
Returns a selector that's used to go pull out groups from the join that aren't all nil.
(join-many flow-joins decl-fields options)
Takes a sequence of [pipe, join-fields, join-type] triplets along with other co-group arguments and performs a mixed join. Allowed join types are :inner, :outer, and :exists.
Takes a sequence of [pipe, join-fields, join-type] triplets along with other co-group arguments and performs a mixed join. Allowed join types are :inner, :outer, and :exists.
(join-with-larger smaller-flow
fields1
larger-flow
fields2
group-fields
aggs
&
opts)
(join-with-smaller larger-flow fields1 smaller-flow fields2 & opts)
(lazy-generator tmp-path [tuple :as l-seq])
Returns a cascalog generator on the supplied sequence of
tuples. lazy-generator
serializes each item in the lazy sequence
into a sequencefile located at the supplied temporary directory and returns
a tap for the data in that directory.
It's recommended to wrap queries that use this tap with
cascalog.cascading.io/with-fs-tmp
; for example,
(with-fs-tmp [_ tmp-dir] (let [lazy-tap (lazy-generator tmp-dir lazy-seq)] (?<- (stdout) [?field1 ?field2 ... etc] (lazy-tap ?field1 ?field2) ...)))
Returns a cascalog generator on the supplied sequence of tuples. `lazy-generator` serializes each item in the lazy sequence into a sequencefile located at the supplied temporary directory and returns a tap for the data in that directory. It's recommended to wrap queries that use this tap with `cascalog.cascading.io/with-fs-tmp`; for example, (with-fs-tmp [_ tmp-dir] (let [lazy-tap (lazy-generator tmp-dir lazy-seq)] (?<- (stdout) [?field1 ?field2 ... etc] (lazy-tap ?field1 ?field2) ...)))
(left-hash-join-with-tiny larger-flow fields1 tiny-flow fields2)
(left-join-with-larger smaller-flow
fields1
larger-flow
fields2
aggs
&
{:as opts})
(left-join-with-smaller larger-flow fields1 smaller-flow fields2 aggs & opts)
(lift-pipes flows)
(logically gen in-fields out-fields f)
Accepts a flow, input fields, output fields and a function that accepts the same things and allows for the following features:
Any variables not prefixed with !, !! or ? are treated as constants in the flow. This allows for (map* flow + 10 ["?a"] ["?b"]) to work properly and clean up its fields without hassle.
Any non-nullable output variables (prefixed with ?) are removed from the flow.
Duplicate input fields are allowed. It is currently NOT allowed to output one of the input variables. In Cascalog, this triggers an implicit filter; this needs to be supplied at another layer.
Accepts a flow, input fields, output fields and a function that accepts the same things and allows for the following features: Any variables not prefixed with !, !! or ? are treated as constants in the flow. This allows for (map* flow + 10 ["?a"] ["?b"]) to work properly and clean up its fields without hassle. Any non-nullable output variables (prefixed with ?) are removed from the flow. Duplicate input fields are allowed. It is currently NOT allowed to output one of the input variables. In Cascalog, this triggers an implicit filter; this needs to be supplied at another layer.
(map* flow op-var in-fields out-fields)
(mapcat* flow op-var in-fields out-fields)
(multigroup pairs declared-group-vars op out-fields)
Take a sequence of pairs of [pipe, join-fields]
Take a sequence of pairs of [pipe, join-fields]
(name-flow gen name)
Assigns a new name to the clojure flow.
Assigns a new name to the clojure flow.
(new-pipe-name joined-seq)
(no-overlap? large small)
(parallel-agg agg-fn in-fields out-fields & {:keys [init-var present-var]})
Creates a parallel aggregation operation.
Creates a parallel aggregation operation.
(parallel-agg? x)
(rename* flow new-fields)
(rename* flow old-fields new-fields)
rename old-fields to new-fields.
rename old-fields to new-fields.
(rename-pipe gen)
(rename-pipe gen name)
(replace-join-fields join-fields join-renames fields)
(sample* flow percent)
(sample* flow percent seed)
Sample some percentage of elements within this pipe. percent should be between 0.00 (0%) and 1.00 (100%) you can provide a seed to get reproducible results.
Sample some percentage of elements within this pipe. percent should be between 0.00 (0%) and 1.00 (100%) you can provide a seed to get reproducible results.
(select* flow keep-fields)
Remove all but the supplied fields from the given flow.
Remove all but the supplied fields from the given flow.
(set-reducers pipe reducers)
Set the number of reducers for this step in the pipe.
Set the number of reducers for this step in the pipe.
(substitute-if pred subfn aseq)
Returns [newseq {map of newvals to oldvals}]
Returns [newseq {map of newvals to oldvals}]
(trap* flow trap)
Applies a trap to the current branch of the supplied flow.
Applies a trap to the current branch of the supplied flow.
(union* & flows)
Merges the supplied flows and ensures uniqueness of the resulting tuples.
Merges the supplied flows and ensures uniqueness of the resulting tuples.
(unique flow)
(unique flow unique-fields & options)
Performs a unique on the input pipe by the supplied fields.
Performs a unique on the input pipe by the supplied fields.
(unique-aggregator)
(with-constants gen in-fields f)
Allows constant substitution on inputs.
Allows constant substitution on inputs.
(with-duplicate-inputs flow from-fields f)
Accepts a flow, some fields, and a function from (flow, unique-fields, new-fields) => flow and appropriately handles duplicate entries inside of the fields.
The fields passed to the supplied function will be guaranteed unique. New fields are passed as a third option to the supplying function, which may decide to call (discard* delta) if the fields are still around.
Accepts a flow, some fields, and a function from (flow, unique-fields, new-fields) => flow and appropriately handles duplicate entries inside of the fields. The fields passed to the supplied function will be guaranteed unique. New fields are passed as a third option to the supplying function, which may decide to call (discard* delta) if the fields are still around.
(with-trap* flow trap f)
Applies a trap to everything that occurs within the supplied function of flow => flow.
Applies a trap to everything that occurs within the supplied function of flow => flow.
(write* flow sink)
cljdoc is a website building & hosting documentation for Clojure/Script libraries
× close