(aggregate-columns ds-or-seq colname agg-map & [options])
(count-distinct colname)
(count-distinct colname op-space)
(distinct colname)
(distinct colname finalizer)
Create a reducer that will return a set of values.
Create a reducer that will return a set of values.
(distinct-int32 colname)
(distinct-int32 colname finalizer)
Get the set of distinct items given you know the space is no larger than int32 space. The optional finalizer allows you to post-process the data.
Get the set of distinct items given you know the space is no larger than int32 space. The optional finalizer allows you to post-process the data.
(first-value colname)
(group-by-column-agg colname agg-map ds-seq)
(group-by-column-agg colname agg-map options ds-seq)
Group a sequence of datasets by a column and aggregate down into a new dataset.
colname - Either a single scalar column name or a vector of column names to group by.
agg-map - map of result column name to reducer. All values in the agg map must be
instances of tech.v3.datatype.IndexReduction
. Column values will be inferred from
the finalized result of the first reduction with nil indicating an object column.
Options:
:map-initial-capacity
- initial hashmap capacity. Resizing hash-maps is expensive so we
would like to set this to something reasonable. Defaults to 100000.:index-filter
- A function that given a dataset produces a function from long index
to boolean. Only indexes for which the index-filter returns true will be added to the
aggregation. For very large datasets, this is a bit faster than using filter before
the aggregation.Example:
user> (require '[tech.v3.dataset :as ds])
nil
user> (require '[tech.v3.dataset.reductions :as ds-reduce])
nil
user> (def stocks (ds/->dataset "test/data/stocks.csv" {:key-fn keyword}))
#'user/stocks
user> (ds-reduce/group-by-column-agg
:symbol
{:symbol (ds-reduce/first-value :symbol)
:price-avg (ds-reduce/mean :price)
:price-sum (ds-reduce/sum :price)}
[stocks stocks stocks])
:symbol-aggregation [5 3]:
| :symbol | :price-avg | :price-sum |
|---------|--------------|------------|
| MSFT | 24.73674797 | 9127.86 |
| IBM | 91.26121951 | 33675.39 |
| AAPL | 64.73048780 | 23885.55 |
| GOOG | 415.87044118 | 84837.57 |
| AMZN | 47.98707317 | 17707.23 |
tech.v3.dataset.reductions-test> (def tstds
(ds/->dataset {:a ["a" "a" "a" "b" "b" "b" "c" "d" "e"]
:b [22 21 22 44 42 44 77 88 99]}))
#'tech.v3.dataset.reductions-test/tstds
tech.v3.dataset.reductions-test> (ds-reduce/group-by-column-agg
[:a :b] {:a (ds-reduce/first-value :a)
:b (ds-reduce/first-value :b)
:c (ds-reduce/row-count)}
[tstds tstds tstds])
:tech.v3.dataset.reductions/_temp_col-aggregation [7 3]:
| :a | :b | :c |
|----|---:|---:|
| a | 21 | 3 |
| a | 22 | 6 |
| b | 42 | 3 |
| b | 44 | 6 |
| c | 77 | 3 |
| d | 88 | 3 |
| e | 99 | 3 |
Group a sequence of datasets by a column and aggregate down into a new dataset. * colname - Either a single scalar column name or a vector of column names to group by. * agg-map - map of result column name to reducer. All values in the agg map must be instances of `tech.v3.datatype.IndexReduction`. Column values will be inferred from the finalized result of the first reduction with nil indicating an object column. Options: * `:map-initial-capacity` - initial hashmap capacity. Resizing hash-maps is expensive so we would like to set this to something reasonable. Defaults to 100000. * `:index-filter` - A function that given a dataset produces a function from long index to boolean. Only indexes for which the index-filter returns true will be added to the aggregation. For very large datasets, this is a bit faster than using filter before the aggregation. Example: ```clojure user> (require '[tech.v3.dataset :as ds]) nil user> (require '[tech.v3.dataset.reductions :as ds-reduce]) nil user> (def stocks (ds/->dataset "test/data/stocks.csv" {:key-fn keyword})) #'user/stocks user> (ds-reduce/group-by-column-agg :symbol {:symbol (ds-reduce/first-value :symbol) :price-avg (ds-reduce/mean :price) :price-sum (ds-reduce/sum :price)} [stocks stocks stocks]) :symbol-aggregation [5 3]: | :symbol | :price-avg | :price-sum | |---------|--------------|------------| | MSFT | 24.73674797 | 9127.86 | | IBM | 91.26121951 | 33675.39 | | AAPL | 64.73048780 | 23885.55 | | GOOG | 415.87044118 | 84837.57 | | AMZN | 47.98707317 | 17707.23 | tech.v3.dataset.reductions-test> (def tstds (ds/->dataset {:a ["a" "a" "a" "b" "b" "b" "c" "d" "e"] :b [22 21 22 44 42 44 77 88 99]})) #'tech.v3.dataset.reductions-test/tstds tech.v3.dataset.reductions-test> (ds-reduce/group-by-column-agg [:a :b] {:a (ds-reduce/first-value :a) :b (ds-reduce/first-value :b) :c (ds-reduce/row-count)} [tstds tstds tstds]) :tech.v3.dataset.reductions/_temp_col-aggregation [7 3]: | :a | :b | :c | |----|---:|---:| | a | 21 | 3 | | a | 22 | 6 | | b | 42 | 3 | | b | 44 | 6 | | c | 77 | 3 | | d | 88 | 3 | | e | 99 | 3 | ```
(mean colname)
Create a double consumer which will produce a mean of the column.
Create a double consumer which will produce a mean of the column.
(row-count)
Create a simple reducer that returns the number of times reduceIndex was called.
Create a simple reducer that returns the number of times reduceIndex was called.
(sum colname)
Create a double consumer which will sum the values.
Create a double consumer which will sum the values.
cljdoc is a website building & hosting documentation for Clojure/Script libraries
× close