(aggregate-columns ds-or-seq colname agg-map & [options])(count-distinct colname)(count-distinct colname op-space)(distinct colname)(distinct colname finalizer)Create a reducer that will return a set of values.
Create a reducer that will return a set of values.
(distinct-int32 colname)(distinct-int32 colname finalizer)Get the set of distinct items given you know the space is no larger than int32 space. The optional finalizer allows you to post-process the data.
Get the set of distinct items given you know the space is no larger than int32 space. The optional finalizer allows you to post-process the data.
(first-value colname)(group-by-column-agg colname agg-map ds-seq)(group-by-column-agg colname agg-map options ds-seq)Group a sequence of datasets by a column and aggregate down into a new dataset.
colname - Either a single scalar column name or a vector of column names to group by.
agg-map - map of result column name to reducer. All values in the agg map must be
instances of tech.v3.datatype.IndexReduction. Column values will be inferred from
the finalized result of the first reduction with nil indicating an object column.
Options:
:map-initial-capacity - initial hashmap capacity. Resizing hash-maps is expensive so we
would like to set this to something reasonable. Defaults to 100000.:index-filter - A function that given a dataset produces a function from long index
to boolean. Only indexes for which the index-filter returns true will be added to the
aggregation. For very large datasets, this is a bit faster than using filter before
the aggregation.Example:
user> (require '[tech.v3.dataset :as ds])
nil
user> (require '[tech.v3.dataset.reductions :as ds-reduce])
nil
user> (def stocks (ds/->dataset "test/data/stocks.csv" {:key-fn keyword}))
#'user/stocks
user> (ds-reduce/group-by-column-agg
:symbol
{:symbol (ds-reduce/first-value :symbol)
:price-avg (ds-reduce/mean :price)
:price-sum (ds-reduce/sum :price)}
[stocks stocks stocks])
:symbol-aggregation [5 3]:
| :symbol | :price-avg | :price-sum |
|---------|--------------|------------|
| MSFT | 24.73674797 | 9127.86 |
| IBM | 91.26121951 | 33675.39 |
| AAPL | 64.73048780 | 23885.55 |
| GOOG | 415.87044118 | 84837.57 |
| AMZN | 47.98707317 | 17707.23 |
tech.v3.dataset.reductions-test> (def tstds
(ds/->dataset {:a ["a" "a" "a" "b" "b" "b" "c" "d" "e"]
:b [22 21 22 44 42 44 77 88 99]}))
#'tech.v3.dataset.reductions-test/tstds
tech.v3.dataset.reductions-test> (ds-reduce/group-by-column-agg
[:a :b] {:a (ds-reduce/first-value :a)
:b (ds-reduce/first-value :b)
:c (ds-reduce/row-count)}
[tstds tstds tstds])
:tech.v3.dataset.reductions/_temp_col-aggregation [7 3]:
| :a | :b | :c |
|----|---:|---:|
| a | 21 | 3 |
| a | 22 | 6 |
| b | 42 | 3 |
| b | 44 | 6 |
| c | 77 | 3 |
| d | 88 | 3 |
| e | 99 | 3 |
Group a sequence of datasets by a column and aggregate down into a new dataset.
* colname - Either a single scalar column name or a vector of column names to group by.
* agg-map - map of result column name to reducer. All values in the agg map must be
instances of `tech.v3.datatype.IndexReduction`. Column values will be inferred from
the finalized result of the first reduction with nil indicating an object column.
Options:
* `:map-initial-capacity` - initial hashmap capacity. Resizing hash-maps is expensive so we
would like to set this to something reasonable. Defaults to 100000.
* `:index-filter` - A function that given a dataset produces a function from long index
to boolean. Only indexes for which the index-filter returns true will be added to the
aggregation. For very large datasets, this is a bit faster than using filter before
the aggregation.
Example:
```clojure
user> (require '[tech.v3.dataset :as ds])
nil
user> (require '[tech.v3.dataset.reductions :as ds-reduce])
nil
user> (def stocks (ds/->dataset "test/data/stocks.csv" {:key-fn keyword}))
#'user/stocks
user> (ds-reduce/group-by-column-agg
:symbol
{:symbol (ds-reduce/first-value :symbol)
:price-avg (ds-reduce/mean :price)
:price-sum (ds-reduce/sum :price)}
[stocks stocks stocks])
:symbol-aggregation [5 3]:
| :symbol | :price-avg | :price-sum |
|---------|--------------|------------|
| MSFT | 24.73674797 | 9127.86 |
| IBM | 91.26121951 | 33675.39 |
| AAPL | 64.73048780 | 23885.55 |
| GOOG | 415.87044118 | 84837.57 |
| AMZN | 47.98707317 | 17707.23 |
tech.v3.dataset.reductions-test> (def tstds
(ds/->dataset {:a ["a" "a" "a" "b" "b" "b" "c" "d" "e"]
:b [22 21 22 44 42 44 77 88 99]}))
#'tech.v3.dataset.reductions-test/tstds
tech.v3.dataset.reductions-test> (ds-reduce/group-by-column-agg
[:a :b] {:a (ds-reduce/first-value :a)
:b (ds-reduce/first-value :b)
:c (ds-reduce/row-count)}
[tstds tstds tstds])
:tech.v3.dataset.reductions/_temp_col-aggregation [7 3]:
| :a | :b | :c |
|----|---:|---:|
| a | 21 | 3 |
| a | 22 | 6 |
| b | 42 | 3 |
| b | 44 | 6 |
| c | 77 | 3 |
| d | 88 | 3 |
| e | 99 | 3 |
```(mean colname)Create a double consumer which will produce a mean of the column.
Create a double consumer which will produce a mean of the column.
(row-count)Create a simple reducer that returns the number of times reduceIndex was called.
Create a simple reducer that returns the number of times reduceIndex was called.
(sum colname)Create a double consumer which will sum the values.
Create a double consumer which will sum the values.
cljdoc builds & hosts documentation for Clojure/Script libraries
| Ctrl+k | Jump to recent docs |
| ← | Move to previous article |
| → | Move to next article |
| Ctrl+/ | Jump to the search field |