Holds the last dataset result in an interactive REPL session. Automatically bound by datajure.nrepl/wrap-dt middleware. Like Clojure's *1, but only for tech.v3.dataset results.
Holds the last dataset result in an interactive REPL session. Automatically bound by datajure.nrepl/wrap-dt middleware. Like Clojure's *1, but only for tech.v3.dataset results.
(asc col)Sort-spec helper: ascending order on col. Use in :order-by.
Sort-spec helper: ascending order on col. Use in :order-by.
(between start-col end-col)Returns a column selector that selects all columns positionally between start-col and end-col (inclusive). Both endpoints must exist in the dataset. Intended for use with :select in dt.
Example: (dt ds :select (between :month-01 :month-12))
Returns a column selector that selects all columns positionally between start-col and end-col (inclusive). Both endpoints must exist in the dataset. Intended for use with :select in dt. Example: (dt ds :select (between :month-01 :month-12))
(count* col)Count of non-nil values in a column.
Asterisk-suffixed to avoid shadowing clojure.core/count.
Distinct from N (total rows) and count-distinct (unique non-nil values).
Count of non-nil values in a column. Asterisk-suffixed to avoid shadowing `clojure.core/count`. Distinct from N (total rows) and count-distinct (unique non-nil values).
(cut col-kw n)Equal-count (quantile) binning — assigns each value in a column to a bin in 1..n based on its percentile rank among non-nil values.
Breakpoints are the 100/n, 200/n, ..., (n-1)*100/n percentiles of the non-nil values. Bin assignment is right-open (binarySearch), so every value lands in exactly one bin in [1, n]. nil values produce nil.
Complements xbar (equal-width bins). Use inside #dt/e:
(dt ds :set {:quintile #dt/e (cut :mass 5)}) (dt ds :where #dt/e (= (cut :mass 4) 1)) ;; bottom quartile
Note: cut requires whole-column context and cannot be used as a standalone row-level function in :by. Use #dt/e (cut :col n) for all use cases.
Equal-count (quantile) binning — assigns each value in a column to a bin
in 1..n based on its percentile rank among non-nil values.
Breakpoints are the 100/n, 200/n, ..., (n-1)*100/n percentiles of the
non-nil values. Bin assignment is right-open (binarySearch), so every
value lands in exactly one bin in [1, n]. nil values produce nil.
Complements xbar (equal-width bins). Use inside #dt/e:
(dt ds :set {:quintile #dt/e (cut :mass 5)})
(dt ds :where #dt/e (= (cut :mass 4) 1)) ;; bottom quartile
Note: cut requires whole-column context and cannot be used as a standalone
row-level function in :by. Use #dt/e (cut :col n) for all use cases.(desc col)Sort-spec helper: descending order on col. Use in :order-by.
Sort-spec helper: descending order on col. Use in :order-by.
(dt dataset & {:keys [where set agg by select order-by within-order]})Query a dataset. Supported keywords: :where, :set, :agg, :by, :select, :order-by, :within-order.
:where - filter rows. Accepts #dt/e expression or plain fn of row map. :set - derive/update columns. Accepts map or vector-of-pairs. When :set contains win/* functions, window mode is activated — with :by, computes within groups; without :by, whole dataset is one partition. :agg - collapse to summary. Accepts map or vector-of-pairs. Use N for row count. :by - grouping for :agg or :set (partitioned window mode). Vector of keywords or fn of row. :within-order - sort within each partition (or whole dataset) before :set or :agg runs. Useful for window functions (win/lag, win/cumsum, ...) and for order-sensitive aggregations (first-val, last-val, OHLC patterns). With :set and :by: sorts within each group before window computation. With :set and no :by: sorts whole dataset before window computation. With :agg and :by: sorts within each group before aggregation. With :agg and no :by: sorts whole dataset before aggregation. :select - keep columns. Accepts: vector of kws, single kw, [:not kw ...], regex, predicate fn, or map {old-kw new-kw} for rename-on-select. :order-by - sort rows. Accepts a vector of (asc :col)/(desc :col) specs, or bare keywords (default asc). Evaluated after all other steps.
Query a dataset. Supported keywords: :where, :set, :agg, :by, :select, :order-by, :within-order.
:where - filter rows. Accepts #dt/e expression or plain fn of row map.
:set - derive/update columns. Accepts map or vector-of-pairs.
When :set contains win/* functions, window mode is activated —
with :by, computes within groups; without :by, whole dataset is one partition.
:agg - collapse to summary. Accepts map or vector-of-pairs. Use N for row count.
:by - grouping for :agg or :set (partitioned window mode). Vector of keywords or fn of row.
:within-order - sort within each partition (or whole dataset) before :set or :agg runs.
Useful for window functions (win/lag, win/cumsum, ...) and for
order-sensitive aggregations (first-val, last-val, OHLC patterns).
With :set and :by: sorts within each group before window computation.
With :set and no :by: sorts whole dataset before window computation.
With :agg and :by: sorts within each group before aggregation.
With :agg and no :by: sorts whole dataset before aggregation.
:select - keep columns. Accepts: vector of kws, single kw, [:not kw ...],
regex, predicate fn, or map {old-kw new-kw} for rename-on-select.
:order-by - sort rows. Accepts a vector of (asc :col)/(desc :col) specs,
or bare keywords (default asc). Evaluated after all other steps.Column maximum. Full-name alias for dfn/reduce-max.
Asterisk-suffixed to avoid shadowing clojure.core/max.
Column maximum. Full-name alias for `dfn/reduce-max`. Asterisk-suffixed to avoid shadowing `clojure.core/max`.
Column mean. Full-name alias for dfn/mean.
Column mean. Full-name alias for `dfn/mean`.
Column median. Full-name alias for dfn/median.
Column median. Full-name alias for `dfn/median`.
Column minimum. Full-name alias for dfn/reduce-min.
Asterisk-suffixed to avoid shadowing clojure.core/min.
Column minimum. Full-name alias for `dfn/reduce-min`. Asterisk-suffixed to avoid shadowing `clojure.core/min`.
Row count aggregation helper. Use as a value in :agg maps.
Terse alias matching data.table/q convention. See also nrow for
a more discoverable full name.
Row count aggregation helper. Use as a value in :agg maps. Terse alias matching data.table/q convention. See also `nrow` for a more discoverable full name.
Row count aggregation helper. Use as a value in :agg maps.
Full-name alias for users who prefer readability over terseness.
Equivalent to N.
Row count aggregation helper. Use as a value in :agg maps. Full-name alias for users who prefer readability over terseness. Equivalent to `N`.
(pass-nil f & guard-cols)Wraps a row-level fn to return nil if any of the specified guard columns are nil/missing in the row. Prevents crashes when plain fns encounter missing values in :set or :where.
Usage: (pass-nil #(Integer/parseInt (:x-str %)) :x-str)
Wraps a row-level fn to return nil if any of the specified guard columns are nil/missing in the row. Prevents crashes when plain fns encounter missing values in :set or :where. Usage: (pass-nil #(Integer/parseInt (:x-str %)) :x-str)
(qtile col-kw n)Quantile bucketing — produces a :by grouping that bins each row's value
in col-kw into one of n equal-count bins based on its percentile rank among
non-nil values. Inspired by R's cut and Stata's xtile.
Breakpoints are computed once from the entire dataset passed to dt, at the
100/n, 200/n, ..., (n-1)*100/n percentiles. Each row is then assigned to a
bin in [1, n] via right-open comparison. nil input values produce nil keys
(their own group).
Companion to xbar (equal-width bins). Use #dt/e (cut :col n) for the
same semantics in :set / :where / :agg contexts (cut also supports a :from
option for reference-subpopulation breakpoints; qtile currently does not).
Result column name defaults to <col>-q<n> (e.g. :mass-q5 for quintile bins
of :mass). Override by attaching {:datajure/col :your-name} metadata to
the qtile result via a second call, or compose with the standard :by of
keywords.
Usage: ;; Quintile buckets of market cap (dt stocks :by [(qtile :mktcap 5)] :agg {:n N :mean-ret #dt/e (mn :ret)})
;; Per-date quintile buckets combined with an exact key (dt stocks :by [:date (qtile :mktcap 5)] :agg {:mean-ret #dt/e (mn :ret)})
;; Equivalent inside #dt/e (column derivation, not grouping): (dt stocks :set {:q #dt/e (cut :mktcap 5)})
Quantile bucketing — produces a :by grouping that bins each row's value
in col-kw into one of n equal-count bins based on its percentile rank among
non-nil values. Inspired by R's `cut` and Stata's `xtile`.
Breakpoints are computed once from the entire dataset passed to `dt`, at the
100/n, 200/n, ..., (n-1)*100/n percentiles. Each row is then assigned to a
bin in [1, n] via right-open comparison. nil input values produce nil keys
(their own group).
Companion to `xbar` (equal-width bins). Use `#dt/e (cut :col n)` for the
same semantics in :set / :where / :agg contexts (cut also supports a :from
option for reference-subpopulation breakpoints; qtile currently does not).
Result column name defaults to `<col>-q<n>` (e.g. :mass-q5 for quintile bins
of :mass). Override by attaching `{:datajure/col :your-name}` metadata to
the qtile result via a second call, or compose with the standard `:by` of
keywords.
Usage:
;; Quintile buckets of market cap
(dt stocks :by [(qtile :mktcap 5)]
:agg {:n N :mean-ret #dt/e (mn :ret)})
;; Per-date quintile buckets combined with an exact key
(dt stocks :by [:date (qtile :mktcap 5)]
:agg {:mean-ret #dt/e (mn :ret)})
;; Equivalent inside #dt/e (column derivation, not grouping):
(dt stocks :set {:q #dt/e (cut :mktcap 5)})(rename dataset col-map)Rename columns in a dataset without dropping any. col-map is {old-kw new-kw}.
Rename columns in a dataset without dropping any.
col-map is {old-kw new-kw}.(reset-notes!)Reset shown info notes. Useful for testing.
Reset shown info notes. Useful for testing.
Column standard deviation. Full-name alias for dfn/standard-deviation.
Column standard deviation. Full-name alias for `dfn/standard-deviation`.
Column sum. Full-name alias for dfn/sum.
Column sum. Full-name alias for `dfn/sum`.
Column variance. Full-name alias for dfn/variance.
Column variance. Full-name alias for `dfn/variance`.
(xbar col-kw width)(xbar col-kw width unit)Floor-division bucketing — floors a column value to the nearest multiple of width. Inspired by q's xbar operator.
For numeric columns: (xbar :price 10) → floor(:price / 10) * 10 For temporal columns: (xbar :time 5 :minutes) → floor to nearest 5-minute boundary
Supported temporal units: :seconds, :minutes, :hours, :days, :weeks
Primary use case: computed :by grouping for time-series bar generation.
Usage: ;; Numeric bucketing in :by (dt ds :by [(xbar :price 10)] :agg {:n N :avg #dt/e (mn :volume)})
;; 5-minute OHLCV bars (-> trades (dt :order-by [(asc :time)]) (dt :by [(xbar :time 5 :minutes) :sym] :agg {:open #dt/e (first-val :price) :close #dt/e (last-val :price) :vol #dt/e (sm :size) :n N}))
;; Also usable inside #dt/e as a column derivation: (dt ds :set {:bucket #dt/e (xbar :price 5)})
Floor-division bucketing — floors a column value to the nearest multiple of width.
Inspired by q's xbar operator.
For numeric columns: (xbar :price 10) → floor(:price / 10) * 10
For temporal columns: (xbar :time 5 :minutes) → floor to nearest 5-minute boundary
Supported temporal units: :seconds, :minutes, :hours, :days, :weeks
Primary use case: computed :by grouping for time-series bar generation.
Usage:
;; Numeric bucketing in :by
(dt ds :by [(xbar :price 10)] :agg {:n N :avg #dt/e (mn :volume)})
;; 5-minute OHLCV bars
(-> trades
(dt :order-by [(asc :time)])
(dt :by [(xbar :time 5 :minutes) :sym]
:agg {:open #dt/e (first-val :price)
:close #dt/e (last-val :price)
:vol #dt/e (sm :size)
:n N}))
;; Also usable inside #dt/e as a column derivation:
(dt ds :set {:bucket #dt/e (xbar :price 5)})cljdoc builds & hosts documentation for Clojure/Script libraries
| Ctrl+k | Jump to recent docs |
| ← | Move to previous article |
| → | Move to next article |
| Ctrl+/ | Jump to the search field |