datajure.core

Liking cljdoc? Tell your friends :D

Clojure only.

*dt*
asc
between
count*
cut
desc
dt
max*
mean
median
min*
N
nrow
pass-nil
qtile
rename
reset-notes!
stddev
sum
variance
xbar

dt^clj

Holds the last dataset result in an interactive REPL session. Automatically bound by datajure.nrepl/wrap-dt middleware. Like Clojure's *1, but only for tech.v3.dataset results.

Holds the last dataset result in an interactive REPL session.
Automatically bound by datajure.nrepl/wrap-dt middleware.
Like Clojure's *1, but only for tech.v3.dataset results.

source raw docstring

asc^clj

(asc col)

Sort-spec helper: ascending order on col. Use in :order-by.

Sort-spec helper: ascending order on col. Use in :order-by.

source raw docstring

between^clj

(between start-col end-col)

Returns a column selector that selects all columns positionally between start-col and end-col (inclusive). Both endpoints must exist in the dataset. Intended for use with :select in dt.

Example: (dt ds :select (between :month-01 :month-12))

Returns a column selector that selects all columns positionally between
start-col and end-col (inclusive). Both endpoints must exist in the dataset.
Intended for use with :select in dt.

Example:
  (dt ds :select (between :month-01 :month-12))

source raw docstring

count*^clj

(count* col)

Count of non-nil values in a column. Asterisk-suffixed to avoid shadowing clojure.core/count. Distinct from N (total rows) and count-distinct (unique non-nil values).

Count of non-nil values in a column.
Asterisk-suffixed to avoid shadowing `clojure.core/count`.
Distinct from N (total rows) and count-distinct (unique non-nil values).

source raw docstring

cut^clj

(cut col-kw n)

Equal-count (quantile) binning — assigns each value in a column to a bin in 1..n based on its percentile rank among non-nil values.

Breakpoints are the 100/n, 200/n, ..., (n-1)*100/n percentiles of the non-nil values. Bin assignment is right-open (binarySearch), so every value lands in exactly one bin in [1, n]. nil values produce nil.

Complements xbar (equal-width bins). Use inside #dt/e:

(dt ds :set {:quintile #dt/e (cut :mass 5)}) (dt ds :where #dt/e (= (cut :mass 4) 1)) ;; bottom quartile

Note: cut requires whole-column context and cannot be used as a standalone row-level function in :by. Use #dt/e (cut :col n) for all use cases.

Equal-count (quantile) binning — assigns each value in a column to a bin
in 1..n based on its percentile rank among non-nil values.

Breakpoints are the 100/n, 200/n, ..., (n-1)*100/n percentiles of the
non-nil values. Bin assignment is right-open (binarySearch), so every
value lands in exactly one bin in [1, n]. nil values produce nil.

Complements xbar (equal-width bins). Use inside #dt/e:

  (dt ds :set {:quintile #dt/e (cut :mass 5)})
  (dt ds :where #dt/e (= (cut :mass 4) 1))   ;; bottom quartile

Note: cut requires whole-column context and cannot be used as a standalone
row-level function in :by. Use #dt/e (cut :col n) for all use cases.

source raw docstring

desc^clj

(desc col)

Sort-spec helper: descending order on col. Use in :order-by.

Sort-spec helper: descending order on col. Use in :order-by.

source raw docstring

dt^clj

(dt dataset & {:keys [where set agg by select order-by within-order]})

Query a dataset. Supported keywords: :where, :set, :agg, :by, :select, :order-by, :within-order.

:where - filter rows. Accepts #dt/e expression or plain fn of row map. :set - derive/update columns. Accepts map or vector-of-pairs. When :set contains win/* functions, window mode is activated — with :by, computes within groups; without :by, whole dataset is one partition. :agg - collapse to summary. Accepts map or vector-of-pairs. Use N for row count. :by - grouping for :agg or :set (partitioned window mode). Vector of keywords or fn of row. :within-order - sort within each partition (or whole dataset) before :set or :agg runs. Useful for window functions (win/lag, win/cumsum, ...) and for order-sensitive aggregations (first-val, last-val, OHLC patterns). With :set and :by: sorts within each group before window computation. With :set and no :by: sorts whole dataset before window computation. With :agg and :by: sorts within each group before aggregation. With :agg and no :by: sorts whole dataset before aggregation. :select - keep columns. Accepts: vector of kws, single kw, [:not kw ...], regex, predicate fn, or map {old-kw new-kw} for rename-on-select. :order-by - sort rows. Accepts a vector of (asc :col)/(desc :col) specs, or bare keywords (default asc). Evaluated after all other steps.

Query a dataset. Supported keywords: :where, :set, :agg, :by, :select, :order-by, :within-order.

:where         - filter rows. Accepts #dt/e expression or plain fn of row map.
:set           - derive/update columns. Accepts map or vector-of-pairs.
                 When :set contains win/* functions, window mode is activated —
                 with :by, computes within groups; without :by, whole dataset is one partition.
:agg           - collapse to summary. Accepts map or vector-of-pairs. Use N for row count.
:by            - grouping for :agg or :set (partitioned window mode). Vector of keywords or fn of row.
:within-order  - sort within each partition (or whole dataset) before :set or :agg runs.
                 Useful for window functions (win/lag, win/cumsum, ...) and for
                 order-sensitive aggregations (first-val, last-val, OHLC patterns).
                 With :set and :by: sorts within each group before window computation.
                 With :set and no :by: sorts whole dataset before window computation.
                 With :agg and :by: sorts within each group before aggregation.
                 With :agg and no :by: sorts whole dataset before aggregation.
:select        - keep columns. Accepts: vector of kws, single kw, [:not kw ...],
                 regex, predicate fn, or map {old-kw new-kw} for rename-on-select.
:order-by      - sort rows. Accepts a vector of (asc :col)/(desc :col) specs,
                 or bare keywords (default asc). Evaluated after all other steps.

source raw docstring

max*^clj

Column maximum. Full-name alias for dfn/reduce-max. Asterisk-suffixed to avoid shadowing clojure.core/max.

Column maximum. Full-name alias for `dfn/reduce-max`.
Asterisk-suffixed to avoid shadowing `clojure.core/max`.

source raw docstring

mean^clj

Column mean. Full-name alias for dfn/mean.

Column mean. Full-name alias for `dfn/mean`.

source raw docstring

median^clj

Column median. Full-name alias for dfn/median.

Column median. Full-name alias for `dfn/median`.

source raw docstring

min*^clj

Column minimum. Full-name alias for dfn/reduce-min. Asterisk-suffixed to avoid shadowing clojure.core/min.

Column minimum. Full-name alias for `dfn/reduce-min`.
Asterisk-suffixed to avoid shadowing `clojure.core/min`.

source raw docstring

N^clj

Row count aggregation helper. Use as a value in :agg maps. Terse alias matching data.table/q convention. See also nrow for a more discoverable full name.

Row count aggregation helper. Use as a value in :agg maps.
Terse alias matching data.table/q convention. See also `nrow` for
a more discoverable full name.

source raw docstring

nrow^clj

Row count aggregation helper. Use as a value in :agg maps. Full-name alias for users who prefer readability over terseness. Equivalent to N.

Row count aggregation helper. Use as a value in :agg maps.
Full-name alias for users who prefer readability over terseness.
Equivalent to `N`.

source raw docstring

pass-nil^clj

(pass-nil f & guard-cols)

Wraps a row-level fn to return nil if any of the specified guard columns are nil/missing in the row. Prevents crashes when plain fns encounter missing values in :set or :where.

Usage: (pass-nil #(Integer/parseInt (:x-str %)) :x-str)

Wraps a row-level fn to return nil if any of the specified guard columns
are nil/missing in the row. Prevents crashes when plain fns encounter
missing values in :set or :where.

Usage: (pass-nil #(Integer/parseInt (:x-str %)) :x-str)

source raw docstring

qtile^clj

(qtile col-kw n)

Quantile bucketing — produces a :by grouping that bins each row's value in col-kw into one of n equal-count bins based on its percentile rank among non-nil values. Inspired by R's cut and Stata's xtile.

Breakpoints are computed once from the entire dataset passed to dt, at the 100/n, 200/n, ..., (n-1)*100/n percentiles. Each row is then assigned to a bin in [1, n] via right-open comparison. nil input values produce nil keys (their own group).

Companion to xbar (equal-width bins). Use #dt/e (cut :col n) for the same semantics in :set / :where / :agg contexts (cut also supports a :from option for reference-subpopulation breakpoints; qtile currently does not).

Result column name defaults to <col>-q<n> (e.g. :mass-q5 for quintile bins of :mass). Override by attaching {:datajure/col :your-name} metadata to the qtile result via a second call, or compose with the standard :by of keywords.

Usage: ;; Quintile buckets of market cap (dt stocks :by [(qtile :mktcap 5)] :agg {:n N :mean-ret #dt/e (mn :ret)})

;; Per-date quintile buckets combined with an exact key (dt stocks :by [:date (qtile :mktcap 5)] :agg {:mean-ret #dt/e (mn :ret)})

;; Equivalent inside #dt/e (column derivation, not grouping): (dt stocks :set {:q #dt/e (cut :mktcap 5)})

Quantile bucketing — produces a :by grouping that bins each row's value
in col-kw into one of n equal-count bins based on its percentile rank among
non-nil values. Inspired by R's `cut` and Stata's `xtile`.

Breakpoints are computed once from the entire dataset passed to `dt`, at the
100/n, 200/n, ..., (n-1)*100/n percentiles. Each row is then assigned to a
bin in [1, n] via right-open comparison. nil input values produce nil keys
(their own group).

Companion to `xbar` (equal-width bins). Use `#dt/e (cut :col n)` for the
same semantics in :set / :where / :agg contexts (cut also supports a :from
option for reference-subpopulation breakpoints; qtile currently does not).

Result column name defaults to `<col>-q<n>` (e.g. :mass-q5 for quintile bins
of :mass). Override by attaching `{:datajure/col :your-name}` metadata to
the qtile result via a second call, or compose with the standard `:by` of
keywords.

Usage:
  ;; Quintile buckets of market cap
  (dt stocks :by [(qtile :mktcap 5)]
      :agg {:n N :mean-ret #dt/e (mn :ret)})

  ;; Per-date quintile buckets combined with an exact key
  (dt stocks :by [:date (qtile :mktcap 5)]
      :agg {:mean-ret #dt/e (mn :ret)})

  ;; Equivalent inside #dt/e (column derivation, not grouping):
  (dt stocks :set {:q #dt/e (cut :mktcap 5)})

source raw docstring

rename^clj

(rename dataset col-map)

Rename columns in a dataset without dropping any. col-map is {old-kw new-kw}.

Rename columns in a dataset without dropping any.
col-map is {old-kw new-kw}.

source raw docstring

reset-notes!^clj

(reset-notes!)

Reset shown info notes. Useful for testing.

Reset shown info notes. Useful for testing.

source raw docstring

stddev^clj

Column standard deviation. Full-name alias for dfn/standard-deviation.

Column standard deviation. Full-name alias for `dfn/standard-deviation`.

source raw docstring

sum^clj

Column sum. Full-name alias for dfn/sum.

Column sum. Full-name alias for `dfn/sum`.

source raw docstring

variance^clj

Column variance. Full-name alias for dfn/variance.

Column variance. Full-name alias for `dfn/variance`.

source raw docstring

xbar^clj

(xbar col-kw width)

(xbar col-kw width unit)

Floor-division bucketing — floors a column value to the nearest multiple of width. Inspired by q's xbar operator.

For numeric columns: (xbar :price 10) → floor(:price / 10) * 10 For temporal columns: (xbar :time 5 :minutes) → floor to nearest 5-minute boundary

Supported temporal units: :seconds, :minutes, :hours, :days, :weeks

Primary use case: computed :by grouping for time-series bar generation.

Usage: ;; Numeric bucketing in :by (dt ds :by [(xbar :price 10)] :agg {:n N :avg #dt/e (mn :volume)})

;; 5-minute OHLCV bars (-> trades (dt :order-by [(asc :time)]) (dt :by [(xbar :time 5 :minutes) :sym] :agg {:open #dt/e (first-val :price) :close #dt/e (last-val :price) :vol #dt/e (sm :size) :n N}))

;; Also usable inside #dt/e as a column derivation: (dt ds :set {:bucket #dt/e (xbar :price 5)})

Floor-division bucketing — floors a column value to the nearest multiple of width.
Inspired by q's xbar operator.

For numeric columns: (xbar :price 10) → floor(:price / 10) * 10
For temporal columns: (xbar :time 5 :minutes) → floor to nearest 5-minute boundary

Supported temporal units: :seconds, :minutes, :hours, :days, :weeks

Primary use case: computed :by grouping for time-series bar generation.

Usage:
  ;; Numeric bucketing in :by
  (dt ds :by [(xbar :price 10)] :agg {:n N :avg #dt/e (mn :volume)})

  ;; 5-minute OHLCV bars
  (-> trades
      (dt :order-by [(asc :time)])
      (dt :by [(xbar :time 5 :minutes) :sym]
          :agg {:open  #dt/e (first-val :price)
                :close #dt/e (last-val :price)
                :vol   #dt/e (sm :size)
                :n     N}))

  ;; Also usable inside #dt/e as a column derivation:
  (dt ds :set {:bucket #dt/e (xbar :price 5)})

source raw docstring

cljdoc builds & hosts documentation for Clojure/Script libraries

Keyboard shortcuts

`Ctrl`+`k`	Jump to recent docs
`←`	Move to previous article
`→`	Move to next article
`Ctrl`+`/`	Jump to the search field

Raise an issue Browse cljdoc source Chat on Slack

× close

datajure.core

*dt*clj

ascclj

betweenclj

count*clj

cutclj

descclj

dtclj

max*clj

meanclj

medianclj

min*clj

Nclj

nrowclj

pass-nilclj

qtileclj

renameclj

reset-notes!clj

stddevclj

sumclj

varianceclj

xbarclj

dt^clj

asc^clj

between^clj

count*^clj

cut^clj

desc^clj

dt^clj

max*^clj

mean^clj

median^clj

min*^clj

N^clj

nrow^clj

pass-nil^clj

qtile^clj

rename^clj

reset-notes!^clj

stddev^clj

sum^clj

variance^clj

xbar^clj