Core sorted-merge algorithm for as-of join.
Part 1 — asof-search / asof-indices: asof-search — binary search for the last right index where value <= target. asof-indices — two-pointer merge over pre-sorted, pre-grouped vectors; public utility, not used internally by asof-match.
Part 2 — asof-match: full key-handling layer. Groups right rows by exact keys, sorts within each group, runs asof-search (binary search) per left row. Returns a lazy sequence of [left-row-idx right-row-idx-or-nil] pairs.
Part 3 — build-result: assembles a tech.v3.dataset from the index pairs. Left columns always present in original order; right non-key columns appended (nil-filled for unmatched rows). Conflicting non-key column names suffixed :right.<n>.
Core sorted-merge algorithm for as-of join.
Part 1 — asof-search / asof-indices:
asof-search — binary search for the last right index where value <= target.
asof-indices — two-pointer merge over pre-sorted, pre-grouped vectors;
public utility, not used internally by asof-match.
Part 2 — asof-match: full key-handling layer. Groups right rows by exact
keys, sorts within each group, runs asof-search (binary search) per left
row. Returns a lazy sequence of [left-row-idx right-row-idx-or-nil] pairs.
Part 3 — build-result: assembles a tech.v3.dataset from the index pairs.
Left columns always present in original order; right non-key columns
appended (nil-filled for unmatched rows). Conflicting non-key column
names suffixed :right.<n>.Rich Clay/Kindly notebook viewers for Datajure datasets and expressions.
Clay integration uses the Kindly convention — values are annotated with :kind/hiccup metadata so any Kindly-compatible tool (Clay, Portal, etc.) renders them as rich HTML.
Two usage modes:
Explicit wrapping (always works, no install step):
(dc/view ds) ;; rich dataset table (dc/view-expr #dt/e (/ :mass 1000)) ;; expression display (dc/view-describe (du/describe ds)) ;; enhanced describe
Auto-rendering via install! (registers Kindly advisor):
(dc/install!) ;; Now all datasets and #dt/e exprs auto-render in Clay notebooks
Usage in a Clay notebook:
(ns my-notebook (:require [datajure.clay :as dc] [datajure.core :as core] [scicloj.clay.v2.api :as clay])) (dc/install!)
(core/dt ds :by [:species] :agg {:n core/N})
Rich Clay/Kindly notebook viewers for Datajure datasets and expressions.
Clay integration uses the Kindly convention — values are annotated with
:kind/hiccup metadata so any Kindly-compatible tool (Clay, Portal, etc.)
renders them as rich HTML.
Two usage modes:
1. Explicit wrapping (always works, no install step):
(dc/view ds) ;; rich dataset table
(dc/view-expr #dt/e (/ :mass 1000)) ;; expression display
(dc/view-describe (du/describe ds)) ;; enhanced describe
2. Auto-rendering via install! (registers Kindly advisor):
(dc/install!)
;; Now all datasets and #dt/e exprs auto-render in Clay notebooks
Usage in a Clay notebook:
(ns my-notebook
(:require [datajure.clay :as dc]
[datajure.core :as core]
[scicloj.clay.v2.api :as clay]))
(dc/install!)
(core/dt ds :by [:species] :agg {:n core/N})Rich Clerk notebook viewers for Datajure datasets and expressions.
Usage — require and call install! at the top of your notebook:
(ns my-notebook (:require [datajure.clerk :as dc] [datajure.core :as core] [nextjournal.clerk :as clerk])) (dc/install!)
This registers custom viewers that automatically render:
Rich Clerk notebook viewers for Datajure datasets and expressions.
Usage — require and call install! at the top of your notebook:
(ns my-notebook
(:require [datajure.clerk :as dc]
[datajure.core :as core]
[nextjournal.clerk :as clerk]))
(dc/install!)
This registers custom viewers that automatically render:
- tech.v3.dataset datasets as rich HTML tables with column types
- #dt/e AST nodes as readable expressions
- du/describe output with conditional formattingOpt-in short aliases for power users. Refer the symbols you want: (require '[datajure.concise :refer [mn sm mx mi N dt fst lst wa ws]])
Aggregation helpers (operate on column vectors — use in :agg plain fns or directly): mn = mean sm = sum md = median sd = standard-deviation ct = count (element count) nuniq = count-distinct fst = first-val (first element) lst = last-val (last element) wa = wavg (weighted average) ws = wsum (weighted sum) mx = max* (column maximum) mi = min* (column minimum)
Statistical transforms (use inside #dt/e expressions as stat/* — these are direct refs to the runtime fns, useful outside #dt/e): standardize = stat/stat-standardize demean = stat/stat-demean winsorize = stat/stat-winsorize
Column selectors: between = positional range selector (core/between)
Everything else re-exported from datajure.core: N, dt, asc, desc, rename, pass-nil
Opt-in short aliases for power users. Refer the symbols you want: (require '[datajure.concise :refer [mn sm mx mi N dt fst lst wa ws]]) Aggregation helpers (operate on column vectors — use in :agg plain fns or directly): mn = mean sm = sum md = median sd = standard-deviation ct = count (element count) nuniq = count-distinct fst = first-val (first element) lst = last-val (last element) wa = wavg (weighted average) ws = wsum (weighted sum) mx = max* (column maximum) mi = min* (column minimum) Statistical transforms (use inside #dt/e expressions as stat/* — these are direct refs to the runtime fns, useful outside #dt/e): standardize = stat/stat-standardize demean = stat/stat-demean winsorize = stat/stat-winsorize Column selectors: between = positional range selector (core/between) Everything else re-exported from datajure.core: N, dt, asc, desc, rename, pass-nil
AST definition and compiler for #dt/e expressions.
#dt/e is a reader tag that produces an AST map. datajure.core interprets these ASTs when executing dt queries. This namespace handles:
Op names are stored as keywords in the AST (e.g. :and, :>, :+) rather than symbols, because the Clojure compiler tries to resolve symbols in literal data structures. Since and/or/not are macros, the compiler rejects 'Can't take value of a macro' when it encounters them as bare symbols in the map values returned by the reader tag. Keywords are self-evaluating and avoid this entirely.
Nil-safety rules (matching spec):
AST definition and compiler for #dt/e expressions.
#dt/e is a reader tag that produces an AST map. datajure.core interprets
these ASTs when executing dt queries. This namespace handles:
- AST node constructors
- compile-expr: AST -> fn of dataset -> column/scalar
- Reader tag handler registered via resources/data_readers.clj (primary)
and register-reader! / alter-var-root (AOT/script fallback)
Op names are stored as keywords in the AST (e.g. :and, :>, :+) rather than
symbols, because the Clojure compiler tries to resolve symbols in literal
data structures. Since and/or/not are macros, the compiler rejects
'Can't take value of a macro' when it encounters them as bare symbols in
the map values returned by the reader tag. Keywords are self-evaluating
and avoid this entirely.
Nil-safety rules (matching spec):
- Comparison ops with nil arg -> false column (all rows false)
- Arithmetic ops with nil arg -> nil (becomes missing when stored in dataset)
These rules only activate when a Clojure nil literal appears in an expression.
Dataset columns with missing values are handled natively by dfn.Unified file I/O for datajure. Dispatches on file extension. Natively supports: CSV, TSV, nippy (and .gz variants of all three). Parquet, xlsx/xls, and Arrow require optional extra dependencies.
Unified file I/O for datajure. Dispatches on file extension. Natively supports: CSV, TSV, nippy (and .gz variants of all three). Parquet, xlsx/xls, and Arrow require optional extra dependencies.
Join functions for datajure. Wraps tech.v3.dataset.join/pd-merge for regular joins and datajure.asof for as-of joins.
Join functions for datajure. Wraps tech.v3.dataset.join/pd-merge for regular joins and datajure.asof for as-of joins.
Reshape functions for datajure. Currently provides wide->long (melt).
Reshape functions for datajure. Currently provides wide->long (melt).
Row-wise (cross-column) function implementations for datajure. Each function takes multiple columns and returns a single column of the same length. These operate across columns within a single row, complementing window functions (which operate down a single column).
Nil conventions (matching spec):
Row-wise (cross-column) function implementations for datajure. Each function takes multiple columns and returns a single column of the same length. These operate across columns within a single row, complementing window functions (which operate down a single column). Nil conventions (matching spec): - row-sum: nil treated as 0 (like R rowSums(na.rm=TRUE)) - row-mean, row-min, row-max: skip nil - All return nil when every input is nil
Statistical transform functions for use inside #dt/e expressions.
These functions operate on column vectors (dtype readers) and return a column of the same length. They are the runtime implementations for stat/* symbols parsed by datajure.expr.
All functions are nil-safe: nil values in the input are skipped when computing reference statistics (mean, sd, percentiles), and nil inputs produce nil outputs in the returned column.
Statistical transform functions for use inside #dt/e expressions. These functions operate on column vectors (dtype readers) and return a column of the same length. They are the runtime implementations for stat/* symbols parsed by datajure.expr. All functions are nil-safe: nil values in the input are skipped when computing reference statistics (mean, sd, percentiles), and nil inputs produce nil outputs in the returned column.
Data cleaning and exploration utilities for datajure. Standalone functions that operate on datasets directly and thread naturally. Not part of dt — these complement it for common data preparation tasks.
Data cleaning and exploration utilities for datajure. Standalone functions that operate on datasets directly and thread naturally. Not part of dt — these complement it for common data preparation tasks.
Window function implementations for datajure. Each function takes a column (dtype reader/vector) and returns a column of the same length. These are called per-partition by the expr compiler when processing :win AST nodes in window mode (:by + :set).
Window function implementations for datajure. Each function takes a column (dtype reader/vector) and returns a column of the same length. These are called per-partition by the expr compiler when processing :win AST nodes in window mode (:by + :set).
cljdoc builds & hosts documentation for Clojure/Script libraries
| Ctrl+k | Jump to recent docs |
| ← | Move to previous article |
| → | Move to next article |
| Ctrl+/ | Jump to the search field |