criterium.stats.kde

Liking cljdoc? Tell your friends :D

Clojure only.

Kernel Density Estimation utilities.

Provides ISJ (Improved Sheather-Jones) bandwidth selection, Gaussian kernel density estimation, bootstrap confidence bands, and mode finding.

All functions require typed arrays (DoubleArray, LongArray).

Kernel Density Estimation utilities.

Provides ISJ (Improved Sheather-Jones) bandwidth selection, Gaussian kernel
density estimation, bootstrap confidence bands, and mode finding.

All functions require typed arrays (DoubleArray, LongArray).

raw docstring

acr-test^clj

(acr-test data
          k
          {:keys [n-bootstrap n-points tol cached-critical-bandwidth
                  rng-factory]
           :or {n-bootstrap 200
                n-points 512
                tol 1.0E-6
                rng-factory (fn* [] (random/make-well-rng-1024a))}})

ACR test for H0: at most k modes.

Combines critical bandwidth and excess mass approaches from Ameijeiras-Alonso, Crujeiras, and Rodríguez-Casal (2019).

Unlike Silverman's test which uses mode count as test statistic, ACR uses excess mass which provides better calibration.

Parameters:

data: typed array of sample values
k: number of modes under H0
opts: optional map with:
- :n-bootstrap (default 200)
- :n-points (default 512)
- :tol (for critical bandwidth search, default 1e-6)
- :cached-critical-bandwidth (optional pre-computed bandwidth, skips search if provided)
- :rng-factory: 0-arity fn returning WellRng1024a (default: make-well-rng-1024a)

Returns map with:

:k - number of modes tested
:excess-mass - observed excess mass statistic
:critical-bandwidth - bandwidth giving exactly k modes
:p-value - proportion of bootstrap excess masses >= observed

Reference: Ameijeiras-Alonso et al. (2019) 'Mode testing, critical bandwidth and excess mass' TEST 28, 900-919

Requires a typed array (DoubleArray or LongArray).

ACR test for H0: at most k modes.

Combines critical bandwidth and excess mass approaches from
Ameijeiras-Alonso, Crujeiras, and Rodríguez-Casal (2019).

Unlike Silverman's test which uses mode count as test statistic,
ACR uses excess mass which provides better calibration.

Parameters:
- data: typed array of sample values
- k: number of modes under H0
- opts: optional map with:
  - :n-bootstrap (default 200)
  - :n-points (default 512)
  - :tol (for critical bandwidth search, default 1e-6)
  - :cached-critical-bandwidth (optional pre-computed bandwidth, skips search if provided)
  - :rng-factory: 0-arity fn returning WellRng1024a (default: make-well-rng-1024a)

Returns map with:
- :k - number of modes tested
- :excess-mass - observed excess mass statistic
- :critical-bandwidth - bandwidth giving exactly k modes
- :p-value - proportion of bootstrap excess masses >= observed

Reference: Ameijeiras-Alonso et al. (2019) 'Mode testing, critical
bandwidth and excess mass' TEST 28, 900-919

Requires a typed array (DoubleArray or LongArray).

source raw docstring

count-modes^clj

(count-modes data bandwidth n-points)

Count number of modes in KDE with given bandwidth. Creates a grid and counts local maxima in the density estimate. Requires a typed array (DoubleArray or LongArray).

Count number of modes in KDE with given bandwidth.
Creates a grid and counts local maxima in the density estimate.
Requires a typed array (DoubleArray or LongArray).

source raw docstring

critical-bandwidth^clj

(critical-bandwidth data
                    k
                    {:keys [tol n-points h-max data-bounds]
                     :or {tol 1.0E-6 n-points 512}})

Find smallest bandwidth giving at most k modes via binary search.

Returns the critical bandwidth h_k, which is the smallest bandwidth such that the KDE has at most k modes.

Parameters:

data: typed array of sample values
k: maximum number of modes
opts: optional map with :tol (tolerance, default 1e-6), :n-points (grid size, default 512), :h-max (optional upper bound, useful when computing h_k knowing h_{k-1}), :data-bounds (optional [min max], avoids redundant data scan)

Requires a typed array (DoubleArray or LongArray).

Find smallest bandwidth giving at most k modes via binary search.

Returns the critical bandwidth h_k, which is the smallest bandwidth
such that the KDE has at most k modes.

Parameters:
- data: typed array of sample values
- k: maximum number of modes
- opts: optional map with :tol (tolerance, default 1e-6),
        :n-points (grid size, default 512),
        :h-max (optional upper bound, useful when computing h_k knowing h_{k-1}),
        :data-bounds (optional [min max], avoids redundant data scan)

Requires a typed array (DoubleArray or LongArray).

source raw docstring

critical-bandwidths^clj

(critical-bandwidths data max-k opts)

Compute critical bandwidths for k=1 to max-k efficiently.

Uses h_{k-1} as upper bound for h_k since h_k < h_{k-1} (more modes require less smoothing). This avoids redundant binary search iterations when testing multiple k values.

Parameters:

data: typed array of sample values
max-k: maximum number of modes to compute bandwidth for
opts: optional map with :tol (tolerance), :n-points (grid size)

Returns a map from k to critical bandwidth h_k.

Compute critical bandwidths for k=1 to max-k efficiently.

Uses h_{k-1} as upper bound for h_k since h_k < h_{k-1} (more modes
require less smoothing). This avoids redundant binary search iterations
when testing multiple k values.

Parameters:
- data: typed array of sample values
- max-k: maximum number of modes to compute bandwidth for
- opts: optional map with :tol (tolerance), :n-points (grid size)

Returns a map from k to critical bandwidth h_k.

source raw docstring

dct-ii^clj

(dct-ii data)

Discrete Cosine Transform Type II. O(n log n) implementation using FFT.

DCT-II formula: X_k = sum_{n=0}^{N-1} x_n * cos(π/N * (n + 0.5) * k)

Returns array of n DCT-II coefficients.

Discrete Cosine Transform Type II.
O(n log n) implementation using FFT.

DCT-II formula: X_k = sum_{n=0}^{N-1} x_n * cos(π/N * (n + 0.5) * k)

Returns array of n DCT-II coefficients.

source raw docstring

dct-ii-direct^clj

(dct-ii-direct data)

Discrete Cosine Transform Type II. Direct O(n²) implementation. Retained for testing/reference.

DCT-II formula: X_k = sum_{n=0}^{N-1} x_n * cos(π/N * (n + 0.5) * k)

Returns vector of N DCT coefficients.

Discrete Cosine Transform Type II.
Direct O(n²) implementation. Retained for testing/reference.

DCT-II formula: X_k = sum_{n=0}^{N-1} x_n * cos(π/N * (n + 0.5) * k)

Returns vector of N DCT coefficients.

source raw docstring

dct-via-fft^clj

(dct-via-fft data)

Discrete Cosine Transform Type II via FFT. O(n log n) implementation using 4N-point FFT.

Algorithm: Place input at odd positions in 4N array, compute FFT, extract real parts.

Returns array of n DCT-II coefficients.

Discrete Cosine Transform Type II via FFT.
O(n log n) implementation using 4N-point FFT.

Algorithm: Place input at odd positions in 4N array,
compute FFT, extract real parts.

Returns array of n DCT-II coefficients.

source raw docstring

excess-mass^clj

(excess-mass data k)

(excess-mass data
             k
             {:keys [rng-factory]
              :or {rng-factory (fn* [] (random/make-well-rng-1024a))}})

Compute excess mass statistic for testing k modes.

The excess mass test statistic is max_λ{E_{n,k+1}(P_n,λ) - E_{n,k}(P_n,λ)} where E_{n,k}(P_n,λ) = sup{∑P_n(C_m) - λ|C_m|} over k disjoint intervals with endpoints at data points.

Parameters:

data: sample values (will be sorted internally)
k: number of modes to test (tests H0: at most k modes)
opts: optional map with:
- :rng-factory: 0-arity fn returning WellRng1024a for jitter (default: make-well-rng-1024a)

Returns map with:

:statistic - the excess mass test statistic
:k - number of modes tested
:n - sample size

Reference: Müller, D.W. and Sawitzki, G. (1991) 'Excess Mass Estimates and Tests for Multimodality' JASA 86, 738-746

Compute excess mass statistic for testing k modes.

The excess mass test statistic is max_λ{E_{n,k+1}(P_n,λ) - E_{n,k}(P_n,λ)}
where E_{n,k}(P_n,λ) = sup{∑P_n(C_m) - λ|C_m|} over k disjoint intervals
with endpoints at data points.

Parameters:
- data: sample values (will be sorted internally)
- k: number of modes to test (tests H0: at most k modes)
- opts: optional map with:
  - :rng-factory: 0-arity fn returning WellRng1024a for jitter (default: make-well-rng-1024a)

Returns map with:
- :statistic - the excess mass test statistic
- :k - number of modes tested
- :n - sample size

Reference: Müller, D.W. and Sawitzki, G. (1991) 'Excess Mass Estimates
and Tests for Multimodality' JASA 86, 738-746

source raw docstring

find-modes^clj

(find-modes grid density)

Find modes (local maxima) in a density estimate.

Returns vector of maps with :location and :density for each mode, sorted by density (highest first).

Find modes (local maxima) in a density estimate.

Returns vector of maps with :location and :density for each mode,
sorted by density (highest first).

source raw docstring

gaussian-kde^clj

(gaussian-kde data bandwidth grid)

Compute Gaussian kernel density estimate at grid points.

Parameters:

data: typed array of sample values (DoubleArray or LongArray)
bandwidth: kernel bandwidth (h)
grid: vector of evaluation points

Returns vector of density values at each grid point. Requires a typed array (DoubleArray or LongArray).

Compute Gaussian kernel density estimate at grid points.

Parameters:
- data: typed array of sample values (DoubleArray or LongArray)
- bandwidth: kernel bandwidth (h)
- grid: vector of evaluation points

Returns vector of density values at each grid point.
Requires a typed array (DoubleArray or LongArray).

source raw docstring

gaussian-kde-direct^clj

(gaussian-kde-direct data bandwidth grid)

Compute Gaussian kernel density estimate at grid points. Direct O(n×m) implementation. Retained for testing/reference.

Parameters:

data: typed array of sample values (DoubleArray or LongArray)
bandwidth: kernel bandwidth (h)
grid: vector of evaluation points

Returns vector of density values at each grid point. Requires a typed array (DoubleArray or LongArray).

Compute Gaussian kernel density estimate at grid points.
Direct O(n×m) implementation. Retained for testing/reference.

Parameters:
- data: typed array of sample values (DoubleArray or LongArray)
- bandwidth: kernel bandwidth (h)
- grid: vector of evaluation points

Returns vector of density values at each grid point.
Requires a typed array (DoubleArray or LongArray).

source raw docstring

gaussian-kde-fft^clj

(gaussian-kde-fft data bandwidth grid)

Compute Gaussian kernel density estimate using FFT-based convolution. O(m log m) implementation where m is grid size.

Algorithm:

Extend grid by 4h on each side to avoid circular convolution edge effects
Bin data onto extended grid using linear interpolation
FFT the binned histogram
Multiply by Gaussian kernel in frequency domain
IFFT to get density estimate
Extract the central portion corresponding to original grid

Parameters:

data: typed array of sample values (DoubleArray or LongArray)
bandwidth: kernel bandwidth (h)
grid: array of evaluation points

Returns array of density values at each grid point. Requires a typed array (DoubleArray or LongArray).

Compute Gaussian kernel density estimate using FFT-based convolution.
O(m log m) implementation where m is grid size.

Algorithm:
1. Extend grid by 4h on each side to avoid circular convolution edge effects
2. Bin data onto extended grid using linear interpolation
3. FFT the binned histogram
4. Multiply by Gaussian kernel in frequency domain
5. IFFT to get density estimate
6. Extract the central portion corresponding to original grid

Parameters:
- data: typed array of sample values (DoubleArray or LongArray)
- bandwidth: kernel bandwidth (h)
- grid: array of evaluation points

Returns array of density values at each grid point.
Requires a typed array (DoubleArray or LongArray).

source raw docstring

isj-bandwidth^clj

(isj-bandwidth data)

Improved Sheather-Jones bandwidth selector.

Uses DCT-based algorithm from Botev et al. for optimal bandwidth selection that works well for multimodal distributions.

Falls back to Silverman's rule if ISJ doesn't converge or gives an unreasonable result (bandwidth > half the data range).

Requires a typed array (DoubleArray or LongArray).

Improved Sheather-Jones bandwidth selector.

Uses DCT-based algorithm from Botev et al. for optimal bandwidth
selection that works well for multimodal distributions.

Falls back to Silverman's rule if ISJ doesn't converge or gives
an unreasonable result (bandwidth > half the data range).

Requires a typed array (DoubleArray or LongArray).

source raw docstring

kde^clj

(kde data)

(kde data
     {:keys [n-points bandwidth n-bootstrap alpha rng-factory]
      :or {n-points 512
           n-bootstrap 200
           alpha 0.05
           rng-factory (fn* [] (random/make-well-rng-1024a))}})

Compute KDE analysis on sample data.

Parameters:

data: typed array of sample values (DoubleArray or LongArray)
opts: optional map with:
- :n-points: grid size (default 512)
- :bandwidth: override bandwidth (default: ISJ selection)
- :n-bootstrap: bootstrap samples for confidence bands (default 200)
- :alpha: confidence level (default 0.05)
- :rng-factory: 0-arity fn returning WellRng1024a (default: make-well-rng-1024a)

Returns map with:

:type :criterium/kde
:bandwidth: selected or provided bandwidth
:grid: evaluation points
:density: density values at grid points
:lower-band: lower confidence band
:upper-band: upper confidence band
:n: sample size

Note: Mode detection is now a separate analysis step. Use silverman-test and mode-confidence-intervals for statistical mode analysis.

Requires a typed array (DoubleArray or LongArray).

Compute KDE analysis on sample data.

Parameters:
- data: typed array of sample values (DoubleArray or LongArray)
- opts: optional map with:
  - :n-points: grid size (default 512)
  - :bandwidth: override bandwidth (default: ISJ selection)
  - :n-bootstrap: bootstrap samples for confidence bands (default 200)
  - :alpha: confidence level (default 0.05)
  - :rng-factory: 0-arity fn returning WellRng1024a (default: make-well-rng-1024a)

Returns map with:
- :type :criterium/kde
- :bandwidth: selected or provided bandwidth
- :grid: evaluation points
- :density: density values at grid points
- :lower-band: lower confidence band
- :upper-band: upper confidence band
- :n: sample size

Note: Mode detection is now a separate analysis step. Use silverman-test
and mode-confidence-intervals for statistical mode analysis.

Requires a typed array (DoubleArray or LongArray).

source raw docstring

kde-bootstrap-sample^clj

(kde-bootstrap-sample data bandwidth grid rng)

Generate a bootstrap sample of KDE density at fixed grid points. Uses the same bandwidth for all bootstrap iterations. rng is a mutable WellRng1024a instance. Requires a typed array (DoubleArray or LongArray).

Generate a bootstrap sample of KDE density at fixed grid points.
Uses the same bandwidth for all bootstrap iterations.
rng is a mutable WellRng1024a instance.
Requires a typed array (DoubleArray or LongArray).

source raw docstring

kde-confidence-bands^clj

(kde-confidence-bands data bandwidth grid)

(kde-confidence-bands data
                      bandwidth
                      grid
                      {:keys [n-bootstrap alpha rng-factory]
                       :or {n-bootstrap 200
                            alpha 0.05
                            rng-factory (fn* [] (random/make-well-rng-1024a))}})

Compute bootstrap confidence bands for KDE.

Parameters:

data: original sample data
bandwidth: kernel bandwidth
grid: evaluation grid points
n-bootstrap: number of bootstrap samples (default 200)
alpha: confidence level (default 0.05 for 95% CI)
rng-factory: 0-arity fn returning WellRng1024a (default: make-well-rng-1024a)

Returns map with :lower and :upper vectors.

Compute bootstrap confidence bands for KDE.

Parameters:
- data: original sample data
- bandwidth: kernel bandwidth
- grid: evaluation grid points
- n-bootstrap: number of bootstrap samples (default 200)
- alpha: confidence level (default 0.05 for 95% CI)
- rng-factory: 0-arity fn returning WellRng1024a (default: make-well-rng-1024a)

Returns map with :lower and :upper vectors.

source raw docstring

linear-bin^clj

(linear-bin data grid)

Bin data onto a regular grid using linear interpolation. Returns vector of bin weights that sum to 1.0.

Each data point contributes to two adjacent bins proportionally to its distance from bin centers.

Requires a typed array (DoubleArray or LongArray).

Bin data onto a regular grid using linear interpolation.
Returns vector of bin weights that sum to 1.0.

Each data point contributes to two adjacent bins proportionally
to its distance from bin centers.

Requires a typed array (DoubleArray or LongArray).

source raw docstring

locate-modes^clj

(locate-modes data
              k
              {:keys [n-points tol cached-critical-bandwidth data-bounds]
               :or {n-points 512 tol 1.0E-6}})

Find mode and antimode locations using critical bandwidth.

Given target number of modes k, finds the critical bandwidth h_k (smallest bandwidth with at most k modes) and locates both modes (local maxima) and antimodes (local minima) at that bandwidth.

Parameters:

data: typed array of sample values
k: target number of modes
opts: optional map with :n-points (grid size, default 512), :tol (tolerance, default 1e-6), :cached-critical-bandwidth (optional pre-computed bandwidth), :data-bounds (optional [min max], avoids redundant data scan)

Returns: {:modes [{:location, :density} ...] - sorted by location :antimodes [{:location, :density} ...] - sorted by location :critical-bandwidth h_k}

Requires a typed array (DoubleArray or LongArray).

Find mode and antimode locations using critical bandwidth.

Given target number of modes k, finds the critical bandwidth h_k
(smallest bandwidth with at most k modes) and locates both modes
(local maxima) and antimodes (local minima) at that bandwidth.

Parameters:
- data: typed array of sample values
- k: target number of modes
- opts: optional map with :n-points (grid size, default 512),
        :tol (tolerance, default 1e-6),
        :cached-critical-bandwidth (optional pre-computed bandwidth),
        :data-bounds (optional [min max], avoids redundant data scan)

Returns:
{:modes [{:location, :density} ...] - sorted by location
 :antimodes [{:location, :density} ...] - sorted by location
 :critical-bandwidth h_k}

Requires a typed array (DoubleArray or LongArray).

source raw docstring

mode-confidence-intervals^clj

(mode-confidence-intervals data bandwidth grid n-modes)

(mode-confidence-intervals data
                           bandwidth
                           grid
                           n-modes
                           {:keys [n-bootstrap alpha rng-factory]
                            :or {n-bootstrap 200
                                 alpha 0.05
                                 rng-factory
                                   (fn* [] (random/make-well-rng-1024a))}})

Compute bootstrap confidence intervals for mode locations.

Parameters:

data: original sample data (typed array)
bandwidth: kernel bandwidth
grid: evaluation grid
n-modes: number of modes to track (default 3)
n-bootstrap: number of bootstrap samples (default 200)
alpha: confidence level (default 0.05)
rng-factory: 0-arity fn returning WellRng1024a (default: make-well-rng-1024a)

Returns vector of mode CIs, each with :location, :ci-lower, :ci-upper.

Requires a typed array (DoubleArray or LongArray).

Compute bootstrap confidence intervals for mode locations.

Parameters:
- data: original sample data (typed array)
- bandwidth: kernel bandwidth
- grid: evaluation grid
- n-modes: number of modes to track (default 3)
- n-bootstrap: number of bootstrap samples (default 200)
- alpha: confidence level (default 0.05)
- rng-factory: 0-arity fn returning WellRng1024a (default: make-well-rng-1024a)

Returns vector of mode CIs, each with :location, :ci-lower, :ci-upper.

Requires a typed array (DoubleArray or LongArray).

source raw docstring

silverman-bandwidth^clj

(silverman-bandwidth data)

Silverman's rule of thumb bandwidth selector. h = 0.9 * min(σ, IQR/1.34) * n^(-1/5)

A simple fallback when ISJ doesn't converge. Requires a typed array (DoubleArray or LongArray).

Silverman's rule of thumb bandwidth selector.
h = 0.9 * min(σ, IQR/1.34) * n^(-1/5)

A simple fallback when ISJ doesn't converge.
Requires a typed array (DoubleArray or LongArray).

source raw docstring

silverman-bootstrap-sample^clj

(silverman-bootstrap-sample data bandwidth rng)

Generate a smoothed bootstrap sample for Silverman's test.

Uses rescaled bootstrap from Silverman (1981): y_i = mean + (X*_i - mean + h * epsilon_i) / sqrt(1 + h²/σ²)

This ensures the bootstrap sample has the same variance as the original. Returns a DoubleArray. rng is a mutable WellRng1024a instance. Requires a typed array (DoubleArray or LongArray).

Generate a smoothed bootstrap sample for Silverman's test.

Uses rescaled bootstrap from Silverman (1981):
y_i = mean + (X*_i - mean + h * epsilon_i) / sqrt(1 + h²/σ²)

This ensures the bootstrap sample has the same variance as the original.
Returns a DoubleArray.
rng is a mutable WellRng1024a instance.
Requires a typed array (DoubleArray or LongArray).

source raw docstring

silverman-test^clj

(silverman-test data
                k
                {:keys [n-bootstrap n-points alpha tol cached-critical-bandwidth
                        rng-factory]
                 :or {n-bootstrap 200
                      n-points 512
                      alpha 0.05
                      tol 1.0E-6
                      rng-factory (fn* [] (random/make-well-rng-1024a))}})

Silverman's bootstrap test for H0: at most k modes.

Tests the null hypothesis that the underlying density has at most k modes. For k=1, applies Hall-York asymptotic correction for better calibration. For k>1, uses standard bootstrap (may be conservative).

Parameters:

data: typed array of sample values
k: number of modes under H0
opts: optional map with:
- :n-bootstrap (default 200)
- :n-points (default 512)
- :alpha (for Hall-York correction, default 0.05)
- :tol (for critical bandwidth search, default 1e-6)
- :cached-critical-bandwidth (optional pre-computed bandwidth, skips search if provided)
- :rng-factory: 0-arity fn returning WellRng1024a (default: make-well-rng-1024a)

Returns map with:

:k - number of modes tested
:critical-bandwidth - bandwidth giving exactly k modes
:p-value - proportion of bootstrap samples with > k modes
:corrected? - whether Hall-York correction was applied

Requires a typed array (DoubleArray or LongArray).

Silverman's bootstrap test for H0: at most k modes.

Tests the null hypothesis that the underlying density has at most k modes.
For k=1, applies Hall-York asymptotic correction for better calibration.
For k>1, uses standard bootstrap (may be conservative).

Parameters:
- data: typed array of sample values
- k: number of modes under H0
- opts: optional map with:
  - :n-bootstrap (default 200)
  - :n-points (default 512)
  - :alpha (for Hall-York correction, default 0.05)
  - :tol (for critical bandwidth search, default 1e-6)
  - :cached-critical-bandwidth (optional pre-computed bandwidth, skips search if provided)
  - :rng-factory: 0-arity fn returning WellRng1024a (default: make-well-rng-1024a)

Returns map with:
- :k - number of modes tested
- :critical-bandwidth - bandwidth giving exactly k modes
- :p-value - proportion of bootstrap samples with > k modes
- :corrected? - whether Hall-York correction was applied

Requires a typed array (DoubleArray or LongArray).

source raw docstring

cljdoc builds & hosts documentation for Clojure/Script libraries

Keyboard shortcuts

`Ctrl`+`k`	Jump to recent docs
`←`	Move to previous article
`→`	Move to next article
`Ctrl`+`/`	Jump to the search field

Raise an issue Browse cljdoc source Chat on Slack

× close