criterium.stats.tail

Liking cljdoc? Tell your friends :D

Clojure only.

exceedances-over-threshold
gpd-cdf
gpd-mle
gpd-pdf
gpd-quantile
hill-estimator
hill-estimator-default-k-range
mean-residual-life
mean-residual-life-default-thresholds
tail-ratios

Tail statistics for extreme value analysis.

Provides functions for analyzing distribution tails, including:

Hill estimator for tail index estimation
Generalized Pareto Distribution (GPD) fitting and functions
Mean residual life for threshold selection
Tail ratios from percentiles

All functions requiring sample data accept typed arrays (ITypedArray).

References:

Hill (1975), A Simple General Approach to Inference About the Tail of a Distribution
Grimshaw (1993), Computing Maximum Likelihood Estimates for the GPD
Coles (2001), An Introduction to Statistical Modeling of Extreme Values

Tail statistics for extreme value analysis.

Provides functions for analyzing distribution tails, including:
- Hill estimator for tail index estimation
- Generalized Pareto Distribution (GPD) fitting and functions
- Mean residual life for threshold selection
- Tail ratios from percentiles

All functions requiring sample data accept typed arrays (ITypedArray).

References:
- Hill (1975), A Simple General Approach to Inference About the Tail of a Distribution
- Grimshaw (1993), Computing Maximum Likelihood Estimates for the GPD
- Coles (2001), An Introduction to Statistical Modeling of Extreme Values

raw docstring

exceedances-over-threshold^clj

(exceedances-over-threshold samples threshold)

Extract excesses over the given threshold. Returns a new DoubleArray containing (y - threshold) for each y > threshold.

In extreme value theory, the 'exceedance' or 'excess' over a threshold u is defined as Y = X - u for observations X > u. These excesses are modeled by the Generalized Pareto Distribution (GPD).

Parameters: samples - typed array of sample values threshold - threshold value u

Returns DoubleArray of excesses (y - threshold for y > threshold).

Extract excesses over the given threshold.
Returns a new DoubleArray containing (y - threshold) for each y > threshold.

In extreme value theory, the 'exceedance' or 'excess' over a threshold u
is defined as Y = X - u for observations X > u. These excesses are modeled
by the Generalized Pareto Distribution (GPD).

Parameters:
  samples - typed array of sample values
  threshold - threshold value u

Returns DoubleArray of excesses (y - threshold for y > threshold).

source raw docstring

gpd-cdf^clj

(gpd-cdf xi sigma)

Cumulative distribution function for the Generalized Pareto Distribution.

For exceedances y > 0: F(y; ξ, σ) = 1 - (1 + ξy/σ)^(-1/ξ) if ξ ≠ 0 F(y; 0, σ) = 1 - exp(-y/σ) if ξ = 0

Parameters: xi - shape parameter ξ sigma - scale parameter σ (must be positive)

Returns a function F(y) that computes P(Y ≤ y).

Cumulative distribution function for the Generalized Pareto Distribution.

For exceedances y > 0:
  F(y; ξ, σ) = 1 - (1 + ξy/σ)^(-1/ξ)  if ξ ≠ 0
  F(y; 0, σ) = 1 - exp(-y/σ)           if ξ = 0

Parameters:
  xi - shape parameter ξ
  sigma - scale parameter σ (must be positive)

Returns a function F(y) that computes P(Y ≤ y).

source raw docstring

gpd-mle^clj

(gpd-mle exceedances)

(gpd-mle exceedances opts)

Maximum likelihood estimation for the Generalized Pareto Distribution.

Uses Grimshaw's (1993) algorithm with profile likelihood optimization.

Parameters: exceedances - typed array of exceedance values (values above threshold) All values must be positive (exceedances, not raw data) opts - optional map with: :max-iter - maximum iterations (default 100) :tol - convergence tolerance (default 1e-8) :xi-min - minimum ξ to search (default -0.5) :xi-max - maximum ξ to search (default 2.0)

Returns map with: :xi - shape parameter estimate :sigma - scale parameter estimate :log-likelihood - maximized log-likelihood :converged? - whether optimization converged :n - number of exceedances

Throws if exceedances is empty or contains non-positive values.

Maximum likelihood estimation for the Generalized Pareto Distribution.

Uses Grimshaw's (1993) algorithm with profile likelihood optimization.

Parameters:
  exceedances - typed array of exceedance values (values above threshold)
                All values must be positive (exceedances, not raw data)
  opts - optional map with:
    :max-iter - maximum iterations (default 100)
    :tol - convergence tolerance (default 1e-8)
    :xi-min - minimum ξ to search (default -0.5)
    :xi-max - maximum ξ to search (default 2.0)

Returns map with:
  :xi - shape parameter estimate
  :sigma - scale parameter estimate
  :log-likelihood - maximized log-likelihood
  :converged? - whether optimization converged
  :n - number of exceedances

Throws if exceedances is empty or contains non-positive values.

source raw docstring

gpd-pdf^clj

(gpd-pdf xi sigma)

Probability density function for the Generalized Pareto Distribution.

For exceedances y > 0: f(y; ξ, σ) = (1/σ) * (1 + ξy/σ)^(-1/ξ - 1) if ξ ≠ 0 f(y; 0, σ) = (1/σ) * exp(-y/σ) if ξ = 0

Support: y ≥ 0 if ξ ≥ 0 0 ≤ y ≤ -σ/ξ if ξ < 0

Parameters: xi - shape parameter ξ (can be negative, zero, or positive) sigma - scale parameter σ (must be positive)

Returns a function f(y) that computes the density at y.

Probability density function for the Generalized Pareto Distribution.

For exceedances y > 0:
  f(y; ξ, σ) = (1/σ) * (1 + ξy/σ)^(-1/ξ - 1)  if ξ ≠ 0
  f(y; 0, σ) = (1/σ) * exp(-y/σ)               if ξ = 0

Support:
  y ≥ 0           if ξ ≥ 0
  0 ≤ y ≤ -σ/ξ    if ξ < 0

Parameters:
  xi - shape parameter ξ (can be negative, zero, or positive)
  sigma - scale parameter σ (must be positive)

Returns a function f(y) that computes the density at y.

source raw docstring

gpd-quantile^clj

(gpd-quantile xi sigma)

Quantile function (inverse CDF) for the Generalized Pareto Distribution.

For probability p ∈ [0, 1]: Q(p; ξ, σ) = (σ/ξ) * ((1-p)^(-ξ) - 1) if ξ ≠ 0 Q(p; 0, σ) = -σ * log(1-p) if ξ = 0

Parameters: xi - shape parameter ξ sigma - scale parameter σ (must be positive)

Returns a function Q(p) that computes the p-th quantile.

Quantile function (inverse CDF) for the Generalized Pareto Distribution.

For probability p ∈ [0, 1]:
  Q(p; ξ, σ) = (σ/ξ) * ((1-p)^(-ξ) - 1)  if ξ ≠ 0
  Q(p; 0, σ) = -σ * log(1-p)              if ξ = 0

Parameters:
  xi - shape parameter ξ
  sigma - scale parameter σ (must be positive)

Returns a function Q(p) that computes the p-th quantile.

source raw docstring

hill-estimator^clj

(hill-estimator sorted-samples k-range)

Compute the Hill estimator for tail index across a range of k values.

The Hill estimator for the k largest order statistics is: H_k = (1/k) * Σᵢ₌₁ᵏ log(X_{(n-i+1)} / X_{(n-k)})

where X_{(i)} is the i-th order statistic (sorted ascending).

The tail index α is estimated as 1/H_k. Heavy tails have small α (< 2).

Parameters: sorted-samples - typed array of samples sorted in ascending order k-range - sequence of k values to compute estimates for (each k uses the k largest observations)

Returns vector of maps {:k k :estimate H_k :tail-index (1/H_k)} for each k in k-range where computation is valid.

Notes:

k must be >= 1 and < n (sample size)
Returns empty vector if samples has fewer than 2 elements
Requires sorted input (ascending order)

Compute the Hill estimator for tail index across a range of k values.

The Hill estimator for the k largest order statistics is:
  H_k = (1/k) * Σᵢ₌₁ᵏ log(X_{(n-i+1)} / X_{(n-k)})

where X_{(i)} is the i-th order statistic (sorted ascending).

The tail index α is estimated as 1/H_k. Heavy tails have small α (< 2).

Parameters:
  sorted-samples - typed array of samples sorted in ascending order
  k-range - sequence of k values to compute estimates for
            (each k uses the k largest observations)

Returns vector of maps {:k k :estimate H_k :tail-index (1/H_k)}
for each k in k-range where computation is valid.

Notes:
- k must be >= 1 and < n (sample size)
- Returns empty vector if samples has fewer than 2 elements
- Requires sorted input (ascending order)

source raw docstring

hill-estimator-default-k-range^clj

(hill-estimator-default-k-range n)

Compute default k range for Hill estimator. Uses k from 10 to min(n/2, 500) with step size based on n.

Parameters: n - sample size

Returns sequence of k values.

Compute default k range for Hill estimator.
Uses k from 10 to min(n/2, 500) with step size based on n.

Parameters:
  n - sample size

Returns sequence of k values.

source raw docstring

mean-residual-life^clj

(mean-residual-life sorted-samples threshold-range)

Compute mean residual life (mean excess) over a range of thresholds.

The mean residual life at threshold u is: e(u) = E[X - u | X > u]

For GPD data, e(u) is linear in u with slope ξ/(1-ξ). A threshold where e(u) becomes approximately linear suggests a good choice for POT analysis.

Parameters: sorted-samples - typed array of samples sorted in ascending order threshold-range - sequence of threshold values to evaluate

Returns vector of maps {:threshold u :mrl e(u) :n-exceed count} where n-exceed is the number of observations exceeding u.

Compute mean residual life (mean excess) over a range of thresholds.

The mean residual life at threshold u is:
  e(u) = E[X - u | X > u]

For GPD data, e(u) is linear in u with slope ξ/(1-ξ).
A threshold where e(u) becomes approximately linear suggests
a good choice for POT analysis.

Parameters:
  sorted-samples - typed array of samples sorted in ascending order
  threshold-range - sequence of threshold values to evaluate

Returns vector of maps {:threshold u :mrl e(u) :n-exceed count}
where n-exceed is the number of observations exceeding u.

source raw docstring

mean-residual-life-default-thresholds^clj

(mean-residual-life-default-thresholds sorted-samples)

(mean-residual-life-default-thresholds sorted-samples n-points)

Compute default threshold range for MRL plot. Uses quantiles from 50th to 95th percentile.

Parameters: sorted-samples - typed array of samples sorted in ascending order n-points - number of threshold points (default 20)

Returns sequence of threshold values.

Compute default threshold range for MRL plot.
Uses quantiles from 50th to 95th percentile.

Parameters:
  sorted-samples - typed array of samples sorted in ascending order
  n-points - number of threshold points (default 20)

Returns sequence of threshold values.

source raw docstring

tail-ratios^clj

(tail-ratios percentiles)

Compute tail ratios from percentile values.

Tail ratios indicate how heavy the distribution tail is. Higher ratios suggest heavier tails.

Parameters: percentiles - map of percentile values with keys like :p95, :p99, :p999 or numeric keys like 0.95, 0.99, 0.999

Returns map with: :p99-p95 - ratio of 99th to 95th percentile :p999-p99 - ratio of 99.9th to 99th percentile :p999-p95 - ratio of 99.9th to 95th percentile (if all present)

Returns nil for ratios where required percentiles are missing.

Compute tail ratios from percentile values.

Tail ratios indicate how heavy the distribution tail is.
Higher ratios suggest heavier tails.

Parameters:
  percentiles - map of percentile values with keys like :p95, :p99, :p999
                or numeric keys like 0.95, 0.99, 0.999

Returns map with:
  :p99-p95 - ratio of 99th to 95th percentile
  :p999-p99 - ratio of 99.9th to 99th percentile
  :p999-p95 - ratio of 99.9th to 95th percentile (if all present)

Returns nil for ratios where required percentiles are missing.

source raw docstring

cljdoc builds & hosts documentation for Clojure/Script libraries

Keyboard shortcuts

`Ctrl`+`k`	Jump to recent docs
`←`	Move to previous article
`→`	Move to next article
`Ctrl`+`/`	Jump to the search field

Raise an issue Browse cljdoc source Chat on Slack

× close