Tail statistics for extreme value analysis.
Provides functions for analyzing distribution tails, including:
All functions requiring sample data accept typed arrays (ITypedArray).
References:
Tail statistics for extreme value analysis. Provides functions for analyzing distribution tails, including: - Hill estimator for tail index estimation - Generalized Pareto Distribution (GPD) fitting and functions - Mean residual life for threshold selection - Tail ratios from percentiles All functions requiring sample data accept typed arrays (ITypedArray). References: - Hill (1975), A Simple General Approach to Inference About the Tail of a Distribution - Grimshaw (1993), Computing Maximum Likelihood Estimates for the GPD - Coles (2001), An Introduction to Statistical Modeling of Extreme Values
(exceedances-over-threshold samples threshold)Extract excesses over the given threshold. Returns a new DoubleArray containing (y - threshold) for each y > threshold.
In extreme value theory, the 'exceedance' or 'excess' over a threshold u is defined as Y = X - u for observations X > u. These excesses are modeled by the Generalized Pareto Distribution (GPD).
Parameters: samples - typed array of sample values threshold - threshold value u
Returns DoubleArray of excesses (y - threshold for y > threshold).
Extract excesses over the given threshold. Returns a new DoubleArray containing (y - threshold) for each y > threshold. In extreme value theory, the 'exceedance' or 'excess' over a threshold u is defined as Y = X - u for observations X > u. These excesses are modeled by the Generalized Pareto Distribution (GPD). Parameters: samples - typed array of sample values threshold - threshold value u Returns DoubleArray of excesses (y - threshold for y > threshold).
(gpd-cdf xi sigma)Cumulative distribution function for the Generalized Pareto Distribution.
For exceedances y > 0: F(y; ξ, σ) = 1 - (1 + ξy/σ)^(-1/ξ) if ξ ≠ 0 F(y; 0, σ) = 1 - exp(-y/σ) if ξ = 0
Parameters: xi - shape parameter ξ sigma - scale parameter σ (must be positive)
Returns a function F(y) that computes P(Y ≤ y).
Cumulative distribution function for the Generalized Pareto Distribution. For exceedances y > 0: F(y; ξ, σ) = 1 - (1 + ξy/σ)^(-1/ξ) if ξ ≠ 0 F(y; 0, σ) = 1 - exp(-y/σ) if ξ = 0 Parameters: xi - shape parameter ξ sigma - scale parameter σ (must be positive) Returns a function F(y) that computes P(Y ≤ y).
(gpd-mle exceedances)(gpd-mle exceedances opts)Maximum likelihood estimation for the Generalized Pareto Distribution.
Uses Grimshaw's (1993) algorithm with profile likelihood optimization.
Parameters: exceedances - typed array of exceedance values (values above threshold) All values must be positive (exceedances, not raw data) opts - optional map with: :max-iter - maximum iterations (default 100) :tol - convergence tolerance (default 1e-8) :xi-min - minimum ξ to search (default -0.5) :xi-max - maximum ξ to search (default 2.0)
Returns map with: :xi - shape parameter estimate :sigma - scale parameter estimate :log-likelihood - maximized log-likelihood :converged? - whether optimization converged :n - number of exceedances
Throws if exceedances is empty or contains non-positive values.
Maximum likelihood estimation for the Generalized Pareto Distribution.
Uses Grimshaw's (1993) algorithm with profile likelihood optimization.
Parameters:
exceedances - typed array of exceedance values (values above threshold)
All values must be positive (exceedances, not raw data)
opts - optional map with:
:max-iter - maximum iterations (default 100)
:tol - convergence tolerance (default 1e-8)
:xi-min - minimum ξ to search (default -0.5)
:xi-max - maximum ξ to search (default 2.0)
Returns map with:
:xi - shape parameter estimate
:sigma - scale parameter estimate
:log-likelihood - maximized log-likelihood
:converged? - whether optimization converged
:n - number of exceedances
Throws if exceedances is empty or contains non-positive values.(gpd-pdf xi sigma)Probability density function for the Generalized Pareto Distribution.
For exceedances y > 0: f(y; ξ, σ) = (1/σ) * (1 + ξy/σ)^(-1/ξ - 1) if ξ ≠ 0 f(y; 0, σ) = (1/σ) * exp(-y/σ) if ξ = 0
Support: y ≥ 0 if ξ ≥ 0 0 ≤ y ≤ -σ/ξ if ξ < 0
Parameters: xi - shape parameter ξ (can be negative, zero, or positive) sigma - scale parameter σ (must be positive)
Returns a function f(y) that computes the density at y.
Probability density function for the Generalized Pareto Distribution. For exceedances y > 0: f(y; ξ, σ) = (1/σ) * (1 + ξy/σ)^(-1/ξ - 1) if ξ ≠ 0 f(y; 0, σ) = (1/σ) * exp(-y/σ) if ξ = 0 Support: y ≥ 0 if ξ ≥ 0 0 ≤ y ≤ -σ/ξ if ξ < 0 Parameters: xi - shape parameter ξ (can be negative, zero, or positive) sigma - scale parameter σ (must be positive) Returns a function f(y) that computes the density at y.
(gpd-quantile xi sigma)Quantile function (inverse CDF) for the Generalized Pareto Distribution.
For probability p ∈ [0, 1]: Q(p; ξ, σ) = (σ/ξ) * ((1-p)^(-ξ) - 1) if ξ ≠ 0 Q(p; 0, σ) = -σ * log(1-p) if ξ = 0
Parameters: xi - shape parameter ξ sigma - scale parameter σ (must be positive)
Returns a function Q(p) that computes the p-th quantile.
Quantile function (inverse CDF) for the Generalized Pareto Distribution. For probability p ∈ [0, 1]: Q(p; ξ, σ) = (σ/ξ) * ((1-p)^(-ξ) - 1) if ξ ≠ 0 Q(p; 0, σ) = -σ * log(1-p) if ξ = 0 Parameters: xi - shape parameter ξ sigma - scale parameter σ (must be positive) Returns a function Q(p) that computes the p-th quantile.
(hill-estimator sorted-samples k-range)Compute the Hill estimator for tail index across a range of k values.
The Hill estimator for the k largest order statistics is: H_k = (1/k) * Σᵢ₌₁ᵏ log(X_{(n-i+1)} / X_{(n-k)})
where X_{(i)} is the i-th order statistic (sorted ascending).
The tail index α is estimated as 1/H_k. Heavy tails have small α (< 2).
Parameters: sorted-samples - typed array of samples sorted in ascending order k-range - sequence of k values to compute estimates for (each k uses the k largest observations)
Returns vector of maps {:k k :estimate H_k :tail-index (1/H_k)} for each k in k-range where computation is valid.
Notes:
Compute the Hill estimator for tail index across a range of k values.
The Hill estimator for the k largest order statistics is:
H_k = (1/k) * Σᵢ₌₁ᵏ log(X_{(n-i+1)} / X_{(n-k)})
where X_{(i)} is the i-th order statistic (sorted ascending).
The tail index α is estimated as 1/H_k. Heavy tails have small α (< 2).
Parameters:
sorted-samples - typed array of samples sorted in ascending order
k-range - sequence of k values to compute estimates for
(each k uses the k largest observations)
Returns vector of maps {:k k :estimate H_k :tail-index (1/H_k)}
for each k in k-range where computation is valid.
Notes:
- k must be >= 1 and < n (sample size)
- Returns empty vector if samples has fewer than 2 elements
- Requires sorted input (ascending order)(hill-estimator-default-k-range n)Compute default k range for Hill estimator. Uses k from 10 to min(n/2, 500) with step size based on n.
Parameters: n - sample size
Returns sequence of k values.
Compute default k range for Hill estimator. Uses k from 10 to min(n/2, 500) with step size based on n. Parameters: n - sample size Returns sequence of k values.
(mean-residual-life sorted-samples threshold-range)Compute mean residual life (mean excess) over a range of thresholds.
The mean residual life at threshold u is: e(u) = E[X - u | X > u]
For GPD data, e(u) is linear in u with slope ξ/(1-ξ). A threshold where e(u) becomes approximately linear suggests a good choice for POT analysis.
Parameters: sorted-samples - typed array of samples sorted in ascending order threshold-range - sequence of threshold values to evaluate
Returns vector of maps {:threshold u :mrl e(u) :n-exceed count} where n-exceed is the number of observations exceeding u.
Compute mean residual life (mean excess) over a range of thresholds.
The mean residual life at threshold u is:
e(u) = E[X - u | X > u]
For GPD data, e(u) is linear in u with slope ξ/(1-ξ).
A threshold where e(u) becomes approximately linear suggests
a good choice for POT analysis.
Parameters:
sorted-samples - typed array of samples sorted in ascending order
threshold-range - sequence of threshold values to evaluate
Returns vector of maps {:threshold u :mrl e(u) :n-exceed count}
where n-exceed is the number of observations exceeding u.(mean-residual-life-default-thresholds sorted-samples)(mean-residual-life-default-thresholds sorted-samples n-points)Compute default threshold range for MRL plot. Uses quantiles from 50th to 95th percentile.
Parameters: sorted-samples - typed array of samples sorted in ascending order n-points - number of threshold points (default 20)
Returns sequence of threshold values.
Compute default threshold range for MRL plot. Uses quantiles from 50th to 95th percentile. Parameters: sorted-samples - typed array of samples sorted in ascending order n-points - number of threshold points (default 20) Returns sequence of threshold values.
(tail-ratios percentiles)Compute tail ratios from percentile values.
Tail ratios indicate how heavy the distribution tail is. Higher ratios suggest heavier tails.
Parameters: percentiles - map of percentile values with keys like :p95, :p99, :p999 or numeric keys like 0.95, 0.99, 0.999
Returns map with: :p99-p95 - ratio of 99th to 95th percentile :p999-p99 - ratio of 99.9th to 99th percentile :p999-p95 - ratio of 99.9th to 95th percentile (if all present)
Returns nil for ratios where required percentiles are missing.
Compute tail ratios from percentile values.
Tail ratios indicate how heavy the distribution tail is.
Higher ratios suggest heavier tails.
Parameters:
percentiles - map of percentile values with keys like :p95, :p99, :p999
or numeric keys like 0.95, 0.99, 0.999
Returns map with:
:p99-p95 - ratio of 99th to 95th percentile
:p999-p99 - ratio of 99.9th to 99th percentile
:p999-p95 - ratio of 99.9th to 95th percentile (if all present)
Returns nil for ratios where required percentiles are missing.cljdoc builds & hosts documentation for Clojure/Script libraries
| Ctrl+k | Jump to recent docs |
| ← | Move to previous article |
| → | Move to next article |
| Ctrl+/ | Jump to the search field |