fastmath.stats.bootstrap

Liking cljdoc? Tell your friends :D

Clojure only.

bootstrap
bootstrap-stats
ci-basic
ci-bc
ci-bca
ci-normal
ci-percentile
ci-studentized
ci-t
jackknife
jackknife+

Bootstrap methods and confidence intervals

Bootstrap methods and confidence intervals

raw docstring

bootstrap^clj

(bootstrap input)

(bootstrap input params-or-statistic)

(bootstrap input
           statistic
           {:keys [rng samples size method antithetic? dimensions include?
                   multi?]
            :or {samples 500}
            :as params})

Generates bootstrap samples from a given dataset or probabilistic model for resampling purposes.

This function supports both nonparametric bootstrap (resampling directly from the data) and parametric bootstrap (resampling from a statistical model estimated from or provided for the data). It can optionally apply a statistic function to the original data and each sample, returning summary statistics for the bootstrap distribution.

The primary input can be:

A sequence of data values (for nonparametric bootstrap).
A map containing:
- :data: The sequence of data values.
- :model: An optional model for parametric bootstrap. If not provided, a default discrete distribution is built from the data (see :distribution and :smoothing).

The function offers various parameters to control the sampling process and model generation.

Parameters:

input (sequence or map): The data source. Can be a sequence of numbers or a map containing :data and optionally :model. Can be sequence of sequences for multidimensional data (when :dimensions is :multi).
statistic (function, optional): A function that takes a sequence of data and returns a single numerical value (e.g., fastmath.stats/mean, fastmath.stats/median). If provided, bootstrap-stats is called on the results.
params (map, optional): An options map to configure the bootstrap process. Keys include:
- :samples (long, default: 500): The number of bootstrap samples to generate.
- :size (long, optional): The size of each individual bootstrap sample. Defaults to the size of the original data.
- :method (keyword, optional): Specifies the sampling method.
  - nil (default): Standard random sampling with replacement.
  - :jackknife: Performs leave-one-out jackknife resampling (ignores :samples and :size).
  - :jackknife+: Performs positive jackknife resampling (duplicates each observation once; ignores :samples).
  - Other keywords are passed to fastmath.random/->seq for sampling from a distribution (only relevant if a :model is used or built).
- :rng (random number generator, optional): An instance of a random number generator (see fastmath.random/rng). A default JVM RNG is used if not provided.
- :smoothing (keyword, optional): Applies smoothing to the bootstrap process.
  - :kde: Uses Kernel Density Estimation to smooth the empirical distribution before sampling. Requires specifying :kernel (default :gaussian) and optionally :bandwidth (auto-estimated by default).
  - :gaussian: Adds random noise drawn from N(0, standard error) to each resampled value.
- :distribution (keyword, default: :real-discrete-distribution): The type of discrete distribution to build automatically from the data if no explicit :model is provided. Other options include :integer-discrete-distribution (for integer data) and :categorical-distribution (for any data type).
- :dimensions (keyword, optional): If set to :multi, treats the input :data as a sequence of sequences (multidimensional data). Models are built or used separately for each dimension, and samples are generated as sequences of vectors.
- :antithetic? (boolean, default: false): If true, uses antithetic sampling for variance reduction (paired samples are generated as x and 1-x from a uniform distribution, then transformed by the inverse CDF of the model). Requires sampling from a distribution model.
- :include? (boolean, default: false): If true, the original dataset is included as one of the samples in the output collection.

Model for parametric bootstrap:

The :model parameter in the input map can be:

Any fastmath.random distribution object (e.g., (r/distribution :normal {:mu 0 :sd 1})).
Any 0-arity function that returns a random sample when called.

If :model is omitted from the input map, a default discrete distribution (:real-discrete-distribution by default, see :distribution param) is built from the :data. Smoothing options (:smoothing) apply to this automatically built model.

When :dimensions is :multi, :model should be a sequence of models, one for each dimension.

Returns:

If statistic is provided: A map containing the original input map augmented with analysis results from bootstrap-stats (e.g., :t0, :ts, :bias, :mean, :stddev).
If statistic is nil: A map containing the original input map augmented with the generated bootstrap samples in the :samples key. The :samples value is a collection of sequences, where each inner sequence is one bootstrap sample. If :dimensions is :multi, samples are sequences of vectors.

Generates bootstrap samples from a given dataset or probabilistic model for resampling purposes.

This function supports both **nonparametric bootstrap** (resampling directly from the data)
and **parametric bootstrap** (resampling from a statistical model estimated from or
provided for the data). It can optionally apply a statistic function to the original
data and each sample, returning summary statistics for the bootstrap distribution.

The primary input can be:

*   A sequence of data values (for nonparametric bootstrap).
*   A map containing:
    *   `:data`: The sequence of data values.
    *   `:model`: An optional model for parametric bootstrap. If not provided,
        a default discrete distribution is built from the data (see `:distribution`
        and `:smoothing`).

The function offers various parameters to control the sampling process and
model generation.

Parameters:

*   `input` (sequence or map): The data source. Can be a sequence of numbers
    or a map containing `:data` and optionally `:model`. Can be sequence of
    sequences for multidimensional data (when `:dimensions` is `:multi`).
*   `statistic` (function, optional): A function that takes a sequence of data
    and returns a single numerical value (e.g., `fastmath.stats/mean`,
    `fastmath.stats/median`). If provided, `bootstrap-stats` is called on the
    results.
*   `params` (map, optional): An options map to configure the bootstrap process.
    Keys include:
    *   `:samples` (long, default: 500): The number of bootstrap samples to generate.
    *   `:size` (long, optional): The size of each individual bootstrap sample.
        Defaults to the size of the original data.
    *   `:method` (keyword, optional): Specifies the sampling method.
        *   `nil` (default): Standard random sampling with replacement.
        *   `:jackknife`: Performs leave-one-out jackknife resampling (ignores
            `:samples` and `:size`).
        *   `:jackknife+`: Performs positive jackknife resampling (duplicates
            each observation once; ignores `:samples`).
        *   Other keywords are passed to `fastmath.random/->seq` for sampling
            from a distribution (only relevant if a `:model` is used or built).
    *   `:rng` (random number generator, optional): An instance of a random number
        generator (see `fastmath.random/rng`). A default JVM RNG is used if not provided.
    *   `:smoothing` (keyword, optional): Applies smoothing to the bootstrap process.
        *   `:kde`: Uses Kernel Density Estimation to smooth the empirical distribution
            before sampling. Requires specifying `:kernel` (default `:gaussian`)
            and optionally `:bandwidth` (auto-estimated by default).
        *   `:gaussian`: Adds random noise drawn from N(0, standard error) to each
            resampled value.
    *   `:distribution` (keyword, default: `:real-discrete-distribution`): The type
        of discrete distribution to build automatically from the data if no explicit
        `:model` is provided. Other options include `:integer-discrete-distribution`
        (for integer data) and `:categorical-distribution` (for any data type).
    *   `:dimensions` (keyword, optional): If set to `:multi`, treats the input
        `:data` as a sequence of sequences (multidimensional data). Models are
        built or used separately for each dimension, and samples are generated
        as sequences of vectors.
    *   `:antithetic?` (boolean, default: `false`): If `true`, uses antithetic sampling
        for variance reduction (paired samples are generated as `x` and `1-x` from a uniform
        distribution, then transformed by the inverse CDF of the model). Requires sampling
        from a distribution model.
    *   `:include?` (boolean, default: `false`): If `true`, the original dataset
        is included as one of the samples in the output collection.

Model for parametric bootstrap:

The `:model` parameter in the input map can be:

*   Any `fastmath.random` distribution object (e.g., `(r/distribution :normal {:mu 0 :sd 1})`).
*   Any 0-arity function that returns a random sample when called.

If `:model` is omitted from the input map, a default discrete distribution
(`:real-discrete-distribution` by default, see `:distribution` param) is built
from the `:data`. Smoothing options (`:smoothing`) apply to this automatically
built model.

When `:dimensions` is `:multi`, `:model` should be a sequence of models, one for
each dimension.

Returns:

*   If `statistic` is provided: A map containing the original input map augmented
    with analysis results from `bootstrap-stats` (e.g., `:t0`, `:ts`, `:bias`,
    `:mean`, `:stddev`).
*   If `statistic` is `nil`: A map containing the original input map augmented
    with the generated bootstrap samples in the `:samples` key. The `:samples`
    value is a collection of sequences, where each inner sequence is one
    bootstrap sample. If `:dimensions` is `:multi`, samples are sequences of vectors.

See also [[jackknife]], [[jackknife+]], [[bootstrap-stats]],
[[ci-normal]], [[ci-basic]], [[ci-percentile]], [[ci-bc]], [[ci-bca]],
[[ci-studentized]], [[ci-t]].

source raw docstring

bootstrap-stats^clj

(bootstrap-stats {:keys [data samples] :as input} statistic)

Calculates summary statistics for bootstrap results.

Takes bootstrap output (typically from bootstrap) and a statistic function, computes the statistic on the original data (t0) and on each bootstrap sample (ts), and derives various descriptive statistics from the distribution of ts.

Parameters:

boot-data (map): A map containing:
- :data: The original dataset.
- :samples: A collection of bootstrap samples (e.g., from bootstrap).
- (optional) other keys like :model from bootstrap generation.
statistic (function): A function that accepts a sequence of data and returns a single numerical statistic (e.g., fastmath.stats/mean, fastmath.stats/median).

Returns a map which is the input boot-data augmented with bootstrap analysis results:

:statistic: The statistic function applied.
:t0: The statistic calculated on the original :data.
:ts: A sequence of the statistic calculated on each bootstrap sample in :samples.
:bias: The estimated bias of the statistic: mean(:ts) - :t0.
:mean, :median, :variance, :stddev, :sem: Descriptive statistics (mean, median, variance, standard deviation, standard error of the mean) calculated from the distribution of :ts.

This function prepares the results for calculating various bootstrap confidence intervals (e.g., ci-normal, ci-percentile, etc.).

Calculates summary statistics for bootstrap results.

Takes bootstrap output (typically from [[bootstrap]]) and a statistic function,
computes the statistic on the original data (`t0`) and on each bootstrap sample (`ts`),
and derives various descriptive statistics from the distribution of `ts`.

Parameters:

* `boot-data` (map): A map containing:
    * `:data`: The original dataset.
    * `:samples`: A collection of bootstrap samples (e.g., from [[bootstrap]]).
    * (optional) other keys like `:model` from bootstrap generation.
* `statistic` (function): A function that accepts a sequence of data and returns
  a single numerical statistic (e.g., `fastmath.stats/mean`, `fastmath.stats/median`).

Returns a map which is the input `boot-data` augmented with bootstrap analysis results:

* `:statistic`: The statistic function applied.
* `:t0`: The statistic calculated on the original `:data`.
* `:ts`: A sequence of the statistic calculated on each bootstrap sample in `:samples`.
* `:bias`: The estimated bias of the statistic: `mean(:ts) - :t0`.
* `:mean`, `:median`, `:variance`, `:stddev`, `:sem`: Descriptive statistics
  (mean, median, variance, standard deviation, standard error of the mean)
  calculated from the distribution of `:ts`.

This function prepares the results for calculating various bootstrap
confidence intervals (e.g., [[ci-normal]], [[ci-percentile]], etc.).

source raw docstring

ci-basic^clj

(ci-basic boot-data)

(ci-basic boot-data alpha)

(ci-basic {:keys [t0 ts]} alpha estimation-strategy)

Calculates the Basic (or Percentile-t) bootstrap confidence interval.

This method is based on the assumption that the distribution of the bootstrap replicates (:ts) centered around the true statistic (t) is approximately the same as the distribution of the original statistic (:t0) centered around the mean of the bootstrap replicates (mean(:ts)).

The interval is constructed using the quantiles of the bootstrap replicates (:ts) relative to the original statistic (:t0). Specifically, the lower bound is 2 * :t0 - q_upper and the upper bound is 2 * :t0 - q_lower, where q_lower and q_upper are the alpha/2 and 1 - alpha/2 quantiles of :ts, respectively.

Parameters:

boot-data (map): A map containing bootstrap results, typically from bootstrap-stats. Requires keys:
- :t0 (double): The statistic calculated on the original data.
- :ts (sequence of numbers): The statistic calculated on each bootstrap sample.
alpha (double, optional): The significance level for the interval. Defaults to 0.05 (for a 95% CI). The interval is based on the alpha/2 and 1 - alpha/2 quantiles of the :ts distribution.
estimation-strategy (keyword, optional): Specifies the quantile estimation strategy used to calculate the quantiles of :ts. Defaults to :legacy. See [[quantiles]] for available options (e.g., :r1 through :r9).

Returns a vector [lower-bound, upper-bound, t0].

lower-bound (double): The lower limit of the confidence interval.
upper-bound (double): The upper limit of the confidence interval.
t0 (double): The statistic calculated on the original data (from boot-data).

See also bootstrap-stats for input preparation and other confidence interval methods: ci-normal, ci-percentile, ci-bc, ci-bca, ci-studentized, ci-t, [[quantiles]].

Calculates the Basic (or Percentile-t) bootstrap confidence interval.

This method is based on the assumption that the distribution of the bootstrap
replicates (`:ts`) centered around the true statistic (`t`) is approximately the
same as the distribution of the original statistic (`:t0`) centered around the mean
of the bootstrap replicates (`mean(:ts)`).

The interval is constructed using the quantiles of the bootstrap replicates (`:ts`)
relative to the original statistic (`:t0`). Specifically, the lower bound is
`2 * :t0 - q_upper` and the upper bound is `2 * :t0 - q_lower`, where `q_lower` and
`q_upper` are the `alpha/2` and `1 - alpha/2` quantiles of `:ts`, respectively.

Parameters:

* `boot-data` (map): A map containing bootstrap results, typically from [[bootstrap-stats]].
  Requires keys:
    * `:t0` (double): The statistic calculated on the original data.
    * `:ts` (sequence of numbers): The statistic calculated on each bootstrap sample.
* `alpha` (double, optional): The significance level for the interval.
  Defaults to `0.05` (for a 95% CI). The interval is based on the `alpha/2`
  and `1 - alpha/2` quantiles of the `:ts` distribution.
* `estimation-strategy` (keyword, optional): Specifies the quantile estimation strategy
  used to calculate the quantiles of `:ts`. Defaults to `:legacy`. See [[quantiles]]
  for available options (e.g., `:r1` through `:r9`).

Returns a vector `[lower-bound, upper-bound, t0]`.

* `lower-bound` (double): The lower limit of the confidence interval.
* `upper-bound` (double): The upper limit of the confidence interval.
* `t0` (double): The statistic calculated on the original data (from `boot-data`).

See also [[bootstrap-stats]] for input preparation and other confidence interval methods:
[[ci-normal]], [[ci-percentile]], [[ci-bc]], [[ci-bca]], [[ci-studentized]], [[ci-t]], [[quantiles]].

source raw docstring

ci-bc^clj

(ci-bc boot-data)

(ci-bc boot-data alpha)

(ci-bc {:keys [t0 ts]} alpha estimation-strategy)

Calculates the Bias-Corrected (BC) bootstrap confidence interval.

This method adjusts the standard Percentile bootstrap interval (ci-percentile) to account for potential bias in the statistic's distribution. The correction is based on the proportion of bootstrap replicates of the statistic (:ts) that are less than the statistic calculated on the original data (:t0).

The procedure involves:

Calculating a bias correction factor ($z_0$) based on the empirical cumulative distribution function (CDF) of the bootstrap replicates at the point of the original statistic ($z_0 = \Phi^{-1}(\text{Proportion of } t^* < t_0)$, where $\Phi^{-1}$ is the inverse standard normal CDF).
Shifting the standard normal quantiles corresponding to the desired confidence level ($\alpha/2$ and $1-\alpha/2$) by $z_0$.
Finding the corresponding quantiles in the distribution of bootstrap replicates (:ts) based on these shifted probabilities.

Parameters:

boot-data (map): A map containing bootstrap results, typically from bootstrap-stats. Requires keys:
- :t0 (double): The statistic calculated on the original data.
- :ts (sequence of numbers): The statistic calculated on each bootstrap sample.
alpha (double, optional): The significance level for the interval. Defaults to 0.05 (for a 95% CI). The interval is based on quantiles of the :ts distribution, adjusted by the bias correction factor.
estimation-strategy (keyword, optional): Specifies the quantile estimation strategy used to calculate the final interval bounds from :ts after applying corrections. Defaults to :legacy. See [[quantiles]] for available options (e.g., :r1 through :r9).

Returns a vector [lower-bound, upper-bound, t0].

lower-bound (double): The lower limit of the confidence interval.
upper-bound (double): The upper limit of the confidence interval.
t0 (double): The statistic calculated on the original data (from boot-data).

See also bootstrap-stats for input preparation and other confidence interval methods: ci-normal, ci-basic, ci-percentile, ci-bca, ci-studentized, ci-t, [[quantiles]].

Calculates the Bias-Corrected (BC) bootstrap confidence interval.

This method adjusts the standard Percentile bootstrap interval ([[ci-percentile]])
to account for potential bias in the statistic's distribution. The correction
is based on the proportion of bootstrap replicates of the statistic (`:ts`)
that are less than the statistic calculated on the original data (`:t0`).

The procedure involves:
1.  Calculating a bias correction factor ($z_0$) based on the empirical cumulative
    distribution function (CDF) of the bootstrap replicates at the point of the
    original statistic ($z_0 = \Phi^{-1}(\text{Proportion of } t^* < t_0)$,
    where $\Phi^{-1}$ is the inverse standard normal CDF).
2.  Shifting the standard normal quantiles corresponding to the desired confidence
    level ($\alpha/2$ and $1-\alpha/2$) by $z_0$.
3.  Finding the corresponding quantiles in the distribution of bootstrap
    replicates (`:ts`) based on these shifted probabilities.

Parameters:

* `boot-data` (map): A map containing bootstrap results, typically from [[bootstrap-stats]].
  Requires keys:
    * `:t0` (double): The statistic calculated on the original data.
    * `:ts` (sequence of numbers): The statistic calculated on each bootstrap sample.
* `alpha` (double, optional): The significance level for the interval.
  Defaults to `0.05` (for a 95% CI). The interval is based on quantiles of the
  `:ts` distribution, adjusted by the bias correction factor.
* `estimation-strategy` (keyword, optional): Specifies the quantile estimation strategy
  used to calculate the final interval bounds from `:ts` after applying corrections.
  Defaults to `:legacy`. See [[quantiles]] for available options (e.g., `:r1` through `:r9`).

Returns a vector `[lower-bound, upper-bound, t0]`.

* `lower-bound` (double): The lower limit of the confidence interval.
* `upper-bound` (double): The upper limit of the confidence interval.
* `t0` (double): The statistic calculated on the original data (from `boot-data`).

See also [[bootstrap-stats]] for input preparation and other confidence interval methods:
[[ci-normal]], [[ci-basic]], [[ci-percentile]], [[ci-bca]], [[ci-studentized]], [[ci-t]], [[quantiles]].

source raw docstring

ci-bca^clj

(ci-bca boot-data)

(ci-bca boot-data alpha)

(ci-bca {:keys [t0 ts data statistic]} alpha estimation-strategy)

Calculates the Bias-Corrected and Accelerated (BCa) bootstrap confidence interval.

The BCa interval is a sophisticated method that corrects for both bias and skewness in the distribution of the bootstrap statistic replicates. It is considered a more accurate interval, particularly when the bootstrap distribution is skewed.

The calculation requires two components:

A bias correction factor ($z_0$) based on the proportion of bootstrap replicates less than the original statistic ($t_0$).
An acceleration factor ($a$) which quantifies the rate of change of the standard error of the statistic with respect to the true parameter value.

The function uses one of two methods to calculate the acceleration factor:

Jackknife method: If the input boot-data map contains the original :data and the :statistic function used to compute :t0 and :ts, the acceleration factor is estimated using the jackknife method (by computing the statistic on leave-one-out jackknife samples).
Empirical method: If :data or :statistic are missing from boot-data, the acceleration factor is estimated empirically from the distribution of the bootstrap replicates (:ts) using its skewness.

Parameters:

boot-data (map): A map containing bootstrap results, typically from bootstrap-stats. Requires keys:
- :t0 (double): The statistic calculated on the original data.
- :ts (sequence of numbers): The statistic calculated on each bootstrap sample. May optionally include:
- :data (sequence): The original dataset (required for jackknife acceleration).
- :statistic (function): The function used to calculate the statistic (required for jackknife acceleration).
alpha (double, optional): The significance level for the interval. Defaults to 0.05 (for a 95% CI). The BCa method uses quantiles of the normal distribution and the bootstrap replicates, adjusted by the bias and acceleration factors.
estimation-strategy (keyword, optional): Specifies the quantile estimation strategy used to calculate the quantiles of the bootstrap replicates (:ts) for the final interval bounds after applying corrections. Defaults to :legacy. See [[quantiles]] for available options (e.g., :r1 through :r9).

Returns a vector [lower-bound, upper-bound, t0].

lower-bound (double): The lower limit of the confidence interval.
upper-bound (double): The upper limit of the confidence interval.
t0 (double): The statistic calculated on the original data (from boot-data).

See also bootstrap-stats for input preparation and other confidence interval methods: ci-normal, ci-basic, ci-percentile, ci-bc, ci-studentized, ci-t, jackknife, [[quantiles]].

Calculates the Bias-Corrected and Accelerated (BCa) bootstrap confidence interval.

The BCa interval is a sophisticated method that corrects for both bias and
skewness in the distribution of the bootstrap statistic replicates. It is
considered a more accurate interval, particularly when the bootstrap
distribution is skewed.

The calculation requires two components:
1.  A **bias correction factor** ($z_0$) based on the proportion of bootstrap
    replicates less than the original statistic ($t_0$).
2.  An **acceleration factor** ($a$) which quantifies the rate of change of the
    standard error of the statistic with respect to the true parameter value.

The function uses one of two methods to calculate the acceleration factor:

*   **Jackknife method**: If the input `boot-data` map contains the original
    `:data` and the `:statistic` function used to compute `:t0` and `:ts`,
    the acceleration factor is estimated using the jackknife method (by computing
    the statistic on leave-one-out jackknife samples).
*   **Empirical method**: If `:data` or `:statistic` are missing from `boot-data`,
    the acceleration factor is estimated empirically from the distribution of
    the bootstrap replicates (`:ts`) using its skewness.

Parameters:

* `boot-data` (map): A map containing bootstrap results, typically from [[bootstrap-stats]].
  Requires keys:
    * `:t0` (double): The statistic calculated on the original data.
    * `:ts` (sequence of numbers): The statistic calculated on each bootstrap sample.
  May optionally include:
    * `:data` (sequence): The original dataset (required for jackknife acceleration).
    * `:statistic` (function): The function used to calculate the statistic (required for jackknife acceleration).
* `alpha` (double, optional): The significance level for the interval.
  Defaults to `0.05` (for a 95% CI). The BCa method uses quantiles of the
  normal distribution and the bootstrap replicates, adjusted by the bias
  and acceleration factors.
* `estimation-strategy` (keyword, optional): Specifies the quantile estimation strategy
  used to calculate the quantiles of the bootstrap replicates (`:ts`) for the
  final interval bounds after applying corrections. Defaults to `:legacy`.
  See [[quantiles]] for available options (e.g., `:r1` through `:r9`).

Returns a vector `[lower-bound, upper-bound, t0]`.

* `lower-bound` (double): The lower limit of the confidence interval.
* `upper-bound` (double): The upper limit of the confidence interval.
* `t0` (double): The statistic calculated on the original data (from `boot-data`).

See also [[bootstrap-stats]] for input preparation and other confidence interval methods:
[[ci-normal]], [[ci-basic]], [[ci-percentile]], [[ci-bc]], [[ci-studentized]], [[ci-t]], [[jackknife]], [[quantiles]].

source raw docstring

ci-normal^clj

(ci-normal boot-data)

(ci-normal {:keys [t0 ts stddev bias]} alpha)

Calculates a Normal (Gaussian) approximation bias-corrected confidence interval.

This method assumes the distribution of the bootstrap replicates of the statistic (:ts) is approximately normal. It computes a confidence interval centered around the mean of the bootstrap statistics, adjusted by the estimated bias (mean(:ts) - :t0), and uses the standard error of the bootstrap statistics for scaling.

Parameters:

boot-data (map): A map containing bootstrap results. Typically produced by bootstrap-stats. Requires keys:
- :t0 (double): The statistic calculated on the original data.
- :ts (sequence of numbers): The statistic calculated on each bootstrap sample. May optionally include pre-calculated :stddev (standard deviation of :ts) and :bias for efficiency.
alpha (double, optional): The significance level for the interval. Defaults to 0.05 (for a 95% CI). The interval is based on the alpha/2 and 1 - alpha/2 quantiles of the standard normal distribution.

Returns a vector [lower-bound, upper-bound, t0].

lower-bound (double): The lower limit of the confidence interval.
upper-bound (double): The upper limit of the confidence interval.
t0 (double): The statistic calculated on the original data (from boot-data).

See also bootstrap-stats for input preparation and other confidence interval methods: ci-basic, ci-percentile, ci-bc, ci-bca, ci-studentized, ci-t.

Calculates a Normal (Gaussian) approximation bias-corrected confidence interval.

This method assumes the distribution of the bootstrap replicates of the statistic (`:ts`)
is approximately normal. It computes a confidence interval centered around the
mean of the bootstrap statistics, adjusted by the estimated bias (`mean(:ts) - :t0`),
and uses the standard error of the bootstrap statistics for scaling.

Parameters:

* `boot-data` (map): A map containing bootstrap results. Typically produced by [[bootstrap-stats]].
  Requires keys:
    * `:t0` (double): The statistic calculated on the original data.
    * `:ts` (sequence of numbers): The statistic calculated on each bootstrap sample.
  May optionally include pre-calculated `:stddev` (standard deviation of `:ts`)
  and `:bias` for efficiency.
* `alpha` (double, optional): The significance level for the interval.
  Defaults to `0.05` (for a 95% CI). The interval is based on the `alpha/2`
  and `1 - alpha/2` quantiles of the standard normal distribution.

Returns a vector `[lower-bound, upper-bound, t0]`.

* `lower-bound` (double): The lower limit of the confidence interval.
* `upper-bound` (double): The upper limit of the confidence interval.
* `t0` (double): The statistic calculated on the original data (from `boot-data`).

See also [[bootstrap-stats]] for input preparation and other confidence interval methods:
[[ci-basic]], [[ci-percentile]], [[ci-bc]], [[ci-bca]], [[ci-studentized]], [[ci-t]].

source raw docstring

ci-percentile^clj

(ci-percentile boot-data)

(ci-percentile boot-data alpha)

(ci-percentile {:keys [t0 ts]} alpha estimation-strategy)

Calculates the Percentile bootstrap confidence interval.

This is the simplest bootstrap confidence interval method. It directly uses the quantiles of the bootstrap replicates of the statistic (:ts) as the confidence interval bounds.

For a confidence level of 1 - alpha, the interval is formed by taking the alpha/2 and 1 - alpha/2 quantiles of the distribution of bootstrap replicates (:ts).

Parameters:

boot-data (map): A map containing bootstrap results, typically from bootstrap-stats. Requires keys:
- :t0 (double): The statistic calculated on the original data.
- :ts (sequence of numbers): The statistic calculated on each bootstrap sample.
alpha (double, optional): The significance level for the interval. Defaults to 0.05 (for a 95% CI). The interval is based on the alpha/2 and 1 - alpha/2 quantiles of the :ts distribution.
estimation-strategy (keyword, optional): Specifies the quantile estimation strategy used to calculate the quantiles of :ts. Defaults to :legacy. See [[quantiles]] for available options (e.g., :r1 through :r9).

Returns a vector [lower-bound, upper-bound, t0].

lower-bound (double): The alpha/2 quantile of :ts.
upper-bound (double): The 1 - alpha/2 quantile of :ts.
t0 (double): The statistic calculated on the original data (from boot-data).

See also bootstrap-stats for input preparation and other confidence interval methods: ci-normal, ci-basic, ci-bc, ci-bca, ci-studentized, ci-t, [[quantiles]].

Calculates the Percentile bootstrap confidence interval.

This is the simplest bootstrap confidence interval method. It directly uses
the quantiles of the bootstrap replicates of the statistic (`:ts`) as the
confidence interval bounds.

For a confidence level of `1 - alpha`, the interval is formed by taking the
`alpha/2` and `1 - alpha/2` quantiles of the distribution of bootstrap
replicates (`:ts`).

Parameters:

* `boot-data` (map): A map containing bootstrap results, typically from [[bootstrap-stats]].
  Requires keys:
    * `:t0` (double): The statistic calculated on the original data.
    * `:ts` (sequence of numbers): The statistic calculated on each bootstrap sample.
* `alpha` (double, optional): The significance level for the interval.
  Defaults to `0.05` (for a 95% CI). The interval is based on the `alpha/2`
  and `1 - alpha/2` quantiles of the `:ts` distribution.
* `estimation-strategy` (keyword, optional): Specifies the quantile estimation strategy
  used to calculate the quantiles of `:ts`. Defaults to `:legacy`. See [[quantiles]]
  for available options (e.g., `:r1` through `:r9`).

Returns a vector `[lower-bound, upper-bound, t0]`.

* `lower-bound` (double): The `alpha/2` quantile of `:ts`.
* `upper-bound` (double): The `1 - alpha/2` quantile of `:ts`.
* `t0` (double): The statistic calculated on the original data (from `boot-data`).

See also [[bootstrap-stats]] for input preparation and other confidence interval methods:
[[ci-normal]], [[ci-basic]], [[ci-bc]], [[ci-bca]], [[ci-studentized]], [[ci-t]], [[quantiles]].

source raw docstring

ci-studentized^clj

(ci-studentized boot-data)

(ci-studentized boot-data alpha)

(ci-studentized {:keys [t0 ts data samples]} alpha estimation-strategy)

Calculates the Studentized (or Bootstrap-t) confidence interval.

This method is based on the distribution of the studentized pivotal quantity (statistic(sample) - statistic(data)) / standard_error(statistic(sample)). It estimates the quantiles of this distribution using bootstrap replicates and then uses them to construct a confidence interval around the statistic calculated on the original data (:t0), scaled by the standard error of the statistic calculated on the original data (stddev(:data)).

Parameters:

boot-data (map): A map containing bootstrap results and necessary inputs. This map typically comes from bootstrap-stats and augmented with :data and :samples from the original bootstrap call if not already present. Requires the following keys:
- :t0 (double): The statistic calculated on the original data.
- :ts (sequence of numbers): The statistic calculated on each bootstrap sample.
- :data (sequence): The original dataset used for bootstrapping. Needed to estimate the standard error of the statistic for scaling the interval.
- :samples (collection of sequences): The collection of bootstrap samples. Needed to calculate the standard error of the statistic for each bootstrap sample.
alpha (double, optional): The significance level for the interval. Defaults to 0.05 (for a 95% CI). The interval is based on the alpha/2 and 1 - alpha/2 quantiles of the studentized bootstrap replicates.
estimation-strategy (keyword, optional): Specifies the quantile estimation strategy used to calculate the quantiles of the studentized replicates. Defaults to :legacy. See [[quantiles]] for available options (e.g., :r1 through :r9).

Returns a vector [lower-bound, upper-bound, t0].

lower-bound (double): The lower limit of the confidence interval.
upper-bound (double): The upper limit of the confidence interval.
t0 (double): The statistic calculated on the original data (from boot-data).

See also bootstrap-stats for input preparation and other confidence interval methods: ci-normal, ci-basic, ci-percentile, ci-bc, ci-bca, ci-t, [[stats/stddev]], [[stats/quantiles]].

Calculates the Studentized (or Bootstrap-t) confidence interval.

This method is based on the distribution of the studentized pivotal quantity
` (statistic(sample) - statistic(data)) / standard_error(statistic(sample)) `.
It estimates the quantiles of this distribution using bootstrap replicates
and then uses them to construct a confidence interval around the statistic calculated
on the original data (`:t0`), scaled by the standard error of the statistic calculated
on the original data (`stddev(:data)`).

Parameters:

* `boot-data` (map): A map containing bootstrap results and necessary inputs.
  This map typically comes from [[bootstrap-stats]] and augmented with `:data` and `:samples`
  from the original [[bootstrap]] call if not already present.
  Requires the following keys:
    * `:t0` (double): The statistic calculated on the original data.
    * `:ts` (sequence of numbers): The statistic calculated on each bootstrap sample.
    * `:data` (sequence): The original dataset used for bootstrapping. Needed to estimate the standard error of the statistic for scaling the interval.
    * `:samples` (collection of sequences): The collection of bootstrap samples. Needed to calculate the standard error of the statistic for each bootstrap sample.
* `alpha` (double, optional): The significance level for the interval.
  Defaults to `0.05` (for a 95% CI). The interval is based on the `alpha/2`
  and `1 - alpha/2` quantiles of the studentized bootstrap replicates.
* `estimation-strategy` (keyword, optional): Specifies the quantile estimation strategy
  used to calculate the quantiles of the studentized replicates. Defaults to `:legacy`.
  See [[quantiles]] for available options (e.g., `:r1` through `:r9`).

Returns a vector `[lower-bound, upper-bound, t0]`.

* `lower-bound` (double): The lower limit of the confidence interval.
* `upper-bound` (double): The upper limit of the confidence interval.
* `t0` (double): The statistic calculated on the original data (from `boot-data`).

See also [[bootstrap-stats]] for input preparation and other confidence interval methods:
[[ci-normal]], [[ci-basic]], [[ci-percentile]], [[ci-bc]], [[ci-bca]], [[ci-t]], [[stats/stddev]], [[stats/quantiles]].

source raw docstring

ci-t^clj

(ci-t boot-data)

(ci-t {:keys [t0 ts stddev]} alpha)

Calculates a confidence interval based on Student's t-distribution, centered at the original statistic value.

This method constructs a confidence interval centered at the statistic calculated on the original data (:t0). The width of the interval is determined by the standard deviation of the bootstrap replicates (:ts), scaled by a critical value from a Student's t-distribution. The degrees of freedom for the t-distribution are based on the number of bootstrap replicates (count(:ts) - 1).

This interval does not explicitly use the Studentized bootstrap pivotal quantity. Instead, it applies a standard t-interval structure using components derived from the bootstrap results and the original data.

Parameters:

boot-data (map): A map containing bootstrap results, typically from bootstrap-stats. Requires keys:
- :t0 (double): The statistic calculated on the original data.
- :ts (sequence of numbers): The statistic calculated on each bootstrap sample. May optionally include pre-calculated :stddev (standard deviation of :ts) for efficiency.
alpha (double, optional): The significance level for the interval. Defaults to 0.05 (for a 95% CI). The interval is based on the alpha/2 and 1 - alpha/2 quantiles of the Student's t-distribution with count(:ts) - 1 degrees of freedom.

Returns a vector [lower-bound, upper-bound, t0].

lower-bound (double): The lower limit of the confidence interval.
upper-bound (double): The upper limit of the confidence interval.
t0 (double): The statistic calculated on the original data (from boot-data).

See also bootstrap-stats for input preparation and other confidence interval methods: ci-normal, ci-basic, ci-percentile, ci-bc, ci-bca, ci-studentized.

Calculates a confidence interval based on Student's t-distribution, centered at the original statistic value.

This method constructs a confidence interval centered at the statistic calculated on the original data (`:t0`). The width of the interval is determined by the standard deviation of the bootstrap replicates (`:ts`), scaled by a critical value from a Student's t-distribution. The degrees of freedom for the t-distribution are based on the number of bootstrap replicates (`count(:ts) - 1`).

This interval does not explicitly use the Studentized bootstrap pivotal quantity. Instead, it applies a standard t-interval structure using components derived from the bootstrap results and the original data.

Parameters:

* `boot-data` (map): A map containing bootstrap results, typically from [[bootstrap-stats]]. Requires keys:
    * `:t0` (double): The statistic calculated on the original data.
    * `:ts` (sequence of numbers): The statistic calculated on each bootstrap sample.
    May optionally include pre-calculated `:stddev` (standard deviation of `:ts`) for efficiency.
* `alpha` (double, optional): The significance level for the interval. Defaults to `0.05` (for a 95% CI). The interval is based on the `alpha/2` and `1 - alpha/2` quantiles of the Student's t-distribution with `count(:ts) - 1` degrees of freedom.

Returns a vector `[lower-bound, upper-bound, t0]`.

* `lower-bound` (double): The lower limit of the confidence interval.
* `upper-bound` (double): The upper limit of the confidence interval.
* `t0` (double): The statistic calculated on the original data (from `boot-data`).

See also [[bootstrap-stats]] for input preparation and other confidence interval methods:
[[ci-normal]], [[ci-basic]], [[ci-percentile]], [[ci-bc]], [[ci-bca]], [[ci-studentized]].

source raw docstring

jackknife^clj

(jackknife vs)

Generates a set of samples from a given sequence using the jackknife leave-one-out method.

For an input sequence vs of size n, this method creates n samples. Each sample is formed by removing a single observation from the original sequence.

Parameters:

vs (sequence): The input data sequence.

Returns a sequence of sequences. The i-th inner sequence is vs with the i-th element removed.

These samples are commonly used for estimating the bias and standard error of a statistic (e.g., via bootstrap-stats).

Generates a set of samples from a given sequence using the jackknife leave-one-out method.

For an input sequence `vs` of size `n`, this method creates `n` samples. Each sample is formed by removing a single observation from the original sequence.

Parameters:

* `vs` (sequence): The input data sequence.

Returns a sequence of sequences. The i-th inner sequence is `vs` with the i-th element removed.

These samples are commonly used for estimating the bias and standard error of a statistic (e.g., via [[bootstrap-stats]]).

source raw docstring

jackknife+^clj

(jackknife+ vs)

Generates a set of samples from a sequence using the 'jackknife positive' method.

For an input sequence vs of size n, this method creates n samples. Each sample is formed by duplicating a single observation from the original sequence and adding it back to the original sequence. Thus, each sample has size n+1.

Parameters:

vs (sequence): The input data sequence.

Returns a sequence of sequences. The i-th inner sequence is vs with an additional copy of the i-th element of vs.

This method is used in specific resampling techniques for estimating bias and variance of a statistic.

Generates a set of samples from a sequence using the 'jackknife positive' method.

For an input sequence `vs` of size `n`, this method creates `n` samples. Each sample is formed by duplicating a single observation from the original sequence and adding it back to the original sequence. Thus, each sample has size `n+1`.

Parameters:

* `vs` (sequence): The input data sequence.

Returns a sequence of sequences. The i-th inner sequence is `vs` with an additional copy of the i-th element of `vs`.

This method is used in specific resampling techniques for estimating bias and variance of a statistic.

source raw docstring

cljdoc builds & hosts documentation for Clojure/Script libraries

Keyboard shortcuts

`Ctrl`+`k`	Jump to recent docs
`←`	Move to previous article
`→`	Move to next article
`Ctrl`+`/`	Jump to the search field

Raise an issue Browse cljdoc source Chat on Slack

× close