Statistics functions.
All functions are backed by Apache Commons Math or SMILE libraries. All work with Clojure sequences.
All in one function stats-map
contains:
:Size
- size of the samples, (count ...)
:Min
- minimum
value:Max
- maximum
value:Range
- range of values:Mean
- mean
/average:Median
- median
, see also: median-3
:Mode
- mode
, see also: modes
:Q1
- first quartile, use: percentile
, [[quartile]]:Q3
- third quartile, use: percentile
, [[quartile]]:Total
- sum
of all samples:SD
- sample standard deviation:Variance
- variance:MAD
- median-absolute-deviation
:SEM
- standard error of mean:LAV
- lower adjacent value, use: adjacent-values
:UAV
- upper adjacent value, use: adjacent-values
:IQR
- interquartile range, (- q3 q1)
:LOF
- lower outer fence, (- q1 (* 3.0 iqr))
:UOF
- upper outer fence, (+ q3 (* 3.0 iqr))
:LIF
- lower inner fence, (- q1 (* 1.5 iqr))
:UIF
- upper inner fence, (+ q3 (* 1.5 iqr))
:Outliers
- list of outliers
, samples which are outside outer fences:Kurtosis
- kurtosis
:Skewness
- skewness
Note: percentile
and [[quartile]] can have 10 different interpolation strategies. See docs
Statistics functions. * Descriptive statistics. * Correlation / covariance * Outliers * Confidence intervals * Extents * Effect size * Student's t-test * Histogram * ACF/PACF * Bootstrap * Binary measures All functions are backed by Apache Commons Math or SMILE libraries. All work with Clojure sequences. ### Descriptive statistics All in one function [[stats-map]] contains: * `:Size` - size of the samples, `(count ...)` * `:Min` - [[minimum]] value * `:Max` - [[maximum]] value * `:Range` - range of values * `:Mean` - [[mean]]/average * `:Median` - [[median]], see also: [[median-3]] * `:Mode` - [[mode]], see also: [[modes]] * `:Q1` - first quartile, use: [[percentile]], [[quartile]] * `:Q3` - third quartile, use: [[percentile]], [[quartile]] * `:Total` - [[sum]] of all samples * `:SD` - sample standard deviation * `:Variance` - variance * `:MAD` - [[median-absolute-deviation]] * `:SEM` - standard error of mean * `:LAV` - lower adjacent value, use: [[adjacent-values]] * `:UAV` - upper adjacent value, use: [[adjacent-values]] * `:IQR` - interquartile range, `(- q3 q1)` * `:LOF` - lower outer fence, `(- q1 (* 3.0 iqr))` * `:UOF` - upper outer fence, `(+ q3 (* 3.0 iqr))` * `:LIF` - lower inner fence, `(- q1 (* 1.5 iqr))` * `:UIF` - upper inner fence, `(+ q3 (* 1.5 iqr))` * `:Outliers` - list of [[outliers]], samples which are outside outer fences * `:Kurtosis` - [[kurtosis]] * `:Skewness` - [[skewness]] Note: [[percentile]] and [[quartile]] can have 10 different interpolation strategies. See [docs](http://commons.apache.org/proper/commons-math/javadocs/api-3.6.1/org/apache/commons/math3/stat/descriptive/rank/Percentile.html)
(acf data)
(acf data lags)
Calculate acf (autocorrelation function) for given number of lags or a list of lags.
If lags is omitted function returns maximum possible number of lags.
Calculate acf (autocorrelation function) for given number of lags or a list of lags. If lags is omitted function returns maximum possible number of lags. See also [[acf-ci]], [[pacf]], [[pacf-ci]]
(acf-ci data lags)
(acf-ci data lags alpha)
acf
with added confidence interval data.
:cis
contains list of calculated ci for every lag.
[[acf]] with added confidence interval data. `:cis` contains list of calculated ci for every lag.
(adjacent-values vs)
(adjacent-values vs estimation-strategy)
(adjacent-values vs q1 q3)
Lower and upper adjacent values (LAV and UAV).
Let Q1 is 25-percentile and Q3 is 75-percentile. IQR is (- Q3 Q1)
.
(- Q1 (* 1.5 IQR))
.(+ Q3 (* 1.5 IQR))
.Optional estimation-strategy
argument can be set to change quantile calculations estimation type. See [[estimation-strategies]].
Lower and upper adjacent values (LAV and UAV). Let Q1 is 25-percentile and Q3 is 75-percentile. IQR is `(- Q3 Q1)`. * LAV is smallest value which is greater or equal to the LIF = `(- Q1 (* 1.5 IQR))`. * UAV is largest value which is lower or equal to the UIF = `(+ Q3 (* 1.5 IQR))`. * third value is a median of samples Optional `estimation-strategy` argument can be set to change quantile calculations estimation type. See [[estimation-strategies]].
(ameasure group1 group2)
Vargha-Delaney A measure for two populations a and b
Vargha-Delaney A measure for two populations a and b
(binary-measures truth prediction)
(binary-measures truth prediction true-value)
Subset of binary measures. See binary-measures-all
.
Following keys are returned: [:tp :tn :fp :fn :accuracy :fdr :f-measure :fall-out :precision :recall :sensitivity :specificity :prevalance]
Subset of binary measures. See [[binary-measures-all]]. Following keys are returned: `[:tp :tn :fp :fn :accuracy :fdr :f-measure :fall-out :precision :recall :sensitivity :specificity :prevalance]`
(binary-measures-all truth prediction)
(binary-measures-all truth prediction true-value)
Collection of binary measures.
truth
- list of ground truth valuesprediction
- list of predicted valuestrue-value
- optional, what is true in truth
and prediction
true-value
can be one of:
nil
- values are treating as booleanstrue
false
)Collection of binary measures. * `truth` - list of ground truth values * `prediction` - list of predicted values * `true-value` - optional, what is true in `truth` and `prediction` `true-value` can be one of: * `nil` - values are treating as booleans * any sequence - values from sequence will be treated as `true` * map - conversion will be done according to provided map (if there is no correspondin key, value is treated as `false`) https://en.wikipedia.org/wiki/Precision_and_recall
(bootstrap vs)
(bootstrap vs samples)
(bootstrap vs samples size)
Generate set of samples of given size from provided data.
Default samples
is 50, number of size
defaults to 1000
Generate set of samples of given size from provided data. Default `samples` is 50, number of `size` defaults to 1000
(bootstrap-ci vs)
(bootstrap-ci vs alpha)
(bootstrap-ci vs alpha samples)
(bootstrap-ci vs alpha samples stat-fn)
Bootstrap method to calculate confidence interval.
Alpha defaults to 0.98, samples to 1000.
Last parameter is statistical function used to measure, default: mean
.
Returns ci and statistical function value.
Bootstrap method to calculate confidence interval. Alpha defaults to 0.98, samples to 1000. Last parameter is statistical function used to measure, default: [[mean]]. Returns ci and statistical function value.
(ci vs)
(ci vs alpha)
T-student based confidence interval for given data. Alpha value defaults to 0.98.
Last value is mean.
T-student based confidence interval for given data. Alpha value defaults to 0.98. Last value is mean.
(cliffs-delta group1 group2)
Cliff's delta effect size for ordinal data.
Cliff's delta effect size for ordinal data.
(cohens-d group1 group2)
Cohen's d effect size for two groups, using sqrt of mean of variances as pooled sd
Cohen's d effect size for two groups, using sqrt of mean of variances as pooled sd
(cohens-d-corrected group1 group2)
Cohen's d corrected for small group size
Cohen's d corrected for small group size
(cohens-f2 group1 group2)
(cohens-f2 type group1 group2)
Cohens f2, by default based on eta-sq
.
Possible type
values are: :eta
(default), :omega
and :epsilon
.
Cohens f2, by default based on `eta-sq`. Possible `type` values are: `:eta` (default), `:omega` and `:epsilon`.
(cohens-q r1 r2)
(cohens-q group1 group2a group2b)
(cohens-q group1a group2a group1b group2b)
Comparison of two correlations.
Arity:
group1
and group2a
with correlation of group1
and group2b
Comparison of two correlations. Arity: * 2 - compare two correlation values * 3 - compare correlation of `group1` and `group2a` with correlation of `group1` and `group2b` * 4 - compare correlation of first two arguments with correlation of last two arguments
(cohens-w group1 group2)
Cohen's W effect size for discrete data.
Cohen's W effect size for discrete data.
(correlation vs1 vs2)
Correlation of two sequences.
Correlation of two sequences.
(count= vs1 vs2)
Count equal values in both seqs.
Count equal values in both seqs.
(covariance vs1 vs2)
Covariance of two sequences.
Covariance of two sequences.
(covariance-matrix vss)
Generate covariance matrix from seq of seqs. Row order.
Generate covariance matrix from seq of seqs. Row order.
(cramers-v group1 group2)
Cramer's V effect size for discrete data.
Cramer's V effect size for discrete data.
(cramers-v-corrected group1 group2)
Corrected Cramer's V
Corrected Cramer's V
(estimate-bins vs)
(estimate-bins vs bins-or-estimate-method)
Estimate number of bins for histogram.
Possible methods are: :sqrt
:sturges
:rice
:doane
:scott
:freedman-diaconis
(default).
Estimate number of bins for histogram. Possible methods are: `:sqrt` `:sturges` `:rice` `:doane` `:scott` `:freedman-diaconis` (default).
(extent vs)
Return extent (min, max, mean) values from sequence
Return extent (min, max, mean) values from sequence
(geomean vs)
Geometric mean for positive values only
Geometric mean for positive values only
(glass-delta group1 group2)
Glass's delta effect size for two groups
Glass's delta effect size for two groups
(hedges-g group1 group2)
Hedges's g effect size for two groups
Hedges's g effect size for two groups
(hedges-g* group1 group2)
Less biased Hedges's g effect size for two groups
Less biased Hedges's g effect size for two groups
(hedges-g-corrected group1 group2)
Cohen's d corrected for small group size
Cohen's d corrected for small group size
(histogram vs)
(histogram vs bins-or-estimate-method)
(histogram vs bins [mn mx])
Calculate histogram.
Returns map with keys:
:size
- number of bins:step
- distance between bins:bins
- list of pairs of range lower value and number of hits:min
- min value:max
- max value:samples
- number of used samplesFor estimation methods check estimate-bins
.
Calculate histogram. Returns map with keys: * `:size` - number of bins * `:step` - distance between bins * `:bins` - list of pairs of range lower value and number of hits * `:min` - min value * `:max` - max value * `:samples` - number of used samples For estimation methods check [[estimate-bins]].
(iqr vs)
(iqr vs estimation-strategy)
Interquartile range.
Interquartile range.
(jensen-shannon-divergence vs1 vs2)
Jensen-Shannon divergence of two sequences.
Jensen-Shannon divergence of two sequences.
(kendall-correlation vs1 vs2)
Kendall's correlation of two sequences.
Kendall's correlation of two sequences.
(kullback-leibler-divergence vs1 vs2)
Kullback-Leibler divergence of two sequences.
Kullback-Leibler divergence of two sequences.
(kurtosis vs)
(kurtosis vs typ)
Calculate kurtosis from sequence.
Possible typs: :G2
(default), :g2
, :excess
or :kurt
.
Calculate kurtosis from sequence. Possible typs: `:G2` (default), `:g2`, `:excess` or `:kurt`.
Alias for median-absolute-deviation
Alias for [[median-absolute-deviation]]
(mad-extent vs)
-/+ median-absolute-deviation and median
-/+ median-absolute-deviation and median
(median vs)
Calculate median of vs
. See median-3
.
Calculate median of `vs`. See [[median-3]].
(median-3 a b c)
Median of three values. See median
.
Median of three values. See [[median]].
(median-absolute-deviation vs)
Calculate MAD
Calculate MAD
(mode vs)
Find the value that appears most often in a dataset vs
.
See also modes
.
Find the value that appears most often in a dataset `vs`. See also [[modes]].
(modes vs)
Find the values that appears most often in a dataset vs
.
Returns sequence with all most appearing values in increasing order.
See also mode
.
Find the values that appears most often in a dataset `vs`. Returns sequence with all most appearing values in increasing order. See also [[mode]].
(moment vs)
(moment vs order)
(moment vs
order
{:keys [absolute? center mean?]
:or {absolute? false center nil mean? true}})
Calculate moment (central or/and absolute) of given order (default: 2).
Additional parameters as a map:
:absolute?
- calculate sum as absolute values (default: false
):mean?
- returns mean (proper moment) or just sum of differences (default: true
):center
- value of central (default: nil
= mean)Calculate moment (central or/and absolute) of given order (default: 2). Additional parameters as a map: * `:absolute?` - calculate sum as absolute values (default: `false`) * `:mean?` - returns mean (proper moment) or just sum of differences (default: `true`) * `:center` - value of central (default: `nil` = mean)
(outliers vs)
(outliers vs estimation-strategy)
(outliers vs q1 q3)
Find outliers defined as values outside outer fences.
Let Q1 is 25-percentile and Q3 is 75-percentile. IQR is (- Q3 Q1)
.
(- Q1 (* 1.5 IQR))
.(+ Q3 (* 1.5 IQR))
.Returns sequence.
Optional estimation-strategy
argument can be set to change quantile calculations estimation type. See [[estimation-strategies]].
Find outliers defined as values outside outer fences. Let Q1 is 25-percentile and Q3 is 75-percentile. IQR is `(- Q3 Q1)`. * LIF (Lower Outer Fence) equals `(- Q1 (* 1.5 IQR))`. * UIF (Upper Outer Fence) equals `(+ Q3 (* 1.5 IQR))`. Returns sequence. Optional `estimation-strategy` argument can be set to change quantile calculations estimation type. See [[estimation-strategies]].
(pacf data)
(pacf data lags)
Caluclate pacf (partial autocorrelation function) for given number of lags.
If lags is omitted function returns maximum possible number of lags.
pacf
returns also lag 0
(which is 0.0
).
Caluclate pacf (partial autocorrelation function) for given number of lags. If lags is omitted function returns maximum possible number of lags. `pacf` returns also lag `0` (which is `0.0`). See also [[acf]], [[acf-ci]], [[pacf-ci]]
(pacf-ci data lags)
(pacf-ci data lags alpha)
pacf
with added confidence interval data.
[[pacf]] with added confidence interval data.
(pearson-correlation vs1 vs2)
Pearson's correlation of two sequences.
Pearson's correlation of two sequences.
(pearson-r group1 group2)
Pearson r
correlation coefficient
Pearson `r` correlation coefficient
(percentile vs p)
(percentile vs p estimation-strategy)
Calculate percentile of a vs
.
Percentile p
is from range 0-100.
See docs.
Optionally you can provide estimation-strategy
to change interpolation methods for selecting values. Default is :legacy
. See more here
See also quantile
.
Calculate percentile of a `vs`. Percentile `p` is from range 0-100. See [docs](http://commons.apache.org/proper/commons-math/javadocs/api-3.4/org/apache/commons/math3/stat/descriptive/rank/Percentile.html). Optionally you can provide `estimation-strategy` to change interpolation methods for selecting values. Default is `:legacy`. See more [here](http://commons.apache.org/proper/commons-math/javadocs/api-3.6.1/org/apache/commons/math3/stat/descriptive/rank/Percentile.EstimationType.html) See also [[quantile]].
(percentile-bc-extent vs)
(percentile-bc-extent vs p)
(percentile-bc-extent vs p1 p2)
(percentile-bc-extent vs p1 p2 estimation-strategy)
Return bias corrected percentile range and mean for bootstrap samples. See https://projecteuclid.org/euclid.ss/1032280214
p
- calculates extent of bias corrected p
and 100-p
(default: p=2.5
)
Set estimation-strategy
to :r7
to get the same result as in R coxed::bca
.
Return bias corrected percentile range and mean for bootstrap samples. See https://projecteuclid.org/euclid.ss/1032280214 `p` - calculates extent of bias corrected `p` and `100-p` (default: `p=2.5`) Set `estimation-strategy` to `:r7` to get the same result as in R `coxed::bca`.
(percentile-bca-extent vs)
(percentile-bca-extent vs p)
(percentile-bca-extent vs p1 p2)
(percentile-bca-extent vs p1 p2 estimation-strategy)
(percentile-bca-extent vs p1 p2 accel estimation-strategy)
Return bias corrected percentile range and mean for bootstrap samples. Also accounts for variance variations throught the accelaration parameter. See https://projecteuclid.org/euclid.ss/1032280214
p
- calculates extent of bias corrected p
and 100-p
(default: p=2.5
)
Set estimation-strategy
to :r7
to get the same result as in R coxed::bca
.
Return bias corrected percentile range and mean for bootstrap samples. Also accounts for variance variations throught the accelaration parameter. See https://projecteuclid.org/euclid.ss/1032280214 `p` - calculates extent of bias corrected `p` and `100-p` (default: `p=2.5`) Set `estimation-strategy` to `:r7` to get the same result as in R `coxed::bca`.
(percentile-extent vs)
(percentile-extent vs p)
(percentile-extent vs p1 p2)
(percentile-extent vs p1 p2 estimation-strategy)
Return percentile range and median.
p
- calculates extent of p
and 100-p
(default: p=25
)
Return percentile range and median. `p` - calculates extent of `p` and `100-p` (default: `p=25`)
(percentiles vs ps)
(percentiles vs ps estimation-strategy)
Calculate percentiles of a vs
.
Percentiles are sequence of values from range 0-100.
See docs.
Optionally you can provide estimation-strategy
to change interpolation methods for selecting values. Default is :legacy
. See more here
See also quantile
.
Calculate percentiles of a `vs`. Percentiles are sequence of values from range 0-100. See [docs](http://commons.apache.org/proper/commons-math/javadocs/api-3.4/org/apache/commons/math3/stat/descriptive/rank/Percentile.html). Optionally you can provide `estimation-strategy` to change interpolation methods for selecting values. Default is `:legacy`. See more [here](http://commons.apache.org/proper/commons-math/javadocs/api-3.6.1/org/apache/commons/math3/stat/descriptive/rank/Percentile.EstimationType.html) See also [[quantile]].
(population-stddev vs)
(population-stddev vs u)
Calculate population standard deviation of vs
.
See stddev
.
Calculate population standard deviation of `vs`. See [[stddev]].
(population-variance vs)
(population-variance vs u)
Calculate population variance of vs
.
See variance
.
Calculate population variance of `vs`. See [[variance]].
(psnr vs1 vs2)
(psnr vs1 vs2 max-value)
Peak signal to noise, max-value
is maximum possible value (default: max from vs1
and vs2
)
Peak signal to noise, `max-value` is maximum possible value (default: max from `vs1` and `vs2`)
(quantile vs q)
(quantile vs q estimation-strategy)
Calculate quantile of a vs
.
Quantile q
is from range 0.0-1.0.
See docs for interpolation strategy.
Optionally you can provide estimation-strategy
to change interpolation methods for selecting values. Default is :legacy
. See more here
See also percentile
.
Calculate quantile of a `vs`. Quantile `q` is from range 0.0-1.0. See [docs](http://commons.apache.org/proper/commons-math/javadocs/api-3.4/org/apache/commons/math3/stat/descriptive/rank/Percentile.html) for interpolation strategy. Optionally you can provide `estimation-strategy` to change interpolation methods for selecting values. Default is `:legacy`. See more [here](http://commons.apache.org/proper/commons-math/javadocs/api-3.6.1/org/apache/commons/math3/stat/descriptive/rank/Percentile.EstimationType.html) See also [[percentile]].
(quantiles vs qs)
(quantiles vs qs estimation-strategy)
Calculate quantiles of a vs
.
Quantilizes is sequence with values from range 0.0-1.0.
See docs for interpolation strategy.
Optionally you can provide estimation-strategy
to change interpolation methods for selecting values. Default is :legacy
. See more here
See also percentiles
.
Calculate quantiles of a `vs`. Quantilizes is sequence with values from range 0.0-1.0. See [docs](http://commons.apache.org/proper/commons-math/javadocs/api-3.4/org/apache/commons/math3/stat/descriptive/rank/Percentile.html) for interpolation strategy. Optionally you can provide `estimation-strategy` to change interpolation methods for selecting values. Default is `:legacy`. See more [here](http://commons.apache.org/proper/commons-math/javadocs/api-3.6.1/org/apache/commons/math3/stat/descriptive/rank/Percentile.EstimationType.html) See also [[percentiles]].
(r2-determination group1 group2)
Coefficient of determination
Coefficient of determination
(skewness vs)
(skewness vs typ)
Calculate skewness from sequence.
Possible types: :G1
(default), :g1
(:pearson
), :b1
, :B1
(:yule
), :B3
, :skew
, :mode
or :median
.
Calculate skewness from sequence. Possible types: `:G1` (default), `:g1` (`:pearson`), `:b1`, `:B1` (`:yule`), `:B3`, `:skew`, `:mode` or `:median`.
(spearman-correlation vs1 vs2)
Spearman's correlation of two sequences.
Spearman's correlation of two sequences.
(standardize vs)
Normalize samples to have mean = 0 and stddev = 1.
Normalize samples to have mean = 0 and stddev = 1.
(stats-map vs)
(stats-map vs estimation-strategy)
Calculate several statistics of vs
and return as map.
Optional estimation-strategy
argument can be set to change quantile calculations estimation type. See [[estimation-strategies]].
Calculate several statistics of `vs` and return as map. Optional `estimation-strategy` argument can be set to change quantile calculations estimation type. See [[estimation-strategies]].
(stddev vs)
(stddev vs u)
Calculate standard deviation of vs
.
See population-stddev
.
Calculate standard deviation of `vs`. See [[population-stddev]].
(trim vs)
(trim vs quantile)
(trim vs quantile estimation-strategy)
(trim vs low high nan)
Return trimmed data. Trim is done by using quantiles, by default is set to 0.2.
Return trimmed data. Trim is done by using quantiles, by default is set to 0.2.
(tschuprows-t group1 group2)
Tschuprows T effect size for discrete data
Tschuprows T effect size for discrete data
(ttest-one-sample xs)
(ttest-one-sample xs
{:keys [alpha sides mu]
:or {alpha 0.05 sides :two-sided mu 0.0}})
One-sample Student's t-test
alpha
- significance level (default: 0.05
)sides
- one of: :two-sided
, :one-sided-less
(short: :one-sided
) or :one-sided-greater
mu
- mean (default: 0.0
)One-sample Student's t-test * `alpha` - significance level (default: `0.05`) * `sides` - one of: `:two-sided`, `:one-sided-less` (short: `:one-sided`) or `:one-sided-greater` * `mu` - mean (default: `0.0`)
(ttest-two-samples xs ys)
(ttest-two-samples
xs
ys
{:keys [alpha sides mu paired? equal-variances?]
:or {alpha 0.05 sides :two-sided mu 0.0 paired? false equal-variances? false}
:as params})
Two-sample Student's t-test
alpha
- significance level (default: 0.05
)sides
- one of: :two-sided
, :one-sided-less
(short: :one-sided
) or :one-sided-greater
mu
- mean (default: 0.0
)paired?
- unpaired or paired test, boolean (default: false
)equal-variances?
- unequal or equal variances, boolean (default: false
)Two-sample Student's t-test * `alpha` - significance level (default: `0.05`) * `sides` - one of: `:two-sided`, `:one-sided-less` (short: `:one-sided`) or `:one-sided-greater` * `mu` - mean (default: `0.0`) * `paired?` - unpaired or paired test, boolean (default: `false`) * `equal-variances?` - unequal or equal variances, boolean (default: `false`)
(variance vs)
(variance vs u)
Calculate variance of vs
.
See population-variance
.
Calculate variance of `vs`. See [[population-variance]].
(winsor vs)
(winsor vs quantile)
(winsor vs quantile estimation-strategy)
(winsor vs low high nan)
Return winsorized data. Trim is done by using quantiles, by default is set to 0.2.
Return winsorized data. Trim is done by using quantiles, by default is set to 0.2.
cljdoc is a website building & hosting documentation for Clojure/Script libraries
× close