fastmath.stats

Liking cljdoc? Tell your friends :D

Clojure only.

Statistics functions.

Descriptive statistics.
Correlation / covariance
Outliers
Confidence intervals
Extents
Effect size
Student's t-test
Histogram
ACF/PACF
Bootstrap
Binary measures

All functions are backed by Apache Commons Math or SMILE libraries. All work with Clojure sequences.

Descriptive statistics

All in one function stats-map contains:

:Size - size of the samples, (count ...)
:Min - minimum value
:Max - maximum value
:Range - range of values
:Mean - mean/average
:Median - median, see also: median-3
:Mode - mode, see also: modes
:Q1 - first quartile, use: percentile, [[quartile]]
:Q3 - third quartile, use: percentile, [[quartile]]
:Total - sum of all samples
:SD - sample standard deviation
:Variance - variance
:MAD - median-absolute-deviation
:SEM - standard error of mean
:LAV - lower adjacent value, use: adjacent-values
:UAV - upper adjacent value, use: adjacent-values
:IQR - interquartile range, (- q3 q1)
:LOF - lower outer fence, (- q1 (* 3.0 iqr))
:UOF - upper outer fence, (+ q3 (* 3.0 iqr))
:LIF - lower inner fence, (- q1 (* 1.5 iqr))
:UIF - upper inner fence, (+ q3 (* 1.5 iqr))
:Outliers - list of outliers, samples which are outside outer fences
:Kurtosis - kurtosis
:Skewness - skewness
:SecMoment - second central moment, use: second-moment

Note: percentile and [[quartile]] can have 10 different interpolation strategies. See docs

Statistics functions.

* Descriptive statistics.
* Correlation / covariance
* Outliers
* Confidence intervals
* Extents
* Effect size
* Student's t-test
* Histogram
* ACF/PACF
* Bootstrap
* Binary measures

All functions are backed by Apache Commons Math or SMILE libraries. All work with Clojure sequences.

### Descriptive statistics

All in one function [[stats-map]] contains:

* `:Size` - size of the samples, `(count ...)`
* `:Min` - [[minimum]] value
* `:Max` - [[maximum]] value
* `:Range` - range of values
* `:Mean` - [[mean]]/average
* `:Median` - [[median]], see also: [[median-3]]
* `:Mode` - [[mode]], see also: [[modes]]
* `:Q1` - first quartile, use: [[percentile]], [[quartile]]
* `:Q3` - third quartile, use: [[percentile]], [[quartile]]
* `:Total` - [[sum]] of all samples
* `:SD` - sample standard deviation
* `:Variance` - variance
* `:MAD` - [[median-absolute-deviation]]
* `:SEM` - standard error of mean
* `:LAV` - lower adjacent value, use: [[adjacent-values]]
* `:UAV` - upper adjacent value, use: [[adjacent-values]]
* `:IQR` - interquartile range, `(- q3 q1)`
* `:LOF` - lower outer fence, `(- q1 (* 3.0 iqr))`
* `:UOF` - upper outer fence, `(+ q3 (* 3.0 iqr))`
* `:LIF` - lower inner fence, `(- q1 (* 1.5 iqr))`
* `:UIF` - upper inner fence, `(+ q3 (* 1.5 iqr))`
* `:Outliers` - list of [[outliers]], samples which are outside outer fences
* `:Kurtosis` - [[kurtosis]]
* `:Skewness` - [[skewness]]
* `:SecMoment` - second central moment, use: [[second-moment]]

Note: [[percentile]] and [[quartile]] can have 10 different interpolation strategies. See [docs](http://commons.apache.org/proper/commons-math/javadocs/api-3.6.1/org/apache/commons/math3/stat/descriptive/rank/Percentile.html)

raw docstring

acf^clj

(acf data)

(acf data lags)

Calculate acf (autocorrelation function) for given number of lags or a list of lags.

If lags is omitted function returns maximum possible number of lags.

acf-ci^clj

(acf-ci data lags)

(acf-ci data lags alpha)

acf with added confidence interval data.

:cis contains list of calculated ci for every lag.

[[acf]] with added confidence interval data.

`:cis` contains list of calculated ci for every lag.

source raw docstring

adjacent-values^clj

(adjacent-values vs)

(adjacent-values vs estimation-strategy)

(adjacent-values vs q1 q3)

Lower and upper adjacent values (LAV and UAV).

Let Q1 is 25-percentile and Q3 is 75-percentile. IQR is (- Q3 Q1).

LAV is smallest value which is greater or equal to the LIF = (- Q1 (* 1.5 IQR)).
UAV is largest value which is lower or equal to the UIF = (+ Q3 (* 1.5 IQR)).
third value is a median of samples

Optional estimation-strategy argument can be set to change quantile calculations estimation type. See [[estimation-strategies]].

Lower and upper adjacent values (LAV and UAV).

Let Q1 is 25-percentile and Q3 is 75-percentile. IQR is `(- Q3 Q1)`.

* LAV is smallest value which is greater or equal to the LIF = `(- Q1 (* 1.5 IQR))`.
* UAV is largest value which is lower or equal to the UIF = `(+ Q3 (* 1.5 IQR))`.
* third value is a median of samples


Optional `estimation-strategy` argument can be set to change quantile calculations estimation type. See [[estimation-strategies]].

source raw docstring

ameasure^clj

(ameasure group1 group2)

Vargha-Delaney A measure for two populations a and b

Vargha-Delaney A measure for two populations a and b

source raw docstring

binary-measures^clj

(binary-measures truth prediction)

(binary-measures truth prediction true-value)

Subset of binary measures. See binary-measures-all.

Following keys are returned: [:tp :tn :fp :fn :accuracy :fdr :f-measure :fall-out :precision :recall :sensitivity :specificity :prevalance]

Subset of binary measures. See [[binary-measures-all]].

Following keys are returned: `[:tp :tn :fp :fn :accuracy :fdr :f-measure :fall-out :precision :recall :sensitivity :specificity :prevalance]`

source raw docstring

binary-measures-all^clj

(binary-measures-all truth prediction)

(binary-measures-all truth prediction true-value)

Collection of binary measures.

truth - list of ground truth values
prediction - list of predicted values
true-value - optional, what is true in truth and prediction

true-value can be one of:

nil - values are treating as booleans
any sequence - values from sequence will be treated as true
map - conversion will be done according to provided map (if there is no correspondin key, value is treated as false)

https://en.wikipedia.org/wiki/Precision_and_recall

Collection of binary measures.

* `truth` - list of ground truth values
* `prediction` - list of predicted values
* `true-value` - optional, what is true in `truth` and `prediction`

`true-value` can be one of:

* `nil` - values are treating as booleans
* any sequence - values from sequence will be treated as `true`
* map - conversion will be done according to provided map (if there is no correspondin key, value is treated as `false`)

https://en.wikipedia.org/wiki/Precision_and_recall

source raw docstring

bootstrap^clj

(bootstrap vs)

(bootstrap vs samples)

(bootstrap vs samples size)

Generate set of samples of given size from provided data.

Default samples is 50, number of size defaults to 1000

Generate set of samples of given size from provided data.

Default `samples` is 50, number of `size` defaults to 1000

source raw docstring

bootstrap-ci^clj

(bootstrap-ci vs)

(bootstrap-ci vs alpha)

(bootstrap-ci vs alpha samples)

(bootstrap-ci vs alpha samples stat-fn)

Bootstrap method to calculate confidence interval.

Alpha defaults to 0.98, samples to 1000. Last parameter is statistical function used to measure, default: mean.

Returns ci and statistical function value.

Bootstrap method to calculate confidence interval.

Alpha defaults to 0.98, samples to 1000.
Last parameter is statistical function used to measure, default: [[mean]].

Returns ci and statistical function value.

source raw docstring

ci^clj

(ci vs)

(ci vs alpha)

T-student based confidence interval for given data. Alpha value defaults to 0.98.

Last value is mean.

T-student based confidence interval for given data. Alpha value defaults to 0.98.

Last value is mean.

source raw docstring

cliffs-delta^clj

(cliffs-delta group1 group2)

Cliff's delta effect size

Cliff's delta effect size

source raw docstring

cohens-d^clj

(cohens-d group1 group2)

Cohen's d effect size for two groups

Cohen's d effect size for two groups

source raw docstring

cohens-d-orig^clj

(cohens-d-orig group1 group2)

Original version of Cohen's d effect size for two groups

Original version of Cohen's d effect size for two groups

source raw docstring

correlation^clj

(correlation vs1 vs2)

Correlation of two sequences.

Correlation of two sequences.

source raw docstring

covariance^clj

(covariance vs1 vs2)

Covariance of two sequences.

Covariance of two sequences.

source raw docstring

covariance-matrix^clj

(covariance-matrix vss)

Generate covariance matrix from seq of seqs. Row order.

Generate covariance matrix from seq of seqs. Row order.

source raw docstring

demean^clj

(demean vs)

Subtract mean from sequence

Subtract mean from sequence

source raw docstring

estimate-bins^clj

(estimate-bins vs)

(estimate-bins vs bins-or-estimate-method)

Estimate number of bins for histogram.

Possible methods are: :sqrt :sturges :rice :doane :scott :freedman-diaconis (default).

Estimate number of bins for histogram.

Possible methods are: `:sqrt` `:sturges` `:rice` `:doane` `:scott` `:freedman-diaconis` (default).

source raw docstring

estimation-strategies-list^clj

source

extent^clj

(extent vs)

Return extent (min, max, mean) values from sequence

Return extent (min, max, mean) values from sequence

source raw docstring

glass-delta^clj

(glass-delta group1 group2)

Glass's delta effect size for two groups

Glass's delta effect size for two groups

source raw docstring

hedges-g^clj

(hedges-g group1 group2)

Hedges's g effect size for two groups

Hedges's g effect size for two groups

source raw docstring

hedges-g*^clj

(hedges-g* group1 group2)

Less biased Hedges's g effect size for two groups

Less biased Hedges's g effect size for two groups

source raw docstring

histogram^clj

(histogram vs)

(histogram vs bins-or-estimate-method)

(histogram vs bins [mn mx])

Calculate histogram.

Returns map with keys:

:size - number of bins
:step - distance between bins
:bins - list of pairs of range lower value and number of hits
:min - min value
:max - max value
:samples - number of used samples

For estimation methods check estimate-bins.

Calculate histogram.

Returns map with keys:

* `:size` - number of bins
* `:step` - distance between bins
* `:bins` - list of pairs of range lower value and number of hits
* `:min` - min value
* `:max` - max value
* `:samples` - number of used samples

For estimation methods check [[estimate-bins]].

source raw docstring

iqr^clj

(iqr vs)

(iqr vs estimation-strategy)

Interquartile range.

Interquartile range.

source raw docstring

jensen-shannon-divergence^clj

(jensen-shannon-divergence vs1 vs2)

Jensen-Shannon divergence of two sequences.

Jensen-Shannon divergence of two sequences.

source raw docstring

kendall-correlation^clj

(kendall-correlation vs1 vs2)

Kendall's correlation of two sequences.

Kendall's correlation of two sequences.

source raw docstring

kullback-leibler-divergence^clj

(kullback-leibler-divergence vs1 vs2)

Kullback-Leibler divergence of two sequences.

Kullback-Leibler divergence of two sequences.

source raw docstring

kurtosis^clj

(kurtosis vs)

Calculate kurtosis from sequence.

Calculate kurtosis from sequence.

source raw docstring

mad-extent^clj

(mad-extent vs)

-/+ median-absolute-deviation and median

 -/+ median-absolute-deviation and median

source raw docstring

maximum^clj

(maximum vs)

Maximum value from sequence.

Maximum value from sequence.

source raw docstring

mean^clj

(mean vs)

Calculate mean of vs

Calculate mean of `vs`

source raw docstring

median^clj

(median vs)

Calculate median of vs. See median-3.

Calculate median of `vs`. See [[median-3]].

source raw docstring

median-3^clj

(median-3 a b c)

Median of three values. See median.

Median of three values. See [[median]].

source raw docstring

median-absolute-deviation^clj

(median-absolute-deviation vs)

Calculate MAD

Calculate MAD

source raw docstring

minimum^clj

(minimum vs)

Minimum value from sequence.

Minimum value from sequence.

source raw docstring

mode^clj

(mode vs)

Find the value that appears most often in a dataset vs.

modes^clj

(modes vs)

Find the values that appears most often in a dataset vs.

Returns sequence with all most appearing values in increasing order.

outliers^clj

(outliers vs)

(outliers vs estimation-strategy)

(outliers vs q1 q3)

Find outliers defined as values outside outer fences.

Let Q1 is 25-percentile and Q3 is 75-percentile. IQR is (- Q3 Q1).

LIF (Lower Outer Fence) equals (- Q1 (* 1.5 IQR)).
UIF (Upper Outer Fence) equals (+ Q3 (* 1.5 IQR)).

Returns sequence.

Optional estimation-strategy argument can be set to change quantile calculations estimation type. See [[estimation-strategies]].

Find outliers defined as values outside outer fences.

Let Q1 is 25-percentile and Q3 is 75-percentile. IQR is `(- Q3 Q1)`.

* LIF (Lower Outer Fence) equals `(- Q1 (* 1.5 IQR))`.
* UIF (Upper Outer Fence) equals `(+ Q3 (* 1.5 IQR))`.

Returns sequence.

Optional `estimation-strategy` argument can be set to change quantile calculations estimation type. See [[estimation-strategies]].

source raw docstring

pacf^clj

(pacf data)

(pacf data lags)

Caluclate pacf (partial autocorrelation function) for given number of lags.

If lags is omitted function returns maximum possible number of lags.

pacf returns also lag 0 (which is 0.0).

pacf-ci^clj

(pacf-ci data lags)

(pacf-ci data lags alpha)

pacf with added confidence interval data.

[[pacf]] with added confidence interval data.

source raw docstring

pearson-correlation^clj

(pearson-correlation vs1 vs2)

Pearson's correlation of two sequences.

Pearson's correlation of two sequences.

source raw docstring

percentile^clj

(percentile vs p)

(percentile vs p estimation-strategy)

Calculate percentile of a vs.

Percentile p is from range 0-100.

See docs.

Optionally you can provide estimation-strategy to change interpolation methods for selecting values. Default is :legacy. See more here

percentile-extent^clj

(percentile-extent vs)

(percentile-extent vs p)

(percentile-extent vs p1 p2)

(percentile-extent vs p1 p2 estimation-strategy)

Return percentile range and median.

p - calculates extent of p and 100-p (default: p=25)

Return percentile range and median.

`p` - calculates extent of `p` and `100-p` (default: `p=25`)

source raw docstring

percentiles^clj

(percentiles vs ps)

(percentiles vs ps estimation-strategy)

Calculate percentiles of a vs.

Percentiles are sequence of values from range 0-100.

See docs.

Optionally you can provide estimation-strategy to change interpolation methods for selecting values. Default is :legacy. See more here

population-stddev^clj

(population-stddev vs)

(population-stddev vs u)

Calculate population standard deviation of vs.

See stddev.

Calculate population standard deviation of `vs`.

See [[stddev]].

source raw docstring

population-variance^clj

(population-variance vs)

(population-variance vs u)

Calculate population variance of vs.

See variance.

Calculate population variance of `vs`.

See [[variance]].

source raw docstring

quantile^clj

(quantile vs q)

(quantile vs q estimation-strategy)

Calculate quantile of a vs.

Quantile q is from range 0.0-1.0.

See docs for interpolation strategy.

Optionally you can provide estimation-strategy to change interpolation methods for selecting values. Default is :legacy. See more here

quantiles^clj

(quantiles vs qs)

(quantiles vs qs estimation-strategy)

Calculate quantiles of a vs.

Quantilizes is sequence with values from range 0.0-1.0.

See docs for interpolation strategy.

Optionally you can provide estimation-strategy to change interpolation methods for selecting values. Default is :legacy. See more here

second-moment^clj

(second-moment vs)

Calculate second moment from sequence.

It's a sum of squared deviations from the sample mean

Calculate second moment from sequence.

It's a sum of squared deviations from the sample mean

source raw docstring

sem^clj

(sem vs)

Standard error of mean

Standard error of mean

source raw docstring

sem-extent^clj

(sem-extent vs)

-/+ sem and mean

 -/+ sem and mean

source raw docstring

skewness^clj

(skewness vs)

Calculate kurtosis from sequence.

Calculate kurtosis from sequence.

source raw docstring

spearman-correlation^clj

(spearman-correlation vs1 vs2)

Spearman's correlation of two sequences.

Spearman's correlation of two sequences.

source raw docstring

standardize^clj

(standardize vs)

Normalize samples to have mean = 0 and stddev = 1.

Normalize samples to have mean = 0 and stddev = 1.

source raw docstring

stats-map^clj

(stats-map vs)

(stats-map vs estimation-strategy)

Calculate several statistics of vs and return as map.

Optional estimation-strategy argument can be set to change quantile calculations estimation type. See [[estimation-strategies]].

Calculate several statistics of `vs` and return as map.

Optional `estimation-strategy` argument can be set to change quantile calculations estimation type. See [[estimation-strategies]].

source raw docstring

stddev^clj

(stddev vs)

(stddev vs u)

Calculate standard deviation of vs.

See population-stddev.

Calculate standard deviation of `vs`.

See [[population-stddev]].

source raw docstring

stddev-extent^clj

(stddev-extent vs)

-/+ stddev and mean

 -/+ stddev and mean

source raw docstring

sum^clj

(sum vs)

Sum of all vs values.

Sum of all `vs` values.

source raw docstring

ttest-one-sample^clj

(ttest-one-sample xs)

(ttest-one-sample xs
                  {:keys [alpha sides mu]
                   :or {alpha 0.05 sides :two-sided mu 0.0}})

One-sample Student's t-test

alpha - significance level (default: 0.05)
sides - one of: :two-sided, :one-sided-less (short: :one-sided) or :one-sided-greater
mu - mean (default: 0.0)

One-sample Student's t-test

* `alpha` - significance level (default: `0.05`)
* `sides` - one of: `:two-sided`, `:one-sided-less` (short: `:one-sided`) or `:one-sided-greater`
* `mu` - mean (default: `0.0`)

source raw docstring

ttest-two-samples^clj

(ttest-two-samples xs ys)

(ttest-two-samples
  xs
  ys
  {:keys [alpha sides mu paired? equal-variances?]
   :or {alpha 0.05 sides :two-sided mu 0.0 paired? false equal-variances? false}
   :as params})

Two-sample Student's t-test

alpha - significance level (default: 0.05)
sides - one of: :two-sided, :one-sided-less (short: :one-sided) or :one-sided-greater
mu - mean (default: 0.0)
paired? - unpaired or paired test, boolean (default: false)
equal-variances? - unequal or equal variances, boolean (default: false)

Two-sample Student's t-test

* `alpha` - significance level (default: `0.05`)
* `sides` - one of: `:two-sided`, `:one-sided-less` (short: `:one-sided`) or `:one-sided-greater`
* `mu` - mean (default: `0.0`)
* `paired?` - unpaired or paired test, boolean (default: `false`)
* `equal-variances?` - unequal or equal variances, boolean (default: `false`)

source raw docstring

variance^clj

(variance vs)

(variance vs u)

Calculate variance of vs.

See population-variance.

Calculate variance of `vs`.

See [[population-variance]].

source raw docstring

cljdoc is a website building & hosting documentation for Clojure/Script libraries

Keyboard shortcuts Report a problem cljdoc on GitHub

× close

fastmath.stats

Descriptive statistics

acfclj

acf-ciclj

adjacent-valuesclj

ameasureclj

binary-measuresclj

binary-measures-allclj

bootstrapclj

bootstrap-ciclj

ciclj

cliffs-deltaclj

cohens-dclj

cohens-d-origclj

correlationclj

covarianceclj

covariance-matrixclj

demeanclj

estimate-binsclj

estimation-strategies-listclj

extentclj

glass-deltaclj

hedges-gclj

hedges-g*clj

histogramclj

iqrclj

jensen-shannon-divergenceclj

kendall-correlationclj

kullback-leibler-divergenceclj

kurtosisclj

mad-extentclj

maximumclj

meanclj

medianclj

median-3clj

median-absolute-deviationclj

minimumclj

modeclj

modesclj

outliersclj

pacfclj

pacf-ciclj

pearson-correlationclj

percentileclj

percentile-extentclj

percentilesclj

population-stddevclj

population-varianceclj

quantileclj

quantilesclj

second-momentclj

semclj

sem-extentclj

skewnessclj

spearman-correlationclj

standardizeclj

stats-mapclj

stddevclj

stddev-extentclj

sumclj

ttest-one-sampleclj

ttest-two-samplesclj

varianceclj

acf^clj

acf-ci^clj

adjacent-values^clj

ameasure^clj

binary-measures^clj

binary-measures-all^clj

bootstrap^clj

bootstrap-ci^clj

ci^clj

cliffs-delta^clj

cohens-d^clj

cohens-d-orig^clj

correlation^clj

covariance^clj

covariance-matrix^clj

demean^clj

estimate-bins^clj

estimation-strategies-list^clj

extent^clj

glass-delta^clj

hedges-g^clj

hedges-g*^clj

histogram^clj

iqr^clj

jensen-shannon-divergence^clj

kendall-correlation^clj

kullback-leibler-divergence^clj

kurtosis^clj

mad-extent^clj

maximum^clj

mean^clj

median^clj

median-3^clj

median-absolute-deviation^clj

minimum^clj

mode^clj

modes^clj

outliers^clj

pacf^clj

pacf-ci^clj

pearson-correlation^clj

percentile^clj

percentile-extent^clj

percentiles^clj

population-stddev^clj

population-variance^clj

quantile^clj

quantiles^clj

second-moment^clj

sem^clj

sem-extent^clj

skewness^clj

spearman-correlation^clj

standardize^clj

stats-map^clj

stddev^clj

stddev-extent^clj

sum^clj

ttest-one-sample^clj

ttest-two-samples^clj

variance^clj