Liking cljdoc? Tell your friends :D
Clojure only.

fastmath.stats

Statistics functions.

  • Descriptive statistics.
  • Correlation / covariance
  • Outliers
  • Confidence intervals
  • Extents
  • Effect size
  • Tests
  • Histogram
  • ACF/PACF
  • Bootstrap (see fastmath.stats.bootstrap)
  • Binary measures

Functions are backed by Apache Commons Math or SMILE libraries. All work with Clojure sequences.

Descriptive statistics

All in one function stats-map contains:

  • :Size - size of the samples, (count ...)
  • :Min - minimum value
  • :Max - maximum value
  • :Range - range of values
  • :Mean - mean/average
  • :Median - median, see also: median-3
  • :Mode - mode, see also: modes
  • :Q1 - first quartile, use: percentile, [[quartile]]
  • :Q3 - third quartile, use: percentile, [[quartile]]
  • :Total - sum of all samples
  • :SD - sample standard deviation
  • :Variance - variance
  • :MAD - median-absolute-deviation
  • :SEM - standard error of mean
  • :LAV - lower adjacent value, use: adjacent-values
  • :UAV - upper adjacent value, use: adjacent-values
  • :IQR - interquartile range, (- q3 q1)
  • :LOF - lower outer fence, (- q1 (* 3.0 iqr))
  • :UOF - upper outer fence, (+ q3 (* 3.0 iqr))
  • :LIF - lower inner fence, (- q1 (* 1.5 iqr))
  • :UIF - upper inner fence, (+ q3 (* 1.5 iqr))
  • :Outliers - list of outliers, samples which are outside outer fences
  • :Kurtosis - kurtosis
  • :Skewness - skewness

Note: percentile and [[quartile]] can have 10 different interpolation strategies. See docs

#### Statistics functions.

* Descriptive statistics.
* Correlation / covariance
* Outliers
* Confidence intervals
* Extents
* Effect size
* Tests
* Histogram
* ACF/PACF
* Bootstrap (see `fastmath.stats.bootstrap`)
* Binary measures

Functions are backed by Apache Commons Math or SMILE libraries. All work with Clojure sequences.

##### Descriptive statistics

All in one function [[stats-map]] contains:

* `:Size` - size of the samples, `(count ...)`
* `:Min` - [[minimum]] value
* `:Max` - [[maximum]] value
* `:Range` - range of values
* `:Mean` - [[mean]]/average
* `:Median` - [[median]], see also: [[median-3]]
* `:Mode` - [[mode]], see also: [[modes]]
* `:Q1` - first quartile, use: [[percentile]], [[quartile]]
* `:Q3` - third quartile, use: [[percentile]], [[quartile]]
* `:Total` - [[sum]] of all samples
* `:SD` - sample standard deviation
* `:Variance` - variance
* `:MAD` - [[median-absolute-deviation]]
* `:SEM` - standard error of mean
* `:LAV` - lower adjacent value, use: [[adjacent-values]]
* `:UAV` - upper adjacent value, use: [[adjacent-values]]
* `:IQR` - interquartile range, `(- q3 q1)`
* `:LOF` - lower outer fence, `(- q1 (* 3.0 iqr))`
* `:UOF` - upper outer fence, `(+ q3 (* 3.0 iqr))`
* `:LIF` - lower inner fence, `(- q1 (* 1.5 iqr))`
* `:UIF` - upper inner fence, `(+ q3 (* 1.5 iqr))`
* `:Outliers` - list of [[outliers]], samples which are outside outer fences
* `:Kurtosis` - [[kurtosis]]
* `:Skewness` - [[skewness]]

Note: [[percentile]] and [[quartile]] can have 10 different interpolation strategies. See [docs](http://commons.apache.org/proper/commons-math/javadocs/api-3.6.1/org/apache/commons/math3/stat/descriptive/rank/Percentile.html)
raw docstring

->confusion-matrixclj

(->confusion-matrix confusion-matrix)
(->confusion-matrix actual prediction)
(->confusion-matrix actual prediction encode-true)
(->confusion-matrix tp fn fp tn)

Convert input to confusion matrix

Convert input to confusion matrix
sourceraw docstring

acfclj

(acf data)
(acf data lags)

Calculate acf (autocorrelation function) for given number of lags or a list of lags.

If lags is omitted function returns maximum possible number of lags.

See also acf-ci, pacf, pacf-ci

Calculate acf (autocorrelation function) for given number of lags or a list of lags.

If lags is omitted function returns maximum possible number of lags.

See also [[acf-ci]], [[pacf]], [[pacf-ci]]
sourceraw docstring

acf-ciclj

(acf-ci data)
(acf-ci data lags)
(acf-ci data lags alpha)

acf with added confidence interval data.

:cis contains list of calculated ci for every lag.

[[acf]] with added confidence interval data.

`:cis` contains list of calculated ci for every lag.
sourceraw docstring

ad-test-one-sampleclj

(ad-test-one-sample xs)
(ad-test-one-sample xs distribution-or-ys)
(ad-test-one-sample xs
                    distribution-or-ys
                    {:keys [sides kernel bandwidth]
                     :or {sides :one-sided-greater kernel :gaussian}})

Anderson-Darling test

Anderson-Darling test
sourceraw docstring

adjacent-valuesclj

(adjacent-values vs)
(adjacent-values vs estimation-strategy)
(adjacent-values vs q1 q3 m)

Lower and upper adjacent values (LAV and UAV).

Let Q1 is 25-percentile and Q3 is 75-percentile. IQR is (- Q3 Q1).

  • LAV is smallest value which is greater or equal to the LIF = (- Q1 (* 1.5 IQR)).
  • UAV is largest value which is lower or equal to the UIF = (+ Q3 (* 1.5 IQR)).
  • third value is a median of samples

Optional estimation-strategy argument can be set to change quantile calculations estimation type. See [[estimation-strategies]].

Lower and upper adjacent values (LAV and UAV).

Let Q1 is 25-percentile and Q3 is 75-percentile. IQR is `(- Q3 Q1)`.

* LAV is smallest value which is greater or equal to the LIF = `(- Q1 (* 1.5 IQR))`.
* UAV is largest value which is lower or equal to the UIF = `(+ Q3 (* 1.5 IQR))`.
* third value is a median of samples


Optional `estimation-strategy` argument can be set to change quantile calculations estimation type. See [[estimation-strategies]].
sourceraw docstring

ameasureclj

(ameasure [group1 group2])
(ameasure group1 group2)

Vargha-Delaney A measure for two populations a and b

Vargha-Delaney A measure for two populations a and b
sourceraw docstring

binary-measuresclj

(binary-measures confusion-matrix)
(binary-measures actual prediction)
(binary-measures actual prediction true-value)
(binary-measures tp fn fp tn)

Subset of binary measures. See binary-measures-all.

Following keys are returned: [:tp :tn :fp :fn :accuracy :fdr :f-measure :fall-out :precision :recall :sensitivity :specificity :prevalence]

Subset of binary measures. See [[binary-measures-all]].

Following keys are returned: `[:tp :tn :fp :fn :accuracy :fdr :f-measure :fall-out :precision :recall :sensitivity :specificity :prevalence]`
sourceraw docstring

binary-measures-allclj

(binary-measures-all confusion-matrix)
(binary-measures-all actual prediction)
(binary-measures-all actual prediction true-value)
(binary-measures-all tp fn fp tn)

Collection of binary measures.

Arguments:

  • confusion-matrix - either map or sequence with [:tp :fn :fp :tn] values

or

  • actual - list of ground truth values
  • prediction - list of predicted values
  • true-value - optional, true/false encoding, what is true in truth and prediction

true-value can be one of:

  • nil - values are treating as booleans
  • any sequence - values from sequence will be treated as true
  • map - conversion will be done according to provided map (if there is no correspondin key, value is treated as false)
  • any predicate

https://en.wikipedia.org/wiki/Precision_and_recall

Collection of binary measures.

Arguments:
* `confusion-matrix` - either map or sequence with `[:tp :fn :fp :tn]` values

or

* `actual` - list of ground truth values
* `prediction` - list of predicted values
* `true-value` - optional, true/false encoding, what is true in `truth` and `prediction`

`true-value` can be one of:

* `nil` - values are treating as booleans
* any sequence - values from sequence will be treated as `true`
* map - conversion will be done according to provided map (if there is no correspondin key, value is treated as `false`)
* any predicate

https://en.wikipedia.org/wiki/Precision_and_recall
sourceraw docstring

binomial-ciclj

(binomial-ci number-of-successes number-of-trials)
(binomial-ci number-of-successes number-of-trials method)
(binomial-ci number-of-successes number-of-trials method alpha)

Return confidence interval for a binomial distribution.

Possible methods are:

  • :asymptotic (normal aproximation, based on central limit theorem), default
  • :agresti-coull
  • :clopper-pearson
  • :wilson
  • :prop.test - one sample proportion test
  • :cloglog
  • :logit
  • :probit
  • :arcsine
  • :all - apply all methods and return a map of triplets

Default alpha is 0.05

Returns a triple [lower ci, upper ci, p=successes/trials]

Return confidence interval for a binomial distribution.

Possible methods are:
* `:asymptotic` (normal aproximation, based on central limit theorem), default
* `:agresti-coull`
* `:clopper-pearson`
* `:wilson`
* `:prop.test` - one sample proportion test
* `:cloglog`
* `:logit`
* `:probit`
* `:arcsine`
* `:all` - apply all methods and return a map of triplets

Default alpha is 0.05

Returns a triple [lower ci, upper ci, p=successes/trials]
sourceraw docstring

binomial-ci-methodsclj

source

binomial-testclj

(binomial-test xs)
(binomial-test xs maybe-params)
(binomial-test number-of-successes
               number-of-trials
               {:keys [alpha p ci-method sides]
                :or {alpha 0.05 p 0.5 ci-method :asymptotic sides :two-sided}})

Binomial test

  • alpha - significance level (default: 0.05)
  • sides - one of: :two-sided (default), :one-sided-less (short: :one-sided) or :one-sided-greater
  • ci-method - see binomial-ci-methods
  • p - tested probability
Binomial test

* `alpha` - significance level (default: `0.05`)
* `sides` - one of: `:two-sided` (default), `:one-sided-less` (short: `:one-sided`) or `:one-sided-greater`
* `ci-method` - see [[binomial-ci-methods]]
* `p` - tested probability
sourceraw docstring

bootstrapcljdeprecated

(bootstrap vs)
(bootstrap vs samples)
(bootstrap vs samples size)

Generate set of samples of given size from provided data.

Default samples is 200, number of size defaults to sample size.

Generate set of samples of given size from provided data.

Default `samples` is 200, number of `size` defaults to sample size.
sourceraw docstring

bootstrap-cicljdeprecated

(bootstrap-ci vs)
(bootstrap-ci vs alpha)
(bootstrap-ci vs alpha samples)
(bootstrap-ci vs alpha samples stat-fn)

Bootstrap method to calculate confidence interval.

Alpha defaults to 0.98, samples to 1000. Last parameter is statistical function used to measure, default: mean.

Returns ci and statistical function value.

Bootstrap method to calculate confidence interval.

Alpha defaults to 0.98, samples to 1000.
Last parameter is statistical function used to measure, default: [[mean]].

Returns ci and statistical function value.
sourceraw docstring

brown-forsythe-testclj

(brown-forsythe-test xss)
(brown-forsythe-test xss params)
source

chisq-testclj

(chisq-test contingency-table-or-xs)
(chisq-test contingency-table-or-xs params)

Chi square test, a power divergence test for lambda 1.0

Power divergence test.

First argument should be one of:

  • contingency table
  • sequence of counts (for goodness of fit)
  • sequence of data (for goodness of fit against distribution)

For goodness of fit there are two options:

  • comparison of observed counts vs expected probabilities or weights (:p)
  • comparison of data against given distribution (:p), in this case histogram from data is created and compared to distribution PDF in bins ranges. Use :bins option to control histogram creation.

Options are:

Chi square test, a power divergence test for `lambda` 1.0

Power divergence test.

  First argument should be one of:
  
  * contingency table
  * sequence of counts (for goodness of fit)
  * sequence of data (for goodness of fit against distribution)

  For goodness of fit there are two options:
  
  * comparison of observed counts vs expected probabilities or weights (`:p`)
  * comparison of data against given distribution (`:p`), in this case histogram from data is created and compared to distribution PDF in bins ranges. Use `:bins` option to control histogram creation.

  Options are:
  
  * `:lambda` - test type:
      * `1.0` - [[chisq-test]]
      * `0.0` - [[multinomial-likelihood-ratio-test]]
      * `-1.0` - [[minimum-discrimination-information-test]]
      * `-2.0` - [[neyman-modified-chisq-test]]
      * `-0.5` - [[freeman-tukey-test]]
      * `2/3` - [[cressie-read-test]] - default
  * `:p` - probabilites, weights or distribution object.
  * `:alpha` - significance level (default: 0.05)
  * `:ci-sides` - confidence interval sides (default: `:two-sided`)
  * `:sides`  - p-value sides (`:two-sided`, `:one-side-greater` - default, `:one-side-less`)
  * `:bootstrap-samples` - number of samples to estimate confidence intervals (default: 1000)
  * `:ddof` - delta degrees of freedom, adjustment for dof (default: 0.0)
  * `:bins` - number of bins or estimator name for histogram
sourceraw docstring

ciclj

(ci vs)
(ci vs alpha)

T-student based confidence interval for given data. Alpha value defaults to 0.05.

Last value is mean.

T-student based confidence interval for given data. Alpha value defaults to 0.05.

Last value is mean.
sourceraw docstring

cliffs-deltaclj

(cliffs-delta [group1 group2])
(cliffs-delta group1 group2)

Cliff's delta effect size for ordinal data.

Cliff's delta effect size for ordinal data.
sourceraw docstring

coefficient-matrixclj

(coefficient-matrix vss)
(coefficient-matrix vss measure-fn)
(coefficient-matrix vss measure-fn symmetric?)

Generate coefficient (correlation, covariance, any two arg function) matrix from seq of seqs. Row order.

Default method: pearson-correlation

Generate coefficient (correlation, covariance, any two arg function) matrix from seq of seqs. Row order.

Default method: pearson-correlation
sourceraw docstring

cohens-dclj

(cohens-d [group1 group2])
(cohens-d group1 group2)
(cohens-d group1 group2 method)

Cohen's d effect size for two groups

Cohen's d effect size for two groups
sourceraw docstring

cohens-d-correctedclj

(cohens-d-corrected [group1 group2])
(cohens-d-corrected group1 group2)
(cohens-d-corrected group1 group2 method)

Cohen's d corrected for small group size

Cohen's d corrected for small group size
sourceraw docstring

cohens-fclj

(cohens-f [group1 group2])
(cohens-f group1 group2)
(cohens-f group1 group2 type)

Cohens f, sqrt of Cohens f2.

Possible type values are: :eta (default), :omega and :epsilon.

Cohens f, sqrt of Cohens f2.

Possible `type` values are: `:eta` (default), `:omega` and `:epsilon`.
sourceraw docstring

cohens-f2clj

(cohens-f2 [group1 group2])
(cohens-f2 group1 group2)
(cohens-f2 group1 group2 type)

Cohens f2, by default based on eta-sq.

Possible type values are: :eta (default), :omega and :epsilon.

Cohens f2, by default based on `eta-sq`.

Possible `type` values are: `:eta` (default), `:omega` and `:epsilon`.
sourceraw docstring

cohens-kappaclj

(cohens-kappa contingency-table)
(cohens-kappa group1 group2)

Cohens kappa

Cohens kappa
sourceraw docstring

cohens-qclj

(cohens-q r1 r2)
(cohens-q group1 group2a group2b)
(cohens-q group1a group2a group1b group2b)

Comparison of two correlations.

Arity:

  • 2 - compare two correlation values
  • 3 - compare correlation of group1 and group2a with correlation of group1 and group2b
  • 4 - compare correlation of first two arguments with correlation of last two arguments
Comparison of two correlations.

Arity:

* 2 - compare two correlation values
* 3 - compare correlation of `group1` and `group2a` with correlation of `group1` and `group2b`
* 4 - compare correlation of first two arguments with correlation of last two arguments
sourceraw docstring

cohens-u2clj

(cohens-u2 [group1 group2])
(cohens-u2 group1 group2)
(cohens-u2 group1 group2 estimation-strategy)

Cohen's U2, the proportion of one of the groups that exceeds the same proportion in the other group.

Cohen's U2, the proportion of one of the groups that exceeds the same proportion in the other group.
sourceraw docstring

cohens-u3clj

(cohens-u3 [group1 group2])
(cohens-u3 group1 group2)
(cohens-u3 group1 group2 estimation-strategy)

Cohen's U3, the proportion of the second group that is smaller than the median of the first group.

Cohen's U3, the proportion of the second group that is smaller than the median of the first group.
sourceraw docstring

cohens-wclj

(cohens-w contingency-table)
(cohens-w group1 group2)

Cohen's W effect size for discrete data.

Cohen's W effect size for discrete data.
sourceraw docstring

contingency-2x2-measuresclj

(contingency-2x2-measures & args)
source

contingency-2x2-measures-allclj

(contingency-2x2-measures-all map-or-seq)
(contingency-2x2-measures-all [a b] [c d])
(contingency-2x2-measures-all a b c d)
source

contingency-tableclj

(contingency-table & seqs)

Returns frequencies map of tuples built from seqs.

Returns frequencies map of tuples built from seqs.
sourceraw docstring

contingency-table->marginalsclj

(contingency-table->marginals ct)
source

correlationclj

(correlation [vs1 vs2])
(correlation vs1 vs2)

Correlation of two sequences.

Correlation of two sequences.
sourceraw docstring

correlation-matrixclj

(correlation-matrix vss)
(correlation-matrix vss measure)

Generate correlation matrix from seq of seqs. Row order.

Possible measures: :pearson (default), :kendall, :spearman.

Generate correlation matrix from seq of seqs. Row order.

Possible measures: `:pearson` (default), `:kendall`, `:spearman`.
sourceraw docstring

count=clj

(count= [vs1 vs2-or-val])
(count= vs1 vs2-or-val)

Count equal values in both seqs. Same as L0

Count equal values in both seqs. Same as [[L0]]
sourceraw docstring

covarianceclj

(covariance [vs1 vs2])
(covariance vs1 vs2)

Covariance of two sequences.

Covariance of two sequences.
sourceraw docstring

covariance-matrixclj

(covariance-matrix vss)

Generate covariance matrix from seq of seqs. Row order.

Generate covariance matrix from seq of seqs. Row order.
sourceraw docstring

cramers-cclj

(cramers-c contingency-table)
(cramers-c group1 group2)

Cramer's C effect size for discrete data.

Cramer's C effect size for discrete data.
sourceraw docstring

cramers-vclj

(cramers-v contingency-table)
(cramers-v group1 group2)

Cramer's V effect size for discrete data.

Cramer's V effect size for discrete data.
sourceraw docstring

cramers-v-correctedclj

(cramers-v-corrected contingency-table)
(cramers-v-corrected group1 group2)

Corrected Cramer's V

Corrected Cramer's V
sourceraw docstring

cressie-read-testclj

(cressie-read-test contingency-table-or-xs)
(cressie-read-test contingency-table-or-xs params)

Cressie-Read test, a power divergence test for lambda 2/3

Power divergence test.

First argument should be one of:

  • contingency table
  • sequence of counts (for goodness of fit)
  • sequence of data (for goodness of fit against distribution)

For goodness of fit there are two options:

  • comparison of observed counts vs expected probabilities or weights (:p)
  • comparison of data against given distribution (:p), in this case histogram from data is created and compared to distribution PDF in bins ranges. Use :bins option to control histogram creation.

Options are:

Cressie-Read test, a power divergence test for `lambda` 2/3

Power divergence test.

  First argument should be one of:
  
  * contingency table
  * sequence of counts (for goodness of fit)
  * sequence of data (for goodness of fit against distribution)

  For goodness of fit there are two options:
  
  * comparison of observed counts vs expected probabilities or weights (`:p`)
  * comparison of data against given distribution (`:p`), in this case histogram from data is created and compared to distribution PDF in bins ranges. Use `:bins` option to control histogram creation.

  Options are:
  
  * `:lambda` - test type:
      * `1.0` - [[chisq-test]]
      * `0.0` - [[multinomial-likelihood-ratio-test]]
      * `-1.0` - [[minimum-discrimination-information-test]]
      * `-2.0` - [[neyman-modified-chisq-test]]
      * `-0.5` - [[freeman-tukey-test]]
      * `2/3` - [[cressie-read-test]] - default
  * `:p` - probabilites, weights or distribution object.
  * `:alpha` - significance level (default: 0.05)
  * `:ci-sides` - confidence interval sides (default: `:two-sided`)
  * `:sides`  - p-value sides (`:two-sided`, `:one-side-greater` - default, `:one-side-less`)
  * `:bootstrap-samples` - number of samples to estimate confidence intervals (default: 1000)
  * `:ddof` - delta degrees of freedom, adjustment for dof (default: 0.0)
  * `:bins` - number of bins or estimator name for histogram
sourceraw docstring

demeanclj

(demean vs)

Subtract mean from sequence

Subtract mean from sequence
sourceraw docstring

dissimilarityclj

(dissimilarity method P-observed Q-expected)
(dissimilarity method
               P-observed
               Q-expected
               {:keys [bins probabilities? epsilon log-base power]
                :or
                  {probabilities? true epsilon 1.0E-6 log-base m/E power 2.0}})

Various PDF distance between two histograms (frequencies) or probabilities.

Q can be a distribution object. Then, histogram will be created out of P.

Arguments:

  • method - distance method
  • P-observed - frequencies, probabilities or actual data (when Q is a distribution)
  • Q-expected - frequencies, probabilities or distribution object (when P is a data)

Options:

  • :probabilities? - should P/Q be converted to a probabilities, default: true.
  • :epsilon - small number which replaces 0.0 when division or logarithm is used`
  • :log-base - base for logarithms, default: e
  • :power - exponent for :minkowski distance, default: 2.0
  • :bins - number of bins or bins estimation method, see histogram.

The list of methods: :euclidean, :city-block, :manhattan, :chebyshev, :minkowski, :sorensen, :gower, :soergel, :kulczynski, :canberra, :lorentzian, :non-intersection, :wave-hedges, :czekanowski, :motyka, :tanimoto, :jaccard, :dice, :bhattacharyya, :hellinger, :matusita, :squared-chord, :euclidean-sq, :squared-euclidean, :pearson-chisq, :chisq, :neyman-chisq, :squared-chisq, :symmetric-chisq, :divergence, :clark, :additive-symmetric-chisq, :kullback-leibler, :jeffreys, :k-divergence, :topsoe, :jensen-shannon, :jensen-difference, :taneja, :kumar-johnson, :avg

See more: Comprehensive Survey on Distance/Similarity Measures between Probability Density Functions by Sung-Hyuk Cha

Various PDF distance between two histograms (frequencies) or probabilities.

Q can be a distribution object. Then, histogram will be created out of P.

Arguments:

* `method` - distance method
* `P-observed` - frequencies, probabilities or actual data (when Q is a distribution)
* `Q-expected` - frequencies, probabilities or distribution object (when P is a data)

Options:

* `:probabilities?` - should P/Q be converted to a probabilities, default: `true`.
* `:epsilon` - small number which replaces `0.0` when division or logarithm is used`
* `:log-base` - base for logarithms, default: `e`
* `:power` - exponent for `:minkowski` distance, default: `2.0`
* `:bins` - number of bins or bins estimation method, see [[histogram]].

The list of methods: `:euclidean`, `:city-block`, `:manhattan`, `:chebyshev`, `:minkowski`, `:sorensen`, `:gower`, `:soergel`, `:kulczynski`, `:canberra`, `:lorentzian`, `:non-intersection`, `:wave-hedges`, `:czekanowski`, `:motyka`, `:tanimoto`, `:jaccard`, `:dice`, `:bhattacharyya`, `:hellinger`, `:matusita`, `:squared-chord`, `:euclidean-sq`, `:squared-euclidean`, `:pearson-chisq`, `:chisq`, `:neyman-chisq`, `:squared-chisq`, `:symmetric-chisq`, `:divergence`, `:clark`, `:additive-symmetric-chisq`, `:kullback-leibler`, `:jeffreys`, `:k-divergence`, `:topsoe`, `:jensen-shannon`, `:jensen-difference`, `:taneja`, `:kumar-johnson`, `:avg`

See more: Comprehensive Survey on Distance/Similarity Measures between Probability Density Functions by Sung-Hyuk Cha
sourceraw docstring

epsilon-sqclj

(epsilon-sq [group1 group2])
(epsilon-sq group1 group2)

Less biased R2

Less biased R2
sourceraw docstring

estimate-binsclj

(estimate-bins vs)
(estimate-bins vs bins-or-estimate-method)

Estimate number of bins for histogram.

Possible methods are: :sqrt :sturges :rice :doane :scott :freedman-diaconis (default).

The number returned is not higher than number of samples.

Estimate number of bins for histogram.

Possible methods are: `:sqrt` `:sturges` `:rice` `:doane` `:scott` `:freedman-diaconis` (default).

The number returned is not higher than number of samples.
sourceraw docstring

estimation-strategies-listclj

List of estimation strategies for percentile/quantile functions.

List of estimation strategies for [[percentile]]/[[quantile]] functions.
sourceraw docstring

eta-sqclj

(eta-sq [group1 group2])
(eta-sq group1 group2)

R2, coefficient of determination

R2, coefficient of determination
sourceraw docstring

extentclj

(extent vs)

Return extent (min, max, mean) values from sequence

Return extent (min, max, mean) values from sequence
sourceraw docstring

f-testclj

(f-test xs ys)
(f-test xs ys {:keys [sides alpha] :or {sides :two-sided alpha 0.05}})

Variance F-test of two samples.

  • alpha - significance level (default: 0.05)
  • sides - one of: :two-sided (default), :one-sided-less (short: :one-sided) or :one-sided-greater
Variance F-test of two samples.

* `alpha` - significance level (default: `0.05`)
* `sides` - one of: `:two-sided` (default), `:one-sided-less` (short: `:one-sided`) or `:one-sided-greater` 
sourceraw docstring

fligner-killeen-testclj

(fligner-killeen-test xss)
(fligner-killeen-test xss {:keys [sides] :or {sides :one-sided-greater}})
source

freeman-tukey-testclj

(freeman-tukey-test contingency-table-or-xs)
(freeman-tukey-test contingency-table-or-xs params)

Freeman-Tukey test, a power divergence test for lambda -0.5

Power divergence test.

First argument should be one of:

  • contingency table
  • sequence of counts (for goodness of fit)
  • sequence of data (for goodness of fit against distribution)

For goodness of fit there are two options:

  • comparison of observed counts vs expected probabilities or weights (:p)
  • comparison of data against given distribution (:p), in this case histogram from data is created and compared to distribution PDF in bins ranges. Use :bins option to control histogram creation.

Options are:

Freeman-Tukey test, a power divergence test for `lambda` -0.5

Power divergence test.

  First argument should be one of:
  
  * contingency table
  * sequence of counts (for goodness of fit)
  * sequence of data (for goodness of fit against distribution)

  For goodness of fit there are two options:
  
  * comparison of observed counts vs expected probabilities or weights (`:p`)
  * comparison of data against given distribution (`:p`), in this case histogram from data is created and compared to distribution PDF in bins ranges. Use `:bins` option to control histogram creation.

  Options are:
  
  * `:lambda` - test type:
      * `1.0` - [[chisq-test]]
      * `0.0` - [[multinomial-likelihood-ratio-test]]
      * `-1.0` - [[minimum-discrimination-information-test]]
      * `-2.0` - [[neyman-modified-chisq-test]]
      * `-0.5` - [[freeman-tukey-test]]
      * `2/3` - [[cressie-read-test]] - default
  * `:p` - probabilites, weights or distribution object.
  * `:alpha` - significance level (default: 0.05)
  * `:ci-sides` - confidence interval sides (default: `:two-sided`)
  * `:sides`  - p-value sides (`:two-sided`, `:one-side-greater` - default, `:one-side-less`)
  * `:bootstrap-samples` - number of samples to estimate confidence intervals (default: 1000)
  * `:ddof` - delta degrees of freedom, adjustment for dof (default: 0.0)
  * `:bins` - number of bins or estimator name for histogram
sourceraw docstring

geomeanclj

(geomean vs)

Geometric mean for positive values only

Geometric mean for positive values only
sourceraw docstring

glass-deltaclj

(glass-delta [group1 group2])
(glass-delta group1 group2)

Glass's delta effect size for two groups

Glass's delta effect size for two groups
sourceraw docstring

harmeanclj

(harmean vs)

Harmonic mean

Harmonic mean
sourceraw docstring

hedges-gclj

(hedges-g [group1 group2])
(hedges-g group1 group2)

Hedges's g effect size for two groups

Hedges's g effect size for two groups
sourceraw docstring

hedges-g*clj

(hedges-g* [group1 group2])
(hedges-g* group1 group2)

Less biased Hedges's g effect size for two groups, J term correction.

Less biased Hedges's g effect size for two groups, J term correction.
sourceraw docstring

hedges-g-correctedclj

(hedges-g-corrected [group1 group2])
(hedges-g-corrected group1 group2)

Cohen's d corrected for small group size

Cohen's d corrected for small group size
sourceraw docstring

histogramclj

(histogram vs)
(histogram vs bins-or-estimate-method)
(histogram vs bins-or-estimate-method [mn mx])

Calculate histogram.

Returns map with keys:

  • :size - number of bins
  • :step - distance between bins
  • :bins - list of pairs of range lower value and number of hits
  • :min - min value
  • :max - max value
  • :samples - number of used samples

For estimation methods check estimate-bins.

If difference between min and max values is 0, number of bins is set to 1.

Calculate histogram.

Returns map with keys:

* `:size` - number of bins
* `:step` - distance between bins
* `:bins` - list of pairs of range lower value and number of hits
* `:min` - min value
* `:max` - max value
* `:samples` - number of used samples

For estimation methods check [[estimate-bins]].

If difference between min and max values is `0`, number of bins is set to 1.
sourceraw docstring

hpdi-extentclj

(hpdi-extent vs)
(hpdi-extent vs size)

Higher Posterior Density interval + median.

size parameter is the target probability content of the interval.

Higher Posterior Density interval + median.

`size` parameter is the target probability content of the interval.
sourceraw docstring

inner-fence-extentclj

(inner-fence-extent vs)
(inner-fence-extent vs estimation-strategy)

Returns LIF, UIF and median

Returns LIF, UIF and median
sourceraw docstring

iqrclj

(iqr vs)
(iqr vs estimation-strategy)

Interquartile range.

Interquartile range.
sourceraw docstring

jensen-shannon-divergencecljdeprecated

(jensen-shannon-divergence [vs1 vs2])
(jensen-shannon-divergence vs1 vs2)

Jensen-Shannon divergence of two sequences.

Jensen-Shannon divergence of two sequences.
sourceraw docstring

kendall-correlationclj

(kendall-correlation [vs1 vs2])
(kendall-correlation vs1 vs2)

Kendall's correlation of two sequences.

Kendall's correlation of two sequences.
sourceraw docstring

kruskal-testclj

(kruskal-test xss)
(kruskal-test xss {:keys [sides] :or {sides :right}})

Kruskal-Wallis rank sum test.

Kruskal-Wallis rank sum test.
sourceraw docstring

ks-test-one-sampleclj

(ks-test-one-sample xs)
(ks-test-one-sample xs distribution-or-ys)
(ks-test-one-sample xs
                    distribution-or-ys
                    {:keys [sides kernel bandwidth distinct?]
                     :or {sides :two-sided kernel :gaussian distinct? true}})

One sample Kolmogorov-Smirnov test

One sample Kolmogorov-Smirnov test
sourceraw docstring

ks-test-two-samplesclj

(ks-test-two-samples xs ys)
(ks-test-two-samples xs
                     ys
                     {:keys [sides distinct?]
                      :or {sides :two-sided distinct? true}})

Two samples Kolmogorov-Smirnov test

Two samples Kolmogorov-Smirnov test
sourceraw docstring

kullback-leibler-divergencecljdeprecated

(kullback-leibler-divergence [vs1 vs2])
(kullback-leibler-divergence vs1 vs2)

Kullback-Leibler divergence of two sequences.

Kullback-Leibler divergence of two sequences.
sourceraw docstring

kurtosisclj

(kurtosis vs)
(kurtosis vs typ)

Calculate kurtosis from sequence.

Possible typs: :G2 (default), :g2 (or :excess), :geary or :kurt.

Calculate kurtosis from sequence.

Possible typs: `:G2` (default), `:g2` (or `:excess`), `:geary` or `:kurt`.
sourceraw docstring

L0clj

Count equal values in both seqs. Same as [[count==]]

Count equal values in both seqs. Same as [[count==]]
sourceraw docstring

L1clj

(L1 [vs1 vs2-or-val])
(L1 vs1 vs2-or-val)

Manhattan distance

Manhattan distance
sourceraw docstring

L2clj

(L2 [vs1 vs2-or-val])
(L2 vs1 vs2-or-val)

Euclidean distance

Euclidean distance
sourceraw docstring

L2sqclj

(L2sq [vs1 vs2-or-val])
(L2sq vs1 vs2-or-val)

Squared euclidean distance

Squared euclidean distance
sourceraw docstring

levene-testclj

(levene-test xss)
(levene-test xss
             {:keys [sides statistic scorediff]
              :or {sides :one-sided-greater statistic mean scorediff abs}})
source

LInfclj

(LInf [vs1 vs2-or-val])
(LInf vs1 vs2-or-val)

Chebyshev distance

Chebyshev distance
sourceraw docstring

madclj

Alias for [[median-absolute-deviation]]
sourceraw docstring

mad-extentclj

(mad-extent vs)

-/+ median-absolute-deviation and median

 -/+ median-absolute-deviation and median
sourceraw docstring

maeclj

(mae [vs1 vs2-or-val])
(mae vs1 vs2-or-val)

Mean absolute error

Mean absolute error
sourceraw docstring

mapeclj

(mape [vs1 vs2-or-val])
(mape vs1 vs2-or-val)

Mean absolute percentage error

Mean absolute percentage error
sourceraw docstring

maximumclj

(maximum vs)

Maximum value from sequence.

Maximum value from sequence.
sourceraw docstring

mccclj

(mcc ct)
(mcc group1 group2)

Matthews correlation coefficient also known as phi coefficient.

Matthews correlation coefficient also known as phi coefficient.
sourceraw docstring

meclj

(me [vs1 vs2-or-val])
(me vs1 vs2-or-val)

Mean error

Mean error
sourceraw docstring

meanclj

(mean vs)

Calculate mean of vs

Calculate mean of `vs`
sourceraw docstring

mean-absolute-deviationclj

(mean-absolute-deviation vs)
(mean-absolute-deviation vs center)

Calculate mean absolute deviation

Calculate mean absolute deviation
sourceraw docstring

means-ratioclj

(means-ratio [group1 group2])
(means-ratio group1 group2)
(means-ratio group1 group2 adjusted?)

Means ratio

Means ratio
sourceraw docstring

means-ratio-correctedclj

(means-ratio-corrected [group1 group2])
(means-ratio-corrected group1 group2)

Bias correced means ratio

Bias correced means ratio
sourceraw docstring

medianclj

(median vs)
(median vs estimation-strategy)

Calculate median of vs. See median-3.

Calculate median of `vs`. See [[median-3]].
sourceraw docstring

median-3clj

(median-3 a b c)

Median of three values. See median.

Median of three values. See [[median]].
sourceraw docstring

median-absolute-deviationclj

(median-absolute-deviation vs)
(median-absolute-deviation vs center)
(median-absolute-deviation vs center estimation-strategy)

Calculate MAD

Calculate MAD
sourceraw docstring

minimumclj

(minimum vs)

Minimum value from sequence.

Minimum value from sequence.
sourceraw docstring

minimum-discrimination-information-testclj

(minimum-discrimination-information-test contingency-table-or-xs)
(minimum-discrimination-information-test contingency-table-or-xs params)

Minimum discrimination information test, a power divergence test for lambda -1.0

Power divergence test.

First argument should be one of:

  • contingency table
  • sequence of counts (for goodness of fit)
  • sequence of data (for goodness of fit against distribution)

For goodness of fit there are two options:

  • comparison of observed counts vs expected probabilities or weights (:p)
  • comparison of data against given distribution (:p), in this case histogram from data is created and compared to distribution PDF in bins ranges. Use :bins option to control histogram creation.

Options are:

Minimum discrimination information test, a power divergence test for `lambda` -1.0

Power divergence test.

  First argument should be one of:
  
  * contingency table
  * sequence of counts (for goodness of fit)
  * sequence of data (for goodness of fit against distribution)

  For goodness of fit there are two options:
  
  * comparison of observed counts vs expected probabilities or weights (`:p`)
  * comparison of data against given distribution (`:p`), in this case histogram from data is created and compared to distribution PDF in bins ranges. Use `:bins` option to control histogram creation.

  Options are:
  
  * `:lambda` - test type:
      * `1.0` - [[chisq-test]]
      * `0.0` - [[multinomial-likelihood-ratio-test]]
      * `-1.0` - [[minimum-discrimination-information-test]]
      * `-2.0` - [[neyman-modified-chisq-test]]
      * `-0.5` - [[freeman-tukey-test]]
      * `2/3` - [[cressie-read-test]] - default
  * `:p` - probabilites, weights or distribution object.
  * `:alpha` - significance level (default: 0.05)
  * `:ci-sides` - confidence interval sides (default: `:two-sided`)
  * `:sides`  - p-value sides (`:two-sided`, `:one-side-greater` - default, `:one-side-less`)
  * `:bootstrap-samples` - number of samples to estimate confidence intervals (default: 1000)
  * `:ddof` - delta degrees of freedom, adjustment for dof (default: 0.0)
  * `:bins` - number of bins or estimator name for histogram
sourceraw docstring

modeclj

(mode vs)
(mode vs method)
(mode vs method opts)

Find the value that appears most often in a dataset vs.

For sample from continuous distribution, three algorithms are possible:

  • :histogram - calculated from histogram
  • :kde - calculated from KDE
  • :pearson - mode = mean-3(median-mean)
  • :default - discrete mode

Histogram accepts optional :bins (see histogram). KDE method accepts :kde for kernel name (default :gaussian) and :bandwidth (auto). Pearson can accept :estimation-strategy for median.

See also modes.

Find the value that appears most often in a dataset `vs`.

For sample from continuous distribution, three algorithms are possible:
* `:histogram` - calculated from [[histogram]]
* `:kde` - calculated from KDE
* `:pearson` - mode = mean-3(median-mean)
* `:default` - discrete mode

Histogram accepts optional `:bins` (see [[histogram]]). KDE method accepts `:kde` for kernel name (default `:gaussian`) and `:bandwidth` (auto). Pearson can accept `:estimation-strategy` for median.

See also [[modes]].
sourceraw docstring

modesclj

(modes vs)
(modes vs method)
(modes vs method opts)

Find the values that appears most often in a dataset vs.

Returns sequence with all most appearing values in increasing order.

See also mode.

Find the values that appears most often in a dataset `vs`.

Returns sequence with all most appearing values in increasing order.

See also [[mode]].
sourceraw docstring

momentclj

(moment vs)
(moment vs order)
(moment vs order {:keys [absolute? center mean? normalize?] :or {mean? true}})

Calculate moment (central or/and absolute) of given order (default: 2).

Additional parameters as a map:

  • :absolute? - calculate sum as absolute values (default: false)
  • :mean? - returns mean (proper moment) or just sum of differences (default: true)
  • :center - value of center (default: nil = mean)
  • :normalize? - apply normalization by standard deviation to the order power
Calculate moment (central or/and absolute) of given order (default: 2).

Additional parameters as a map:

* `:absolute?` - calculate sum as absolute values (default: `false`)
* `:mean?` - returns mean (proper moment) or just sum of differences (default: `true`)
* `:center` - value of center (default: `nil` = mean)
* `:normalize?` - apply normalization by standard deviation to the order power
sourceraw docstring

mseclj

(mse [vs1 vs2-or-val])
(mse vs1 vs2-or-val)

Mean squared error

Mean squared error
sourceraw docstring

multinomial-likelihood-ratio-testclj

(multinomial-likelihood-ratio-test contingency-table-or-xs)
(multinomial-likelihood-ratio-test contingency-table-or-xs params)

Multinomial likelihood ratio test, a power divergence test for lambda 0.0

Power divergence test.

First argument should be one of:

  • contingency table
  • sequence of counts (for goodness of fit)
  • sequence of data (for goodness of fit against distribution)

For goodness of fit there are two options:

  • comparison of observed counts vs expected probabilities or weights (:p)
  • comparison of data against given distribution (:p), in this case histogram from data is created and compared to distribution PDF in bins ranges. Use :bins option to control histogram creation.

Options are:

Multinomial likelihood ratio test, a power divergence test for `lambda` 0.0

Power divergence test.

  First argument should be one of:
  
  * contingency table
  * sequence of counts (for goodness of fit)
  * sequence of data (for goodness of fit against distribution)

  For goodness of fit there are two options:
  
  * comparison of observed counts vs expected probabilities or weights (`:p`)
  * comparison of data against given distribution (`:p`), in this case histogram from data is created and compared to distribution PDF in bins ranges. Use `:bins` option to control histogram creation.

  Options are:
  
  * `:lambda` - test type:
      * `1.0` - [[chisq-test]]
      * `0.0` - [[multinomial-likelihood-ratio-test]]
      * `-1.0` - [[minimum-discrimination-information-test]]
      * `-2.0` - [[neyman-modified-chisq-test]]
      * `-0.5` - [[freeman-tukey-test]]
      * `2/3` - [[cressie-read-test]] - default
  * `:p` - probabilites, weights or distribution object.
  * `:alpha` - significance level (default: 0.05)
  * `:ci-sides` - confidence interval sides (default: `:two-sided`)
  * `:sides`  - p-value sides (`:two-sided`, `:one-side-greater` - default, `:one-side-less`)
  * `:bootstrap-samples` - number of samples to estimate confidence intervals (default: 1000)
  * `:ddof` - delta degrees of freedom, adjustment for dof (default: 0.0)
  * `:bins` - number of bins or estimator name for histogram
sourceraw docstring

neyman-modified-chisq-testclj

(neyman-modified-chisq-test contingency-table-or-xs)
(neyman-modified-chisq-test contingency-table-or-xs params)

Neyman modifield chi square test, a power divergence test for lambda -2.0

Power divergence test.

First argument should be one of:

  • contingency table
  • sequence of counts (for goodness of fit)
  • sequence of data (for goodness of fit against distribution)

For goodness of fit there are two options:

  • comparison of observed counts vs expected probabilities or weights (:p)
  • comparison of data against given distribution (:p), in this case histogram from data is created and compared to distribution PDF in bins ranges. Use :bins option to control histogram creation.

Options are:

Neyman modifield chi square test, a power divergence test for `lambda` -2.0

Power divergence test.

  First argument should be one of:
  
  * contingency table
  * sequence of counts (for goodness of fit)
  * sequence of data (for goodness of fit against distribution)

  For goodness of fit there are two options:
  
  * comparison of observed counts vs expected probabilities or weights (`:p`)
  * comparison of data against given distribution (`:p`), in this case histogram from data is created and compared to distribution PDF in bins ranges. Use `:bins` option to control histogram creation.

  Options are:
  
  * `:lambda` - test type:
      * `1.0` - [[chisq-test]]
      * `0.0` - [[multinomial-likelihood-ratio-test]]
      * `-1.0` - [[minimum-discrimination-information-test]]
      * `-2.0` - [[neyman-modified-chisq-test]]
      * `-0.5` - [[freeman-tukey-test]]
      * `2/3` - [[cressie-read-test]] - default
  * `:p` - probabilites, weights or distribution object.
  * `:alpha` - significance level (default: 0.05)
  * `:ci-sides` - confidence interval sides (default: `:two-sided`)
  * `:sides`  - p-value sides (`:two-sided`, `:one-side-greater` - default, `:one-side-less`)
  * `:bootstrap-samples` - number of samples to estimate confidence intervals (default: 1000)
  * `:ddof` - delta degrees of freedom, adjustment for dof (default: 0.0)
  * `:bins` - number of bins or estimator name for histogram
sourceraw docstring

omega-sqclj

(omega-sq [group1 group2])
(omega-sq group1 group2)

Adjusted R2

Adjusted R2
sourceraw docstring

one-way-anova-testclj

(one-way-anova-test xss)
(one-way-anova-test xss {:keys [sides] :or {sides :one-sided-greater}})
source

outer-fence-extentclj

(outer-fence-extent vs)
(outer-fence-extent vs estimation-strategy)

Returns LOF, UOF and median

Returns LOF, UOF and median
sourceraw docstring

outliersclj

(outliers vs)
(outliers vs estimation-strategy)
(outliers vs q1 q3)

Find outliers defined as values outside inner fences.

Let Q1 is 25-percentile and Q3 is 75-percentile. IQR is (- Q3 Q1).

  • LIF (Lower Inner Fence) equals (- Q1 (* 1.5 IQR)).
  • UIF (Upper Inner Fence) equals (+ Q3 (* 1.5 IQR)).

Returns sequence.

Optional estimation-strategy argument can be set to change quantile calculations estimation type. See [[estimation-strategies]].

Find outliers defined as values outside inner fences.

Let Q1 is 25-percentile and Q3 is 75-percentile. IQR is `(- Q3 Q1)`.

* LIF (Lower Inner Fence) equals `(- Q1 (* 1.5 IQR))`.
* UIF (Upper Inner Fence) equals `(+ Q3 (* 1.5 IQR))`.

Returns sequence.

Optional `estimation-strategy` argument can be set to change quantile calculations estimation type. See [[estimation-strategies]].
sourceraw docstring

p-overlapclj

(p-overlap [group1 group2])
(p-overlap group1 group2)
(p-overlap group1
           group2
           {:keys [kde bandwidth min-iterations steps]
            :or {kde :gaussian min-iterations 3 steps 500}})

Overlapping index, kernel density approximation

Overlapping index, kernel density approximation
sourceraw docstring

p-valueclj

(p-value stat)
(p-value distribution stat)
(p-value distribution stat sides)

Calculate p-value for given distribution (default: N(0,1)), stat and sides (one of :two-sided, :one-sided-greater or :one-sided-less/:one-sided).

Calculate p-value for given distribution (default: N(0,1)), `stat`  and sides (one of `:two-sided`, `:one-sided-greater` or `:one-sided-less`/`:one-sided`).
sourceraw docstring

pacfclj

(pacf data)
(pacf data lags)

Caluclate pacf (partial autocorrelation function) for given number of lags.

If lags is omitted function returns maximum possible number of lags.

pacf returns also lag 0 (which is 0.0).

See also acf, acf-ci, pacf-ci

Caluclate pacf (partial autocorrelation function) for given number of lags.

If lags is omitted function returns maximum possible number of lags.

`pacf` returns also lag `0` (which is `0.0`).

See also [[acf]], [[acf-ci]], [[pacf-ci]]
sourceraw docstring

pacf-ciclj

(pacf-ci data)
(pacf-ci data lags)
(pacf-ci data lags alpha)

pacf with added confidence interval data.

[[pacf]] with added confidence interval data.
sourceraw docstring

pearson-correlationclj

(pearson-correlation [vs1 vs2])
(pearson-correlation vs1 vs2)

Pearson's correlation of two sequences.

Pearson's correlation of two sequences.
sourceraw docstring

pearson-rclj

(pearson-r [group1 group2])
(pearson-r group1 group2)

Pearson r correlation coefficient

Pearson `r` correlation coefficient
sourceraw docstring

percentileclj

(percentile vs p)
(percentile vs p estimation-strategy)

Calculate percentile of a vs.

Percentile p is from range 0-100.

See docs.

Optionally you can provide estimation-strategy to change interpolation methods for selecting values. Default is :legacy. See more here

See also quantile.

Calculate percentile of a `vs`.

Percentile `p` is from range 0-100.

See [docs](http://commons.apache.org/proper/commons-math/javadocs/api-3.4/org/apache/commons/math3/stat/descriptive/rank/Percentile.html).

Optionally you can provide `estimation-strategy` to change interpolation methods for selecting values. Default is `:legacy`. See more [here](http://commons.apache.org/proper/commons-math/javadocs/api-3.6.1/org/apache/commons/math3/stat/descriptive/rank/Percentile.EstimationType.html)

See also [[quantile]].
sourceraw docstring

percentile-bc-extentclj

(percentile-bc-extent vs)
(percentile-bc-extent vs p)
(percentile-bc-extent vs p1 p2)
(percentile-bc-extent vs p1 p2 estimation-strategy)

Return bias corrected percentile range and mean for bootstrap samples. See https://projecteuclid.org/euclid.ss/1032280214

p - calculates extent of bias corrected p and 100-p (default: p=2.5)

Set estimation-strategy to :r7 to get the same result as in R coxed::bca.

Return bias corrected percentile range and mean for bootstrap samples.
See https://projecteuclid.org/euclid.ss/1032280214

`p` - calculates extent of bias corrected `p` and `100-p` (default: `p=2.5`)

Set `estimation-strategy` to `:r7` to get the same result as in R `coxed::bca`.
sourceraw docstring

percentile-bca-extentclj

(percentile-bca-extent vs)
(percentile-bca-extent vs p)
(percentile-bca-extent vs p1 p2)
(percentile-bca-extent vs p1 p2 estimation-strategy)
(percentile-bca-extent vs p1 p2 accel estimation-strategy)

Return bias corrected percentile range and mean for bootstrap samples. Also accounts for variance variations throught the accelaration parameter. See https://projecteuclid.org/euclid.ss/1032280214

p - calculates extent of bias corrected p and 100-p (default: p=2.5)

Set estimation-strategy to :r7 to get the same result as in R coxed::bca.

Return bias corrected percentile range and mean for bootstrap samples. Also accounts for variance
 variations throught the accelaration parameter.
See https://projecteuclid.org/euclid.ss/1032280214

`p` - calculates extent of bias corrected `p` and `100-p` (default: `p=2.5`)

Set `estimation-strategy` to `:r7` to get the same result as in R `coxed::bca`.
sourceraw docstring

percentile-extentclj

(percentile-extent vs)
(percentile-extent vs p)
(percentile-extent vs p1 p2)
(percentile-extent vs p1 p2 estimation-strategy)

Return percentile range and median.

p - calculates extent of p and 100-p (default: p=25)

Return percentile range and median.

`p` - calculates extent of `p` and `100-p` (default: `p=25`)
sourceraw docstring

percentilesclj

(percentiles vs)
(percentiles vs ps)
(percentiles vs ps estimation-strategy)

Calculate percentiles of a vs.

Percentiles are sequence of values from range 0-100.

See docs.

Optionally you can provide estimation-strategy to change interpolation methods for selecting values. Default is :legacy. See more here

See also quantile.

Calculate percentiles of a `vs`.

Percentiles are sequence of values from range 0-100.

See [docs](http://commons.apache.org/proper/commons-math/javadocs/api-3.4/org/apache/commons/math3/stat/descriptive/rank/Percentile.html).

Optionally you can provide `estimation-strategy` to change interpolation methods for selecting values. Default is `:legacy`. See more [here](http://commons.apache.org/proper/commons-math/javadocs/api-3.6.1/org/apache/commons/math3/stat/descriptive/rank/Percentile.EstimationType.html)

See also [[quantile]].
sourceraw docstring

piclj

(pi vs)
(pi vs size)
(pi vs size estimation-strategy)

Returns PI as a map, quantile intervals based on interval size.

Quantiles are (1-size)/2 and 1-(1-size)/2

Returns PI as a map, quantile intervals based on interval size.

Quantiles are `(1-size)/2` and `1-(1-size)/2`
sourceraw docstring

pi-extentclj

(pi-extent vs)
(pi-extent vs size)
(pi-extent vs size estimation-strategy)

Returns PI extent, quantile intervals based on interval size + median.

Quantiles are (1-size)/2 and 1-(1-size)/2

Returns PI extent, quantile intervals based on interval size + median.

Quantiles are `(1-size)/2` and `1-(1-size)/2`
sourceraw docstring

pooled-stddevclj

(pooled-stddev groups)
(pooled-stddev groups method)

Calculate pooled standard deviation for samples and method

Calculate pooled standard deviation for samples and method
sourceraw docstring

pooled-varianceclj

(pooled-variance groups)
(pooled-variance groups method)

Calculate pooled variance for samples and method.

Methods:

  • :unbiased - sqrt of weighted average of variances (default)
  • :biased - biased version of :unbiased
  • :avg - sqrt of average of variances
Calculate pooled variance for samples and method.

Methods:
* `:unbiased` - sqrt of weighted average of variances (default)
* `:biased` - biased version of `:unbiased`
* `:avg` - sqrt of average of variances
sourceraw docstring

population-stddevclj

(population-stddev vs)
(population-stddev vs u)

Calculate population standard deviation of vs.

See stddev.

Calculate population standard deviation of `vs`.

See [[stddev]].
sourceraw docstring

population-varianceclj

(population-variance vs)
(population-variance vs u)

Calculate population variance of vs.

See variance.

Calculate population variance of `vs`.

See [[variance]].
sourceraw docstring

power-divergence-testclj

(power-divergence-test contingency-table-or-xs)
(power-divergence-test contingency-table-or-xs
                       {:keys [lambda ci-sides sides p alpha bootstrap-samples
                               ddof bins]
                        :or {lambda m/TWO_THIRD
                             sides :one-sided-greater
                             ci-sides :two-sided
                             alpha 0.05
                             bootstrap-samples 1000
                             ddof 0}})

Power divergence test.

First argument should be one of:

  • contingency table
  • sequence of counts (for goodness of fit)
  • sequence of data (for goodness of fit against distribution)

For goodness of fit there are two options:

  • comparison of observed counts vs expected probabilities or weights (:p)
  • comparison of data against given distribution (:p), in this case histogram from data is created and compared to distribution PDF in bins ranges. Use :bins option to control histogram creation.

Options are:

Power divergence test.

First argument should be one of:

* contingency table
* sequence of counts (for goodness of fit)
* sequence of data (for goodness of fit against distribution)

For goodness of fit there are two options:

* comparison of observed counts vs expected probabilities or weights (`:p`)
* comparison of data against given distribution (`:p`), in this case histogram from data is created and compared to distribution PDF in bins ranges. Use `:bins` option to control histogram creation.

Options are:

* `:lambda` - test type:
    * `1.0` - [[chisq-test]]
    * `0.0` - [[multinomial-likelihood-ratio-test]]
    * `-1.0` - [[minimum-discrimination-information-test]]
    * `-2.0` - [[neyman-modified-chisq-test]]
    * `-0.5` - [[freeman-tukey-test]]
    * `2/3` - [[cressie-read-test]] - default
* `:p` - probabilites, weights or distribution object.
* `:alpha` - significance level (default: 0.05)
* `:ci-sides` - confidence interval sides (default: `:two-sided`)
* `:sides`  - p-value sides (`:two-sided`, `:one-side-greater` - default, `:one-side-less`)
* `:bootstrap-samples` - number of samples to estimate confidence intervals (default: 1000)
* `:ddof` - delta degrees of freedom, adjustment for dof (default: 0.0)
* `:bins` - number of bins or estimator name for histogram
sourceraw docstring

powmeanclj

(powmean vs power)

Generalized power mean

Generalized power mean
sourceraw docstring

psnrclj

(psnr [vs1 vs2-or-val])
(psnr vs1 vs2-or-val)
(psnr vs1 vs2-or-val max-value)

Peak signal to noise, max-value is maximum possible value (default: max from vs1 and vs2)

Peak signal to noise, `max-value` is maximum possible value (default: max from `vs1` and `vs2`)
sourceraw docstring

quantileclj

(quantile vs q)
(quantile vs q estimation-strategy)

Calculate quantile of a vs.

Quantile q is from range 0.0-1.0.

See docs for interpolation strategy.

Optionally you can provide estimation-strategy to change interpolation methods for selecting values. Default is :legacy. See more here

See also percentile.

Calculate quantile of a `vs`.

Quantile `q` is from range 0.0-1.0.

See [docs](http://commons.apache.org/proper/commons-math/javadocs/api-3.4/org/apache/commons/math3/stat/descriptive/rank/Percentile.html) for interpolation strategy.

Optionally you can provide `estimation-strategy` to change interpolation methods for selecting values. Default is `:legacy`. See more [here](http://commons.apache.org/proper/commons-math/javadocs/api-3.6.1/org/apache/commons/math3/stat/descriptive/rank/Percentile.EstimationType.html)

See also [[percentile]].
sourceraw docstring

quantile-extentclj

(quantile-extent vs)
(quantile-extent vs q)
(quantile-extent vs q1 q2)
(quantile-extent vs q1 q2 estimation-strategy)

Return quantile range and median.

q - calculates extent of q and 1.0-q (default: q=0.25)

Return quantile range and median.

`q` - calculates extent of `q` and `1.0-q` (default: `q=0.25`)
sourceraw docstring

quantilesclj

(quantiles vs)
(quantiles vs qs)
(quantiles vs qs estimation-strategy)

Calculate quantiles of a vs.

Quantilizes is sequence with values from range 0.0-1.0.

See docs for interpolation strategy.

Optionally you can provide estimation-strategy to change interpolation methods for selecting values. Default is :legacy. See more here

See also percentiles.

Calculate quantiles of a `vs`.

Quantilizes is sequence with values from range 0.0-1.0.

See [docs](http://commons.apache.org/proper/commons-math/javadocs/api-3.4/org/apache/commons/math3/stat/descriptive/rank/Percentile.html) for interpolation strategy.

Optionally you can provide `estimation-strategy` to change interpolation methods for selecting values. Default is `:legacy`. See more [here](http://commons.apache.org/proper/commons-math/javadocs/api-3.6.1/org/apache/commons/math3/stat/descriptive/rank/Percentile.EstimationType.html)

See also [[percentiles]].
sourceraw docstring

r2clj

(r2 [vs1 vs2-or-val])
(r2 vs1 vs2-or-val)

R2

R2
sourceraw docstring

r2-determinationclj

(r2-determination [group1 group2])
(r2-determination group1 group2)

Coefficient of determination

Coefficient of determination
sourceraw docstring

rank-epsilon-sqclj

(rank-epsilon-sq xs)

Effect size for Kruskal-Wallis test

Effect size for Kruskal-Wallis test
sourceraw docstring

rank-eta-sqclj

(rank-eta-sq xs)

Effect size for Kruskal-Wallis test

Effect size for Kruskal-Wallis test
sourceraw docstring

rescaleclj

(rescale vs)
(rescale vs low high)

Lineary rascale data to desired range, [0,1] by default

Lineary rascale data to desired range, [0,1] by default
sourceraw docstring

rmseclj

(rmse [vs1 vs2-or-val])
(rmse vs1 vs2-or-val)

Root mean squared error

Root mean squared error
sourceraw docstring

robust-standardizeclj

(robust-standardize vs)
(robust-standardize vs q)

Normalize samples to have median = 0 and MAD = 1.

If q argument is used, scaling is done by quantile difference (Q_q, Q_(1-q)). Set 0.25 for IQR.

Normalize samples to have median = 0 and MAD = 1.

If `q` argument is used, scaling is done by quantile difference (Q_q, Q_(1-q)). Set 0.25 for IQR.
sourceraw docstring

rows->contingency-tableclj

(rows->contingency-table xss)
source

rssclj

(rss [vs1 vs2-or-val])
(rss vs1 vs2-or-val)

Residual sum of squares

Residual sum of squares
sourceraw docstring

second-momentcljdeprecated

source

semclj

(sem vs)

Standard error of mean

Standard error of mean
sourceraw docstring

sem-extentclj

(sem-extent vs)

-/+ sem and mean

 -/+ sem and mean
sourceraw docstring

similarityclj

(similarity method P-observed Q-expected)
(similarity method
            P-observed
            Q-expected
            {:keys [bins probabilities? epsilon]
             :or {probabilities? true epsilon 1.0E-6}})

Various PDF similarities between two histograms (frequencies) or probabilities.

Q can be a distribution object. Then, histogram will be created out of P.

Arguments:

  • method - distance method
  • P-observed - frequencies, probabilities or actual data (when Q is a distribution)
  • Q-expected - frequencies, probabilities or distribution object (when P is a data)

Options:

  • :probabilities? - should P/Q be converted to a probabilities, default: true.
  • :epsilon - small number which replaces 0.0 when division or logarithm is used`
  • :bins - number of bins or bins estimation method, see histogram.

The list of methods: :intersection, :czekanowski, :motyka, :kulczynski, :ruzicka, :inner-product, :harmonic-mean, :cosine, :jaccard, :dice, :fidelity, :squared-chord

See more: Comprehensive Survey on Distance/Similarity Measures between Probability Density Functions by Sung-Hyuk Cha

Various PDF similarities between two histograms (frequencies) or probabilities.

Q can be a distribution object. Then, histogram will be created out of P.

Arguments:

* `method` - distance method
* `P-observed` - frequencies, probabilities or actual data (when Q is a distribution)
* `Q-expected` - frequencies, probabilities or distribution object (when P is a data)

Options:

* `:probabilities?` - should P/Q be converted to a probabilities, default: `true`.
* `:epsilon` - small number which replaces `0.0` when division or logarithm is used`
* `:bins` - number of bins or bins estimation method, see [[histogram]].

The list of methods: `:intersection`, `:czekanowski`, `:motyka`, `:kulczynski`, `:ruzicka`, `:inner-product`, `:harmonic-mean`, `:cosine`, `:jaccard`, `:dice`, `:fidelity`, `:squared-chord`

See more: Comprehensive Survey on Distance/Similarity Measures between Probability Density Functions by Sung-Hyuk Cha
sourceraw docstring

skewnessclj

(skewness vs)
(skewness vs typ)

Calculate skewness from sequence.

Possible types: :G1 (default), :g1 (:pearson), :b1, :B1 (:yule), :B3, :skew, :mode or :median.

Calculate skewness from sequence.

Possible types: `:G1` (default), `:g1` (`:pearson`), `:b1`, `:B1` (`:yule`), `:B3`, `:skew`, `:mode` or `:median`.
sourceraw docstring

spanclj

(span vs)

Width of the sample, maximum value minus minimum value

Width of the sample, maximum value minus minimum value
sourceraw docstring

spearman-correlationclj

(spearman-correlation [vs1 vs2])
(spearman-correlation vs1 vs2)

Spearman's correlation of two sequences.

Spearman's correlation of two sequences.
sourceraw docstring

standardizeclj

(standardize vs)

Normalize samples to have mean = 0 and stddev = 1.

Normalize samples to have mean = 0 and stddev = 1.
sourceraw docstring

stats-mapclj

(stats-map vs)
(stats-map vs estimation-strategy)

Calculate several statistics of vs and return as map.

Optional estimation-strategy argument can be set to change quantile calculations estimation type. See [[estimation-strategies]].

Calculate several statistics of `vs` and return as map.

Optional `estimation-strategy` argument can be set to change quantile calculations estimation type. See [[estimation-strategies]].
sourceraw docstring

stddevclj

(stddev vs)
(stddev vs u)

Calculate standard deviation of vs.

See population-stddev.

Calculate standard deviation of `vs`.

See [[population-stddev]].
sourceraw docstring

stddev-extentclj

(stddev-extent vs)

-/+ stddev and mean

 -/+ stddev and mean
sourceraw docstring

sumclj

(sum vs)

Sum of all vs values.

Sum of all `vs` values.
sourceraw docstring

t-test-one-sampleclj

(t-test-one-sample xs)
(t-test-one-sample xs m)

One sample Student's t-test

  • alpha - significance level (default: 0.05)
  • sides - one of: :two-sided, :one-sided-less (short: :one-sided) or :one-sided-greater
  • mu - mean (default: 0.0)
One sample Student's t-test

* `alpha` - significance level (default: `0.05`)
* `sides` - one of: `:two-sided`, `:one-sided-less` (short: `:one-sided`) or `:one-sided-greater`
* `mu` - mean (default: `0.0`)
sourceraw docstring

t-test-two-samplesclj

(t-test-two-samples xs ys)
(t-test-two-samples xs
                    ys
                    {:keys [paired? equal-variances?]
                     :or {paired? false equal-variances? false}
                     :as params})

Two samples Student's t-test

  • alpha - significance level (default: 0.05)
  • sides - one of: :two-sided (default), :one-sided-less (short: :one-sided) or :one-sided-greater
  • mu - mean (default: 0.0)
  • paired? - unpaired or paired test, boolean (default: false)
  • equal-variances? - unequal or equal variances, boolean (default: false)
Two samples Student's t-test

* `alpha` - significance level (default: `0.05`)
* `sides` - one of: `:two-sided` (default), `:one-sided-less` (short: `:one-sided`) or `:one-sided-greater`
* `mu` - mean (default: `0.0`)
* `paired?` - unpaired or paired test, boolean (default: `false`)
* `equal-variances?` - unequal or equal variances, boolean (default: `false`)
sourceraw docstring

trimclj

(trim vs)
(trim vs quantile)
(trim vs quantile estimation-strategy)
(trim vs low high nan)

Return trimmed data. Trim is done by using quantiles, by default is set to 0.2.

Return trimmed data. Trim is done by using quantiles, by default is set to 0.2.
sourceraw docstring

tschuprows-tclj

(tschuprows-t contingency-table)
(tschuprows-t group1 group2)

Tschuprows T effect size for discrete data

Tschuprows T effect size for discrete data
sourceraw docstring

ttest-one-samplecljdeprecated

source

ttest-two-samplescljdeprecated

source

varianceclj

(variance vs)
(variance vs u)

Calculate variance of vs.

See population-variance.

Calculate variance of `vs`.

See [[population-variance]].
sourceraw docstring

variationclj

(variation vs)

Coefficient of variation CV = stddev / mean

Coefficient of variation CV = stddev / mean
sourceraw docstring

weighted-kappaclj

(weighted-kappa contingency-table)
(weighted-kappa contingency-table weights)

Cohen's weighted kappa for indexed contingency table

Cohen's weighted kappa for indexed contingency table
sourceraw docstring

winsorclj

(winsor vs)
(winsor vs quantile)
(winsor vs quantile estimation-strategy)
(winsor vs low high nan)

Return winsorized data. Trim is done by using quantiles, by default is set to 0.2.

Return winsorized data. Trim is done by using quantiles, by default is set to 0.2.
sourceraw docstring

wmeanclj

(wmean vs)
(wmean vs weights)

Weighted mean

Weighted mean
sourceraw docstring

wmedianclj

(wmedian vs ws)
(wmedian vs ws method)

Weighted median.

Calculation is done using interpolation. There are three methods:

  • :linear - linear interpolation, default
  • :step - step interpolation
  • :average - average of ties

Based on spatstat.geom::weighted.quantile from R.

Weighted median.

Calculation is done using interpolation. There are three methods:
* `:linear` - linear interpolation, default
* `:step` - step interpolation
* `:average` - average of ties

Based on `spatstat.geom::weighted.quantile` from R.
sourceraw docstring

wmw-oddsclj

(wmw-odds [group1 group2])
(wmw-odds group1 group2)

Wilcoxon-Mann-Whitney odds

Wilcoxon-Mann-Whitney odds
sourceraw docstring

wquantileclj

(wquantile vs ws q)
(wquantile vs ws q method)

Weighted quantile.

Calculation is done using interpolation. There are three methods:

  • :linear - linear interpolation, default
  • :step - step interpolation
  • :average - average of ties

Based on spatstat.geom::weighted.quantile from R.

Weighted quantile.

Calculation is done using interpolation. There are three methods:
* `:linear` - linear interpolation, default
* `:step` - step interpolation
* `:average` - average of ties

Based on `spatstat.geom::weighted.quantile` from R.
sourceraw docstring

wquantilesclj

(wquantiles vs ws)
(wquantiles vs ws qs)
(wquantiles vs ws qs method)

Weighted quantiles.

Calculation is done using interpolation. There are three methods:

  • :linear - linear interpolation, default
  • :step - step interpolation
  • :average - average of ties

Based on spatstat.geom::weighted.quantile from R.

Weighted quantiles.

Calculation is done using interpolation. There are three methods:
* `:linear` - linear interpolation, default
* `:step` - step interpolation
* `:average` - average of ties

Based on `spatstat.geom::weighted.quantile` from R.
sourceraw docstring

z-test-one-sampleclj

(z-test-one-sample xs)
(z-test-one-sample xs m)

One sample z-test

  • alpha - significance level (default: 0.05)
  • sides - one of: :two-sided, :one-sided-less (short: :one-sided) or :one-sided-greater
  • mu - mean (default: 0.0)
One sample z-test

* `alpha` - significance level (default: `0.05`)
* `sides` - one of: `:two-sided`, `:one-sided-less` (short: `:one-sided`) or `:one-sided-greater`
* `mu` - mean (default: `0.0`)
sourceraw docstring

z-test-two-samplesclj

(z-test-two-samples xs ys)
(z-test-two-samples xs
                    ys
                    {:keys [paired? equal-variances?]
                     :or {paired? false equal-variances? false}
                     :as params})

Two samples z-test

  • alpha - significance level (default: 0.05)
  • sides - one of: :two-sided (default), :one-sided-less (short: :one-sided) or :one-sided-greater
  • mu - mean (default: 0.0)
  • paired? - unpaired or paired test, boolean (default: false)
  • equal-variances? - unequal or equal variances, boolean (default: false)
Two samples z-test

* `alpha` - significance level (default: `0.05`)
* `sides` - one of: `:two-sided` (default), `:one-sided-less` (short: `:one-sided`) or `:one-sided-greater`
* `mu` - mean (default: `0.0`)
* `paired?` - unpaired or paired test, boolean (default: `false`)
* `equal-variances?` - unequal or equal variances, boolean (default: `false`)
sourceraw docstring

cljdoc is a website building & hosting documentation for Clojure/Script libraries

× close