tesser.math

Liking cljdoc? Tell your friends :D

Clojure only.

correlation
correlation+count
correlation+count-matrix
correlation-matrix
covariance
covariance-matrix
digest
fuse-matrix
mean
standard-deviation
sum
variance

Folds over numbers! Calculate sums, means, variance, standard deviation, covariance and linear correlations, and matrices thereof, plus quantiles and histograms estimates backed by probabilistic QDigests.

Folds over numbers! Calculate sums, means, variance, standard deviation,
covariance and linear correlations, and matrices thereof, plus quantiles and
histograms estimates backed by probabilistic QDigests.

raw docstring

correlation^clj

(correlation & args)

Like correlation+count, but only returns the correlation.

Like correlation+count, but only returns the correlation.

source raw docstring

correlation+count^clj

(correlation+count fx fy)

(correlation+count fx fy fold__3373__auto__)

Given two functions: (fx input) and (fy input), each of which returns a number, estimates the unbiased linear correlation coefficient between fx and fy over inputs. Ignores any records where fx or fy are nil. If there are no records with values for fx and fy, the correlation is nil. See http://mathworld.wolfram.com/CorrelationCoefficient.html.

This function returns a map of correlation and count, like

{:correlation 0.34 :count 142}

which is useful for significance testing.

Given two functions: (fx input) and (fy input), each of which returns a
number, estimates the unbiased linear correlation coefficient between fx and
fy over inputs. Ignores any records where fx or fy are nil. If there are no
records with values for fx and fy, the correlation is nil. See
http://mathworld.wolfram.com/CorrelationCoefficient.html.

This function returns a map of correlation and count, like

    {:correlation 0.34 :count 142}

which is useful for significance testing.

source raw docstring

correlation+count-matrix^clj

(correlation+count-matrix & args)

Given a map of key names to functions that extract values for those keys from an input, computes the correlations for each of the n^2 key pairs, returning a map of name pairs to the their correlations and counts. See correlation+count. For example:

(t/correlation-matrix {:name-length #(.length (:name %))
                      :age         :age
                      :num-cats    (comp count :cats)})

will, when executed, returns a map like

{[:name-length :age]      {:count 150 :correlation 0.56}
 [:name-length :num-cats] {:count 150 :correlation 0.95}
 ...}

Given a map of key names to functions that extract values for those keys
from an input, computes the correlations for each of the n^2 key
pairs, returning a map of name pairs to the their correlations and counts.
See correlation+count. For example:

    (t/correlation-matrix {:name-length #(.length (:name %))
                          :age         :age
                          :num-cats    (comp count :cats)})

will, when executed, returns a map like

    {[:name-length :age]      {:count 150 :correlation 0.56}
     [:name-length :num-cats] {:count 150 :correlation 0.95}
     ...}

source raw docstring

correlation-matrix^clj

(correlation-matrix & args)

Like correlation+count-matrix, but returns just correlations coefficients instead of maps of :correlation and :count.

Like correlation+count-matrix, but returns just correlations coefficients
instead of maps of :correlation and :count.

source raw docstring

covariance^clj

(covariance fx fy)

(covariance fx fy fold__3373__auto__)

Given two functions of an input (fx input) and (fy input), each of which returns a number, estimates the unbiased covariance of those functions over inputs.

Ignores any inputs where (fx input) or (fy input) are nil. If no inputs have both x and y, returns nil.

Given two functions of an input `(fx input)` and `(fy input)`, each of which
returns a number, estimates the unbiased covariance of those functions over
inputs.

Ignores any inputs where `(fx input)` or `(fy input)` are nil. If no inputs
have both x and y, returns nil.

source raw docstring

covariance-matrix^clj

(covariance-matrix & args)

Given a map of key names to functions that extract values for those keys from an input, computes the covariance for each of the n^2 key pairs, returning a map of name pairs to the their covariance. For example:

(t/covariance-matrix {:name-length #(.length (:name %))
                      :age         :age
                      :num-cats    (comp count :cats)})

Given a map of key names to functions that extract values for those keys
from an input, computes the covariance for each of the n^2 key pairs,
returning a map of name pairs to the their covariance. For example:

    (t/covariance-matrix {:name-length #(.length (:name %))
                          :age         :age
                          :num-cats    (comp count :cats)})

source raw docstring

digest^clj

(digest digest-generator)

(digest digest-generator fold__3373__auto__)

You've got a set of numeric inputs and want to know their quantiles distribution, histogram, etc. This fold takes numeric inputs and produces a statistical estimate of their distribution.

digest takes a function that returns a tesser.quantiles/Digest. The fold returns an instance of that digest.

For example, to compute an HDRHistogram over both positive and negative doubles (or longs, rationals, etc):

Compute a digest using e.g.

(def digest (->> (m/digest q/hdr-histogram)
                 (t/tesser [[1 1 1 1 1 1 2 2 2 3 3 4 5]])))
; => #<DoubleHistogram ...>

To specify options for the digest, just use partial or (fn [] ...)

(m/digest (partial q/hdr-histogram {:significant-value-digits 4
                                    :highest-to-lowest-value-ratio 1e6}))

DoubleHistogram, like many quantile estimators, only works over positive values. To cover positives and negatives together, use tesser.quantiles/dual:

(m/digest #(q/dual q/hdr-histogram {:significant-value-digits 2}))

Once you've computed a digest, you can find a particular quantile using tesser.quantiles/quantile

(q/quantile digest 0)   ; => 1.0
(q/quantile digest 0.5) ; => 1.0
(q/quantile digest 4/5) ; => 2.0009765625
(q/quantile digest 1)   ; => 3.0009765625

The total number of points in the sample:

(q/point-count digest) ; => 5

Minima and maxima:

(q/min digest) ; => 1.0
(q/max digest) ; => 3.0009765625

Or find the distribution of values less than or equal to each point, with resolution given by the internal granularity of the digest:

(q/distribution digest)
; => ([1.0 3] [2.0009765625 1] [3.0009765625 1])

(q/cumulative-distribution digest)
; => ([1.0 3] [2.0009765625 4] [3.0009765625 5])

You don't have to return the whole digest; any of these derivative operations can be merged directly into the fold via tesser.core/post-combine.

(->> (m/digest q/hdr-histogram)
     (t/post-combine #(q/quantile % 1/2))
     (t/tesser [[1 2 2 3 3 3 3 3 3 3 3]]))
; => 3.0009765625

You may also use tesser.cardinality/hll for estimating the cardinality of a set. HLL+ uses a probabilistic data-structure to compute set cardinality using very little memory with accuracy tradeoffs.

The HLL digest can be used like the above mentioned histograms:

(def digest (->> (m/digest cardinality/hll)
                 (t/tesser [[1 1 1 1 1 1 2 2 2 3 3 4 5]])))
; => #<HyperLogLogPlus...>

Getting the cardinality out through a post-combine step:

(->> (m/digest cardinality/hll)
     (t/post-combine #(q/point-count %))
     (t/tesser [[1 2 2 3 3 3 3 3 3 3 3]]))
; => 3

I want to emphasize that depending on the size of your data, its distribution, and the number of digests you want to compute, you may need different digest algorithms and widely varying tuning parameters. Until we have a better grasp of the space/error tradeoffs here, I won't choose defaults for you.

You've got a set of numeric inputs and want to know their quantiles
distribution, histogram, etc. This fold takes numeric inputs and
produces a statistical estimate of their distribution.

`digest` takes a function that returns a `tesser.quantiles/Digest`. The fold
returns an instance of that digest.

For example, to compute an HDRHistogram over both positive and negative
doubles (or longs, rationals, etc):

Compute a digest using e.g.

    (def digest (->> (m/digest q/hdr-histogram)
                     (t/tesser [[1 1 1 1 1 1 2 2 2 3 3 4 5]])))
    ; => #<DoubleHistogram ...>

To specify options for the digest, just use partial or (fn [] ...)

    (m/digest (partial q/hdr-histogram {:significant-value-digits 4
                                        :highest-to-lowest-value-ratio 1e6}))

DoubleHistogram, like many quantile estimators, only works over positive
values. To cover positives and negatives together, use
`tesser.quantiles/dual`:

    (m/digest #(q/dual q/hdr-histogram {:significant-value-digits 2}))

Once you've computed a digest, you can find a particular quantile using
`tesser.quantiles/quantile`

    (q/quantile digest 0)   ; => 1.0
    (q/quantile digest 0.5) ; => 1.0
    (q/quantile digest 4/5) ; => 2.0009765625
    (q/quantile digest 1)   ; => 3.0009765625

The total number of points in the sample:

    (q/point-count digest) ; => 5

Minima and maxima:

    (q/min digest) ; => 1.0
    (q/max digest) ; => 3.0009765625

Or find the distribution of values less than or equal to each point, with
resolution given by the internal granularity of the digest:

    (q/distribution digest)
    ; => ([1.0 3] [2.0009765625 1] [3.0009765625 1])

    (q/cumulative-distribution digest)
    ; => ([1.0 3] [2.0009765625 4] [3.0009765625 5])

You don't have to return the whole digest; any of these derivative
operations can be merged directly into the fold via
`tesser.core/post-combine`.

    (->> (m/digest q/hdr-histogram)
         (t/post-combine #(q/quantile % 1/2))
         (t/tesser [[1 2 2 3 3 3 3 3 3 3 3]]))
    ; => 3.0009765625

You may also use `tesser.cardinality/hll` for estimating the cardinality of a
set. HLL+ uses a probabilistic data-structure to compute set cardinality using
very little memory with accuracy tradeoffs.

The HLL digest can be used like the above mentioned histograms:

    (def digest (->> (m/digest cardinality/hll)
                     (t/tesser [[1 1 1 1 1 1 2 2 2 3 3 4 5]])))
    ; => #<HyperLogLogPlus...>

Getting the cardinality out through a post-combine step:

    (->> (m/digest cardinality/hll)
         (t/post-combine #(q/point-count %))
         (t/tesser [[1 2 2 3 3 3 3 3 3 3 3]]))
    ; => 3

I want to emphasize that depending on the size of your data, its
distribution, and the number of digests you want to compute, you may need
different digest algorithms and widely varying tuning parameters. Until we
have a better grasp of the space/error tradeoffs here, I won't choose
defaults for you.

source raw docstring

fuse-matrix^clj

(fuse-matrix fold keymap & [downstream])

Given:

A function like covariance that takes two functions of an input and yields a fold, and
A map of key names to functions that extract values for those keys from an input,

pairwise-matrix computes that fold over each pair of keys, returning a map of name pairs to the result of that pairwise fold over the inputs. You can think of this like an N^2 version of fuse.

Given:

1. A function like `covariance` that takes two functions of an input and
   yields a fold, and
2. A map of key names to functions that extract values for
   those keys from an input,

pairwise-matrix computes that fold over each *pair* of keys, returning a map
of name pairs to the result of that pairwise fold over the inputs. You can
think of this like an N^2 version of `fuse`.

source raw docstring

mean^clj

(mean)

(mean fold__3373__auto__)

Finds the arithmetic mean of numeric inputs.

Finds the arithmetic mean of numeric inputs.

source raw docstring

standard-deviation^clj

(standard-deviation & [f])

Estimates the standard deviation of numeric inputs.

Estimates the standard deviation of numeric inputs.

source raw docstring

sum^clj

(sum)

(sum fold__3373__auto__)

Finds the sum of numeric elements.

Finds the sum of numeric elements.

source raw docstring

variance^clj

(variance)

(variance fold__3373__auto__)

Unbiased variance estimation. Given numeric inputs, returns their variance.

Unbiased variance estimation. Given numeric inputs, returns their
variance.

source raw docstring

cljdoc is a website building & hosting documentation for Clojure/Script libraries

Keyboard shortcuts Report a problem cljdoc on GitHub

× close

tesser.math

correlationclj

correlation+countclj

correlation+count-matrixclj

correlation-matrixclj

covarianceclj

covariance-matrixclj

digestclj

fuse-matrixclj

meanclj

standard-deviationclj

sumclj

varianceclj

correlation^clj

correlation+count^clj

correlation+count-matrix^clj

correlation-matrix^clj

covariance^clj

covariance-matrix^clj

digest^clj

fuse-matrix^clj

mean^clj

standard-deviation^clj

sum^clj

variance^clj