Liking cljdoc? Tell your friends :D

tesser.math

Folds over numbers! Calculate sums, means, variance, standard deviation, covariance and linear correlations, and matrices thereof, plus quantiles and histograms estimates backed by probabilistic QDigests.

Folds over numbers! Calculate sums, means, variance, standard deviation,
covariance and linear correlations, and matrices thereof, plus quantiles and
histograms estimates backed by probabilistic QDigests.
raw docstring

correlationclj

(correlation & args)

Like correlation+count, but only returns the correlation.

Like correlation+count, but only returns the correlation.
sourceraw docstring

correlation+countclj

(correlation+count fx fy)
(correlation+count fx fy fold__3373__auto__)

Given two functions: (fx input) and (fy input), each of which returns a number, estimates the unbiased linear correlation coefficient between fx and fy over inputs. Ignores any records where fx or fy are nil. If there are no records with values for fx and fy, the correlation is nil. See http://mathworld.wolfram.com/CorrelationCoefficient.html.

This function returns a map of correlation and count, like

{:correlation 0.34 :count 142}

which is useful for significance testing.

Given two functions: (fx input) and (fy input), each of which returns a
number, estimates the unbiased linear correlation coefficient between fx and
fy over inputs. Ignores any records where fx or fy are nil. If there are no
records with values for fx and fy, the correlation is nil. See
http://mathworld.wolfram.com/CorrelationCoefficient.html.

This function returns a map of correlation and count, like

    {:correlation 0.34 :count 142}

which is useful for significance testing.
sourceraw docstring

correlation+count-matrixclj

(correlation+count-matrix & args)

Given a map of key names to functions that extract values for those keys from an input, computes the correlations for each of the n^2 key pairs, returning a map of name pairs to the their correlations and counts. See correlation+count. For example:

(t/correlation-matrix {:name-length #(.length (:name %))
                      :age         :age
                      :num-cats    (comp count :cats)})

will, when executed, returns a map like

{[:name-length :age]      {:count 150 :correlation 0.56}
 [:name-length :num-cats] {:count 150 :correlation 0.95}
 ...}
Given a map of key names to functions that extract values for those keys
from an input, computes the correlations for each of the n^2 key
pairs, returning a map of name pairs to the their correlations and counts.
See correlation+count. For example:

    (t/correlation-matrix {:name-length #(.length (:name %))
                          :age         :age
                          :num-cats    (comp count :cats)})

will, when executed, returns a map like

    {[:name-length :age]      {:count 150 :correlation 0.56}
     [:name-length :num-cats] {:count 150 :correlation 0.95}
     ...}
sourceraw docstring

correlation-matrixclj

(correlation-matrix & args)

Like correlation+count-matrix, but returns just correlations coefficients instead of maps of :correlation and :count.

Like correlation+count-matrix, but returns just correlations coefficients
instead of maps of :correlation and :count.
sourceraw docstring

covarianceclj

(covariance fx fy)
(covariance fx fy fold__3373__auto__)

Given two functions of an input (fx input) and (fy input), each of which returns a number, estimates the unbiased covariance of those functions over inputs.

Ignores any inputs where (fx input) or (fy input) are nil. If no inputs have both x and y, returns nil.

Given two functions of an input `(fx input)` and `(fy input)`, each of which
returns a number, estimates the unbiased covariance of those functions over
inputs.

Ignores any inputs where `(fx input)` or `(fy input)` are nil. If no inputs
have both x and y, returns nil.
sourceraw docstring

covariance-matrixclj

(covariance-matrix & args)

Given a map of key names to functions that extract values for those keys from an input, computes the covariance for each of the n^2 key pairs, returning a map of name pairs to the their covariance. For example:

(t/covariance-matrix {:name-length #(.length (:name %))
                      :age         :age
                      :num-cats    (comp count :cats)})
Given a map of key names to functions that extract values for those keys
from an input, computes the covariance for each of the n^2 key pairs,
returning a map of name pairs to the their covariance. For example:

    (t/covariance-matrix {:name-length #(.length (:name %))
                          :age         :age
                          :num-cats    (comp count :cats)})
sourceraw docstring

digestclj

(digest digest-generator)
(digest digest-generator fold__3373__auto__)

You've got a set of numeric inputs and want to know their quantiles distribution, histogram, etc. This fold takes numeric inputs and produces a statistical estimate of their distribution.

digest takes a function that returns a tesser.quantiles/Digest. The fold returns an instance of that digest.

For example, to compute an HDRHistogram over both positive and negative doubles (or longs, rationals, etc):

Compute a digest using e.g.

(def digest (->> (m/digest q/hdr-histogram)
                 (t/tesser [[1 1 1 1 1 1 2 2 2 3 3 4 5]])))
; => #<DoubleHistogram ...>

To specify options for the digest, just use partial or (fn [] ...)

(m/digest (partial q/hdr-histogram {:significant-value-digits 4
                                    :highest-to-lowest-value-ratio 1e6}))

DoubleHistogram, like many quantile estimators, only works over positive values. To cover positives and negatives together, use tesser.quantiles/dual:

(m/digest #(q/dual q/hdr-histogram {:significant-value-digits 2}))

Once you've computed a digest, you can find a particular quantile using tesser.quantiles/quantile

(q/quantile digest 0)   ; => 1.0
(q/quantile digest 0.5) ; => 1.0
(q/quantile digest 4/5) ; => 2.0009765625
(q/quantile digest 1)   ; => 3.0009765625

The total number of points in the sample:

(q/point-count digest) ; => 5

Minima and maxima:

(q/min digest) ; => 1.0
(q/max digest) ; => 3.0009765625

Or find the distribution of values less than or equal to each point, with resolution given by the internal granularity of the digest:

(q/distribution digest)
; => ([1.0 3] [2.0009765625 1] [3.0009765625 1])

(q/cumulative-distribution digest)
; => ([1.0 3] [2.0009765625 4] [3.0009765625 5])

You don't have to return the whole digest; any of these derivative operations can be merged directly into the fold via tesser.core/post-combine.

(->> (m/digest q/hdr-histogram)
     (t/post-combine #(q/quantile % 1/2))
     (t/tesser [[1 2 2 3 3 3 3 3 3 3 3]]))
; => 3.0009765625

You may also use tesser.cardinality/hll for estimating the cardinality of a set. HLL+ uses a probabilistic data-structure to compute set cardinality using very little memory with accuracy tradeoffs.

The HLL digest can be used like the above mentioned histograms:

(def digest (->> (m/digest cardinality/hll)
                 (t/tesser [[1 1 1 1 1 1 2 2 2 3 3 4 5]])))
; => #<HyperLogLogPlus...>

Getting the cardinality out through a post-combine step:

(->> (m/digest cardinality/hll)
     (t/post-combine #(q/point-count %))
     (t/tesser [[1 2 2 3 3 3 3 3 3 3 3]]))
; => 3

I want to emphasize that depending on the size of your data, its distribution, and the number of digests you want to compute, you may need different digest algorithms and widely varying tuning parameters. Until we have a better grasp of the space/error tradeoffs here, I won't choose defaults for you.

You've got a set of numeric inputs and want to know their quantiles
distribution, histogram, etc. This fold takes numeric inputs and
produces a statistical estimate of their distribution.

`digest` takes a function that returns a `tesser.quantiles/Digest`. The fold
returns an instance of that digest.

For example, to compute an HDRHistogram over both positive and negative
doubles (or longs, rationals, etc):

Compute a digest using e.g.

    (def digest (->> (m/digest q/hdr-histogram)
                     (t/tesser [[1 1 1 1 1 1 2 2 2 3 3 4 5]])))
    ; => #<DoubleHistogram ...>

To specify options for the digest, just use partial or (fn [] ...)

    (m/digest (partial q/hdr-histogram {:significant-value-digits 4
                                        :highest-to-lowest-value-ratio 1e6}))

DoubleHistogram, like many quantile estimators, only works over positive
values. To cover positives and negatives together, use
`tesser.quantiles/dual`:

    (m/digest #(q/dual q/hdr-histogram {:significant-value-digits 2}))

Once you've computed a digest, you can find a particular quantile using
`tesser.quantiles/quantile`

    (q/quantile digest 0)   ; => 1.0
    (q/quantile digest 0.5) ; => 1.0
    (q/quantile digest 4/5) ; => 2.0009765625
    (q/quantile digest 1)   ; => 3.0009765625

The total number of points in the sample:

    (q/point-count digest) ; => 5

Minima and maxima:

    (q/min digest) ; => 1.0
    (q/max digest) ; => 3.0009765625

Or find the distribution of values less than or equal to each point, with
resolution given by the internal granularity of the digest:

    (q/distribution digest)
    ; => ([1.0 3] [2.0009765625 1] [3.0009765625 1])

    (q/cumulative-distribution digest)
    ; => ([1.0 3] [2.0009765625 4] [3.0009765625 5])

You don't have to return the whole digest; any of these derivative
operations can be merged directly into the fold via
`tesser.core/post-combine`.

    (->> (m/digest q/hdr-histogram)
         (t/post-combine #(q/quantile % 1/2))
         (t/tesser [[1 2 2 3 3 3 3 3 3 3 3]]))
    ; => 3.0009765625

You may also use `tesser.cardinality/hll` for estimating the cardinality of a
set. HLL+ uses a probabilistic data-structure to compute set cardinality using
very little memory with accuracy tradeoffs.

The HLL digest can be used like the above mentioned histograms:

    (def digest (->> (m/digest cardinality/hll)
                     (t/tesser [[1 1 1 1 1 1 2 2 2 3 3 4 5]])))
    ; => #<HyperLogLogPlus...>

Getting the cardinality out through a post-combine step:

    (->> (m/digest cardinality/hll)
         (t/post-combine #(q/point-count %))
         (t/tesser [[1 2 2 3 3 3 3 3 3 3 3]]))
    ; => 3

I want to emphasize that depending on the size of your data, its
distribution, and the number of digests you want to compute, you may need
different digest algorithms and widely varying tuning parameters. Until we
have a better grasp of the space/error tradeoffs here, I won't choose
defaults for you.
sourceraw docstring

fuse-matrixclj

(fuse-matrix fold keymap & [downstream])

Given:

  1. A function like covariance that takes two functions of an input and yields a fold, and
  2. A map of key names to functions that extract values for those keys from an input,

pairwise-matrix computes that fold over each pair of keys, returning a map of name pairs to the result of that pairwise fold over the inputs. You can think of this like an N^2 version of fuse.

Given:

1. A function like `covariance` that takes two functions of an input and
   yields a fold, and
2. A map of key names to functions that extract values for
   those keys from an input,

pairwise-matrix computes that fold over each *pair* of keys, returning a map
of name pairs to the result of that pairwise fold over the inputs. You can
think of this like an N^2 version of `fuse`.
sourceraw docstring

meanclj

(mean)
(mean fold__3373__auto__)

Finds the arithmetic mean of numeric inputs.

Finds the arithmetic mean of numeric inputs.
sourceraw docstring

standard-deviationclj

(standard-deviation & [f])

Estimates the standard deviation of numeric inputs.

Estimates the standard deviation of numeric inputs.
sourceraw docstring

sumclj

(sum)
(sum fold__3373__auto__)

Finds the sum of numeric elements.

Finds the sum of numeric elements.
sourceraw docstring

varianceclj

(variance)
(variance fold__3373__auto__)

Unbiased variance estimation. Given numeric inputs, returns their variance.

Unbiased variance estimation. Given numeric inputs, returns their
variance.
sourceraw docstring

cljdoc is a website building & hosting documentation for Clojure/Script libraries

× close