Liking cljdoc? Tell your friends :D

jepsen.random

Pluggable generation of random values.

Pluggable

First, randomness should be pluggable. In normal tests, standard Clojure (rand-int) and friends are just fine. But in tests, it's nice if those can be replaced by a deterministic seed. When running in a hypervisor like Antithesis, you want to draw entropy from a special SDK, so it can intentionally send you down interesting paths.

Fast

Second, it should be reasonably fast. We'd like to ideally generate ~100 K ops/sec, and each operation might need to draw, say, 10 random values, which is 1 M values/sec. Basic Clojure (rand-int 5) on my ThreadRipper takes ~37 ns. Clojure's data.generators takes ~35 ns. Our thread-local implementation, backed by a LXM splittable random, takes just ~33 ns.

Thread-safe

We want everyone in the Jepsen universe to be able to draw random values from this namespace without coordinating. This implies generating values should be thread-safe.

Stateful

This namespace must be stateful. We'd like callers to simply be able to call (r/int 5) and get 2.

Pure, splittable random seeds ala test.check are nice, but they a.) come with a performance penalty, and b.) require threading that random state through essentially every function call and return. This is not only complex, but adds additional destructuring overhead at each call boundary.

The main advantage of stateful random generators is determinism across threads, but this is not a major concern in Jepsen. In normal test runs, we don't care about reproducibility. In Antithesis, the entire thread schedule is deterministic, so we're free to share state across threads and trust that Antithesis Will Take Care Of It. In tests, we're generally drawing entropy from a single thread. It'd be nice to have thread-safe random generators, but it's not critical.

Determinism

In single-threaded contexts, we want to be able to seed randomness and have reproducible tests. Doing this across threads is not really important--if we were being rigorous we could thread a splittable random down through every thread spawned, but there's a LOT of threaded code in Jepsen and it doesn't all know about us. More to the point, our multithreaded code is usually a.) non-random, or b.) doing IO, which we can't control. Having determinism for a single thread gets us a reasonable 'bang for our buck'.

Special Distributions

Jepsen needs some random things that aren't well supported by the normal java.util.Random, clojure.core, or data.generators functions. In particular, we like to do:

  1. Zipfian distributions: lots of small things, but sometimes very large things.
  2. Weighted choices: 90% reads, 5% writes, 5% deletes.
  3. Special values: over-represent maxima, minima, and zero, to stress codepaths that might treat them differently.

Usage

Here are common Clojure functions and their equivalents in this namespace:

rand rand/double rand-int rand/long rand-nth rand/nth shuffle rand/shuffle

You can also generate values from common distributions:

rand/bool Returns true or false, optionally with a probability rand/exp Exponential distribution rand/geometric Geometric distribution rand/zipf Zipfian distribution rand/weighted Discrete values with given weights

You can take random permutations and subsets (really, ordered prefixes of permutations) of collections with:

rand/shuffle rand/nonempty-subset

There are two macros for randomly branching control flow:

rand/branch rand/weighted-branch

To re-bind randomness to a specifically seeded RNG, use:

(jepsen.random/with-seed 5 (jepsen.random/long) ; Returns the same value every time (call-stuff-using-jepsen.random ...) ; This holds for the whole body

This changes a global variable jepsen.random/rng and is NOT THREAD SAFE. Do not use with-seed concurrently. It's fine to spawn threads within the body, but if those threads are spawned in a nondeterministic order, then their calls to jepsen.random will also be nondeterministic.

Pluggable generation of random values.

## Pluggable

First, randomness should be pluggable. In normal tests, standard Clojure
`(rand-int)` and friends are just fine. But in tests, it's nice if those can
be replaced by a deterministic seed. When running in a hypervisor like
Antithesis, you want to draw entropy from a special SDK, so it can
intentionally send you down interesting paths.

## Fast

Second, it should be *reasonably* fast. We'd like to ideally generate ~100 K
ops/sec, and each operation might need to draw, say, 10 random values, which
is 1 M values/sec. Basic Clojure (rand-int 5) on my ThreadRipper takes ~37
ns. Clojure's data.generators takes ~35 ns. Our thread-local implementation,
backed by a LXM splittable random, takes just ~33 ns.

## Thread-safe

We want everyone in the Jepsen universe to be able to draw random values from
this namespace without coordinating. This implies generating values should be
thread-safe.

## Stateful

This namespace must be stateful. We'd like callers to simply be able to call
`(r/int 5)` and get 2.

Pure, splittable random seeds ala `test.check` are nice, but they a.) come
with a performance penalty, and b.) require threading that random state
through essentially every function call and return. This is not only complex,
but adds additional destructuring overhead at each call boundary.

The main advantage of stateful random generators is determinism across
threads, but this is not a major concern in Jepsen. In normal test runs, we
don't care about reproducibility. In Antithesis, the entire thread schedule
is deterministic, so we're free to share state across threads and trust that
Antithesis Will Take Care Of It. In tests, we're generally drawing entropy
from a single thread. It'd be *nice* to have thread-safe random generators,
but it's not critical.

## Determinism

In single-threaded contexts, we want to be able to seed randomness and have
reproducible tests. Doing this across threads is not really important--if we
were being rigorous we could thread a splittable random down through every
thread spawned, but there's a LOT of threaded code in Jepsen and it doesn't
all know about us. More to the point, our multithreaded code is usually a.)
non-random, or b.) doing IO, which we can't control. Having determinism for a
single thread gets us a reasonable 'bang for our buck'.

## Special Distributions

Jepsen needs some random things that aren't well supported by the normal
java.util.Random, clojure.core, or data.generators functions. In particular,
we like to do:

1. Zipfian distributions: lots of small things, but sometimes very
   large things.
2. Weighted choices: 90% reads, 5% writes, 5% deletes.
3. Special values: over-represent maxima, minima, and zero, to stress
   codepaths that might treat them differently.

## Usage

Here are common Clojure functions and their equivalents in this namespace:

  rand        rand/double
  rand-int    rand/long
  rand-nth    rand/nth
  shuffle     rand/shuffle

You can also generate values from common distributions:

  rand/bool       Returns true or false, optionally with a probability
  rand/exp        Exponential distribution
  rand/geometric  Geometric distribution
  rand/zipf       Zipfian distribution
  rand/weighted   Discrete values with given weights

You can take random permutations and subsets (really, ordered prefixes of
permutations) of collections with:

  rand/shuffle
  rand/nonempty-subset

There are two macros for randomly branching control flow:

  rand/branch
  rand/weighted-branch

To re-bind randomness to a specifically seeded RNG, use:

(jepsen.random/with-seed 5
  (jepsen.random/long)                  ; Returns the same value every time
  (call-stuff-using-jepsen.random ...)  ; This holds for the whole body

This changes a global variable `jepsen.random/rng` and is NOT THREAD SAFE. Do
not use `with-seed` concurrently. It's fine to spawn threads within the body,
but if those threads are spawned in a nondeterministic order, then their
calls to jepsen.random will also be nondeterministic.
raw docstring

all-zero-doubles?clj

(all-zero-doubles? xs)

Takes a double array and returns true iff it contains only 0.0. We use this as a sentinel value for weight arrays.

Takes a double array and returns true iff it contains only 0.0. We use this
as a sentinel value for weight arrays.
raw docstring

boolclj

(bool)
(bool p)

Generates a boolean. With no arguments, has a 50% chance of being true. Optionally takes a probability (in [0,1]) of being true.

Generates a boolean. With no arguments, has a 50% chance of being true.
Optionally takes a probability (in [0,1]) of being true.
raw docstring

branchcljmacro

(branch & forms)

Like a random cond: takes several forms and evaluates one of them at random. With zero arguments, returns nil.

Like a random `cond`: takes several forms and evaluates one of them at
random. With zero arguments, returns nil.
raw docstring

clojure-randomclj

(clojure-random)

A random source that uses Clojure's built-ins.

A random source that uses Clojure's built-ins.
raw docstring

data-generators-randomclj

(data-generators-random)

A random source that uses data.generators.

A random source that uses data.generators.
raw docstring

doubleclj

(double)
(double upper)
(double lower upper)

Generates a uniformly distributed double in [lower, upper).

Generates a uniformly distributed double in [lower, upper).
raw docstring

double-weighted-indexclj

(double-weighted-index weights)
(double-weighted-index total-weight weights)

Takes a total weight and an array of double weights for a weighted discrete distribution. Generates a random long index into those weights, with probability proportionate to that index's weight. Returns -1 when total-weight is 0.

Note that total-weight is optional and should always be (reduce + weights). It's only an argument so you can sum once, rather than every call.

Takes a total weight and an array of double weights for a weighted discrete
distribution. Generates a random long index into those weights, with
probability proportionate to that index's weight. Returns -1 when
total-weight is 0.

Note that `total-weight` is optional and should always be (reduce + weights).
It's only an argument so you can sum once, rather than every call.
raw docstring

expclj

(exp lambda)

Generates a exponentially distributed random double with rate parameter lambda.

Generates a exponentially distributed random double with rate parameter
lambda.
raw docstring

geometricclj

(geometric p)

Generates a geometrically distributed random long with probability p. Specifically, these approximate the number of independent Bernoulli trial failures before the first success, supported on 0, 1, 2, .... We do this because--as with all our discrete distributions--we often want to use these values as indices into vectors, which are zero-indexed.

Generates a geometrically distributed random long with probability p.
Specifically, these approximate the number of independent Bernoulli trial
failures before the first success, supported on 0, 1, 2, .... We do this
because--as with all our discrete distributions--we often want to use these
values as indices into vectors, which are zero-indexed.
raw docstring

longclj

(long)
(long upper)
(long lower upper)

Generates a uniformly distributed primitive long in [lower, upper)

Generates a uniformly distributed primitive long in [lower, upper)
raw docstring

nonempty-subsetclj

(nonempty-subset coll)

A non-empty prefix of a permutation of the given collection. Lengths are uniformly distributed. This is technically not a subset; if there are duplicates in the input collection, there may be duplicates in the output; nor is the output a set. Still, we use this constantly for subset-like things, like 'take a random collection of some nodes from the test'.

Returns nil if collection is empty. Otherwise, returns at least singleton prefices, and up to full permutations of coll itself.

A non-empty prefix of a permutation of the given collection. Lengths are
uniformly distributed. This is technically not a subset; if there are
duplicates in the input collection, there may be duplicates in the output;
nor is the output a set. Still, we use this constantly for subset-like
things, like 'take a random collection of some nodes from the test'.

Returns nil if collection is empty. Otherwise, returns at least singleton
prefices, and up to full permutations of coll itself.
raw docstring

nthclj

(nth coll)

Like clojure's rand-nth, but uses our RNG.

Like clojure's rand-nth, but uses our RNG.
raw docstring

nth-emptyclj

(nth-empty coll)

Like rand-nth, but returns nil for empty collections.

Like rand-nth, but returns `nil` for empty collections.
raw docstring

rngclj

This is our default implementation of randomness. We don't use a dynamic var here because it turns out to add significant overhead (~30 ns), roughly halving speed, and because when you're choosing a different random implementation, you want it everywhere, not just in one thread tree.

To select a different random generator, use

(with-redefs [r/rng (r/thread-local-random some-seed)]
  ...)
This is our default implementation of randomness. We don't use a dynamic var
here because it turns out to add significant overhead (~30 ns), roughly
halving speed, and because when you're choosing a different random
implementation, you want it *everywhere*, not just in one thread tree.

To select a different random generator, use

    (with-redefs [r/rng (r/thread-local-random some-seed)]
      ...)
raw docstring

shuffleclj

(shuffle coll)

Like Clojure shuffle, but based on our RNG. We use a Fisher-Yates shuffle here; see http://en.wikipedia.org/wiki/Fisher–Yates_shuffle#The_modern_algorithm

Like Clojure shuffle, but based on our RNG. We use a Fisher-Yates shuffle
here; see
http://en.wikipedia.org/wiki/Fisher–Yates_shuffle#The_modern_algorithm
raw docstring

thread-local-randomclj

(thread-local-random)
(thread-local-random seed)

Constructs a ThreadLocalRandom. Right now we use an L64X128MixRandom, and split off a fresh instance for every thread that asks for entropy.

Constructs a ThreadLocalRandom. Right now we use an L64X128MixRandom, and
split off a fresh instance for every thread that asks for entropy.
raw docstring

weightedclj

(weighted choices)

Picks a random selection out of several weighted choices. Takes a map of values to weights, like so:

{:foo 5 :bar 3 :baz 2}

This returns :foo half of the time, :bar 30% of the time, and :baz 20% of the time. This is slow; see weighted-choice-fn for a faster variant.

Picks a random selection out of several weighted choices. Takes a map of
values to weights, like so:

  {:foo 5
   :bar 3
   :baz 2}

This returns :foo half of the time, :bar 30% of the time, and :baz 20% of the
time. This is slow; see weighted-choice-fn for a faster variant.
raw docstring

weighted-branchcljmacro

(weighted-branch & branches)

Like branch, but where the probability of each branch is proportionate to its weight. Like weighted, but since this is a macro, it only evaluates a single branch, not all of them. For example, to print :x 30% of the time, and :y 70%...

(weighted-branch 3 (prn :x) 7 (prn :y))

Weights must be run-time constant, but can be arbitrary expressions. Weights are evaluated on the first invocation of the macro's code, and cached thereafter. This makes weighted-branch significantly faster than weighted.

Like `branch`, but where the probability of each branch is proportionate to
its weight. Like `weighted`, but since this is a macro, it only evaluates a
single branch, not all of them. For example, to print :x 30% of the time, and
:y 70%...

(weighted-branch
3 (prn :x)
7 (prn :y))

Weights must be run-time constant, but can be arbitrary expressions. Weights
are evaluated on the first invocation of the macro's code, and cached
thereafter. This makes `weighted-branch` significantly faster than
`weighted`.
raw docstring

weighted-fnclj

(weighted-fn wcs)

Like weighted-choice, but returns a function which, when invoked with no arguments, returns a random value. This version is significantly faster than weighted-choice, as it can pre-compute key values.

Like weighted-choice, but returns a *function* which, when invoked with no
arguments, returns a random value. This version is significantly faster than
weighted-choice, as it can pre-compute key values.
raw docstring

with-rngcljmacro

(with-rng rng & body)

Evaluates body with the given random number generator. Not thread-safe; takes effect globally.

Evaluates body with the given random number generator. Not thread-safe;
takes effect globally.
raw docstring

with-seedcljmacro

(with-seed seed & body)

Evaluates body with the random generator re-bound to a determistic PRNG with the given seed. Not thread-safe; takes effect globally.

Evaluates body with the random generator re-bound to a determistic PRNG with
the given seed. Not thread-safe; takes effect globally.
raw docstring

zipfclj

(zipf n)
(zipf skew n)

Selects a Zipfian-distributed integer in [0, n) with a given skew factor. Adapted from the rejection sampling technique in https://jasoncrease.medium.com/rejection-sampling-the-zipf-distribution-6b359792cffa.

Note that the standard Zipfian ranges from (0, n], but you almost always want to use a Zipfian with a zero-indexed collection, so we emit [0, n) instead. Add one if you want the standard Zipfian.

Selects a Zipfian-distributed integer in [0, n) with a given skew factor.
Adapted from the rejection sampling technique in
https://jasoncrease.medium.com/rejection-sampling-the-zipf-distribution-6b359792cffa.

Note that the standard Zipfian ranges from (0, n], but you almost always want
to use a Zipfian with a zero-indexed collection, so we emit [0, n) instead.
Add one if you want the standard Zipfian.
raw docstring

zipf-b-inverse-cdfclj

(zipf-b-inverse-cdf skew t p)

Inverse cumulative distribution function for the zipfian bounding function used in zipf.

Inverse cumulative distribution function for the zipfian bounding function
used in `zipf`.
raw docstring

zipf-default-skewclj

When we choose zipf-distributed things, what skew do we generally pick? The algorithm has a singularity at 1, so we choose a slightly larger value.

When we choose zipf-distributed things, what skew do we generally pick? The
algorithm has a singularity at 1, so we choose a slightly larger value.
raw docstring

cljdoc builds & hosts documentation for Clojure/Script libraries

Keyboard shortcuts
Ctrl+kJump to recent docs
Move to previous article
Move to next article
Ctrl+/Jump to the search field
× close