Liking cljdoc? Tell your friends :D

jepsen.generator

In a Nutshell

Generators tell Jepsen what to do during a test. Generators are purely functional objects which support two functions: op and update. op produces operations for Jepsen to perform: it takes a test and context object, and yields:

nil if the generator is exhausted
:pending if the generator doesn't know what to do yet
[op, gen'], where op' is the next operation this generator would like to execute, and gen' is the state of the generator that would result if op were evaluated. Ops must be a jepsen.history.Op.

update allows generators to evolve as events occur--for instance, when an operation is invoked or completed. For instance, update allows a generator to emit read operations until at least one succeeds.

Maps, sequences, and functions are all generators, allowing you to write all kinds of generators using existing Clojure tooling. This namespace provides additional transformations and combinators for complex transformations.

Migrating From Classic Generators

The old jepsen.generator namespace used mutable state everywhere, and was plagued by race conditions. jepsen.generator.pure provides a similar API, but its purely functional approach has several advantages:

Pure generators shouldn't deadlock or throw weird interrupted exceptions. These issues have plagued classic generators; I've done my best to make incremental improvements, but the problems seem unavoidable.
Pure generators can respond to completion operations, which means you can write things like 'keep trying x until y occurs' without sharing complex mutable state with clients.
Sequences are pure generators out of the box; no more juggling gen/seq wrappers. Use existing Clojure sequence transformations to build complex behaviors.
Pure generators provide an explicit 'I don't know yet' state, which is useful when you know future operations might come, but don't know when or what they are.
Pure generators do not rely on dynamic state; their arguments are all explicit. They are deterministically testable.
Pure generators allow new combinators like (any gen1 gen2 gen3), which returns the first operation from any of several generators; this approach was impossible in classic generators.
Pure generators have an explicit, deterministic model of time, rather than relying on thread scheduler constructs like Thread/sleep.
Certain constructs, like gen/sleep and gen/log in classic generators, could not be composed in sequences readily; pure generators provide a regular composition language.
Constructs like gen/each, which were fragile in classic generators and relied on macro magic, are now simple functions.
Pure generators are significantly simpler to implement and test than classic generators, though they do require careful thought.

There are some notable tradeoffs, including:

Pure generators perform all generator-related computation on a single thread, and create additional garbage due to their pure functional approach. However, realistic generator tests yield rates over 20,000 operations/sec, which seems more than sufficient for Jepsen's purposes.
The API is subtly different. In my experience teaching hundreds of engineers to write Jepsen tests, users typically cite the generator API as one of Jepsen's best features. I've tried to preserve as much of its shape as possible, while sanding off rough edges and cleaning up inconsistencies. Some functions have the same shape but different semantics: stagger, for instance, now takes a total rather than a *per-thread* rate. Some infrequently-used generators have not been ported, to keep the API smaller.
update and contexts are not a full replacement for mutable state. We think they should suffice for most practical uses, and controlled use of mutable shared state is still possible.
You can (and we encourage!) the use of impure functions, e.g. randomness, as impure generators. However, it's possible I haven't fully thought through the implications of this choice; the semantics may evolve over time.

When migrating old to new generators, keep in mind:

gen/seq and gen/seq-all are unnecessary; any Clojure sequence is already a pure generator. gen/seq didn't just turn sequences into generators; it also ensured that only one operation was consumed from each. This is now explicit: use (map gen.pure/once coll) instead of (gen/seq coll), andcollinstead of(gen/seq-all coll). Where the sequence is of one-shot generators already, there's no need to wrap elements with gen/once: instead of(gen/seq [{:f :read} {:f :write}])`), you can write [{:f :read} {:f :write}] directly.
Functions return generators, not just operations, which makes it easier to express sequences of operations like 'pick the current leader, isolate it, then kill that same node, then restart that node.' Use #(gen/once {:f :write, :value (rand-int 5)) instead of (fn [] {:f :write, :value (rand-int 5)}).
stagger, delay, etc. now take total rates, rather than the rate per thread.
delay-til is gone. It should come back; I just haven't written it yet. Defining what exactly delay-til means is... surprisingly tricky.
each used to mean 'on each process', but in practice what users generally wanted was 'on each thread'--on each process had a tendency to result in unexpected infinite loops when ops crashed. each-thread is probably what you want instead.
Instead of using jepsen.generator/threads, etc, use helper functions like some-free-process.
Functions now take zero args (f) or a test and context map (f test ctx), rather than (f test process).
Maps are one-shot generators by default, rather than emitting themselves indefinitely. This streamlines the most common use cases:
- (map (fn [x] {:f :write, :value x}) (range)) produces a series of distinct, monotonically increasing writes
- (fn [] {:f :inc, :value (rand-nth 5)}) produces a series of random increments, rather than a series where every value is the same (randomly selected) value.
When migrating, you can drop most uses of gen/once around maps, and introduce (repeat ...) where you want to repeat an operation more than once.

In More Detail

A Jepsen history is a list of operations--invocations and completions. A generator's job is to specify what invocations to perform, and when. In a sense, a generator becomes a history as Jepsen incrementally applies it to a database.

Naively, we might define a history as a fixed sequence of invocations to perform at certain times, but this is impossible: we have only a fixed set of threads, and they may not be free to perform our operations. A thread must be free in order to perform an operation.

Time, too, is a dependency. When we schedule an operation to occur once per second, we mean that only once a certain time has passed can the next operation begin.

There may also be dependencies between threads. Perhaps only after a nemesis has initiated a network partition does our client perform a particular read. We want the ability to hold until a certain operation has begun.

Conceptually, then, a generator is a graph of events, some of which have not yet occurred. Some events are invocations: these are the operations the generator will provide to clients. Some events are completions: these are provided by clients to the generator. Other events are temporal: a certain time has passed.

This graph has some invocations which are ready to perform. When we have a ready invocation, we apply the invocation using the client, obtain a completion, and apply the completion back to the graph, obtaining a new graph.

By Example

Perform a single read

{:f :read}

Perform a series of random writes:

(fn [] {:f :write, :value (rand-int 5))

Perform 10 random writes. This is regular clojure.core/repeat:

(repeat 10 (fn [] {:f :write, :value (rand-int 5)))

Perform a sequence of 50 unique writes. We use regular Clojure sequence functions here:

(->> (range) (map (fn [x] {:f :write, :value (rand-int 5)})) (take 50))

Write 3, then (possibly concurrently) read:

[{:f :write, :value 3} {:f :read}]

Since these might execute concurrently, the read might not observe the write. To wait for the write to complete first:

(gen/phases {:f :write, :value 3} {:f :read})

Have each thread independently perform a single increment, then read:

(gen/each-thread [{:f :inc} {:f :read}])

Reserve 5 threads for reads, 10 threads for increments, and the remaining threads reset a counter.

(gen/reserve 5 (repeat {:f :read}) 10 (repeat {:f :inc}) (repeat {:f :reset}))

Perform a random mixture of unique writes and reads, randomly timed, at roughly 10 Hz, for 30 seconds:

(->> (gen/mix [(repeat {:f :read}) (map (fn [x] {:f :write, :value x}) (range))]) (gen/stagger 1/10) (gen/time-limit 30))

While that's happening, have the nemesis alternate between breaking and repairing something roughly every 5 seconds:

(->> (gen/mix [(repeat {:f :read}) (map (fn [x] {:f :write, :value x}) (range))]) (gen/stagger 1/10) (gen/nemesis (->> (cycle [{:f :break} {:f :repair}]) (gen/stagger 5))) (gen/time-limit 30))

Follow this by a single nemesis repair (along with an informational log message), wait 10 seconds for recovery, then have each thread perform reads until that thread sees at least one OK operation.

(gen/phases (->> (gen/mix [(repeat {:f :read}) (map (fn [x] {:f :write, :value x}) (range))]) (gen/stagger 1/10) (gen/nemesis (->> (cycle [{:f :break} {:f :repair}]) (gen/stagger 5))) (gen/time-limit 30)) (gen/log "Recovering") (gen/nemesis {:f :repair}) (gen/sleep 10) (gen/log "Final read") (gen/clients (gen/each-thread (gen/until-ok {:f :read}))))

Contexts

A context is a map which provides information about the state of the world to generators. For instance, a generator might need to know the number of threads which will ask it for operations. It can get that number from the context. Users can add their own values to the context map, which allows two generators to share state. When one generator calls another, it can pass a modified version of the context, which allows us to write generators that, say, run two independent workloads, each with their own concurrency and thread mappings.

The standard context mappings, which are provided by Jepsen when invoking the top-level generator, and can be expected by every generator, are defined in jepsen.generator.context. They include some stock fields:

:time           The current Jepsen linear time, in nanoseconds

Additional fields (e.g. :threads, :free-threads, etc) are present for bookkeeping, but should not be interfered with or accessed directly: contexts are performance-sensitive and for optimization reasons their internal structure is somewhat complex. Use the functions all-threads, thread->process, some-free-process, etc. See jepsen.generator.context for these functions, which are also imported here in jepsen.generator.

Fetching an operation

We use (op gen test context) to ask the generator for the next invocation that we can process.

The operation can have three forms:

The generator may return nil, which means the generator is done, and there is nothing more to do. Once a generator does this, it must never return anything other than nil, even if the context changes.
The generator may return :pending, which means there might be more ops later, but it can't tell yet.
The generator may return an operation, in which case:
- If its time is in the past, we can evaluate it now
- If its time is in the future, we wait until either:
  - The time arrives
  - Circumstances change (e.g. we update the generator)

But (op gen test context) returns more than just an operation; it also returns the subsequent state of the generator, if that operation were to be performed. The two are bundled into a tuple.

(op gen test context) => [op gen'] ; known op [:pending gen] ; unsure nil ; exhausted

The analogous operations for sequences are (first) and (next); why do we couple them here? Why not use the update mechanism strictly to evolve state? Because the behavior in sequences is relatively simple: next always moves forward one item, whereas only some updates actually cause systems to move forward. Seqs always do the same thing in response to next, whereas generators may do different things depending on context. Moreover, Jepsen generators are often branched, rather than linearly wrapped, as sequences are, resulting in questions about which branch needs to be updated.

When I tried to strictly separate implementations of (op) and (update), it resulted in every update call attempting to determine whether this particular generator did or did not emit the given invocation event. This is remarkably tricky to do well, and winds up relying on all kinds of non-local assumptions about the behavior of the generators you wrap, and those which wrap you.

Updating a generator

We still want the ability to respond to invocations and completions, e.g. by tracking that information in context variables. Therefore, in addition to (op) returning a new generator, we have a separate function, (update gen test context event), which allows generators to react to changing circumstances.

We invoke an operation (e.g. one that the generator just gave us)
We complete an operation

Updates use a context with a specific relationship to the event:

The context :time is equal to the event :time
The free processes set reflects the state after the event has taken place; e.g. if the event is an invoke, the thread is listed as no longer free; if the event is a completion, the thread is listed as free.
The worker map reflects the process which that thread worker was executing at the time the event occurred.

See jepsen.generator.context for more.

Default implementations

Nil is a valid generator; it ignores updates and always yields nil for operations.

IPersistentMaps are generators which ignore updates and return exactly one operation which looks like the map itself, but with default values for time, process, and type provided based on the context. This means you can write a generator like

{:f :write, :value 2}

and it will generate a single op like

{:type :invoke, :process 3, :time 1234, :f :write, :value 2}

To produce an infinite series of ops drawn from the same map, use

(repeat {:f :write, :value 2}).

Sequences are generators which assume the elements of the sequence are themselves generators. They ignore updates, and return all operations from the first generator in the sequence, then all operations from the second, and so on.

Functions are generators which ignore updates and can take either test and context as arguments, or no args. Functions should be mostly pure, but some creative impurity is probably OK. For instance, returning randomized :values for maps is probably all right. I don't know the laws! What is this, Haskell?

When a function is used as a generator, its return value is used as a generator; that generator is used until exhausted, and then the function is called again to produce a new generator. For instance:

; Produces a series of different random writes, e.g. 1, 5, 2, 3... (fn [] {:f :write, :value (rand-int 5)})

; Alternating write/read ops, e.g. write 2, read, write 5, read, ... (fn [] (map gen/once [{:f :write, :value (rand-int 5)} {:f :read}]))

Promises and delays are generators which ignore updates, yield :pending until realized, then are replaced by whatever generator they contain. Delays are not evaluated until they could produce an op, so you can include them in sequences, phases, etc., and they'll be evaluated only once prior ops have been consumed.

# In a Nutshell

Generators tell Jepsen what to do during a test. Generators are purely
functional objects which support two functions: `op` and `update`. `op`
produces operations for Jepsen to perform: it takes a test and context
object, and yields:

- nil if the generator is exhausted
- :pending if the generator doesn't know what to do yet
- [op, gen'], where op' is the next operation this generator would like to
execute, and `gen'` is the state of the generator that would result if `op`
were evaluated. Ops must be a jepsen.history.Op.

`update` allows generators to evolve as events occur--for instance, when an
operation is invoked or completed. For instance, `update` allows a generator
to emit read operations *until* at least one succeeds.

Maps, sequences, and functions are all generators, allowing you to write all
kinds of generators using existing Clojure tooling. This namespace provides
additional transformations and combinators for complex transformations.

# Migrating From Classic Generators

The old jepsen.generator namespace used mutable state everywhere, and was
plagued by race conditions. jepsen.generator.pure provides a similar API, but
its purely functional approach has several advantages:

- Pure generators shouldn't deadlock or throw weird interrupted exceptions.
  These issues have plagued classic generators; I've done my best to make
  incremental improvements, but the problems seem unavoidable.

- Pure generators can respond to completion operations, which means you can
  write things like 'keep trying x until y occurs' without sharing complex
  mutable state with clients.

- Sequences are pure generators out of the box; no more juggling gen/seq
  wrappers. Use existing Clojure sequence transformations to build complex
  behaviors.

- Pure generators provide an explicit 'I don't know yet' state, which is
  useful when you know future operations might come, but don't know when or
  what they are.

- Pure generators do not rely on dynamic state; their arguments are all
  explicit. They are deterministically testable.

- Pure generators allow new combinators like (any gen1 gen2 gen3), which
  returns the first operation from any of several generators; this approach
  was impossible in classic generators.

- Pure generators have an explicit, deterministic model of time, rather than
  relying on thread scheduler constructs like Thread/sleep.

- Certain constructs, like gen/sleep and gen/log in classic generators, could
  not be composed in sequences readily; pure generators provide a regular
  composition language.

- Constructs like gen/each, which were fragile in classic generators and
  relied on macro magic, are now simple functions.

- Pure generators are significantly simpler to implement and test than
  classic generators, though they do require careful thought.

There are some notable tradeoffs, including:

- Pure generators perform all generator-related computation on a single
  thread, and create additional garbage due to their pure functional approach.
  However, realistic generator tests yield rates over 20,000 operations/sec,
  which seems more than sufficient for Jepsen's purposes.

- The API is subtly different. In my experience teaching hundreds of
  engineers to write Jepsen tests, users typically cite the generator API as
  one of Jepsen's best features. I've tried to preserve as much of its shape
  as possible, while sanding off rough edges and cleaning up inconsistencies.
  Some functions have the same shape but different semantics: `stagger`, for
  instance, now takes a *total* rather than a `*per-thread*` rate. Some
  infrequently-used generators have not been ported, to keep the API smaller.

- `update` and contexts are not a full replacement for mutable state. We
  think they should suffice for most practical uses, and controlled use of
  mutable shared state is still possible.

- You can (and we encourage!) the use of impure functions, e.g. randomness,
  as impure generators. However, it's possible I haven't fully thought
  through the implications of this choice; the semantics may evolve over
  time.

When migrating old to new generators, keep in mind:

- `gen/seq` and `gen/seq-all` are unnecessary; any Clojure sequence is
  already a pure generator. `gen/seq` didn't just turn sequences into
  generators; it also ensured that only one operation was consumed from each.
  This is now explicit: use `(map gen.pure/once coll)` instead of (gen/seq
  coll)`, and `coll` instead of `(gen/seq-all coll)`. Where the sequence is
  of one-shot generators already, there's no need to wrap elements with
  gen/once: instead of `(gen/seq [{:f :read} {:f :write}])`), you can write
  [{:f :read} {:f :write}] directly.

- Functions return generators, not just operations, which makes it easier to
  express sequences of operations like 'pick the current leader, isolate it,
  then kill that same node, then restart that node.' Use `#(gen/once {:f
  :write, :value (rand-int 5))` instead of `(fn [] {:f :write, :value
  (rand-int 5)})`.

- `stagger`, `delay`, etc. now take total rates, rather than the rate per
  thread.

- `delay-til` is gone. It should come back; I just haven't written it yet.
  Defining what exactly delay-til means is... surprisingly tricky.

- `each` used to mean 'on each process', but in practice what users generally
  wanted was 'on each thread'--on each process had a tendency to result in
  unexpected infinite loops when ops crashed. `each-thread` is probably what
  you want instead.

- Instead of using *jepsen.generator/threads*, etc, use helper functions like
  some-free-process.

- Functions now take zero args (f) or a test and context map (f test ctx),
  rather than (f test process).

- Maps are one-shot generators by default, rather than emitting themselves
  indefinitely. This streamlines the most common use cases:

    - (map (fn [x] {:f :write, :value x}) (range)) produces a series of
      distinct, monotonically increasing writes

    - (fn [] {:f :inc, :value (rand-nth 5)}) produces a series of random
      increments, rather than a series where every value is the *same*
      (randomly selected) value.

  When migrating, you can drop most uses of gen/once around maps, and
  introduce (repeat ...) where you want to repeat an operation more than once.

# In More Detail

A Jepsen history is a list of operations--invocations and completions. A
generator's job is to specify what invocations to perform, and when. In a
sense, a generator *becomes* a history as Jepsen incrementally applies it to
a database.

Naively, we might define a history as a fixed sequence of invocations to
perform at certain times, but this is impossible: we have only a fixed set of
threads, and they may not be free to perform our operations. A thread must be
*free* in order to perform an operation.

Time, too, is a dependency. When we schedule an operation to occur once per
second, we mean that only once a certain time has passed can the next
operation begin.

There may also be dependencies between threads. Perhaps only after a nemesis
has initiated a network partition does our client perform a particular read.
We want the ability to hold until a certain operation has begun.

Conceptually, then, a generator is a *graph* of events, some of which have
not yet occurred. Some events are invocations: these are the operations the
generator will provide to clients. Some events are completions: these are
provided by clients to the generator. Other events are temporal: a certain
time has passed.

This graph has some invocations which are *ready* to perform. When we have a
ready invocation, we apply the invocation using the client, obtain a
completion, and apply the completion back to the graph, obtaining a new
graph.

## By Example

Perform a single read

  {:f :read}

Perform a series of random writes:

  (fn [] {:f :write, :value (rand-int 5))

Perform 10 random writes. This is regular clojure.core/repeat:

  (repeat 10 (fn [] {:f :write, :value (rand-int 5)))

Perform a sequence of 50 unique writes. We use regular Clojure sequence
functions here:

  (->> (range)
       (map (fn [x] {:f :write, :value (rand-int 5)}))
       (take 50))

Write 3, then (possibly concurrently) read:

  [{:f :write, :value 3} {:f :read}]

Since these might execute concurrently, the read might not observe the write.
To wait for the write to complete first:

  (gen/phases {:f :write, :value 3}
              {:f :read})

Have each thread independently perform a single increment, then read:

  (gen/each-thread [{:f :inc} {:f :read}])

Reserve 5 threads for reads, 10 threads for increments, and the remaining
threads reset a counter.

  (gen/reserve 5  (repeat {:f :read})
               10 (repeat {:f :inc})
                  (repeat {:f :reset}))

Perform a random mixture of unique writes and reads, randomly timed, at
roughly 10 Hz, for 30 seconds:

  (->> (gen/mix [(repeat {:f :read})
                 (map (fn [x] {:f :write, :value x}) (range))])
       (gen/stagger 1/10)
       (gen/time-limit 30))

While that's happening, have the nemesis alternate between
breaking and repairing something roughly every 5 seconds:

  (->> (gen/mix [(repeat {:f :read})
                 (map (fn [x] {:f :write, :value x}) (range))])
       (gen/stagger 1/10)
       (gen/nemesis (->> (cycle [{:f :break}
                                 {:f :repair}])
                         (gen/stagger 5)))
       (gen/time-limit 30))

Follow this by a single nemesis repair (along with an informational log
message), wait 10 seconds for recovery, then have each thread perform reads
until that thread sees at least one OK operation.

  (gen/phases (->> (gen/mix [(repeat {:f :read})
                             (map (fn [x] {:f :write, :value x}) (range))])
                   (gen/stagger 1/10)
                   (gen/nemesis (->> (cycle [{:f :break}
                                             {:f :repair}])
                                     (gen/stagger 5)))
                   (gen/time-limit 30))
              (gen/log "Recovering")
              (gen/nemesis {:f :repair})
              (gen/sleep 10)
              (gen/log "Final read")
              (gen/clients (gen/each-thread (gen/until-ok {:f :read}))))

## Contexts

A *context* is a map which provides information about the state of the world
to generators. For instance, a generator might need to know the number of
threads which will ask it for operations. It can get that number from the
*context*. Users can add their own values to the context map, which allows
two generators to share state. When one generator calls another, it can pass
a modified version of the context, which allows us to write generators that,
say, run two independent workloads, each with their own concurrency and
thread mappings.

The standard context mappings, which are provided by Jepsen when invoking the
top-level generator, and can be expected by every generator, are defined in
jepsen.generator.context. They include some stock fields:

    :time           The current Jepsen linear time, in nanoseconds

Additional fields (e.g. :threads, :free-threads, etc) are present for
bookkeeping, but should not be interfered with or accessed directly: contexts
are performance-sensitive and for optimization reasons their internal
structure is somewhat complex. Use the functions `all-threads`,
`thread->process`, `some-free-process`, etc. See jepsen.generator.context for
these functions, which are also imported here in jepsen.generator.

## Fetching an operation

We use `(op gen test context)` to ask the generator for the next invocation
that we can process.

The operation can have three forms:

- The generator may return `nil`, which means the generator is done, and
  there is nothing more to do. Once a generator does this, it must never
  return anything other than `nil`, even if the context changes.
- The generator may return :pending, which means there might be more
  ops later, but it can't tell yet.
- The generator may return an operation, in which case:
  - If its time is in the past, we can evaluate it now
  - If its time is in the future, we wait until either:
    - The time arrives
    - Circumstances change (e.g. we update the generator)

But (op gen test context) returns more than just an operation; it also
returns the *subsequent state* of the generator, if that operation were to be
performed. The two are bundled into a tuple.

(op gen test context) => [op gen']      ; known op
                         [:pending gen] ; unsure
                         nil            ; exhausted

The analogous operations for sequences are (first) and (next); why do we
couple them here? Why not use the update mechanism strictly to evolve state?
Because the behavior in sequences is relatively simple: next always moves
forward one item, whereas only *some* updates actually cause systems to move
forward. Seqs always do the same thing in response to `next`, whereas
generators may do different things depending on context. Moreover, Jepsen
generators are often branched, rather than linearly wrapped, as sequences
are, resulting in questions about *which branch* needs to be updated.

When I tried to strictly separate implementations of (op) and (update), it
resulted in every update call attempting to determine whether this particular
generator did or did not emit the given invocation event. This is
*remarkably* tricky to do well, and winds up relying on all kinds of
non-local assumptions about the behavior of the generators you wrap, and
those which wrap you.

## Updating a generator

We still want the ability to respond to invocations and completions, e.g. by
tracking that information in context variables. Therefore, in addition to
(op) returning a new generator, we have a separate function, (update gen test
context event), which allows generators to react to changing circumstances.

- We invoke an operation (e.g. one that the generator just gave us)
- We complete an operation

Updates use a context with a specific relationship to the event:

- The context :time is equal to the event :time
- The free processes set reflects the state after the event has taken place;
  e.g. if the event is an invoke, the thread is listed as no longer free; if
  the event is a completion, the thread is listed as free.
- The worker map reflects the process which that thread worker was executing
  at the time the event occurred.

See jepsen.generator.context for more.

## Default implementations

Nil is a valid generator; it ignores updates and always yields nil for
operations.

IPersistentMaps are generators which ignore updates and return exactly one
operation which looks like the map itself, but with default values for time,
process, and type provided based on the context. This means you can write a
generator like

  {:f :write, :value 2}

and it will generate a single op like

  {:type :invoke, :process 3, :time 1234, :f :write, :value 2}

To produce an infinite series of ops drawn from the same map, use

  (repeat {:f :write, :value 2}).

Sequences are generators which assume the elements of the sequence are
themselves generators. They ignore updates, and return all operations from
the first generator in the sequence, then all operations from the second, and
so on.

Functions are generators which ignore updates and can take either test and
context as arguments, or no args. Functions should be *mostly* pure, but some
creative impurity is probably OK. For instance, returning randomized :values
for maps is probably all right. I don't know the laws! What is this, Haskell?

When a function is used as a generator, its return value is used as a
generator; that generator is used until exhausted, and then the function is
called again to produce a new generator. For instance:

  ; Produces a series of different random writes, e.g. 1, 5, 2, 3...
  (fn [] {:f :write, :value (rand-int 5)})

  ; Alternating write/read ops, e.g. write 2, read, write 5, read, ...
  (fn [] (map gen/once [{:f :write, :value (rand-int 5)}
                        {:f :read}]))

Promises and delays are generators which ignore updates, yield :pending until
realized, then are replaced by whatever generator they contain. Delays are
not evaluated until they *could* produce an op, so you can include them in
sequences, phases, etc., and they'll be evaluated only once prior ops have
been consumed.

raw docstring

jepsen.generator.context

Generators work with an immutable context that tells them what time it is, what processes are available, what process is executing which thread and vice versa, and so on. We need an efficient, high-performance data structure to track this information. This namespace provides that data structure, and functions to alter it.

Contexts are intended not only for managing generator-relevant state about active threads and so on; they also can store arbitrary contextual information for generators. For instance, generators may thread state between invocations or layers of the generator stack. To do this, contexts also behave like Clojure maps. They have a single special key, :time; all other keys are available for your use.

Generators work with an immutable *context* that tells them what time it is,
what processes are available, what process is executing which thread and vice
versa, and so on. We need an efficient, high-performance data structure to
track this information. This namespace provides that data structure, and
functions to alter it.

Contexts are intended not only for managing generator-relevant state about
active threads and so on; they also can store arbitrary contextual
information for generators. For instance, generators may thread state between
invocations or layers of the generator stack. To do this, contexts *also*
behave like Clojure maps. They have a single special key, :time; all other
keys are available for your use.

raw docstring

jepsen.generator.test

This namespace contains functions for testing generators. See the jepsen.generator-test namespace in the test/ directory for a concrete example of how these functions can be used.

NOTE: While the simulate function is considered stable at this point, the others might still be subject to change -- use with care and expect possible breakage in future releases.

This namespace contains functions for testing generators. See the
`jepsen.generator-test` namespace in the `test/` directory for a concrete
example of how these functions can be used.

NOTE: While the `simulate` function is considered stable at this point, the
others might still be subject to change -- use with care and expect possible
breakage in future releases.

raw docstring

jepsen.generator.translation-table

We burn a lot of time in hashcode and map manipulation for thread names, which are mostly integers 0...n, but sometimes non-integer names like :nemesis. It's nice to be able to represent thread state internally as purely integers. To do this, we compute a one-time translation table which lets us map those names to integers and vice-versa.

We burn a lot of time in hashcode and map manipulation for thread names,
which are mostly integers 0...n, but sometimes non-integer names like
:nemesis. It's nice to be able to represent thread state internally as purely
integers. To do this, we compute a one-time translation table which lets us
map those names to integers and vice-versa.

raw docstring

jepsen.random

Pluggable generation of random values.

Pluggable

First, randomness should be pluggable. In normal tests, standard Clojure (rand-int) and friends are just fine. But in tests, it's nice if those can be replaced by a deterministic seed. When running in a hypervisor like Antithesis, you want to draw entropy from a special SDK, so it can intentionally send you down interesting paths.

Fast

Second, it should be reasonably fast. We'd like to ideally generate ~100 K ops/sec, and each operation might need to draw, say, 10 random values, which is 1 M values/sec. Basic Clojure (rand-int 5) on my ThreadRipper takes ~37 ns. Clojure's data.generators takes ~35 ns. Our thread-local implementation, backed by a LXM splittable random, takes just ~33 ns.

Thread-safe

We want everyone in the Jepsen universe to be able to draw random values from this namespace without coordinating. This implies generating values should be thread-safe.

Stateful

This namespace must be stateful. We'd like callers to simply be able to call (r/int 5) and get 2.

Pure, splittable random seeds ala test.check are nice, but they a.) come with a performance penalty, and b.) require threading that random state through essentially every function call and return. This is not only complex, but adds additional destructuring overhead at each call boundary.

The main advantage of stateful random generators is determinism across threads, but this is not a major concern in Jepsen. In normal test runs, we don't care about reproducibility. In Antithesis, the entire thread schedule is deterministic, so we're free to share state across threads and trust that Antithesis Will Take Care Of It. In tests, we're generally drawing entropy from a single thread. It'd be nice to have thread-safe random generators, but it's not critical.

Determinism

In single-threaded contexts, we want to be able to seed randomness and have reproducible tests. Doing this across threads is not really important--if we were being rigorous we could thread a splittable random down through every thread spawned, but there's a LOT of threaded code in Jepsen and it doesn't all know about us. More to the point, our multithreaded code is usually a.) non-random, or b.) doing IO, which we can't control. Having determinism for a single thread gets us a reasonable 'bang for our buck'.

Special Distributions

Jepsen needs some random things that aren't well supported by the normal java.util.Random, clojure.core, or data.generators functions. In particular, we like to do:

Zipfian distributions: lots of small things, but sometimes very large things.
Weighted choices: 90% reads, 5% writes, 5% deletes.
Special values: over-represent maxima, minima, and zero, to stress codepaths that might treat them differently.

Usage

Here are common Clojure functions and their equivalents in this namespace:

rand rand/double rand-int rand/long rand-nth rand/nth shuffle rand/shuffle

You can also generate values from common distributions:

rand/bool Returns true or false, optionally with a probability rand/exp Exponential distribution rand/geometric Geometric distribution rand/zipf Zipfian distribution rand/weighted Discrete values with given weights

You can take random permutations and subsets (really, ordered prefixes of permutations) of collections with:

rand/shuffle rand/nonempty-subset

There are two macros for randomly branching control flow:

rand/branch rand/weighted-branch

To re-bind randomness to a specifically seeded RNG, use:

(jepsen.random/with-seed 5 (jepsen.random/long) ; Returns the same value every time (call-stuff-using-jepsen.random ...) ; This holds for the whole body

This changes a global variable jepsen.random/rng and is NOT THREAD SAFE. Do not use with-seed concurrently. It's fine to spawn threads within the body, but if those threads are spawned in a nondeterministic order, then their calls to jepsen.random will also be nondeterministic.

Pluggable generation of random values.

## Pluggable

First, randomness should be pluggable. In normal tests, standard Clojure
`(rand-int)` and friends are just fine. But in tests, it's nice if those can
be replaced by a deterministic seed. When running in a hypervisor like
Antithesis, you want to draw entropy from a special SDK, so it can
intentionally send you down interesting paths.

## Fast

Second, it should be *reasonably* fast. We'd like to ideally generate ~100 K
ops/sec, and each operation might need to draw, say, 10 random values, which
is 1 M values/sec. Basic Clojure (rand-int 5) on my ThreadRipper takes ~37
ns. Clojure's data.generators takes ~35 ns. Our thread-local implementation,
backed by a LXM splittable random, takes just ~33 ns.

## Thread-safe

We want everyone in the Jepsen universe to be able to draw random values from
this namespace without coordinating. This implies generating values should be
thread-safe.

## Stateful

This namespace must be stateful. We'd like callers to simply be able to call
`(r/int 5)` and get 2.

Pure, splittable random seeds ala `test.check` are nice, but they a.) come
with a performance penalty, and b.) require threading that random state
through essentially every function call and return. This is not only complex,
but adds additional destructuring overhead at each call boundary.

The main advantage of stateful random generators is determinism across
threads, but this is not a major concern in Jepsen. In normal test runs, we
don't care about reproducibility. In Antithesis, the entire thread schedule
is deterministic, so we're free to share state across threads and trust that
Antithesis Will Take Care Of It. In tests, we're generally drawing entropy
from a single thread. It'd be *nice* to have thread-safe random generators,
but it's not critical.

## Determinism

In single-threaded contexts, we want to be able to seed randomness and have
reproducible tests. Doing this across threads is not really important--if we
were being rigorous we could thread a splittable random down through every
thread spawned, but there's a LOT of threaded code in Jepsen and it doesn't
all know about us. More to the point, our multithreaded code is usually a.)
non-random, or b.) doing IO, which we can't control. Having determinism for a
single thread gets us a reasonable 'bang for our buck'.

## Special Distributions

Jepsen needs some random things that aren't well supported by the normal
java.util.Random, clojure.core, or data.generators functions. In particular,
we like to do:

1. Zipfian distributions: lots of small things, but sometimes very
   large things.
2. Weighted choices: 90% reads, 5% writes, 5% deletes.
3. Special values: over-represent maxima, minima, and zero, to stress
   codepaths that might treat them differently.

## Usage

Here are common Clojure functions and their equivalents in this namespace:

  rand        rand/double
  rand-int    rand/long
  rand-nth    rand/nth
  shuffle     rand/shuffle

You can also generate values from common distributions:

  rand/bool       Returns true or false, optionally with a probability
  rand/exp        Exponential distribution
  rand/geometric  Geometric distribution
  rand/zipf       Zipfian distribution
  rand/weighted   Discrete values with given weights

You can take random permutations and subsets (really, ordered prefixes of
permutations) of collections with:

  rand/shuffle
  rand/nonempty-subset

There are two macros for randomly branching control flow:

  rand/branch
  rand/weighted-branch

To re-bind randomness to a specifically seeded RNG, use:

(jepsen.random/with-seed 5
  (jepsen.random/long)                  ; Returns the same value every time
  (call-stuff-using-jepsen.random ...)  ; This holds for the whole body

This changes a global variable `jepsen.random/rng` and is NOT THREAD SAFE. Do
not use `with-seed` concurrently. It's fine to spawn threads within the body,
but if those threads are spawned in a nondeterministic order, then their
calls to jepsen.random will also be nondeterministic.

raw docstring

cljdoc builds & hosts documentation for Clojure/Script libraries

Keyboard shortcuts

`Ctrl`+`k`	Jump to recent docs
`←`	Move to previous article
`→`	Move to next article
`Ctrl`+`/`	Jump to the search field

Raise an issue Browse cljdoc source Chat on Slack

× close