Liking cljdoc? Tell your friends :D

jepsen.generator.pure

A Jepsen history is a list of operations--invocations and completions. A generator's job is to specify what invocations to perform, and when. In a sense, a generator becomes a history as Jepsen incrementally applies it to a real system.

Naively, we might define a history as a fixed sequence of invocations to perform at certain times, but this is impossible: we have only a fixed set of threads, and they may not be free to perform our operations. A thread must be free in order to perform an operation.

Time, too, is a dependency. When we schedule an operation to occur once per second, we mean that only once a certain time has passed can the next operation begin.

There may also be dependencies between threads. Perhaps only after a nemesis has initiated a network partition does our client perform a particular read. We want the ability to hold until a certain operation has begun.

Conceptually, then, a generator is a graph of events, some of which have not yet occurred. Some events are invocations: these are the operations the generator will provide to clients. Some events are completions: these are provided by clients to the generator. Other events are temporal: a certain time has passed.

This graph has some invocations which are ready to perform. When we have a ready invocation, we apply the invocation as an input to the graph, obtaining a new graph, and hand the operation to the relevant client.

Contexts

A context is a map which provides context for generators. For instance, a generator might need to know the number of threads which will ask it for operations. It can get that number from the context. Users can add their own values to the context map, which allows two generators to share state. When one generator calls another, it can pass a modified version of the context, which allows us to write generators that, say, run two independent workloads, each with their own concurrency and thread mappings.

The standard context mappings, which are provided by Jepsen when invoking the top-level generator, and can be expected by every generator, are:

:time           The current Jepsen linear time, in nanoseconds
:free-threads   A collection of idle threads which could perform work
:workers        A map of thread identifiers to process identifiers

Fetching an operation

We use (op gen test context) to ask the generator for the next invocation that we can process.

The operation can have three forms:

  • The generator may return nil, which means the generator is done, and there is nothing more to do. Once a generator does this, it must never return anything other than nil, even if the context changes.
  • The generator may return :pending, which means there might be more ops later, but it can't tell yet.
  • The generator may return an operation, in which case:
    • If its time is in the past, we can evaluate it now
    • If its time is in the future, we wait until either:
      • The time arrives
      • Circumstances change (e.g. we update the generator)

But (op gen test context) returns more than just an operation; it also returns the subsequent state of the generator, if that operation were to be emitted. The two are bundled into a tuple.

(op gen test context) => [op gen'] ; known op [:pending gen] ; unsure nil ; exhausted

The analogous operations for sequences are (first) and (next); why do we couple them here? Why not use the update mechanism strictly to evolve state? Because the behavior in sequences is relatively simple: next always moves forward one item, whereas only some updates actually cause systems to move forward. Seqs always do the same thing in response to next, whereas generators may do different things depending on context. Moreover, Jepsen generators are often branched, rather than linearly wrapped, as sequences are, resulting in questions about which branch needs to be updated.

When I tried to strictly separate implementations of (op) and (update), it resulted in every update call attempting to determine whether this particular generator did or did not emit the given invocation event. This is remarkably tricky to do well, and winds up relying on all kinds of non-local assumptions about the behavior of the generators you wrap, and those which wrap you.

Updating a generator

We still want the ability to respond to invocations and completions, e.g. by tracking that information in context variables. Therefore, in addition to (op) returning a new generator, we have a separate function, (update gen test context event), which allows generators to react to changing circumstances.

  • We invoke an operation (e.g. one that the generator just gave us)
  • We complete an operation

Updates use a context with a specific relationship to the event:

  • The context :time is equal to the event :time
  • The free processes and worker maps reflect those prior to the event taking place. This ensures that generators can examine the worker map to identify which thread performed the given operation.

TODO: this is not true yet. Fix this.

Default implementations

Nil is a valid generator; it ignores updates and always yields nil for operations.

IPersistentMaps are generators which ignore updates and return operations which look like the map itself, but with default values for time, process, and type provided based on the context. This means you can write a generator like

{:f :write, :value 2}

and it will generate ops like

{:type :invoke, :process 3, :time 1234, :f :write, :value 2}

Sequences are generators which assume the elements of the sequence are themselves generators. They ignore updates, and return all operations from the first generator in the sequence, then all operations from the second, and so on. They do not synchronize.

Functions are generators which ignore updates and can take either test and context as arguments, or no args. Functions should be mostly pure, but some creative impurity is probably OK. For instance, returning randomized :values for maps is probably all right. I don't know the laws! What is this, Haskell?

Functions can return two things:

  • nil: signifies that the function generator is exhausted.
  • a tuple of [op gen]: passed through directly; the gen replaces the fn
  • a map: the map is treated as a generator, which lets it fill in a process, time, etc.

In the future, we might consider:

  • Returning any other generator: the function could be replaced by that generator, allowing us to compute generators lazily?
A Jepsen history is a list of operations--invocations and completions. A
generator's job is to specify what invocations to perform, and when. In a
sense, a generator *becomes* a history as Jepsen incrementally applies it to
a real system.

Naively, we might define a history as a fixed sequence of invocations to
perform at certain times, but this is impossible: we have only a fixed set of
threads, and they may not be free to perform our operations. A thread must be
*free* in order to perform an operation.

Time, too, is a dependency. When we schedule an operation to occur once per
second, we mean that only once a certain time has passed can the next
operation begin.

There may also be dependencies between threads. Perhaps only after a nemesis
has initiated a network partition does our client perform a particular read.
We want the ability to hold until a certain operation has begun.

Conceptually, then, a generator is a *graph* of events, some of which have
not yet occurred. Some events are invocations: these are the operations the
generator will provide to clients. Some events are completions: these are
provided by clients to the generator. Other events are temporal: a certain
time has passed.

This graph has some invocations which are *ready* to perform. When we have a
ready invocation, we apply the invocation as an input to the graph, obtaining
a new graph, and hand the operation to the relevant client.

## Contexts

A *context* is a map which provides context for generators. For instance, a
generator might need to know the number of threads which will ask it for
operations. It can get that number from the *context*. Users can add their
own values to the context map, which allows two generators to share state.
When one generator calls another, it can pass a modified version of the
context, which allows us to write generators that, say, run two independent
workloads, each with their own concurrency and thread mappings.

The standard context mappings, which are provided by Jepsen when invoking the
top-level generator, and can be expected by every generator, are:

    :time           The current Jepsen linear time, in nanoseconds
    :free-threads   A collection of idle threads which could perform work
    :workers        A map of thread identifiers to process identifiers

## Fetching an operation

We use `(op gen test context)` to ask the generator for the next invocation
that we can process.

The operation can have three forms:

- The generator may return `nil`, which means the generator is done, and
  there is nothing more to do. Once a generator does this, it must never
  return anything other than `nil`, even if the context changes.
- The generator may return :pending, which means there might be more
  ops later, but it can't tell yet.
- The generator may return an operation, in which case:
  - If its time is in the past, we can evaluate it now
  - If its time is in the future, we wait until either:
    - The time arrives
    - Circumstances change (e.g. we update the generator)

But (op gen test context) returns more than just an operation; it also
returns the *subsequent state* of the generator, if that operation were to be
emitted. The two are bundled into a tuple.

(op gen test context) => [op gen']      ; known op
                         [:pending gen] ; unsure
                         nil            ; exhausted

The analogous operations for sequences are (first) and (next); why do we
couple them here? Why not use the update mechanism strictly to evolve state?
Because the behavior in sequences is relatively simple: next always moves
forward one item, whereas only *some* updates actually cause systems to move
forward. Seqs always do the same thing in response to `next`, whereas
generators may do different things depending on context. Moreover, Jepsen
generators are often branched, rather than linearly wrapped, as sequences
are, resulting in questions about *which branch* needs to be updated.

When I tried to strictly separate implementations of (op) and (update), it
resulted in every update call attempting to determine whether this particular
generator did or did not emit the given invocation event. This is
*remarkably* tricky to do well, and winds up relying on all kinds of
non-local assumptions about the behavior of the generators you wrap, and
those which wrap you.

## Updating a generator

We still want the ability to respond to invocations and completions, e.g. by
tracking that information in context variables. Therefore, in addition to
(op) returning a new generator, we have a separate function, (update gen test
context event), which allows generators to react to changing circumstances.

- We invoke an operation (e.g. one that the generator just gave us)
- We complete an operation

Updates use a context with a specific relationship to the event:

- The context :time is equal to the event :time
- The free processes and worker maps reflect those *prior* to the event
taking place. This ensures that generators can examine the worker map to
identify which thread performed the given operation.

TODO: this is not true yet. Fix this.

## Default implementations

Nil is a valid generator; it ignores updates and always yields nil for
operations.

IPersistentMaps are generators which ignore updates and return operations
which look like the map itself, but with default values for time, process,
and type provided based on the context. This means you can write a generator
like

{:f :write, :value 2}

and it will generate ops like

{:type :invoke, :process 3, :time 1234, :f :write, :value 2}

Sequences are generators which assume the elements of the sequence are
themselves generators. They ignore updates, and return all operations from
the first generator in the sequence, then all operations from the second, and
so on. They do not synchronize.

Functions are generators which ignore updates and can take either test and
context as arguments, or no args. Functions should be *mostly* pure, but some
creative impurity is probably OK. For instance, returning randomized :values
for maps is probably all right. I don't know the laws! What is this, Haskell?

Functions can return two things:

- nil: signifies that the function generator is exhausted.
- a tuple of [op gen]: passed through directly; the gen replaces the fn
- a map: the map is treated as a generator, which lets it fill in a process,
         time, etc.

In the future, we might consider:

- Returning any other generator: the function could be *replaced* by that
generator, allowing us to compute generators lazily?
raw docstring

all-processesclj

(all-processes context)

Given a context, returns all processes currently being executed by threads.

Given a context, returns all processes currently being executed by threads.
raw docstring

all-threadsclj

(all-threads context)

Given a context, returns a collection of all threads.

Given a context, returns a collection of all threads.
raw docstring

anyclj

(any & gens)

Takes multiple generators and binds them together. Operations are taken from any generator. Updates are propagated to all generators.

Takes multiple generators and binds them together. Operations are taken from
any generator. Updates are propagated to all generators.
raw docstring

clientsclj

(clients client-gen)
(clients client-gen nemesis-gen)

In the single-arity form, wraps a generator such that only clients request operations from it. In its two-arity form, combines a generator of client operations and a generator for nemesis operations into one. When the process requesting an operation is :nemesis, routes to the nemesis generator; otherwise to the client generator.

In the single-arity form, wraps a generator such that only clients
request operations from it. In its two-arity form, combines a generator of
client operations and a generator for nemesis operations into one. When the
process requesting an operation is :nemesis, routes to the nemesis generator;
otherwise to the client generator.
raw docstring

delay-tilclj

(delay-til dt gen)

Given a time dt in seconds, and an underlying generator gen, constructs a generator which aligns invocations to intervals of dt seconds.

Given a time dt in seconds, and an underlying generator gen, constructs a
generator which aligns invocations to intervals of dt seconds.
raw docstring

dissoc-vecclj

(dissoc-vec v i)

Cut a single index out of a vector, returning a vector one shorter, without the element at that index.

Cut a single index out of a vector, returning a vector one shorter, without
the element at that index.
raw docstring

each-threadclj

(each-thread gen)

Takes a generator. Constructs a generator which maintains independent copies of that generator for every thread. Each generator sees exactly one thread in its free process list. Updates are propagated to the generator for the thread which emitted the operation.

Takes a generator. Constructs a generator which maintains independent copies
of that generator for every thread. Each generator sees exactly one thread in
its free process list. Updates are propagated to the generator for the thread
which emitted the operation.
raw docstring

f-mapclj

(f-map f-map g)

Takes a function f-map converting op functions (:f op) to other functions, and a generator g. Returns a generator like g, but where fs are replaced according to f-map. Useful for composing generators together for use with a composed nemesis.

Takes a function `f-map` converting op functions (:f op) to other functions,
and a generator `g`. Returns a generator like `g`, but where fs are replaced
according to `f-map`. Useful for composing generators together for use with a
composed nemesis.
raw docstring

filterclj

(filter f gen)

A generator which filters operations from an underlying generator, passing on only those which match (f op). Like map, :pending and nil operations bypass the filter.

A generator which filters operations from an underlying generator, passing
on only those which match (f op). Like `map`, :pending and nil operations
bypass the filter.
raw docstring

free-processesclj

(free-processes context)

Given a context, returns a collection of processes which are not actively processing invocations.

Given a context, returns a collection of processes which are not actively
processing invocations.
raw docstring

free-threadsclj

(free-threads context)

Given a context, returns a collection of threads that are not actively processing invocations.

Given a context, returns a collection of threads that are not actively
processing invocations.
raw docstring

Generatorcljprotocol

opclj

(op gen test context)

updateclj

(update gen test context event)

Updates the generator to reflect an event having taken place

Updates the generator to reflect an event having taken place

ignore-updatesclj

(ignore-updates gen)

Wraps a generator. Any call to update is ignored, returning this generator with no changes.

It's not clear if this actually confers any performance advantage right now.

Wraps a generator. Any call to `update` is ignored, returning this
generator with no changes.

It's not clear if this actually confers any performance advantage right now.
raw docstring

limitclj

(limit remaining gen)

Wraps a generator and ensures that it returns at most limit operations. Propagates every update to the underlying generator.

Wraps a generator and ensures that it returns at most `limit` operations.
Propagates every update to the underlying generator.
raw docstring

logclj

(log msg)

A generator which, when asked for an operation, logs a message and yields nil.

A generator which, when asked for an operation, logs a message and yields
nil.
raw docstring

mapclj

(map f gen)

A generator which wraps another generator g, transforming operations it generates with (f op test process), of if that fails, (f op). When the underlying generator yields :pending or nil, this generator does too, without calling f. Passes updates to underlying generator.

A generator which wraps another generator g, transforming operations it
generates with (f op test process), of if that fails, (f op). When the
underlying generator yields :pending or nil, this generator does too, without
calling `f`. Passes updates to underlying generator.
raw docstring

mixclj

(mix gens)

A random mixture of several generators. Takes a collection of generators and chooses between them uniformly. Ignores updates; some users create broad (hundreds of generators) mixes.

To be precise, a mix behaves like a sequence of one-time, randomly selected generators from the given collection. This is efficient and prevents multiple generators from competing for the next slot, making it hard to control the mixture of operations.

A random mixture of several generators. Takes a collection of generators and
chooses between them uniformly. Ignores updates; some users create broad
(hundreds of generators) mixes.

To be precise, a mix behaves like a sequence of one-time, randomly selected
generators from the given collection. This is efficient and prevents multiple
generators from competing for the next slot, making it hard to control the
mixture of operations.
raw docstring

nemesisclj

(nemesis nemesis-gen)
(nemesis nemesis-gen client-gen)

In the single-arity form, wraps a generator such that only the nemesis requests operations from it. In its two-arity form, combines a generator of client operations and a generator for nemesis operations into one. When the process requesting an operation is :nemesis, routes to the nemesis generator; otherwise to the client generator.

In the single-arity form, wraps a generator such that only the nemesis
requests operations from it. In its two-arity form, combines a generator of
client operations and a generator for nemesis operations into one. When the
process requesting an operation is :nemesis, routes to the nemesis generator;
otherwise to the client generator.
raw docstring

next-processclj

(next-process context thread)

When a process being executed by a thread crashes, this function returns the next process for a given thread. You should probably only use this with the global context, because it relies on the size of the :workers map.

When a process being executed by a thread crashes, this function returns the
next process for a given thread. You should probably only use this with the
*global* context, because it relies on the size of the `:workers` map.
raw docstring

onclj


on-threadsclj

(on-threads f gen)

Wraps a generator, restricting threads which can use it to only those threads which satisfy (f thread). Alters the context passed to the underlying generator: it will only include free threads and workers satisfying f. Updates are passed on only when the thread performing the update matches f.

Wraps a generator, restricting threads which can use it to only those
threads which satisfy (f thread). Alters the context passed to the underlying
generator: it will only include free threads and workers satisfying f.
Updates are passed on only when the thread performing the update matches f.
raw docstring

on-threads-contextclj

(on-threads-context f ctx)

Helper function to transform contexts for OnThreads.

Helper function to transform contexts for OnThreads.
raw docstring

onceclj

(once gen)

Emits only a single item from the underlying generator.

Emits only a single item from the underlying generator.
raw docstring

phasesclj

(phases & generators)

Takes several generators, and constructs a generator which evaluates everything from the first generator, then everything from the second, and so on.

Takes several generators, and constructs a generator which evaluates
everything from the first generator, then everything from the second, and so
on.
raw docstring

process->threadclj

(process->thread context process)

Takes a context and a process, and returns the thread which is executing that process.

Takes a context and a process, and returns the thread which is executing
that process.
raw docstring

process-limitclj

(process-limit n gen)

Takes a generator and returns a generator with bounded concurrency--it emits operations for up to n distinct processes, but no more.

Specifically, we track the set of all processes in a context's workers map: the underlying generator can return operations only from contexts such that the union of all processes across all such contexts has cardinality at most n. Tracking the union of all possible processes, rather than just those processes actually performing operations, prevents the generator from "trickling" at the end of a test, i.e. letting only one or two processes continue to perform ops, rather than the full concurrency of the test.

Takes a generator and returns a generator with bounded concurrency--it emits
operations for up to n distinct processes, but no more.

Specifically, we track the set of all processes in a context's `workers` map:
the underlying generator can return operations only from contexts such that
the union of all processes across all such contexts has cardinality at most
`n`. Tracking the union of all *possible* processes, rather than just those
processes actually performing operations, prevents the generator from
"trickling" at the end of a test, i.e. letting only one or two processes
continue to perform ops, rather than the full concurrency of the test.
raw docstring

random-int-seqclj

(random-int-seq)
(random-int-seq seed)

Generates a reproducible sequence of random longs, given a random seed. If seed is not provided, taken from (rand-int)).

Generates a reproducible sequence of random longs, given a random seed. If
seed is not provided, taken from (rand-int)).
raw docstring

sleepclj

(sleep)

Informally, pauses for dt seconds before yielding nil for an operation. Formally, this is sort of a weird one, because everything here is pure, and there's no notion of blocking. We can delay operations, which have times, but delaying nil, which does NOT have a time, isn't straightforward.

What we'll need to do, later, is invent a special type of operation map which delays the scheduler and is not passed to clients. Or just refuse to implement this at all. It's nicely readable, but it's semantically kind of weird (Who sleeps? When?) compared to delaying a subsequent operation, which has a well defined behavior.

Informally, pauses for dt seconds before yielding `nil` for an operation.
Formally, this is sort of a weird one, because everything here is pure, and
there's no notion of blocking. We can delay operations, which have times, but
delaying *nil*, which does NOT have a time, isn't straightforward.

What we'll need to do, later, is invent a special type of operation map which
delays the scheduler and is *not* passed to clients. Or just refuse to
implement this at all. It's nicely readable, but it's semantically kind of
weird (Who sleeps? When?) compared to delaying a subsequent operation, which
has a well defined behavior.
raw docstring

soonest-op-vecclj

(soonest-op-vec pair1 pair2)

Takes two [op, ...] vectors, and returns the vector whose op occurs first. Op maps occur before those which are :pending. :pending occurs before nil.

We use vectors here because you may want to pass [op, gen'] pairs, or possibly encode additional information into the vector, so you can, for instance, identify which of several generators was the next one.

Takes two [op, ...] vectors, and returns the vector whose op occurs first.
Op maps occur before those which are :pending. :pending occurs before `nil`.

We use vectors here because you may want to pass [op, gen'] pairs, or
possibly encode additional information into the vector, so you can, for
instance, identify *which* of several generators was the next one.
raw docstring

staggerclj

(stagger dt gen)

Wraps a generator. Operations from that generator are delayed by a uniform random time between 0 to 2 * dt.

Note that unlike jepsen's original version of stagger, this delay applies to all operations, not to each thread independently. If your old stagger dt is 10, and your concurrency is 5, your new stagger dt should be 2.

Wraps a generator. Operations from that generator are delayed by a uniform
random time between 0 to 2 * dt.

Note that unlike jepsen's original version of `stagger`, this delay applies
to *all* operations, not to each thread independently. If your old stagger
dt is 10, and your concurrency is 5, your new stagger dt should be 2.
raw docstring

synchronizeclj

(synchronize gen)

Takes a generator, and waits for all workers to be free before it begins.

Takes a generator, and waits for all workers to be free before it begins.
raw docstring

thenclj

(then a b)

Generator A, synchronize, then generator B. Note that this takes its arguments backwards: b comes before a. Why? Because it reads better in ->> composition. You can say:

(->> (fn [] {:f :write :value 2})
     (limit 3)
     (then (once {:f :read})))
Generator A, synchronize, then generator B. Note that this takes its
arguments backwards: b comes before a. Why? Because it reads better in ->>
composition. You can say:

    (->> (fn [] {:f :write :value 2})
         (limit 3)
         (then (once {:f :read})))
raw docstring

time-limitclj

(time-limit dt gen)

Takes a time in seconds, and an underlying generator. Once this emits an operation (taken from that underlying generator), will only emit operations for dt seconds.

Takes a time in seconds, and an underlying generator. Once this emits an
operation (taken from that underlying generator), will only emit operations
for dt seconds.
raw docstring

validateclj

(validate gen)

Validates the well-formedness of operations emitted from the underlying generator.

Validates the well-formedness of operations emitted from the underlying
generator.
raw docstring

cljdoc is a website building & hosting documentation for Clojure/Script libraries

× close