Liking cljdoc? Tell your friends :D

riemann.config

Riemann config files are eval'd in the context of this namespace. Includes streams, client, email, logging, and graphite; the common functions used in config. Provides a default core and functions ((tcp|udp)-server, streams, index, reinject) which operate on that core.

Riemann config files are eval'd in the context of this namespace. Includes
streams, client, email, logging, and graphite; the common functions used in
config. Provides a default core and functions ((tcp|udp)-server, streams,
index, reinject) which operate on that core.
raw docstring

riemann.deps

Riemann's dependency resolution system expresses stateful relationships between events. Dependencies are expressed as Rules; a Rule is a statement about the relationship between a particular event and the current state of the index.

Maps are rules which specify that their keys and values should be present in some event in the index. {} will match any non-empty index. {:service "a" :state "ok"} will match an index which has {:service "a" :state "ok" :metric 123}, and so on.

(all & rules) matches only if all rules match.

(any & rules) matches if any of the rules match.

(localhost & rules) states that all child rules must have the same host as the event of interest.

(depends a & bs) means that if a matches the current event (and only the current event, not the full index), b must match the current event and index.

Riemann's dependency resolution system expresses stateful relationships
between events. Dependencies are expressed as Rules; a Rule is a statement
about the relationship between a particular event and the current state of
the index.

Maps are rules which specify that their keys and values should be present in
some event in the index. {} will match any non-empty index. {:service "a"
:state "ok"} will match an index which has {:service "a" :state "ok"
:metric 123}, and so on.

(all & rules) matches only if all rules match.

(any & rules) matches if any of the rules match.

(localhost & rules) states that all child rules must have the same host as
the event of interest.

(depends a & bs) means that if a matches the current event (and only the
current event, not the full index), b must match the current event and index.
raw docstring

riemann.expiration

Many places in Riemann need to understand whether the events they're working with are currently valid, and whether a given host/service combo has expired. The expiration tracker provides a stateful data structure for tracking new events, figuring out when expirations should be emitted, and calling back when they need to occur.

Many places in Riemann need to understand whether the events they're working
with are currently valid, and whether a given host/service combo has expired.
The expiration tracker provides a stateful data structure for tracking new
events, figuring out when expirations should be emitted, and calling back
when they need to occur.
raw docstring

riemann.folds

Functions for combining states.

Folds usually come in two variants: a friendly version like sum, and a strict version like sum*. Strict variants will throw when one of their events is nil, missing a metric, or otherwise invalid. In typical use, however, you won't have all the necessary information to pass on an event. Friendly variants will do their best to ignore these error conditions where sensible, returning partially complete events or nil instead of throwing.

Called with an empty list, folds which would return a single event return nil.

Functions for combining states.

Folds usually come in two variants: a friendly version like sum, and a strict
version like sum*. Strict variants will throw when one of their events is
nil, missing a metric, or otherwise invalid. In typical use, however, you
won't *have* all the necessary information to pass on an event. Friendly
variants will do their best to ignore these error conditions where sensible,
returning partially complete events or nil instead of throwing.

Called with an empty list, folds which would return a single event return
nil.
raw docstring

riemann.index

Maintains a stateful index of events by [host, service] key. Can be queried to return the most recent indexed events matching some expression. Can expire events which have exceeded their TTL. Presently the only implementation of the index protocol is backed by a nonblockinghashmap, but I plan to add an HSQLDB backend as well.

Indexes must extend three protocols:

  • Index: indexing and querying events
  • Seqable: returning a list of events
  • Service: lifecycle management
Maintains a stateful index of events by [host, service] key. Can be queried
to return the most recent indexed events matching some expression. Can expire
events which have exceeded their TTL. Presently the only implementation of
the index protocol is backed by a nonblockinghashmap, but I plan to add an
HSQLDB backend as well.

Indexes must extend three protocols:

- Index: indexing and querying events
- Seqable: returning a list of events
- Service: lifecycle management
raw docstring

riemann.kafka

Receives events from and forwards events to Kafka.

Receives events from and forwards events to Kafka.
raw docstring

riemann.plugin

Simple plugin loader for riemann.

Riemann already offers the ability to load jar files in the classpath through its initialization scripts. This namespace allows walking the classpath, looking for small meta files distributed with plugins which hint at the requires.

This allows load-plugins or load-plugin to make new functions and streams available in the config file without resorting to manual requires.

The meta file distributed with plugins is expected to be an EDN file containing at least two keys:

  • plugin: the plugin name
  • require: the namespace to load

The namespace will be made available using the plugin name's symbol value.

Simple plugin loader for riemann.

Riemann already offers the ability to load jar files in the classpath
through its initialization scripts. This namespace allows walking the
classpath, looking for small meta files distributed with plugins which
hint at the requires.

This allows `load-plugins` or `load-plugin` to make new functions and
streams available in the config file without resorting to manual
requires.

The meta file distributed with plugins is expected to be an EDN file
containing at least two keys:

- `plugin`: the plugin name
- `require`: the namespace to load

The namespace will be made available using the plugin name's symbol
value.
raw docstring

riemann.pool

A generic thread-safe resource pool.

A generic thread-safe resource pool.
raw docstring

riemann.pubsub

Provides publish-subscribe handling of events. Publishers push events onto a channel, which has n subscribers. Each subscriber subscribes to a channel with an optional predicate function. Events which match the predicate are sent to the subscriber.

Provides publish-subscribe handling of events. Publishers push events onto a
channel, which has n subscribers. Each subscriber subscribes to a channel
with an optional predicate function. Events which match the predicate are
sent to the subscriber.
raw docstring

riemann.query

The query parser. Parses strings into ASTs, and converts ASTs to functions which match events.

The query parser. Parses strings into ASTs, and converts ASTs to functions
which match events.
raw docstring

riemann.repl

The riemann REPL server is a bit of a special case. Since it controls almost every aspect of Riemann--and can shut those aspects down--it needs to live above them. While you usually start a repl server from the config file, it is not bound to the usual config lifecycle and won't be shut down or interrupted during config reload.

The riemann REPL server is a bit of a special case. Since it controls almost
every aspect of Riemann--and can shut those aspects down--it needs to live
above them. While you usually *start* a repl server from the config file, it
is not bound to the usual config lifecycle and won't be shut down or
interrupted during config reload.
raw docstring

riemann.streams

The streams namespace aims to provide a comprehensive set of widely applicable, combinable tools for building more complex streams.

Streams are functions which accept events or, in some cases, lists of events.

Streams typically do one or more of the following.

  • Filter events.
  • Transform events.
  • Combine events over time.
  • Apply events to other streams.
  • Forward events to other services.

Most streams accept, after their initial arguments, any number of streams as children. These are known as children or "child streams" of the stream. The children are typically invoked sequentially, any exceptions thrown are caught, logged and optionally forwarded to exception-stream. Return values of children are ignored.

Events are backed by a map (e.g. {:service "foo" :metric 3.5}), so any function that accepts maps will work with events. Common functions like prn can be used as a child stream.

Some common patterns for defining child streams are (fn [e] (println e)) and (partial log :info).

The streams namespace aims to provide a comprehensive set of widely
applicable, combinable tools for building more complex streams.

Streams are functions which accept events or, in some cases, lists of events.

Streams typically do one or more of the following.

* Filter events.
* Transform events.
* Combine events over time.
* Apply events to other streams.
* Forward events to other services.

Most streams accept, after their initial arguments, any number of streams as
children. These are known as children or "child streams" of the stream.
The children are typically invoked sequentially, any exceptions thrown are
caught, logged and optionally forwarded to *exception-stream*.
Return values of children are ignored.

Events are backed by a map (e.g. {:service "foo" :metric 3.5}), so any
function that accepts maps will work with events.
Common functions like prn can be used as a child stream.

Some common patterns for defining child streams are (fn [e] (println e))
and (partial log :info).
raw docstring

riemann.streams.pure

Riemann streams have performed exceptionally well, but their design scope was intentionally limited. Users consistently request:

  • Consistent behavior with respect to time intervals
  • Persisting stream state across reloads and process restarts
  • Replicating streams across multiple nodes

These suggest:

  • Streams should be atomically coerceable to data structures for storage and exchange between nodes.

  • Streams need globally unique identifiers so we can handle multiple streams feeding into the same child stream.

  • Streams should be deterministic functions of their input events, instead of depending on wall-clock time.

  • Streams should be as pure as possible--side effect should be explicitly identified and controlled for distribution and replayability.

And we wish to preserve:

  • Performance: tens of millions of events per second is nice.

  • Implicit parallelism: it's easy to get determinism in a single thread, but we can and should do better.

  • The existing stream syntax, including macros like (by).

Useful existing assumptions:

  • Events are roughly time-ordered, and we have no problem rejecting events that appear too far outside our latency window.

New assumptions:

  • Events are uniquely identified by [host, service, time, seq] identifiers.

  • Events are deterministically partitioned into time windows. Windows are monotonically advancing; all streams see windows in strictly sequential order.

Ideas on Time

Peter Alvaro has convinced me of the obvious (lol) idea that a distributed stream processor must have some way to seal the events--to tell when a given stream has seen all inputs for a particular interval, so that it can in turn send state downstream.

Presently, sealing is performed as a global side effect from the riemann.time scheduler, which causes weird latency anomalies--for instance, chaining 3 windows together introduces 3x window latency, and if multiple streams are recombined we'll happily interleave events from different times. Also, this fucks with determinism.

I think the right move is to thread sealing information through the same paths that connect streams to one another, but this introduces a new problem. Right now, Riemann streams only have links to their children, not their parents. In order for a stream to know it's received all its inputs, it must know how many producers are sending it messages.

We could do this by sending an 'initiation pulse' through the topology, where each parent informs its child that it exists. Happily, producer relationships are local, not global: a stream only has to know about its immediate producers.

A difficulty: (by) creates producers dynamically. I THINK this is the only place where Riemann topologies shift at runtime. Can we figure out a solution that works for (by)?

A specific case: (by) has one child stream c0, which has events for the current window w0. (by) receives an event e which creates a new child stream c0. All child streams unify at a final stream f.

  /--- c0 ---\
by            f
  \--- c1 ---/

Case 1: e falls in w0. Create child c1 and inform it that the current window is w0. c1 sends an initiation pulse for w0 to f, causing f to increment its producer count for w0. If by can only seal w0 once c1's initiation pulse has completely propagated, f will not seal w0 until c1 has had a chance to forward its w0 events to f as well.

Case 2: e falls in w1. Seal w0 and inform c0 as usual. Move to window w1, create c1 and proceed as in Case 1.

Implication: initiation and sealing pulses require synchronous acknowledgement--we have to wait for the initiation to propagate all the way downstream to make sure that every downstream node is aware of its new producer. It's fine to emit events concurrently (cuz the stream may be able to do parallel processing) so long as sealing is strictly ordered.

Another implication: Sealing pulses must be ordered w.r.t the event stream. We have to acquire something like a ReaderWriterLock to ensure no threads are operating on our children while we seal them. That means acquiring a readlock on every. single. event. Fuck.

Alternate strategy: queues between everything. We are going for CSP invariants, after all, and that's what Akka does! That means locks AND allocations and pointer chasing for every event. Fuck no.

Alternate strategy: most configs involve tens of branches. Branches can execute in parallel: we could enforce a single-thread-per-branch rule. When branches recombine, though, we gotta introduce a queue or a lock. Seems less bad. OTOH, if only one thread can ever execute a given stream, we can stop using atoms and move all stream state to volatile mutables, which could dramatically reduce memory barriers. And the initiation pulses tell us when we need locks: if you only have one producer, you don't need to lock.

If we use queues and enforce strict thread affinity for streams, we could get away with unsynchronized mutables, not just volatile ones.

UGH

Riemann streams have performed exceptionally well, but their design scope
was intentionally limited. Users consistently request:

- Consistent behavior with respect to time intervals
- Persisting stream state across reloads and process restarts
- Replicating streams across multiple nodes

These suggest:

- Streams should be atomically coerceable to data structures for storage and
  exchange between nodes.

- Streams need globally unique identifiers so we can handle multiple streams
  feeding into the same child stream.

- Streams should be deterministic functions of their input events, instead of
  depending on wall-clock time.

- Streams should be as pure as possible--side effect should be explicitly
  identified and controlled for distribution and replayability.


And we wish to preserve:

- Performance: tens of millions of events per second is nice.

- Implicit parallelism: it's easy to get determinism in a single thread, but
  we can and should do better.

- The existing stream syntax, including macros like (by).


Useful existing assumptions:

- Events are roughly time-ordered, and we have no problem rejecting events
that appear too far outside our latency window.


New assumptions:

- Events are uniquely identified by [host, service, time, seq] identifiers.

- Events are deterministically partitioned into time windows. Windows are
monotonically advancing; all streams see windows in strictly sequential
order.

## Ideas on Time

Peter Alvaro has convinced me of the obvious (lol) idea that a distributed
stream processor must have some way to *seal* the events--to tell when a
given stream has seen all inputs for a particular interval, so that it can in
turn send state downstream.

Presently, sealing is performed as a global side effect from the
riemann.time scheduler, which causes weird latency anomalies--for instance,
chaining 3 windows together introduces 3x window latency, and if multiple
streams are recombined we'll happily interleave events from different times.
Also, this fucks with determinism.

I think the right move is to thread sealing information through the same
paths that connect streams to one another, but this introduces a new problem.
Right now, Riemann streams only have links to their children, not their
parents. In order for a stream to know it's received all its inputs, it must
know how many producers are sending it messages.

We could do this by sending an 'initiation pulse' through the topology, where
each parent informs its child that it exists. Happily, producer relationships
are local, not global: a stream only has to know about its immediate
producers.

A difficulty: (by) creates producers dynamically. I THINK this is the only
place where Riemann topologies shift at runtime. Can we figure out a solution
that works for (by)?

A specific case: (by) has one child stream c0, which has events for the
current window w0. (by) receives an event e which creates a new child stream
c0. All child streams unify at a final stream f.


      /--- c0 ---\
    by            f
      \--- c1 ---/


Case 1: e falls in w0. Create child c1 and inform it that the current window
is w0. c1 sends an initiation pulse for w0 to f, causing f to increment its
producer count for w0. If by can only seal w0 once c1's initiation pulse has
completely propagated, f will not seal w0 until c1 has had a chance to
forward its w0 events to f as well.

Case 2: e falls in w1. Seal w0 and inform c0 as usual. Move to window w1,
create c1 and proceed as in Case 1.

Implication: initiation and sealing pulses require synchronous
acknowledgement--we have to wait for the initiation to propagate all the way
downstream to make sure that every downstream node is aware of its new
producer. It's fine to emit events concurrently (cuz the stream may be able
to do parallel processing) so long as sealing is strictly ordered.

Another implication: Sealing pulses must be ordered w.r.t the event stream.
We have to acquire something like a ReaderWriterLock to ensure no threads are
operating on our children while we seal them. That means acquiring a readlock
on every. single. event. Fuck.

Alternate strategy: queues between everything. We are going for CSP
invariants, after all, and that's what Akka does! That means locks AND
allocations and pointer chasing for every event. Fuck no.

Alternate strategy: most configs involve tens of branches. Branches can
execute in parallel: we could enforce a single-thread-per-branch rule. When
branches recombine, though, we gotta introduce a queue or a lock. Seems less
bad. OTOH, if only one thread can ever execute a given stream, we can stop
using atoms and move all stream state to volatile mutables, which could
dramatically reduce memory barriers. And the initiation pulses tell us when
we need locks: if you only have one producer, you don't need to lock.

If we use queues and enforce strict thread affinity for streams, we could get
away with unsynchronized mutables, not just volatile ones.

UGH
raw docstring

No vars found in this namespace.

riemann.test

Fast, end-to-end, repeatable testing for entire Riemann configs. Provides a tap macro which taps the event stream and (in testing mode) records all events that flow through that stream. Provides a variant of deftest that initiates controlled time and sets up a fresh result set for taps, and a function inject! to apply events to streams and see what each tap received.

Fast, end-to-end, repeatable testing for entire Riemann configs. Provides a
`tap` macro which taps the event stream and (in testing mode) records all
events that flow through that stream. Provides a variant of deftest that
initiates controlled time and sets up a fresh result set for taps, and a
function `inject!` to apply events to streams and see what each tap
received.
raw docstring

riemann.time

Clocks and scheduled tasks. Provides functions for getting the current time and running functions (Tasks) at specific times and periods. Includes a threadpool for task execution, controlled by (start!) and (stop!).

Clocks and scheduled tasks. Provides functions for getting the current time
and running functions (Tasks) at specific times and periods. Includes a
threadpool for task execution, controlled by (start!) and (stop!).
raw docstring

riemann.time.controlled

Provides controllable periodic and deferred execution. Calling (advance! delta-in-seconds) moves the clock forward, triggering events that would have occurred, in sequence.

Provides controllable periodic and deferred execution. Calling (advance!
delta-in-seconds) moves the clock forward, triggering events that would have
occurred, in sequence.
raw docstring

riemann.transport

Functions used in several transports. Some netty parts transpire here since netty is the preferred method of providing transports

Functions used in several transports. Some netty parts transpire
here since netty is the preferred method of providing transports
raw docstring

riemann.transport.debug

It is very dark. You are likely to be eaten a grue.

It is very dark. You are likely to be eaten a grue.
raw docstring

riemann.transport.tcp

Accepts messages from external sources. Associated with a core. Sends incoming events to the core's streams, queries the core's index for states.

Accepts messages from external sources. Associated with a core. Sends
incoming events to the core's streams, queries the core's index for states.
raw docstring

riemann.transport.udp

Accepts messages from external sources. Associated with a core. Sends incoming events to the core's streams, queries the core's index for states.

Accepts messages from external sources. Associated with a core. Sends
incoming events to the core's streams, queries the core's index for states.
raw docstring

cljdoc is a website building & hosting documentation for Clojure/Script libraries

× close