Liking cljdoc? Tell your friends :D

skyscraper.traverse

Parallelized context tree traversal.

First, some definitions (sketchy – some details omitted):

  1. A handler is a function taking a map and returning a seq of maps (or a symbol naming such a function).
  2. A context is a map that may contain a special key, ::handler, describing a handler that you may run on it.

Now imagine that we have a root context. We can run its ::handler on it, obtaining a series of child contexts. If these contexts in turn contain their own ::handlers, we can invoke each on its associated context, obtaining another series of grandchild contexts. Repeatedly applying this process gives rise to a tree, called a context tree.

We call that tree implicit because it is never reified as a whole in the process; rather, its nodes are computed individually.

This ns implements context tree traversal parallelized using core.async, with the following provisos:

  • A handler can be either synchronous (in which case it's a function taking context and returning seq of contexts) or asynchronous (in which case it takes a seq of contexts and a callback, should return immediately, and should arrange for that callback to be called with a list of return contexts when it's ready). Whether a handler is synchronous or asynchronous depends on a context's ::call-protocol.

  • It supports context priorities, letting you control the order in which the context tree nodes will be visited. These are specified by the ::priority context key: the less the number, the higher the priority.

Parallelized context tree traversal.

First, some definitions (sketchy – some details omitted):

1. A _handler_ is a function taking a map and returning a seq of
   maps (or a symbol naming such a function).
2. A _context_ is a map that may contain a special key,
   `::handler`, describing a handler that you may run on it.

Now imagine that we have a root context. We can run its `::handler`
on it, obtaining a series of child contexts. If these contexts in
turn contain their own `::handler`s, we can invoke each on its
associated context, obtaining another series of grandchild contexts.
Repeatedly applying this process gives rise to a tree, called a
_context tree_.

We call that tree _implicit_ because it is never reified as a whole
in the process; rather, its nodes are computed individually.

This ns implements context tree traversal parallelized using core.async,
with the following provisos:

- A handler can be either synchronous (in which case it's a function
  taking context and returning seq of contexts) or asynchronous (in
  which case it takes a seq of contexts and a callback, should return
  immediately, and should arrange for that callback to be called with
  a list of return contexts when it's ready). Whether a handler is
  synchronous or asynchronous depends on a context's `::call-protocol`.

- It supports context priorities, letting you control the order in which
  the context tree nodes will be visited. These are specified by the
  `::priority` context key: the less the number, the higher the priority.
raw docstring

capture-errorscljmacro

(capture-errors context & body)
source

close-all!clj

(close-all! channels)

Closes channels used by the traversal process. Call this function after wait! returns.

Closes channels used by the traversal process. Call this function
after `wait!` returns.
sourceraw docstring

default-optionsclj

source

launchclj

(launch seed options)

Launches a parallel tree traversal. Spins up a number of core.async threads that actually perform it, then immediately returns a map of channels used to orchestrate the process – most importantly, :terminate-chan will be closed when the process completes.

options is a map that may include:

:leaf-chan a channel where seqs of tree leaves will be put (default nil) :item-chan a channel where seqs of tree nodes will be put (default nil) :parallelism number of worker threads to create (default 4) :prioritize? take into account ::priority values (default false)

To wait until traversal is complete, use wait!. Also, remember to use close-all! to close the channels returned by this function. See traverse! or chan->seq for an example of how to put it together.

Launches a parallel tree traversal. Spins up a number of core.async
threads that actually perform it, then immediately returns a map of
channels used to orchestrate the process – most importantly,
`:terminate-chan` will be closed when the process completes.

`options` is a map that may include:

  :leaf-chan     a channel where seqs of tree leaves will be put
                 (default nil)
  :item-chan     a channel where seqs of tree nodes will be put
                 (default nil)
  :parallelism   number of worker threads to create (default 4)
  :prioritize?   take into account ::priority values (default false)

To wait until traversal is complete, use `wait!`. Also, remember to
use `close-all!` to close the channels returned by this
function. See `traverse!` or `chan->seq` for an example of how to
put it together.
sourceraw docstring

leaf-seqclj

(leaf-seq seed options)

Returns a lazy seq of leaf nodes from a tree traversal. Any channels created will be automatically closed when the seq is fully consumed.

Returns a lazy seq of leaf nodes from a tree traversal. Any channels
created will be automatically closed when the seq is fully consumed.
sourceraw docstring

traverse!clj

(traverse! seed options)

Traverses a tree and returns after the process is complete. Parameters are the same as in launch.

Traverses a tree and returns after the process is complete.
Parameters are the same as in `launch`.
sourceraw docstring

wait!clj

(wait! {:keys [terminate-chan]})

Waits until the scraping process is complete.

Waits until the scraping process is complete.
sourceraw docstring

cljdoc is a website building & hosting documentation for Clojure/Script libraries

× close