Liking cljdoc? Tell your friends :D

pegasus.process

Two major crawler bits:

  • the pipeline - a crawl task from URL to saving a payload is a pipeline
  • the pipeline components - each operation in the pipeline - downloading the URL, extracting links, updating crawler state and so on.

This namespace contains defintions and examples.

Two major crawler bits:
- the pipeline - a crawl task from URL to saving a payload is a pipeline
- the pipeline components - each operation in the pipeline - downloading
  the URL, extracting links, updating crawler state and so on.

This namespace contains defintions and examples.
raw docstring

add-transducerclj

(add-transducer in xf parallelism)
source

initialize-component-configsclj

(initialize-component-configs orig-config)
source

initialize-pipelineclj

(initialize-pipeline config)

A pipeline contains kws - fn-map contains a map from the kws to implementations. The components (typically) read from a channel and write to a channel. The first component is fixed as the component that speaks to a queue. The last component is the writer

A pipeline contains kws - fn-map
contains a map from the kws to implementations.
The components (typically) read from a
channel and write to a channel.
The first component is fixed as the component
that speaks to a queue.
The last component is the writer
sourceraw docstring

is-config?clj

(is-config? a-config)

Simple heuristics to check if a map is still a config. Must be a map, must contain a pipeline key entries must have values - no mixins i.e.

Simple heuristics to check if
a map is still a config.
Must be a map,
must contain a pipeline key
entries must have values - no mixins i.e.
sourceraw docstring

PipelineComponentProtocolcljprotocol

A pipeline component protocol. A pipeline component is responsible for setting up state (creating directories and that sort of thing), being a member of the pipeline, and then cleaning up when the crawl is supposed to end.

initialize - called with an existing config.

A pipeline component protocol.
A pipeline component is responsible for setting up state (creating
directories and that sort of thing), being a member of the pipeline,
and then cleaning up when the crawl is supposed to end.

initialize - called with an existing config.

cleanclj

(clean this config)

initializeclj

(initialize this config)

runclj

(run this obj config)
sourceraw docstring

run-processclj

(run-process component process-schema in-chan parallelism crawl-config)
source

run-with-inputclj

(run-with-input obj config process-schema component)
source

cljdoc builds & hosts documentation for Clojure/Script libraries

Keyboard shortcuts
Ctrl+kJump to recent docs
Move to previous article
Move to next article
Ctrl+/Jump to the search field
× close