Two major crawler bits:
This namespace contains defintions and examples.
Two major crawler bits: - the pipeline - a crawl task from URL to saving a payload is a pipeline - the pipeline components - each operation in the pipeline - downloading the URL, extracting links, updating crawler state and so on. This namespace contains defintions and examples.
(initialize-pipeline config)
A pipeline contains kws - fn-map contains a map from the kws to implementations. The components (typically) read from a channel and write to a channel. The first component is fixed as the component that speaks to a queue. The last component is the writer
A pipeline contains kws - fn-map contains a map from the kws to implementations. The components (typically) read from a channel and write to a channel. The first component is fixed as the component that speaks to a queue. The last component is the writer
(is-config? a-config)
Simple heuristics to check if a map is still a config. Must be a map, must contain a pipeline key entries must have values - no mixins i.e.
Simple heuristics to check if a map is still a config. Must be a map, must contain a pipeline key entries must have values - no mixins i.e.
A pipeline component protocol. A pipeline component is responsible for setting up state (creating directories and that sort of thing), being a member of the pipeline, and then cleaning up when the crawl is supposed to end.
initialize - called with an existing config.
A pipeline component protocol. A pipeline component is responsible for setting up state (creating directories and that sort of thing), being a member of the pipeline, and then cleaning up when the crawl is supposed to end. initialize - called with an existing config.
(clean this config)
(initialize this config)
(run this obj config)
cljdoc builds & hosts documentation for Clojure/Script libraries
Ctrl+k | Jump to recent docs |
← | Move to previous article |
→ | Move to next article |
Ctrl+/ | Jump to the search field |