scrape
function that returns a lazy sequence of nodes, there is an
alternative, non-lazy, imperative interface (scrape!
) that treats producing new results as
side-effects.:parse-fn
and :http-options
can now be provided either per-page or globally. (Thanks to Alexander Solovyov for the suggestion.)process-fn
.skyscraper
namespace has been renamed to skyscraper.core
.defprocessor
now takes a keyword name, and registers a function in the
global registry instead of defining it. This means that it’s no longer possible
to call one processor from another: if you need that, define process-fn
as a
named function.:processor
keys are now expected to
be keywords.scrape
no longer guarantees the order in which the site will be scraped.
In particular, two different invocations of scrape
are not guaranteed to return
the scraped data in the same order. If you need that guarantee, set
parallelism
and max-connections
to 1.get-cache-keys
has been removed. If you want the same effect, include :cache-key
in the desired contexts.:only
now doesn’t barf on keys not appearing in seed.MemoryCache
.download
now supports arbitrarily many retries.get-cache-keys
.scrape
and friends can now accept a keyword as the first argument.:cache-key
key in the context).scrape
options: :only
and :postprocess
.scrape-csv
now accepts an :all-keys
argument and has been rewritten using a helper function, save-dataset-to-csv
.scrape-csv
.:updatable
,
scrape
now has an :update
option.scrape
option: :retries
.OutOfMemoryError
.
(scrape
no longer holds onto the head of the lazy seq it produces).processed-cache
option to scrape
now works as advertised.scrape
option: :html-cache
. (Thanks to ayato-p.)defprocessor
clauses: :url-fn
and :cache-key-fn
.
:url
key.process-fn
functions) can now access current context.decode-body-headers
feature.scrape
now supports a http-options
argument to override HTTP options (e.g., timeouts).Can you improve this documentation?Edit on GitHub
cljdoc is a website building & hosting documentation for Clojure/Script libraries
× close