(fetch-content query)
Fetches coordinates from Common Crawl Index Server along with their content from AWS
Takes query
map, described in https://github.com/webrecorder/pywb/wiki/CDX-Server-API#api-reference
Additionally, :cdx-api
query key can specify index server endpoint.
If :cdx-api
is not provided, endpoint from most recent crawl is used and
can be accesed with (common-crawl-utils.config/get-most-recent-cdx-api)
;; To fetch all content for host from most recent crawl (fetch-content {:url "http://www.cnn.com" :matchType "host"})
;; To fetch limited number of coordinates with content (take 10 (fetch-content {:url "http://www.cnn.com" :matchType "host"}))
Fetches coordinates from Common Crawl Index Server along with their content from AWS Takes `query` map, described in https://github.com/webrecorder/pywb/wiki/CDX-Server-API#api-reference Additionally, `:cdx-api` query key can specify index server endpoint. If `:cdx-api` is not provided, endpoint from most recent crawl is used and can be accesed with `(common-crawl-utils.config/get-most-recent-cdx-api)` ;; To fetch all content for host from most recent crawl (fetch-content {:url "http://www.cnn.com" :matchType "host"}) ;; To fetch limited number of coordinates with content (take 10 (fetch-content {:url "http://www.cnn.com" :matchType "host"}))
(fetch-content-async
{:keys [coordinate-chan content-chan close?]
:as query
:or {coordinate-chan (chan) content-chan (chan) close? true}})
cljdoc is a website building & hosting documentation for Clojure/Script libraries
× close