Liking cljdoc? Tell your friends :D

common-crawl-utils.fetcher


fetch-contentclj

(fetch-content query)

Fetches coordinates from Common Crawl Index Server along with their content from AWS

Takes query map, described in https://github.com/webrecorder/pywb/wiki/CDX-Server-API#api-reference

Additionally, :cdx-api query key can specify index server endpoint. If :cdx-api is not provided, endpoint from most recent crawl is used and can be accesed with (common-crawl-utils.config/get-most-recent-cdx-api)

;; To fetch all content for host from most recent crawl (fetch-content {:url "http://www.cnn.com" :matchType "host"})

;; To fetch limited number of coordinates with content (take 10 (fetch-content {:url "http://www.cnn.com" :matchType "host"}))

Fetches coordinates from Common Crawl Index Server along with their content from AWS

Takes `query` map, described in https://github.com/webrecorder/pywb/wiki/CDX-Server-API#api-reference

Additionally, `:cdx-api` query key can specify index server endpoint.
If `:cdx-api` is not provided, endpoint from most recent crawl is used and
can be accesed with `(common-crawl-utils.config/get-most-recent-cdx-api)`

;; To fetch all content for host from most recent crawl
(fetch-content {:url "http://www.cnn.com" :matchType "host"})

;; To fetch limited number of coordinates with content
(take 10 (fetch-content {:url "http://www.cnn.com" :matchType "host"}))
sourceraw docstring

fetch-content-asyncclj

(fetch-content-async
  {:keys [coordinate-chan content-chan close?]
   :as query
   :or {coordinate-chan (chan) content-chan (chan) close? true}})
source

fetch-single-coordinate-contentclj

(fetch-single-coordinate-content coordinate)
source

cljdoc is a website building & hosting documentation for Clojure/Script libraries

× close