(fetch query)
Issues HTTP request to Common Crawl Index Server and returns a lazy sequence with content coordinates
Takes query
map, described in https://github.com/webrecorder/pywb/wiki/CDX-Server-API#api-reference
Additionally, :cdx-api
query key can specify index server endpoint.
If :cdx-api
is not provided, endpoint from most recent crawl is used and
can be accesed with (common-crawl-utils.config/get-most-recent-cdx-api)
;; To fetch all coordinates for host from most recent crawl (fetch {:url "http://www.cnn.com" :matchType "host"})
;; To fetch limited number of coordinates (take 10 (fetch {:url "http://www.cnn.com" :matchType "host"}))
Issues HTTP request to Common Crawl Index Server and returns a lazy sequence with content coordinates Takes `query` map, described in https://github.com/webrecorder/pywb/wiki/CDX-Server-API#api-reference Additionally, `:cdx-api` query key can specify index server endpoint. If `:cdx-api` is not provided, endpoint from most recent crawl is used and can be accesed with `(common-crawl-utils.config/get-most-recent-cdx-api)` ;; To fetch all coordinates for host from most recent crawl (fetch {:url "http://www.cnn.com" :matchType "host"}) ;; To fetch limited number of coordinates (take 10 (fetch {:url "http://www.cnn.com" :matchType "host"}))
(fetch-async {:keys [coordinate-chan limit close?]
:as query
:or {coordinate-chan (chan) close? true}})
(get-most-recent-cdx-api {:keys [index-collinfo]
:as query
:or {index-collinfo constants/index-collinfo}})
cljdoc is a website building & hosting documentation for Clojure/Script libraries
× close