Liking cljdoc? Tell your friends :D

common-crawl-utils.coordinates


call-cdx-apiclj

(call-cdx-api {:keys [cdx-api timeout] :as query})
source

cdx-paramsclj

source

fetchclj

(fetch query)

Issues HTTP request to Common Crawl Index Server and returns a lazy sequence with content coordinates

Takes query map, described in https://github.com/webrecorder/pywb/wiki/CDX-Server-API#api-reference

Additionally, :cdx-api query key can specify index server endpoint. If :cdx-api is not provided, endpoint from most recent crawl is used and can be accesed with (common-crawl-utils.config/get-most-recent-cdx-api)

;; To fetch all coordinates for host from most recent crawl (fetch {:url "http://www.cnn.com" :matchType "host"})

;; To fetch limited number of coordinates (take 10 (fetch {:url "http://www.cnn.com" :matchType "host"}))

Issues HTTP request to Common Crawl Index Server and returns a lazy sequence with content coordinates

Takes `query` map, described in https://github.com/webrecorder/pywb/wiki/CDX-Server-API#api-reference

Additionally, `:cdx-api` query key can specify index server endpoint.
If `:cdx-api` is not provided, endpoint from most recent crawl is used and
can be accesed with `(common-crawl-utils.config/get-most-recent-cdx-api)`

;; To fetch all coordinates for host from most recent crawl
(fetch {:url "http://www.cnn.com" :matchType "host"})

;; To fetch limited number of coordinates
(take 10 (fetch {:url "http://www.cnn.com" :matchType "host"}))
sourceraw docstring

fetch-asyncclj

(fetch-async {:keys [coordinate-chan limit close?]
              :as query
              :or {coordinate-chan (chan) close? true}})
source

get-most-recent-cdx-apiclj

(get-most-recent-cdx-api {:keys [index-collinfo]
                          :as query
                          :or {index-collinfo constants/index-collinfo}})
source

get-total-pagesclj

(get-total-pages {:keys [cdx-api] :as query})
source

cljdoc is a website building & hosting documentation for Clojure/Script libraries

× close