Liking cljdoc? Tell your friends :D

common-crawl-utils.reader


get-cdx-urlsclj

(get-cdx-urls id)
source

get-urlsclj

(get-urls path)
source

get-warc-urlsclj

(get-warc-urls id)
source

read-coordinatesclj

(read-coordinates)
(read-coordinates id)

Given an ID of the Crawl returns a sequence of Common Crawl Coordinates records. ID example: "CC-MAIN-2019-35". By default reads coordinates from the latest Crawl. Returns the sequence of the coordinates.

Given an ID of the Crawl returns a sequence of Common Crawl Coordinates records.
ID example: "CC-MAIN-2019-35".
By default reads coordinates from the latest Crawl.
Returns the sequence of the coordinates.
sourceraw docstring

read-warcclj

(read-warc)
(read-warc id)

Given an ID of the Crawl returns a sequence of WARC records. ID example: "CC-MAIN-2019-35". By default reads WARCs from the latest Crawl. Returns the sequence of the WARC records.

Given an ID of the Crawl returns a sequence of WARC records.
ID example: "CC-MAIN-2019-35".
By default reads WARCs from the latest Crawl.
Returns the sequence of the WARC records.
sourceraw docstring

cljdoc is a website building & hosting documentation for Clojure/Script libraries

× close