A lightweight concurrent web crawler written in Clojure!
(require '[net.cgrand.enlive-html :as html])
(def hacker-news-url "https://news.ycombinator.com")
(defn headlines
"Retrieves Hacker News headlines"
[url]
(-> url
fetch-dom
(html/select [:td.title :a.storylink])
(as-> nodes (map html/text nodes))
println))
; Retrieve Hacker News headlines on first few pages!
(crawl {:url hacker-news-url
:handler headlines
:link-selector [:a.morelink]
:max-depth 3})
(require '[net.cgrand.enlive-html :as html])
(def amazon-url "https://www.amazon.ca/Echo-Dot-3rd-gen-Charcoal/product-reviews/B07PDHT5XP/ref=cm_cr_dp_d_show_all_btm?ie=UTF8&reviewerType=all_reviews")
(defn reviews
"Retrieves Amazon reviews"
[url]
(-> url
fetch-dom
(html/select [:div.review :span.review-text])
(as-> reviews (map html/text reviews))
(as-> reviews (map clojure.string/trim reviews))
println))
; Retrieve Amazon reviews on first few pages!
(crawl {:url amazon-url
:handler reviews
:link-selector [:li.a-last :a]
:max-depth 3})
Can you improve this documentation?Edit on GitHub
cljdoc is a website building & hosting documentation for Clojure/Script libraries
× close