(distributed-cache-files job-conf)
Given the job configuration retrieve the list of file paths we've stored in the distributed cache. These will be different paths on each node, thus each mapper needs to ask for the location during the set up phase.
Given the job configuration retrieve the list of file paths we've stored in the distributed cache. These will be different paths on each node, thus each mapper needs to ask for the location during the set up phase.
(file-path->lines path)
Given a Path, return the lines of the file as a realized sequence
Given a Path, return the lines of the file as a realized sequence
(make-ignore-word? job-conf)
Given the job configuration, read out the list of files that
we've supplied to the distributed cache, and create a predicate
that will return false for any strings that are
contained in our files.
Given the job configuration, read out the list of files that we've supplied to the distributed cache, and create a predicate that will return false for any strings that are contained in our files.
(my-map key value)
We'll emit a 1 for each word that we don't end up ignoring, along with that word.
We'll emit a 1 for each word that we don't end up ignoring, along with that word.
(my-map-setup context)
When we set up the mapper, let's now bind our ignore-word? var to a function that's read the files that contains the words to ignore from our cache.
When we set up the mapper, let's now bind our ignore-word? var to a function that's read the files that contains the words to ignore from our cache.
(my-reduce key values-fn)
Gather by each word and sum the values
Gather by each word and sum the values
cljdoc is a website building & hosting documentation for Clojure/Script libraries
× close