Liking cljdoc? Tell your friends :D

org.clojars.punit-naik.clj-ml.lsh


band-hashclj

(band-hash band-size minhash-list)

Takes the minhash signature of a string and partitions it according to band-size Then we hash each "band" (partition) as similar strings will tend have at least one matching hashed band

Takes the minhash signature of a string and partitions it according to `band-size`
Then we hash each "band" (partition) as similar strings will tend have at least one matching hashed band
sourceraw docstring

compare-recordsclj

(compare-records records)

Compares a list of records/string with each other using org.clojars.punit-naik.clj-ml.utils.string/reversed-levenstein-distance

Compares a list of records/string with each other using `org.clojars.punit-naik.clj-ml.utils.string/reversed-levenstein-distance`
sourceraw docstring

find-possible-duplicatesclj

(find-possible-duplicates shingle-size
                          hash-count
                          band-size
                          match-threshold
                          data)

Takes a collection of strings (data) and finds out the similar strings from the collection

Takes a collection of strings (`data`) and finds out the similar strings from the collection
sourceraw docstring

hash-n-timesclj

(hash-n-times sh-list n)

Hashes a shingles list n times

Hashes a shingles list `n` times
sourceraw docstring

merge-candidatesclj

(merge-candidates candidate-list)
source

merge-candidates-recursiveclj

(merge-candidates-recursive candidate-list)
source

min-hashclj

(min-hash hash-values)

Takes the lists of hashed values (where all of them have the same size) and finds the minimum hash value at the position ‘i’ from every list thereby generating a single list of hash values which is the minhash signature of that string

Takes the lists of hashed values (where all of them have the same size)
and finds the minimum hash value at the position ‘i’ from every list
thereby generating a single list of hash values which is the minhash signature of that string
sourceraw docstring

cljdoc is a website building & hosting documentation for Clojure/Script libraries

× close