A tiny library wrapping language-detect that can be used to determine the language of a particular piece of text.
Here are the supported languages
There were not any easily usable language detection libraries, and while the language-detection library was good, it was not packaged well and was short on documentation. Also, Clojure needed a nice way to use it.
Add this to your project.clj:
[cld "0.1.0"]
(ns foo
(:require [cld.core :as lang]))
;; This loads the default language profiles, 99% of the time you will
;; want to use this, the other 1% of the time you can use
;; (lang/load-profiles "/path/to/profilesdir") to load whichever
;; profiles you'd like to use.
;;
;; Calling this multiple times only loads the profiles once, however
;; calling load-profiles multiple times with result in an Exception if
;; the profiles have already been loaded.
(lang/default-init!)
(lang/detect "Clojure is a sweet language.")
;; A tuple is returned on language and language-probability map:
;; => ["en" {"en" "0.7142847692020113", "nl" "0.28571303555752214"}]
(lang/detect "ただしその発表の時にお約束していたとおり")
;; => ["ja" {"ja" "0.9999999913100619"}]
(lang/detect "Le directeur de campagne de François Hollande réagit à l'entrée en campagne de John Doe")
;; => ["fr" {"fr" "0.9999964521882916"}]
;; A Reader can be specified also:
(lang/detect (clojure.java.io/reader "/tmp/foo"))
detect
also supports a map of options, here are the options:
{:smoothing n ;; defaults to 0.5
:max-length n ;; defaults to reading the entire stream or string
:verbose true ;; defaults to false, prints to stdout
:prior-map {"en" 0.1123 ;; A map of prior probabilities
"fr" 0.0091
"jp" 0.2330}}
(lang/detect "This is english, Un corps de femme a été retrouvé")
;; => ["fr" {"fr" "0.8571405683231152", "en" "0.14285930685987672"}]
(lang/detect "This is english, Un corps de femme a été retrouvé" {:max-length 10})
;; => ["en" {"en" "0.999996754400581"}]
Licensed under the Apache Public License, version 2
I don't actually know what any of the French or Japanese sentences mean, there shouldn't be anything offensive in there, but my apologies if there is.
Can you improve this documentation?Edit on GitHub
cljdoc is a website building & hosting documentation for Clojure/Script libraries
× close