TinySegmenter is a Clojure library which splits Japanese into words. This is needed because the Japanese stubbornly refuse to use the spacebar.
This library is just a Clojure port of the TinySegmenter javascript library by Taku Kudo (taku@chasen.org). This version is based on the Python 3 version of TinySegmenter
The library only exports a single function segment
, which takes a string (or any char sequence) as an argument.
(= (segment "私の名前はFelixです")
["私" "の" "名前" "は" "Felix" "です"])
This project is distributed under the BSD 3 License, just like the original version by Taku Kudo. See the LICENSE file for more information.
Can you improve this documentation?Edit on GitHub
cljdoc is a website building & hosting documentation for Clojure/Script libraries
× close