Clojure wrapper for the Japanese Morphological Analyzer MeCab.
A minimal wrapper around the SWIG-generated Java bindings for MeCab. Currently tested with all varieties of UniDic and IPAdic, although other dictionaries are planned.
clj-mecab requires you to have MeCab (0.996) installed (the mecab-config
binary is used to find your MeCab configuration) and on your path.
On Debian:
apt get install mecab mecab-utils libmecab-java libmecab-jni unidic-mecab
On MacOS:
brew install mecab mecab-unidic
Note that you will need to manually install Maven dependencies on MacOS (see next section).
You also need to have the Java JNI (SWIG) bindings for the version of MeCab you have installed on your system installed in your local Maven repository (~/.m2
).
This can be accomplished by:
mvn install:install-file -DgroupId=org.chasen -DartifactId=mecab -Dpackaging=jar -Dversion=0.996 -Dfile=/usr/share/java/mecab/MeCab.jar -DgeneratePom=true
Where /usr/share/java/mecab/MeCab.jar
should point to the generated jar on your system.
You will also need to manually download cmecab-java and install it into your local Maven repo:
wget https://github.com/takscape/cmecab-java/releases/download/2.1.0/cmecab-java-2.1.0.tar.gz
tar xzf cmecab-java-2.1.0.tar.gz
mvn install:install-file -DgroupId=net.moraleboost.cmecab-java -DartifactId=cmecab-java -Dpackaging=jar -Dversion=2.1.0 -Dfile=cmecab-java-2.1.0/cmecab-java-2.1.0.jar -DgeneratePom=true
MeCab depends on CRF++, so first install that.
wget http://crfpp.googlecode.com/files/CRF%2B%2B-0.58.tar.gz
tar xzf CRF++-0.58.tar.gz
cd CRF++-0.58 && ./configure && make -j4 && make install && cd ..
Next, install MeCab.
wget http://mecab.googlecode.com/files/mecab-0.996.tar.gz
tar xzf mecab-0.996.tar.gz
cd mecab-0.996 && ./configure --with-charset=utf8 --enable-utf8-only && make -j4 && make install && cd ..
And at least one dictionary:
IPAdic:
wget http://mecab.googlecode.com/files/mecab-ipadic-2.7.0-20070801.tar.gz
tar xzf mecab-ipadic-2.7.0-20070801.tar.gz
cd mecab-ipadic-2.7.0-20070801 && ./configure --with-charset=utf8 && make -j4 && make install && cd ..
UniDic:
curl -O http://unidic.ninjal.ac.jp/dictionaries/UniDic-gendai/stable/zip/unidic-cwj-2.2.0.zip
unzip -x unidic-cwj-2.2.0.zip
cd unidic-cwj-2.2.0 && install -d $(mecab-config --dicdir)/unidic-cwj && install -m 644 dicrc *.bin *.dic $(mecab-config --dicdir)/unidic-cwj && cd ..
Include in :dependencies in your project.clj
:
[clj-mecab "0.4.12"]
Interactive use:
$ boot repl
(require '[clj-mecab.parse :as mecab])
(mecab/parse-sentence "こんにちは、世界!")
[{:orth "こんにちは", :f-type "*", :i-type "*", ...} {:orth "、", :f-type "*", :i-type "*", ...} {:orth "世界", :f-type "*", :i-type "*", ...} ...]
Several features are planned for future versions:
Copyright © 2013-2019 Bor Hodošček
Distributed under the Eclipse Public License, the same as Clojure, as well as the 3-clause BSD license.
Can you improve this documentation?Edit on GitHub
cljdoc is a website building & hosting documentation for Clojure/Script libraries
× close