A Clojure library for Japanese morphological analyzer MeCab.
$ echo すもももももももものうち | mecab
すもも 名詞,一般,*,*,*,*,すもも,スモモ,スモモ
も 助詞,係助詞,*,*,*,*,も,モ,モ
もも 名詞,一般,*,*,*,*,もも,モモ,モモ
も 助詞,係助詞,*,*,*,*,も,モ,モ
もも 名詞,一般,*,*,*,*,もも,モモ,モモ
の 助詞,連体化,*,*,*,*,の,ノ,ノ
うち 名詞,非自立,副詞可能,*,*,*,うち,ウチ,ウチ
EOS
project.clj
[jah524/clojure-mecab "0.3.0"]
(use 'clojure-mecab)
(parse "すもももももももものうち")
;=> [["すもも" "名詞" "一般" "*" "*" "*" "*" "すもも" "スモモ" "スモモ"]
; ["も" "助詞" "係助詞" "*" "*" "*" "*" "も" "モ" "モ"]
; ["もも" "名詞" "一般" "*" "*" "*" "*" "もも" "モモ" "モモ"]
; ["も" "助詞" "係助詞" "*" "*" "*" "*" "も" "モ" "モ"]
; ["もも" "名詞" "一般" "*" "*" "*" "*" "もも" "モモ" "モモ"]
; ["の" "助詞" "連体化" "*" "*" "*" "*" "の" "ノ" "ノ"]
; ["うち" "名詞" "非自立" "副詞可能" "*" "*" "*" "うち" "ウチ" "ウチ"]]
(extract-words "すもももももももものうち")
;=> ["すもも" "も" "もも" "も" "もも" "の" "うち"]
(extract-words "すもももももももものうち" ["名詞"] [])
;=> ["すもも" "もも" "もも" "うち"]
(extract-words "すもももももももものうち" [] ["名詞"])
;=> ["も" "も" "の"]
(extract-words "すもももももももものうち" ["名詞"] ["非自立"])
;=> ["すもも" "もも" "もも"]
This library uses clojure.java.shell/sh
to access mecab so that you do not need Java JNI bindings.
If you want to deploy your application to Saas such as Heroku, you better use kuromoji (Java implementation) instead.
But mecab is much faster than kuromoji, so you should use mecab when you process massive data.
Copyright © 2018 Jah524
Distributed under the Eclipse Public License either version 1.0 or (at your option) any later version.
Can you improve this documentation?Edit on GitHub
cljdoc is a website building & hosting documentation for Clojure/Script libraries
× close