Clojure library for retrieving wiktionary pages from wiktionary dumps.
Requires (for now) a modified
version
of Clojure core's data.xml
library that uses the
Woodstox XML parsing library
rather than the built-in JVM-provided one, in order to handle wikimedia's large text
sections.
$ wget https://dumps.wikimedia.org/nlwiktionary/20200701/nlwiktionary-20200701-pages-articles.xml.bz2
$ bunzip2 nlwiktionary-20200701-pages-articles.xml.bz2
git clone git@github.com:ekoontz/data.xml.git
cd data.xml
git checkout upgraded-dependencies-with-woodstox
lein install
The following (lookup)
s correspond to the following wiktionary pages:
$ lein repl
OpenJDK 64-Bit Server VM warning: Options -Xverify:none and -noverify were deprecated in JDK 13 and will likely be removed in a future release.
nREPL server started on port 61773 on host 127.0.0.1 - nrepl://127.0.0.1:61773
REPL-y 0.4.4, nREPL 0.6.0
Clojure 1.10.0
OpenJDK 64-Bit Server VM 14.0.1+7
Docs: (doc function-name-here)
(find-doc "part-of-name-here")
Source: (source function-name-here)
Javadoc: (javadoc java-object-or-class-here)
Exit: Control+D or (exit) or (quit)
Results: Stored in vars *1, *2, *3, an exception in *e
user=> (load "core")
nil
user=> (in-ns 'wikiparse)
#object[clojure.lang.Namespace 0x694b8f32 "wikiparse"]
wikiparse=> (subs (lookup "hond") 0 30)
"[[Bestand:Rottweiler3.jpg|thum"
wikiparse=> (subs (lookup "kat") 0 30)
"{{=universeel=}}\n{{-etym-}}\n* "
wikiparse=> (subs (lookup "jongen") 0 30)
"[[Bestand:Albert Anker - Schul"
wikiparse=> (subs (lookup "meisje") 0 30)
"[[Bestand:Leon Fortunski Schle"
wikiparse=>
Can you improve this documentation?Edit on GitHub
cljdoc is a website building & hosting documentation for Clojure/Script libraries
× close