Liking cljdoc? Tell your friends :D

[Clojars Project](https://clojars.org

Introduction

Clojure library for retrieving wiktionary pages from wiktionary dumps. Requires (for now) a modified version of Clojure core's data.xml library that uses the Woodstox XML parsing library rather than the built-in JVM-provided one, in order to handle wikimedia's large text sections.

Setup

Get wiktionary dump XML file

$ wget https://dumps.wikimedia.org/nlwiktionary/20200701/nlwiktionary-20200701-pages-articles.xml.bz2
$ bunzip2 nlwiktionary-20200701-pages-articles.xml.bz2

Install data.xml with woodstox support

git clone git@github.com:ekoontz/data.xml.git
cd data.xml
git checkout upgraded-dependencies-with-woodstox
lein install

Demo

The following (lookup)s correspond to the following wiktionary pages:

$ lein repl
OpenJDK 64-Bit Server VM warning: Options -Xverify:none and -noverify were deprecated in JDK 13 and will likely be removed in a future release.
nREPL server started on port 61773 on host 127.0.0.1 - nrepl://127.0.0.1:61773
REPL-y 0.4.4, nREPL 0.6.0
Clojure 1.10.0
OpenJDK 64-Bit Server VM 14.0.1+7
    Docs: (doc function-name-here)
          (find-doc "part-of-name-here")
  Source: (source function-name-here)
 Javadoc: (javadoc java-object-or-class-here)
    Exit: Control+D or (exit) or (quit)
 Results: Stored in vars *1, *2, *3, an exception in *e

user=> (load "core")
nil
user=> (in-ns 'wikiparse)
#object[clojure.lang.Namespace 0x694b8f32 "wikiparse"]
wikiparse=> (subs (lookup "hond") 0 30)
"[[Bestand:Rottweiler3.jpg|thum"
wikiparse=> (subs (lookup "kat") 0 30)
"{{=universeel=}}\n{{-etym-}}\n* "
wikiparse=> (subs (lookup "jongen") 0 30)
"[[Bestand:Albert Anker - Schul"
wikiparse=> (subs (lookup "meisje") 0 30)
"[[Bestand:Leon Fortunski Schle"
wikiparse=>

Can you improve this documentation?Edit on GitHub

cljdoc is a website building & hosting documentation for Clojure/Script libraries

× close