Liking cljdoc? Tell your friends :D

clj-extjwnl

clj-extjwnl provides an API for querying WordNet using data patterns inspired by Datomic pull.

It is a Clojure wrapper for a subset of the Extended Java WordNet Library (extJWNL) that provides easy access to the underlying library.

If you'd like to use Java interop with extJWNL without using a wrapper library see the wiki.

Installation

deps.edn dependency:

{net.zakak/clj-extjwnl {:mvn/version "0.1.2-SNAPSHOT"}}

Leiningen dependency:

[net.zakak/clj-extjwnl "0.1.2-SNAPSHOT"]

Usage

The primary functions are:

  • default-dictionary - creates an instance of a WordNet dictionary
  • lookup - retrieves word data from the dictionary using a data pattern

Lookup accepts an edn data pattern describing the data to be retrieved.

An example:

(ns hello-extjwnl.core
  (:require [net.zakak.clj-extjwnl :as extjwnl]))

;; Load the default dictionary.
(def dict (extjwnl/default-dictionary))

;; Describe what we want to know about the word.
(def part-of-speech-pattern '[{:index-word/pos [:pos/label]}])

;; Lookup data about 'dog' from the dictionary.
(extjwnl/lookup dict part-of-speech-pattern "dog")
;; => [#:index-word{:pos #:pos{:label "noun"}} #:index-word{:pos #:pos{:label "verb"}}]

;; Add glossary data to a pattern.
(def glossary-pattern '[{:index-word/pos [:pos/label]}
                        {:word/senses [:synset/gloss]}])

;; Lookup using the new pattern. 
(extjwnl/lookup dict glossary-pattern "dog")
;; => [#:index-word{:pos #:pos{:label "verb"}, :senses [#:synset{:gloss "go after with the intent to catch; ,,,"}]} ,,,]

Data Patterns

IndexWord

An IndexWord represents a line of the pos.index file.

  • :index-word/lemma
  • {:index-word/pos [:pos/label]}
  • {:index-word/senses [ Synset pattern ]}

There can be many senses for an IndexWord. The following returns all of them:

If you'd like to return n senses:

Word

A Word represents the lexical information related to a specific sense of an IndexWord.

  • :word/lemma
  • {:word/pos [:pos/label]}

Synset

A Synset, or synonym set, represents a line of a WordNet pos.data file.

  • :synset/gloss
  • {:synset/pos [:pos/label]}
  • {:synset/pointers [ Pointer pattern ]}
  • {:synset/words [:word/lemma {:word/pos [:pos/label]}]}

You can limit how many pointers and words are returned:

Pointer

A Pointer encodes a lexical or semantic relationship between WordNet entities.

  • {:pointer/type [:pointer-type/label]}
  • {:pointer/synset [ Synset pattern ]}

Identity

Use :identity to return the underlying Java object. Useful to interop directly for features not covered by data patterns.

(let [pos (-> (lookup dict
                      '[{:index-word/pos [:pos/label :identity]}]
                      "dog")
              first
              :index-word/pos)]
  {:label (:pos/label pos)
   :id   (.getId (:identity pos))})
;; => {:label "noun", :id 1}

License

Copyright © 2020 Zak Kriner

Distributed under the Eclipse Public License version 2.0.

Can you improve this documentation?Edit on GitHub

cljdoc is a website building & hosting documentation for Clojure/Script libraries

× close