zensols.nlparse.tok-re

Liking cljdoc? Tell your friends :D

Clojure only.

item
parse-features
write-regex-files

This namespace extends the NER system to easily add any regular expression using the Stanford TokensRegex API.

This takes a sequence of regular expressions and entity metadata as input and produces a file format the TokensRegex API consumes to tag entities.

This is an example of the output.

This namespace extends the NER system to easily add any regular
expression using the [Stanford
TokensRegex](http://nlp.stanford.edu/software/tokensregex.html) API.

This takes a sequence of regular expressions and entity metadata as input and
produces a file format the TokensRegex API consumes to tag entities.

[This](https://github.com/plandes/clj-nlp-parse/blob/v0.0.11/test-resources/token-regex.txt)
is an example of the output.

raw docstring

item^clj

(item content label & opts)

Create an item used to create a pattern/line in the Stanford CoreNLP regular expression definition file with a regex created from content and NER label.

The opts parameter are keys with:

:lem-min-len minimum item utterance length to turn on lemmatization for the last token (default -1), for example:
- 2: if the string is or longer than 2 chars lemmatize the last token
- 0: always lemmatize
- -1: never lemmatize
:case-min-tok must have at least N tokens to turn on case sensitivity (default to -1), for example:
- 2: if there are 1 or 2 tokens make it case sensitive
- 1: if there is only one token then make it case sensitive
- 0: always case sensitive
- -1: always case insensitive
:conj-regexp? add and|& regex to match both symbols, defaults to true
:first-det-chop? chop off 'the' at the beginning of the item utterance, defaults to true
:is-regexp? if true write the regular expression verbatim instead of generating one from the utterance like form

Create an item used to create a pattern/line in the Stanford CoreNLP regular
  expression definition file with a regex created from **content** and NER
  **label**.

  The **opts** parameter are keys with:

* **:lem-min-len** minimum item utterance length to turn on lemmatization
    for the last token (default -1), for example:
     * 2: if the string is or longer than 2 chars lemmatize the last token
     * 0: always lemmatize
     * -1: never lemmatize
* **:case-min-tok** must have at least N tokens to turn on case sensitivity
    (default to `-1`), for example:
     * 2: if there are 1 or 2 tokens make it case sensitive
     * 1: if there is only one token then make it case sensitive
     * 0: always case sensitive
     * -1: always case insensitive
* **:conj-regexp?** add and|& regex to match both symbols, defaults to `true`
* **:first-det-chop?** chop off 'the' at the beginning of the item utterance,
    defaults to `true`
* **:is-regexp?** if `true` write the regular expression verbatim instead of
    generating one from the utterance like form

source raw docstring

parse-features^clj

(parse-features feature-string)

source

write-regex-files^clj

(write-regex-files regex-output-file features-output-file items)

Write all items to the Stanford token regular expression files regex-output-file with all possible features in features-output-file.

Write all **items** to the Stanford token regular expression files
**regex-output-file** with all possible features in
**features-output-file**.

source raw docstring

cljdoc is a website building & hosting documentation for Clojure/Script libraries

Keyboard shortcuts Report a problem cljdoc on GitHub

× close