Liking cljdoc? Tell your friends :D

zensols.nlparse.tok-re

This namespace extends the NER system to easily add any regular expression using the Stanford TokensRegex API.

This takes a sequence of regular expressions and entity metadata as input and produces a file format the TokensRegex API consumes to tag entities.

This is an example of the output.

This namespace extends the NER system to easily add any regular
expression using the [Stanford
TokensRegex](http://nlp.stanford.edu/software/tokensregex.html) API.

This takes a sequence of regular expressions and entity metadata as input and
produces a file format the TokensRegex API consumes to tag entities.

[This](https://github.com/plandes/clj-nlp-parse/blob/v0.0.11/test-resources/token-regex.txt)
is an example of the output.
raw docstring

itemclj

(item content label & opts)

Create an item used to create a pattern/line in the Stanford CoreNLP regular expression definition file with a regex created from content and NER label.

The opts parameter are keys with:

  • :lem-min-len minimum item utterance length to turn on lemmatization for the last token (default -1), for example:
    • 2: if the string is or longer than 2 chars lemmatize the last token
    • 0: always lemmatize
    • -1: never lemmatize
  • :case-min-tok must have at least N tokens to turn on case sensitivity (default to -1), for example:
    • 2: if there are 1 or 2 tokens make it case sensitive
    • 1: if there is only one token then make it case sensitive
    • 0: always case sensitive
    • -1: always case insensitive
  • :conj-regexp? add and|& regex to match both symbols, defaults to true
  • :first-det-chop? chop off 'the' at the beginning of the item utterance, defaults to true
  • :is-regexp? if true write the regular expression verbatim instead of generating one from the utterance like form
Create an item used to create a pattern/line in the Stanford CoreNLP regular
  expression definition file with a regex created from **content** and NER
  **label**.

  The **opts** parameter are keys with:

* **:lem-min-len** minimum item utterance length to turn on lemmatization
    for the last token (default -1), for example:
     * 2: if the string is or longer than 2 chars lemmatize the last token
     * 0: always lemmatize
     * -1: never lemmatize
* **:case-min-tok** must have at least N tokens to turn on case sensitivity
    (default to `-1`), for example:
     * 2: if there are 1 or 2 tokens make it case sensitive
     * 1: if there is only one token then make it case sensitive
     * 0: always case sensitive
     * -1: always case insensitive
* **:conj-regexp?** add and|& regex to match both symbols, defaults to `true`
* **:first-det-chop?** chop off 'the' at the beginning of the item utterance,
    defaults to `true`
* **:is-regexp?** if `true` write the regular expression verbatim instead of
    generating one from the utterance like form
sourceraw docstring

parse-featuresclj

(parse-features feature-string)
source

write-regex-filesclj

(write-regex-files regex-output-file features-output-file items)

Write all items to the Stanford token regular expression files regex-output-file with all possible features in features-output-file.

Write all **items** to the Stanford token regular expression files
**regex-output-file** with all possible features in
**features-output-file**.
sourceraw docstring

cljdoc is a website building & hosting documentation for Clojure/Script libraries

× close