Liking cljdoc? Tell your friends :D

zensols.nlparse.config

Configure the Stanford CoreNLP parser.

This provides a plugin architecture for natural language processing tasks in a pipeline. A parser takes either an human language utterance or a previously annotated data parsed from an utterance.

Parser Libraries

Each parser provides a set of components that make up the pipeline. Each component (i.e. tokenize) is a function that returns a map including a map containing keys:

  • component a key that's the name of the component to create.
  • parser a key that is the name of the parser it belongs to.

For example, the Stanford CoreNLP word tokenizer has the following return map:

  • :component :tokenize
  • :lang lang-code (e.g. en)
  • :parser :stanford

The map also has additional key/value pairs that represent remaining configuration given to the parser library used to create it's pipeline components. All parse library names (keys) are given in all-parsers.

Use register-library to add your library with the key name of your parser.

Usage

You can either create your own custom parser configuration with create-parse-config and then create it's respective context with create-context. If you do this, then each parse call needs to be in a with-context lexical context. If you don't, a default context is created and used for each parse invocation.

Once/if configured, use zensols.nlparse.parse/parse to invoke the parsing pipeline.

Configure the Stanford CoreNLP parser.

This provides a plugin architecture for natural language processing tasks in a
pipeline.  A parser takes either an human language utterance or a previously
annotated data parsed from an utterance.


### Parser Libraries

Each parser provides a set of *components* that make up the pipeline.  Each
component (i.e. [[tokenize]]) is a function that returns a map including a map
containing keys:

* **component** a key that's the name of the component to create.
* **parser** a key that is the name of the parser it belongs to.

For example, the Stanford CoreNLP word tokenizer has the following return map:

* **:component** :tokenize
* **:lang**  *lang-code* (e.g. `en`)
* **:parser** :stanford

The map also has additional key/value pairs that represent remaining
configuration given to the parser library used to create it's pipeline
components.  All parse library names (keys) are given in [[all-parsers]].

Use [[register-library]] to add your library with the key name of your parser.


### Usage

You can either create your own custom parser configuration
with [[create-parse-config]] and then create it's respective context
with [[create-context]].  If you do this, then each parse call needs to be in
a [[with-context]] lexical context.  If you don't, a default context is created
and used for each parse invocation.

Once/if configured, use [[zensols.nlparse.parse/parse]] to invoke the parsing
pipeline.
raw docstring

all-parsersclj

All parsers available in this package (jar).

All parsers available in this package (jar).
sourceraw docstring

component-documentationclj

(component-documentation)

Return maps doc documentation with keys :name and :doc.

Return maps doc documentation with keys `:name` and `:doc`.
sourceraw docstring

component-from-configclj

(component-from-config config name)

Return a component by name from parse config.

Return a component by **name** from parse **config**.
sourceraw docstring

component-from-contextclj

(component-from-context context name)

Return a component by name from parse context.

Return a component by **name** from parse **context**.
sourceraw docstring

components-as-stringclj

(components-as-string)

Return all available components as a string

Return all available components as a string
sourceraw docstring

contextclj

(context lib-name)

Return context created with create-context.

See the usage section section.

Return context created with [[create-context]].

See the [usage section](#usage) section.
sourceraw docstring

coreferenceclj

(coreference)

Create annotator to coreference tree structure.

Create annotator to coreference tree structure.
sourceraw docstring

create-contextclj

(create-context)
(create-context parse-config & {:keys [timeout-millis]})

Return a context used during parsing. This calls all registered (register-library) parse libraries create functions and returns an object to be used with the parse function zensols.nlparse.parse/parse

The parameter parse-config is either a parse configuration created with create-parse-config or a string. If a string is used for the parse-config parameter create pipeline by component names separated by commas. See zensols.nlparse.config-parse for more inforamation on this DSL.

Using the output of [[components-of-string]] would create all components. However, the easier way to utilize all components is to to call this function with no parameters.

See the usage section section.

Keys

  • :timeout-millis number of milliseconds to allow the parser to complete before java.util.concurrent.TimeoutException is thrown or nil for no timeout; no timeout is the default
Return a context used during parsing.  This calls all
  registered ([[register-library]]) parse libraries create functions and
  returns an object to be used with the parse
  function [[zensols.nlparse.parse/parse]]

  The parameter **parse-config** is either a parse configuration created
  with [[create-parse-config]] or a string.  If a string is used for the
  **parse-config** parameter create pipeline by component names separated by
  commas.  See [[zensols.nlparse.config-parse]] for more inforamation on this
  DSL.

  Using the output of [[components-of-string]] would create all components.
  However, the easier way to utilize all components is to to call this function
  with no parameters.

  See the [usage section](#usage) section.

Keys
---
* **:timeout-millis** number of milliseconds to allow the parser to complete
  before `java.util.concurrent.TimeoutException` is thrown or `nil` for no
  timeout; no timeout is the default
sourceraw docstring

create-parse-configclj

(create-parse-config &
                     {:keys [parsers only-tokenize? pipeline]
                      :or {parsers all-parsers}})

Create a parse configuration given as input to create-context.

If no keys are given all components are configured (see components-as-string).

Keys

  • :only-tokenize? create a parse configuration that only utilizes the tokenization of the Stanford CoreNLP library.
  • :pipeline a list of components created with one of the many component create functions (i.e. tokenize) or from a roll-your-own add-on library; this redners the :parsers key unsued
  • :parsers a set of parser library names (keys) used to indicate which components to return (i.e. :stanford); see all-parsers
Create a parse configuration given as input to [[create-context]].

  If no keys are given all components are
  configured (see [[components-as-string]]).

Keys
----
* **:only-tokenize?** create a parse configuration that only utilizes the
  tokenization of the Stanford CoreNLP library.
* **:pipeline** a list of components created with one of the many component
  create functions (i.e. [[tokenize]]) or from a roll-your-own add-on library;
  this redners the `:parsers` key unsued
* **:parsers** a set of parser library names (keys) used to indicate which
  components to return (i.e. `:stanford`); see [[all-parsers]]
sourceraw docstring

dependency-parse-treeclj

(dependency-parse-tree)

Create an annotator to create a dependency parse tree.

See the dependencies manual for definitions.

Create an annotator to create a dependency parse tree.

See the [dependencies
manual](https://nlp.stanford.edu/software/dependencies_manual.pdf) for
definitions.
sourceraw docstring

morphologyclj

(morphology)

Create a morphology annotator, which adds the lemmatization of a word. This adds the :lemma keyword to each token..

Create a morphology annotator, which adds the lemmatization of a word.  This
adds the `:lemma` keyword to each token..
sourceraw docstring

named-entity-recognizerclj

(named-entity-recognizer)
(named-entity-recognizer paths)
(named-entity-recognizer paths lang)

Create annotator to do named entity recognition. All models in the paths sequence are loaded. The lang is the language parameter, which can be either ENGLISH or CHINESE and defaults to ENGLISH. See the NERClassifierCominer Javadoc for more information.

By default, the English CoNLL 4 class is used. See the Stanford NER for more information.

Create annotator to do named entity recognition.  All models in the
**paths** sequence are loaded.  The **lang** is the language parameter, which
can be either `ENGLISH` or `CHINESE` and defaults to `ENGLISH`.  See
the [NERClassifierCominer
Javadoc](https://nlp.stanford.edu/nlp/javadoc/javanlp/edu/stanford/nlp/ie/NERClassifierCombiner.html)
for more information.

By default, the [English CoNLL 4
class](http://www.cnts.ua.ac.be/conll2003/ner/) is used.  See the [Stanford
NER](http://nlp.stanford.edu/software/CRF-NER.shtml) for more information.
sourceraw docstring

natural-logicclj

(natural-logic)

Create a natural logic annotator.

See the Stanford CoreNLP documentation for more information.

Create a natural logic annotator.

See the [Stanford CoreNLP
documentation](https://stanfordnlp.github.io/CoreNLP/natlog.html) for more
information.
sourceraw docstring

parse-functionsclj

(parse-functions)

Return all registered parse function in the order they are to be called.

See the usage section section.

Return all registered parse function in the order they are to be called.

See the [usage section](#usage) section.
sourceraw docstring

parse-timeoutclj

(parse-timeout)

Return the number of milliseconds to timeout the parse or nil if none.

See create-context.

Return the number of milliseconds to timeout the parse or `nil` if none.

See [[create-context]].
sourceraw docstring

parse-treeclj

(parse-tree)
(parse-tree {:keys [include-score? maxtime use-shift-reduce? language]
             :as conf})

Create annotator to create head and parse trees.

Keys

  • :include-score? true if computed per node accuracy scores are included in parse tree
  • :maxtime the maximum time in milliseconds to wait for the tree parser to complete (per sentence)
  • **:use-shift-reduce? if true use the faster and smaller shift reduce model, but the model must be present and model load time is slower (see the shift reduce doc)
  • :language the parse language model (currently only used for shift reduce), defaults to english
Create annotator to create head and parse trees.

Keys
----
* **:include-score?** `true` if computed per node accuracy scores are included
  in parse tree
* **:maxtime** the maximum time in milliseconds to wait for the tree parser to
  complete (per sentence)
* **:use-shift-reduce? if `true` use the faster and smaller shift reduce model,
  but the model must be present and model load time is slower (see the [shift
  reduce doc](https://nlp.stanford.edu/software/srparser.shtml))
* **:language** the parse language model (currently only used for shift
  reduce), defaults to `english`
sourceraw docstring

part-of-speechclj

(part-of-speech)
(part-of-speech pos-model-resource)

Create annotator to do part of speech tagging. You can set the model with a resource identified with the pos-model-resource string, which defaults to the English WSJ trained corpus.

Create annotator to do part of speech tagging.  You can set the model with a
resource identified with the **pos-model-resource** string, which defaults to
the [English WSJ trained
corpus](http://www-nlp.stanford.edu/software/pos-tagger-faq.shtml).
sourceraw docstring

(print-component-documentation)

Print the formatted component documentation see component-documentation.

Print the formatted component documentation see
[[component-documentation]].
sourceraw docstring

register-libraryclj

(register-library lib-name lib-cfg & {:keys [force?]})

Register plugin library lib-name with lib-cfg a map containing:

  • :create-fn a function taht takes a parse configuration (see create-parse-config) to create a context later returned with context

  • :reset-fn a function that takes the parse context to null out any atoms or cached data structures; this is called by [[reset]

  • :parse-fn a function that takes a signle human language utterance string or output of another parse library's output

  • :component-fns all component creating functions from this library

    Implementation note: this forces re-creation of the default context (see the usage section) to allow create-context invoked on calling library at next invocation to context for newly registered libraries.

Register plugin library **lib-name** with **lib-cfg** a map containing:

* **:create-fn** a function taht takes a parse
  configuration (see [[create-parse-config]]) to create a context later
  returned with [[context]]
* **:reset-fn** a function that takes the parse context to `null` out any atoms
  or cached data structures; this is called by [[reset]
* **:parse-fn** a function that takes a signle human language utterance string
  or output of another parse library's output
* **:component-fns** all component creating functions from this library

  *Implementation note*: this forces re-creation of the default context (see
  the [usage section](#usage)) to allow create-context invoked on calling
  library at next invocation to `context` for newly registered libraries.
sourceraw docstring

resetclj

(reset & {:keys [hard?] :or {hard? true}})

Reset the cached data structures and configuration in the default (or currently bound with-context) context. This is also called by [[zensosls.actioncli.dynamic/purge]].

Reset the cached data structures and configuration in the default (or
currently bound [[with-context]]) context.  This is also called
by [[zensosls.actioncli.dynamic/purge]].
sourceraw docstring

semantic-role-labelerclj

(semantic-role-labeler)
(semantic-role-labeler lang-code)

Create a semantic role labeler annotator. You can configure the language with the lang-code, which is a two string language code and defaults to English.

Keys

  • :lang language used to create the SRL pipeline
  • :model-type model type used to create the SRL pipeilne
  • :first-label-token-threshold token minimum position that contains a label to help decide the best SRL labeled sentence to choose.
Create a semantic role labeler annotator.  You can configure the language
  with the **lang-code**, which is a two string language code and defaults to
  English.

Keys
----
* **:lang** language used to create the SRL pipeline
* **:model-type** model type used to create the SRL pipeilne
* **:first-label-token-threshold** token minimum position that contains a
label to help decide the best SRL labeled sentence to choose.
sourceraw docstring

sentenceclj

(sentence)

Create annotator to group tokens into sentences per configured language.

Create annotator to group tokens into sentences per configured language.
sourceraw docstring

sentimentclj

(sentiment)
(sentiment aggregate?)

Create annotator for sentiment analysis. The aggregate? parameter tells the parser to create a top (root) sentiment level score for the entire parse utterance.

Create annotator for sentiment analysis.  The **aggregate?** parameter tells
the parser to create a top (root) sentiment level score for the entire parse
utterance.
sourceraw docstring

stopwordclj

(stopword)

Create annotator to annotate stop words (boolean).

Create annotator to annotate stop words (boolean).
sourceraw docstring

token-regexclj

(token-regex)
(token-regex paths)

Create annotator to token regular expression. You can configure an array of strings identifying either resources or files using the paths parameter, which defaults to token-regex.txt, which is included in the resources of this package as an example and used with the test cases.

The :tok-re-resources is a sequence of string paths to create a single annotator or a sequence of sequence string paths. If more than one annotator is created the output of an annotator can be used in the patterns of the next.

Create annotator to token regular expression.  You can configure an array of
strings identifying either resources or files using the **paths** parameter,
which defaults to `token-regex.txt`, which is included in the resources of
this package as an example and used with the test cases.

The `:tok-re-resources` is a sequence of string paths to create a single
annotator or a sequence of sequence string paths.  If more than one annotator
is created the output of an annotator can be used in the patterns of the
next.
sourceraw docstring

tokenizeclj

(tokenize)
(tokenize lang-code)

Create annotator to split words per configured language. The tokenization langauge is set with the lang-code parameter, which is a two string language code and defaults to en (English).

Create annotator to split words per configured language.  The tokenization
langauge is set with the **lang-code** parameter, which is a two string
language code and defaults to `en` (English).
sourceraw docstring

with-contextcljmacro

(with-context context & forms)

Use the parser with a context created with create-context. This context is optionally configured. Without this macro the default context is used as described in the usage section section.

Use the parser with a context created with [[create-context]].
This context is optionally configured.  Without this macro the default
context is used as described in the [usage section](#usage) section.
sourceraw docstring

cljdoc is a website building & hosting documentation for Clojure/Script libraries

× close