Configure the Stanford CoreNLP parser.
This provides a plugin architecture for natural language processing tasks in a pipeline. A parser takes either an human language utterance or a previously annotated data parsed from an utterance.
Each parser provides a set of components that make up the pipeline. Each
component (i.e. tokenize
) is a function that returns a map including a map
containing keys:
For example, the Stanford CoreNLP word tokenizer has the following return map:
en
)The map also has additional key/value pairs that represent remaining
configuration given to the parser library used to create it's pipeline
components. All parse library names (keys) are given in all-parsers
.
Use register-library
to add your library with the key name of your parser.
You can either create your own custom parser configuration
with create-parse-config
and then create it's respective context
with create-context
. If you do this, then each parse call needs to be in
a with-context
lexical context. If you don't, a default context is created
and used for each parse invocation.
Once/if configured, use zensols.nlparse.parse/parse
to invoke the parsing
pipeline.
Configure the Stanford CoreNLP parser. This provides a plugin architecture for natural language processing tasks in a pipeline. A parser takes either an human language utterance or a previously annotated data parsed from an utterance. ### Parser Libraries Each parser provides a set of *components* that make up the pipeline. Each component (i.e. [[tokenize]]) is a function that returns a map including a map containing keys: * **component** a key that's the name of the component to create. * **parser** a key that is the name of the parser it belongs to. For example, the Stanford CoreNLP word tokenizer has the following return map: * **:component** :tokenize * **:lang** *lang-code* (e.g. `en`) * **:parser** :stanford The map also has additional key/value pairs that represent remaining configuration given to the parser library used to create it's pipeline components. All parse library names (keys) are given in [[all-parsers]]. Use [[register-library]] to add your library with the key name of your parser. ### Usage You can either create your own custom parser configuration with [[create-parse-config]] and then create it's respective context with [[create-context]]. If you do this, then each parse call needs to be in a [[with-context]] lexical context. If you don't, a default context is created and used for each parse invocation. Once/if configured, use [[zensols.nlparse.parse/parse]] to invoke the parsing pipeline.
All parsers available in this package (jar).
All parsers available in this package (jar).
(component-documentation)
Return maps doc documentation with keys :name
and :doc
.
Return maps doc documentation with keys `:name` and `:doc`.
(component-from-config config name)
Return a component by name from parse config.
Return a component by **name** from parse **config**.
(component-from-context context name)
Return a component by name from parse context.
Return a component by **name** from parse **context**.
(components-as-string)
Return all available components as a string
Return all available components as a string
(context lib-name)
Return context created with create-context
.
See the usage section section.
Return context created with [[create-context]]. See the [usage section](#usage) section.
(coreference)
Create annotator to coreference tree structure.
Create annotator to coreference tree structure.
(create-context)
(create-context parse-config & {:keys [timeout-millis]})
Return a context used during parsing. This calls all
registered (register-library
) parse libraries create functions and
returns an object to be used with the parse
function zensols.nlparse.parse/parse
The parameter parse-config is either a parse configuration created
with create-parse-config
or a string. If a string is used for the
parse-config parameter create pipeline by component names separated by
commas. See zensols.nlparse.config-parse
for more inforamation on this
DSL.
Using the output of [[components-of-string]] would create all components. However, the easier way to utilize all components is to to call this function with no parameters.
See the usage section section.
java.util.concurrent.TimeoutException
is thrown or nil
for no
timeout; no timeout is the defaultReturn a context used during parsing. This calls all registered ([[register-library]]) parse libraries create functions and returns an object to be used with the parse function [[zensols.nlparse.parse/parse]] The parameter **parse-config** is either a parse configuration created with [[create-parse-config]] or a string. If a string is used for the **parse-config** parameter create pipeline by component names separated by commas. See [[zensols.nlparse.config-parse]] for more inforamation on this DSL. Using the output of [[components-of-string]] would create all components. However, the easier way to utilize all components is to to call this function with no parameters. See the [usage section](#usage) section. Keys --- * **:timeout-millis** number of milliseconds to allow the parser to complete before `java.util.concurrent.TimeoutException` is thrown or `nil` for no timeout; no timeout is the default
(create-parse-config &
{:keys [parsers only-tokenize? pipeline]
:or {parsers all-parsers}})
Create a parse configuration given as input to create-context
.
If no keys are given all components are
configured (see components-as-string
).
tokenize
) or from a roll-your-own add-on library;
this redners the :parsers
key unsued:stanford
); see all-parsers
Create a parse configuration given as input to [[create-context]]. If no keys are given all components are configured (see [[components-as-string]]). Keys ---- * **:only-tokenize?** create a parse configuration that only utilizes the tokenization of the Stanford CoreNLP library. * **:pipeline** a list of components created with one of the many component create functions (i.e. [[tokenize]]) or from a roll-your-own add-on library; this redners the `:parsers` key unsued * **:parsers** a set of parser library names (keys) used to indicate which components to return (i.e. `:stanford`); see [[all-parsers]]
(dependency-parse-tree)
Create an annotator to create a dependency parse tree.
See the dependencies manual for definitions.
Create an annotator to create a dependency parse tree. See the [dependencies manual](https://nlp.stanford.edu/software/dependencies_manual.pdf) for definitions.
(morphology)
Create a morphology annotator, which adds the lemmatization of a word. This
adds the :lemma
keyword to each token..
Create a morphology annotator, which adds the lemmatization of a word. This adds the `:lemma` keyword to each token..
(named-entity-recognizer)
(named-entity-recognizer paths)
(named-entity-recognizer paths lang)
Create annotator to do named entity recognition. All models in the
paths sequence are loaded. The lang is the language parameter, which
can be either ENGLISH
or CHINESE
and defaults to ENGLISH
. See
the NERClassifierCominer
Javadoc
for more information.
By default, the English CoNLL 4 class is used. See the Stanford NER for more information.
Create annotator to do named entity recognition. All models in the **paths** sequence are loaded. The **lang** is the language parameter, which can be either `ENGLISH` or `CHINESE` and defaults to `ENGLISH`. See the [NERClassifierCominer Javadoc](https://nlp.stanford.edu/nlp/javadoc/javanlp/edu/stanford/nlp/ie/NERClassifierCombiner.html) for more information. By default, the [English CoNLL 4 class](http://www.cnts.ua.ac.be/conll2003/ner/) is used. See the [Stanford NER](http://nlp.stanford.edu/software/CRF-NER.shtml) for more information.
(natural-logic)
Create a natural logic annotator.
See the Stanford CoreNLP documentation for more information.
Create a natural logic annotator. See the [Stanford CoreNLP documentation](https://stanfordnlp.github.io/CoreNLP/natlog.html) for more information.
(parse-functions)
Return all registered parse function in the order they are to be called.
See the usage section section.
Return all registered parse function in the order they are to be called. See the [usage section](#usage) section.
(parse-timeout)
Return the number of milliseconds to timeout the parse or nil
if none.
See create-context
.
Return the number of milliseconds to timeout the parse or `nil` if none. See [[create-context]].
(parse-tree)
(parse-tree {:keys [include-score? maxtime use-shift-reduce? language]
:as conf})
Create annotator to create head and parse trees.
true
if computed per node accuracy scores are included
in parse treetrue
use the faster and smaller shift reduce model,
but the model must be present and model load time is slower (see the shift
reduce doc)english
Create annotator to create head and parse trees. Keys ---- * **:include-score?** `true` if computed per node accuracy scores are included in parse tree * **:maxtime** the maximum time in milliseconds to wait for the tree parser to complete (per sentence) * **:use-shift-reduce? if `true` use the faster and smaller shift reduce model, but the model must be present and model load time is slower (see the [shift reduce doc](https://nlp.stanford.edu/software/srparser.shtml)) * **:language** the parse language model (currently only used for shift reduce), defaults to `english`
(part-of-speech)
(part-of-speech pos-model-resource)
Create annotator to do part of speech tagging. You can set the model with a resource identified with the pos-model-resource string, which defaults to the English WSJ trained corpus.
Create annotator to do part of speech tagging. You can set the model with a resource identified with the **pos-model-resource** string, which defaults to the [English WSJ trained corpus](http://www-nlp.stanford.edu/software/pos-tagger-faq.shtml).
(print-component-documentation)
Print the formatted component documentation see
component-documentation
.
Print the formatted component documentation see [[component-documentation]].
(register-library lib-name lib-cfg & {:keys [force?]})
Register plugin library lib-name with lib-cfg a map containing:
:create-fn a function taht takes a parse
configuration (see create-parse-config
) to create a context later
returned with context
:reset-fn a function that takes the parse context to null
out any atoms
or cached data structures; this is called by [[reset]
:parse-fn a function that takes a signle human language utterance string or output of another parse library's output
:component-fns all component creating functions from this library
Implementation note: this forces re-creation of the default context (see
the usage section) to allow create-context invoked on calling
library at next invocation to context
for newly registered libraries.
Register plugin library **lib-name** with **lib-cfg** a map containing: * **:create-fn** a function taht takes a parse configuration (see [[create-parse-config]]) to create a context later returned with [[context]] * **:reset-fn** a function that takes the parse context to `null` out any atoms or cached data structures; this is called by [[reset] * **:parse-fn** a function that takes a signle human language utterance string or output of another parse library's output * **:component-fns** all component creating functions from this library *Implementation note*: this forces re-creation of the default context (see the [usage section](#usage)) to allow create-context invoked on calling library at next invocation to `context` for newly registered libraries.
(reset & {:keys [hard?] :or {hard? true}})
Reset the cached data structures and configuration in the default (or
currently bound with-context
) context. This is also called
by [[zensosls.actioncli.dynamic/purge]].
Reset the cached data structures and configuration in the default (or currently bound [[with-context]]) context. This is also called by [[zensosls.actioncli.dynamic/purge]].
(semantic-role-labeler)
(semantic-role-labeler lang-code)
Create a semantic role labeler annotator. You can configure the language with the lang-code, which is a two string language code and defaults to English.
Create a semantic role labeler annotator. You can configure the language with the **lang-code**, which is a two string language code and defaults to English. Keys ---- * **:lang** language used to create the SRL pipeline * **:model-type** model type used to create the SRL pipeilne * **:first-label-token-threshold** token minimum position that contains a label to help decide the best SRL labeled sentence to choose.
(sentence)
Create annotator to group tokens into sentences per configured language.
Create annotator to group tokens into sentences per configured language.
(sentiment)
(sentiment aggregate?)
Create annotator for sentiment analysis. The aggregate? parameter tells the parser to create a top (root) sentiment level score for the entire parse utterance.
Create annotator for sentiment analysis. The **aggregate?** parameter tells the parser to create a top (root) sentiment level score for the entire parse utterance.
(stopword)
Create annotator to annotate stop words (boolean).
Create annotator to annotate stop words (boolean).
(token-regex)
(token-regex paths)
Create annotator to token regular expression. You can configure an array of
strings identifying either resources or files using the paths parameter,
which defaults to token-regex.txt
, which is included in the resources of
this package as an example and used with the test cases.
The :tok-re-resources
is a sequence of string paths to create a single
annotator or a sequence of sequence string paths. If more than one annotator
is created the output of an annotator can be used in the patterns of the
next.
Create annotator to token regular expression. You can configure an array of strings identifying either resources or files using the **paths** parameter, which defaults to `token-regex.txt`, which is included in the resources of this package as an example and used with the test cases. The `:tok-re-resources` is a sequence of string paths to create a single annotator or a sequence of sequence string paths. If more than one annotator is created the output of an annotator can be used in the patterns of the next.
(tokenize)
(tokenize lang-code)
Create annotator to split words per configured language. The tokenization
langauge is set with the lang-code parameter, which is a two string
language code and defaults to en
(English).
Create annotator to split words per configured language. The tokenization langauge is set with the **lang-code** parameter, which is a two string language code and defaults to `en` (English).
(with-context context & forms)
Use the parser with a context created with create-context
.
This context is optionally configured. Without this macro the default
context is used as described in the usage section section.
Use the parser with a context created with [[create-context]]. This context is optionally configured. Without this macro the default context is used as described in the [usage section](#usage) section.
cljdoc is a website building & hosting documentation for Clojure/Script libraries
× close