Configure the Stanford CoreNLP parser.
This provides a plugin architecture for natural language processing tasks in a pipeline. A parser takes either an human language utterance or a previously annotated data parsed from an utterance.
Each parser provides a set of components that make up the pipeline. Each
component (i.e. tokenize
) is a function that returns a map including a map
containing keys:
For example, the Stanford CoreNLP word tokenizer has the following return map:
en
)The map also has additional key/value pairs that represent remaining
configuration given to the parser library used to create it's pipeline
components. All parse library names (keys) are given in all-parsers
.
Use register-library
to add your library with the key name of your parser.
You can either create your own custom parser configuration
with create-parse-config
and then create it's respective context
with create-context
. If you do this, then each parse call needs to be in
a with-context
lexical context. If you don't, a default context is created
and used for each parse invocation.
Once/if configured, use zensols.nlparse.parse/parse
to invoke the parsing
pipeline.
Configure the Stanford CoreNLP parser. This provides a plugin architecture for natural language processing tasks in a pipeline. A parser takes either an human language utterance or a previously annotated data parsed from an utterance. ### Parser Libraries Each parser provides a set of *components* that make up the pipeline. Each component (i.e. [[tokenize]]) is a function that returns a map including a map containing keys: * **component** a key that's the name of the component to create. * **parser** a key that is the name of the parser it belongs to. For example, the Stanford CoreNLP word tokenizer has the following return map: * **:component** :tokenize * **:lang** *lang-code* (e.g. `en`) * **:parser** :stanford The map also has additional key/value pairs that represent remaining configuration given to the parser library used to create it's pipeline components. All parse library names (keys) are given in [[all-parsers]]. Use [[register-library]] to add your library with the key name of your parser. ### Usage You can either create your own custom parser configuration with [[create-parse-config]] and then create it's respective context with [[create-context]]. If you do this, then each parse call needs to be in a [[with-context]] lexical context. If you don't, a default context is created and used for each parse invocation. Once/if configured, use [[zensols.nlparse.parse/parse]] to invoke the parsing pipeline.
Parse a pipeline configruation. This namespace supports a simple
DSL for parsing a pipeline configuration (see zensols.nlparse.config
). The
configuration string represents is a component separated by commas as a set
of forms. For example the forms:
zensols.nlparse.config/tokenize("en"),zensols.nlparse.config/sentence,part-of-speech("english.tagger"),zensols.nlparse.config/morphology
creates a pipeline that tokenizes, adds POS and lemmas when called
with parse
. Note the double quotes in the tokenize
and part-of-speech
mnemonics. The parse
function does this by calling in order:
zensols.nlparse.config/tokenize
"en")zensols.nlparse.config/sentence
)zensols.nlparse.config/part-of-speech
"english.tagger")zensols.nlparse.config/morphology
)Soem configuration functions are parameterized by positions or maps. Positional functions are shown in the above example and a map configuration follows:
parse-tree({:use-shift-reduce? true :maxtime 1000})
which creates a shift reduce parser that times out after a second (per sentence).
Note that arguments are option (the parenthetical portion of the form) and so
is the namespace, which defaults to zensols.nlparse.config
. To use a
separate namespace for custom plug and play To use a separate namespace for
custom plug and play
components (see zensols.nlparse.config/register-library
) you can specify
your own namespace with a /
, for example:
example.namespace/myfunc(arg1,arg2)
Parse a pipeline configruation. This namespace supports a simple DSL for parsing a pipeline configuration (see [[zensols.nlparse.config]]). The *configuration string* represents is a component separated by commas as a set of *forms*. For example the forms: ``` zensols.nlparse.config/tokenize("en"),zensols.nlparse.config/sentence,part-of-speech("english.tagger"),zensols.nlparse.config/morphology ``` creates a pipeline that tokenizes, adds POS and lemmas when called with [[parse]]. Note the double quotes in the `tokenize` and `part-of-speech` mnemonics. The [[parse]] function does this by calling in order: * ([[zensols.nlparse.config/tokenize]] "en") * ([[zensols.nlparse.config/sentence]]) * ([[zensols.nlparse.config/part-of-speech]] "english.tagger") * ([[zensols.nlparse.config/morphology]]) Soem configuration functions are parameterized by positions or maps. Positional functions are shown in the above example and a map configuration follows: ``` parse-tree({:use-shift-reduce? true :maxtime 1000}) ``` which creates a shift reduce parser that times out after a second (per sentence). Note that arguments are option (the parenthetical portion of the form) and so is the namespace, which defaults to `zensols.nlparse.config`. To use a separate namespace for custom plug and play To use a separate namespace for custom plug and play components (see [[zensols.nlparse.config/register-library]]) you can specify your own namespace with a `/`, for example: ``` example.namespace/myfunc(arg1,arg2) ```
Feature utility functions. In this library, all references to
panon
stand for parsed annotation, which is returned
from zensols.nlparse.parse/parse
.
Feature utility functions. In this library, all references to `panon` stand for *parsed annotation*, which is returned from [[zensols.nlparse.parse/parse]].
Feature utility functions. See zensols.nlparse.feature.lang
.
Feature utility functions. See [[zensols.nlparse.feature.lang]].
Parse an utterance using the Stanford CoreNLP and the ClearNLP SRL.
This is the main client entry point to the package. A default out of the box parser works that comes with components listed in [[zensols.nlparse.config/all-components]].
If you want to customzie or add your own parser plug in, see
the zensols.nlparse.config
namespace.
Parse an utterance using the Stanford CoreNLP and the ClearNLP SRL. This is the main client entry point to the package. A default out of the box parser works that comes with components listed in [[zensols.nlparse.config/all-components]]. If you want to customzie or add your own parser plug in, see the [[zensols.nlparse.config]] namespace.
Configure environment for the NLP pipeline.
Configure environment for the NLP pipeline.
Wrap ClearNLP SRL.
Currently the propbank trained version is used. The main classification
function is label
.
Wrap ClearNLP SRL. Currently the propbank trained version is used. The main classification function is [[label]].
Wraps the Stanford CoreNLP parser.
Wraps the Stanford CoreNLP parser.
This namesapce provides ways of filtering stop word tokens.
To avoid the double negative in function names, go words are defined to be
the compliment of a vocabulary with a stop word list. Functions
like go-word?
tell whether or not a token is a stop word, which are
defined to be:
This namesapce provides ways of filtering *stop word* tokens. To avoid the double negative in function names, *go words* are defined to be the compliment of a vocabulary with a stop word list. Functions like [[go-word?]] tell whether or not a token is a stop word, which are defined to be: * stopwords (predefined list) * punctuation * numbers * non-alphabetic characters
This namespace extends the NER system to easily add any regular expression using the Stanford TokensRegex API.
This takes a sequence of regular expressions and entity metadata as input and produces a file format the TokensRegex API consumes to tag entities.
This is an example of the output.
This namespace extends the NER system to easily add any regular expression using the [Stanford TokensRegex](http://nlp.stanford.edu/software/tokensregex.html) API. This takes a sequence of regular expressions and entity metadata as input and produces a file format the TokensRegex API consumes to tag entities. [This](https://github.com/plandes/clj-nlp-parse/blob/v0.0.11/test-resources/token-regex.txt) is an example of the output.
cljdoc is a website building & hosting documentation for Clojure/Script libraries
× close