Liking cljdoc? Tell your friends :D

dk.simongray.datalinguist

Functions for building a CoreNLP pipeline and extracting text annotations.

The functions are designed to be chained using the threading macro or through function composition. Please note that any annotation can be accessed using the basic annotation function, you are not limited to using the convenience functions otherwise provided in this namespace.

The functions here mirror the annotation system of Stanford CoreNLP: once the return value isn't an instance of TypesafeMap or a seq of TypesafeMap objects, the annotation functions cannot retrieve anything from it. One example of this might be dependency-graph which returns a SemanticGraph object.

As a general rule, functions with names that are pluralised have a seqable output, e.g. sentences or tokens. This does not matter when chaining these functions, as all of the annotation functions will implicitly map to seqs.

Functions for building a CoreNLP pipeline and extracting text annotations.

The functions are designed to be chained using the threading macro or through
function composition. Please note that *any* annotation can be accessed using
the basic `annotation` function, you are not limited to using the convenience
functions otherwise provided in this namespace.

The functions here mirror the annotation system of Stanford CoreNLP: once the
return value isn't an instance of TypesafeMap or a seq of TypesafeMap objects,
the annotation functions cannot retrieve anything from it. One example of this
might be `dependency-graph` which returns a SemanticGraph object.

As a general rule, functions with names that are pluralised have a seqable
output, e.g. `sentences` or `tokens`. This does not matter when chaining these
functions, as all of the annotation functions will implicitly map to seqs.
raw docstring

->pipelineclj

(->pipeline conf)

Wrap a closure around the CoreNLP pipeline specified in the conf map.

The returned function will annotate input text with the annotators specified in addition to any unspecified dependency annotators.

Wrap a closure around the CoreNLP pipeline specified in the `conf` map.

The returned function will annotate input text with the annotators specified
in addition to any unspecified dependency annotators.
raw docstring

annotationclj

(annotation c x)

Access the annotation of x as specified by class c.

If x doesn't contain the annotation, tries to find the annotation inside any tokens or sentences within x, in that order. Generally, annotations will be located at either the document level, sentence level, or token level, so this behaviour allows skipping some steps in the REPL.

Access the annotation of `x` as specified by class `c`.

If `x` doesn't contain the annotation, tries to find the annotation inside any
tokens or sentences within x, in that order. Generally, annotations will be
located at either the document level, sentence level, or token level, so
this behaviour allows skipping some steps in the REPL.
raw docstring

constituency-treeclj

(constituency-tree x)
(constituency-tree style x)

The constituency tree of x; style can be :kbest-trees, :binarized, or :standard (default).

The constituency tree of `x`; `style` can be :kbest-trees, :binarized, or
:standard (default).
raw docstring

dependency-graphclj

(dependency-graph x)
(dependency-graph style x)

The dependency graph of x; style can be :basic, :enhanced, or :enhanced++ (default).

The dependency graph of `x`; `style` can be :basic, :enhanced, or :enhanced++
(default).
raw docstring

indexclj

(index x)
(index style x)

The index of x; style can be :quote, :sentence, or :token (default).

The index of `x`; `style` can be :quote, :sentence, or :token (default).
raw docstring

lemmaclj

(lemma x)

The lemma of x.

The lemma of `x`.
raw docstring

mentionsclj

(mentions x)

The named entity mentions of x.

The named entity mentions of `x`.
raw docstring

named-entityclj

(named-entity x)
(named-entity style x)

The named entity tag of x; style can be :probs, :coarse, :fine, or :tag (default).

The named entity tag of `x`; `style` can be :probs, :coarse, :fine, or
:tag (default).
raw docstring

numericclj

(numeric x)
(numeric style x)

The numeric value or type of x; style can be :normalized, :composite, :composite-type, :composite-value, :type, or :value (default).

The numeric value or type of `x`; `style` can be :normalized, :composite,
:composite-type, :composite-value, :type, or :value (default).
raw docstring

offsetclj

(offset x)
(offset style x)

The character offset of x; style can be :end or :begin (default).

The character offset of `x`; `style` can be :end or :begin (default).
raw docstring

posclj

(pos x)

The part-of-speech of x.

The part-of-speech of `x`.
raw docstring

quotationsclj

(quotations x)
(quotations style x)

The quotations of x; style can be :unclosed or :closed (default).

The quotations of `x`; `style` can be :unclosed or :closed (default).
raw docstring

recur-datafyclj

(recur-datafy x)

Return a recursively datafied representation of x. Call at the end of an annotation chain to get plain Clojure data structures.

Return a recursively datafied representation of `x`.
Call at the end of an annotation chain to get plain Clojure data structures.
raw docstring

sentencesclj

(sentences x)

The sentences of x.

The sentences of `x`.
raw docstring

textclj

(text x)
(text style x)

The text of x; style can be :true-case or :plain (default).

The text of `x`; `style` can be :true-case or :plain (default).
raw docstring

token-findclj

(token-find m)
(token-find p tokens)

Return the next semgrex match, if any, of tokens to pattern, using TokenSequenceMatcher.find().

Return the next semgrex match, if any, of tokens to pattern, using
TokenSequenceMatcher.find().
raw docstring

token-groupsclj

(token-groups m)

Returns the groups from the most recent match/find. If there are no nested groups, returns tokens for the entire match. If there are nested groups, returns a vector of the groups, the first element being the entire match.

Returns the groups from the most recent match/find. If there are no
nested groups, returns tokens for the entire match. If there are
nested groups, returns a vector of the groups, the first element
being the entire match.
raw docstring

token-matcherclj

(token-matcher p tokens)

Create a TokenSequenceMatcher from p and tokens; use in token-find.

Create a TokenSequenceMatcher from `p` and `tokens`; use in token-find.
raw docstring

token-matchesclj

(token-matches p g)

Returns the match, if any, of tokens to pattern, using edu.stanford.nlp.ling.tokensregex.TokenSequenceMatcher.matches(). Uses token-groups to return the groups.

Returns the match, if any, of tokens to pattern, using
edu.stanford.nlp.ling.tokensregex.TokenSequenceMatcher.matches().
Uses token-groups to return the groups.
raw docstring

token-patternclj

(token-pattern s)

Return an instance of TokenSequencePattern, for use, e.g. in token-matcher.

Return an instance of TokenSequencePattern, for use, e.g. in token-matcher.
raw docstring

token-seqclj

(token-seq p tokens)

Return a lazy list of matches of TokenSequencePattern p in tokens.

Return a lazy list of matches of TokenSequencePattern `p` in `tokens`.
raw docstring

tokensclj

(tokens x)

The tokens of x.

The tokens of `x`.
raw docstring

triplesclj

(triples x)
(triples style x)

The triples of x; style can be :kbp or :openie (default).

The triples of `x`; `style` can be :kbp or :openie (default).
raw docstring

true-caseclj

(true-case x)

The true case of x.

The true case of `x`.
raw docstring

whitespaceclj

(whitespace x)
(whitespace style x)

The whitespace around x; style can be :after or :before (default).

The whitespace around `x`; `style` can be :after or :before (default).
raw docstring

cljdoc is a website building & hosting documentation for Clojure/Script libraries

× close