dk.simongray.datalinguist — dk.simongray/datalinguist 0.2.171

Liking cljdoc? Tell your friends :D

Clojure only.

dk.simongray.datalinguist

Functions for building a CoreNLP pipeline and extracting text annotations.

The functions are designed to be chained using the threading macro or through function composition. Please note that any annotation can be accessed using the basic annotation function, you are not limited to using the convenience functions otherwise provided in this namespace.

The functions here mirror the annotation system of Stanford CoreNLP: once the return value isn't an instance of TypesafeMap or a seq of TypesafeMap objects, the annotation functions cannot retrieve anything from it. One example of this might be dependency-graph which returns a SemanticGraph object.

As a general rule, functions with names that are pluralised have a seqable output, e.g. sentences or tokens. This does not matter when chaining these functions, as all of the annotation functions will implicitly map to seqs.

Functions for building a CoreNLP pipeline and extracting text annotations.

The functions are designed to be chained using the threading macro or through
function composition. Please note that *any* annotation can be accessed using
the basic `annotation` function, you are not limited to using the convenience
functions otherwise provided in this namespace.

The functions here mirror the annotation system of Stanford CoreNLP: once the
return value isn't an instance of TypesafeMap or a seq of TypesafeMap objects,
the annotation functions cannot retrieve anything from it. One example of this
might be `dependency-graph` which returns a SemanticGraph object.

As a general rule, functions with names that are pluralised have a seqable
output, e.g. `sentences` or `tokens`. This does not matter when chaining these
functions, as all of the annotation functions will implicitly map to seqs.

->pipeline^clj

(->pipeline conf)

Wrap a closure around the CoreNLP pipeline specified in the conf map.

The returned function will annotate input text with the annotators specified in addition to any unspecified dependency annotators.

Wrap a closure around the CoreNLP pipeline specified in the `conf` map.

The returned function will annotate input text with the annotators specified
in addition to any unspecified dependency annotators.

annotation^clj

(annotation c x)

Access the annotation of x as specified by class c.

If x doesn't contain the annotation, tries to find the annotation inside any tokens or sentences within x, in that order. Generally, annotations will be located at either the document level, sentence level, or token level, so this behaviour allows skipping some steps in the REPL.

Access the annotation of `x` as specified by class `c`.

If `x` doesn't contain the annotation, tries to find the annotation inside any
tokens or sentences within x, in that order. Generally, annotations will be
located at either the document level, sentence level, or token level, so
this behaviour allows skipping some steps in the REPL.

constituency-tree^clj

(constituency-tree x)

(constituency-tree style x)

The constituency tree of x; style can be :kbest-trees, :binarized, or :standard (default).

The constituency tree of `x`; `style` can be :kbest-trees, :binarized, or
:standard (default).

dependency-graph^clj

(dependency-graph x)

(dependency-graph style x)

The dependency graph of x; style can be :basic, :enhanced, or :enhanced++ (default).

The dependency graph of `x`; `style` can be :basic, :enhanced, or :enhanced++
(default).

index^clj

(index x)

(index style x)

The index of x; style can be :quote, :sentence, or :token (default).

The index of `x`; `style` can be :quote, :sentence, or :token (default).

lemma^clj

(lemma x)

The lemma of x.

The lemma of `x`.

mentions^clj

(mentions x)

The named entity mentions of x.

The named entity mentions of `x`.

named-entity^clj

(named-entity x)

(named-entity style x)

The named entity tag of x; style can be :probs, :coarse, :fine, or :tag (default).

The named entity tag of `x`; `style` can be :probs, :coarse, :fine, or
:tag (default).

numeric^clj

(numeric x)

(numeric style x)

The numeric value or type of x; style can be :normalized, :composite, :composite-type, :composite-value, :type, or :value (default).

The numeric value or type of `x`; `style` can be :normalized, :composite,
:composite-type, :composite-value, :type, or :value (default).

offset^clj

(offset x)

(offset style x)

The character offset of x; style can be :end or :begin (default).

The character offset of `x`; `style` can be :end or :begin (default).

pos^clj

(pos x)

The part-of-speech of x.

The part-of-speech of `x`.

quotations^clj

(quotations x)

(quotations style x)

The quotations of x; style can be :unclosed or :closed (default).

The quotations of `x`; `style` can be :unclosed or :closed (default).

recur-datafy^clj

(recur-datafy x)

Return a recursively datafied representation of x. Call at the end of an annotation chain to get plain Clojure data structures.

Return a recursively datafied representation of `x`.
Call at the end of an annotation chain to get plain Clojure data structures.

sentences^clj

(sentences x)

The sentences of x.

The sentences of `x`.

text^clj

(text x)

(text style x)

The text of x; style can be :true-case or :plain (default).

The text of `x`; `style` can be :true-case or :plain (default).

token-find^clj

(token-find m)

(token-find p tokens)

Return the next semgrex match, if any, of tokens to pattern, using TokenSequenceMatcher.find().

Return the next semgrex match, if any, of tokens to pattern, using
TokenSequenceMatcher.find().

token-groups^clj

(token-groups m)

Returns the groups from the most recent match/find. If there are no nested groups, returns tokens for the entire match. If there are nested groups, returns a vector of the groups, the first element being the entire match.

Returns the groups from the most recent match/find. If there are no
nested groups, returns tokens for the entire match. If there are
nested groups, returns a vector of the groups, the first element
being the entire match.

token-matcher^clj

(token-matcher p tokens)

Create a TokenSequenceMatcher from p and tokens; use in token-find.

Create a TokenSequenceMatcher from `p` and `tokens`; use in token-find.

token-matches^clj

(token-matches p g)

Returns the match, if any, of tokens to pattern, using edu.stanford.nlp.ling.tokensregex.TokenSequenceMatcher.matches(). Uses token-groups to return the groups.

Returns the match, if any, of tokens to pattern, using
edu.stanford.nlp.ling.tokensregex.TokenSequenceMatcher.matches().
Uses token-groups to return the groups.

token-pattern^clj

(token-pattern s)

Return an instance of TokenSequencePattern, for use, e.g. in token-matcher.

Return an instance of TokenSequencePattern, for use, e.g. in token-matcher.

token-seq^clj

(token-seq p tokens)

Return a lazy list of matches of TokenSequencePattern p in tokens.

Return a lazy list of matches of TokenSequencePattern `p` in `tokens`.

tokens^clj

(tokens x)

The tokens of x.

The tokens of `x`.

triples^clj

(triples x)

(triples style x)

The triples of x; style can be :kbp or :openie (default).

The triples of `x`; `style` can be :kbp or :openie (default).

true-case^clj

(true-case x)

The true case of x.

The true case of `x`.

whitespace^clj

(whitespace x)

(whitespace style x)

The whitespace around x; style can be :after or :before (default).

The whitespace around `x`; `style` can be :after or :before (default).

cljdoc is a website building & hosting documentation for Clojure/Script libraries

Keyboard shortcuts Report a problem cljdoc on GitHub