Functions for building a CoreNLP pipeline and extracting text annotations.
The functions are designed to be chained using the threading macro or through
function composition. Please note that any annotation can be accessed using
the basic annotation
function, you are not limited to using the convenience
functions otherwise provided in this namespace.
The functions here mirror the annotation system of Stanford CoreNLP: once the
return value isn't an instance of TypesafeMap or a seq of TypesafeMap objects,
the annotation functions cannot retrieve anything from it. One example of this
might be dependency-graph
which returns a SemanticGraph object.
As a general rule, functions with names that are pluralised have a seqable
output, e.g. sentences
or tokens
. This does not matter when chaining these
functions, as all of the annotation functions will implicitly map to seqs.
Functions for building a CoreNLP pipeline and extracting text annotations. The functions are designed to be chained using the threading macro or through function composition. Please note that *any* annotation can be accessed using the basic `annotation` function, you are not limited to using the convenience functions otherwise provided in this namespace. The functions here mirror the annotation system of Stanford CoreNLP: once the return value isn't an instance of TypesafeMap or a seq of TypesafeMap objects, the annotation functions cannot retrieve anything from it. One example of this might be `dependency-graph` which returns a SemanticGraph object. As a general rule, functions with names that are pluralised have a seqable output, e.g. `sentences` or `tokens`. This does not matter when chaining these functions, as all of the annotation functions will implicitly map to seqs.
(->pipeline conf)
Wrap a closure around the CoreNLP pipeline specified in the conf
map.
The returned function will annotate input text with the annotators specified in addition to any unspecified dependency annotators.
Wrap a closure around the CoreNLP pipeline specified in the `conf` map. The returned function will annotate input text with the annotators specified in addition to any unspecified dependency annotators.
(annotation c x)
Access the annotation of x
as specified by class c
.
If x
doesn't contain the annotation, tries to find the annotation inside any
tokens or sentences within x, in that order. Generally, annotations will be
located at either the document level, sentence level, or token level, so
this behaviour allows skipping some steps in the REPL.
Access the annotation of `x` as specified by class `c`. If `x` doesn't contain the annotation, tries to find the annotation inside any tokens or sentences within x, in that order. Generally, annotations will be located at either the document level, sentence level, or token level, so this behaviour allows skipping some steps in the REPL.
(constituency-tree x)
(constituency-tree style x)
The constituency tree of x
; style
can be :kbest-trees, :binarized, or
:standard (default).
The constituency tree of `x`; `style` can be :kbest-trees, :binarized, or :standard (default).
(dependency-graph x)
(dependency-graph style x)
The dependency graph of x
; style
can be :basic, :enhanced, or :enhanced++
(default).
The dependency graph of `x`; `style` can be :basic, :enhanced, or :enhanced++ (default).
(index x)
(index style x)
The index of x
; style
can be :quote, :sentence, or :token (default).
The index of `x`; `style` can be :quote, :sentence, or :token (default).
(mentions x)
The named entity mentions of x
.
The named entity mentions of `x`.
(named-entity x)
(named-entity style x)
The named entity tag of x
; style
can be :probs, :coarse, :fine, or
:tag (default).
The named entity tag of `x`; `style` can be :probs, :coarse, :fine, or :tag (default).
(numeric x)
(numeric style x)
The numeric value or type of x
; style
can be :normalized, :composite,
:composite-type, :composite-value, :type, or :value (default).
The numeric value or type of `x`; `style` can be :normalized, :composite, :composite-type, :composite-value, :type, or :value (default).
(offset x)
(offset style x)
The character offset of x
; style
can be :end or :begin (default).
The character offset of `x`; `style` can be :end or :begin (default).
(quotations x)
(quotations style x)
The quotations of x
; style
can be :unclosed or :closed (default).
The quotations of `x`; `style` can be :unclosed or :closed (default).
(recur-datafy x)
Return a recursively datafied representation of x
.
Call at the end of an annotation chain to get plain Clojure data structures.
Return a recursively datafied representation of `x`. Call at the end of an annotation chain to get plain Clojure data structures.
(text x)
(text style x)
The text of x
; style
can be :true-case or :plain (default).
The text of `x`; `style` can be :true-case or :plain (default).
(token-find m)
(token-find p tokens)
Return the next semgrex match, if any, of tokens to pattern, using TokenSequenceMatcher.find().
Return the next semgrex match, if any, of tokens to pattern, using TokenSequenceMatcher.find().
(token-groups m)
Returns the groups from the most recent match/find. If there are no nested groups, returns tokens for the entire match. If there are nested groups, returns a vector of the groups, the first element being the entire match.
Returns the groups from the most recent match/find. If there are no nested groups, returns tokens for the entire match. If there are nested groups, returns a vector of the groups, the first element being the entire match.
(token-matcher p tokens)
Create a TokenSequenceMatcher from p
and tokens
; use in token-find.
Create a TokenSequenceMatcher from `p` and `tokens`; use in token-find.
(token-matches p g)
Returns the match, if any, of tokens to pattern, using edu.stanford.nlp.ling.tokensregex.TokenSequenceMatcher.matches(). Uses token-groups to return the groups.
Returns the match, if any, of tokens to pattern, using edu.stanford.nlp.ling.tokensregex.TokenSequenceMatcher.matches(). Uses token-groups to return the groups.
(token-pattern s)
Return an instance of TokenSequencePattern, for use, e.g. in token-matcher.
Return an instance of TokenSequencePattern, for use, e.g. in token-matcher.
(token-seq p tokens)
Return a lazy list of matches of TokenSequencePattern p
in tokens
.
Return a lazy list of matches of TokenSequencePattern `p` in `tokens`.
(triples x)
(triples style x)
The triples of x
; style
can be :kbp or :openie (default).
The triples of `x`; `style` can be :kbp or :openie (default).
(whitespace x)
(whitespace style x)
The whitespace around x
; style
can be :after or :before (default).
The whitespace around `x`; `style` can be :after or :before (default).
cljdoc is a website building & hosting documentation for Clojure/Script libraries
× close