dk — dk.simongray/datalinguist 0.2.171

dk.simongray.datalinguist

Functions for building a CoreNLP pipeline and extracting text annotations.

The functions are designed to be chained using the threading macro or through function composition. Please note that any annotation can be accessed using the basic annotation function, you are not limited to using the convenience functions otherwise provided in this namespace.

The functions here mirror the annotation system of Stanford CoreNLP: once the return value isn't an instance of TypesafeMap or a seq of TypesafeMap objects, the annotation functions cannot retrieve anything from it. One example of this might be dependency-graph which returns a SemanticGraph object.

As a general rule, functions with names that are pluralised have a seqable output, e.g. sentences or tokens. This does not matter when chaining these functions, as all of the annotation functions will implicitly map to seqs.

Functions for building a CoreNLP pipeline and extracting text annotations.

The functions are designed to be chained using the threading macro or through
function composition. Please note that *any* annotation can be accessed using
the basic `annotation` function, you are not limited to using the convenience
functions otherwise provided in this namespace.

The functions here mirror the annotation system of Stanford CoreNLP: once the
return value isn't an instance of TypesafeMap or a seq of TypesafeMap objects,
the annotation functions cannot retrieve anything from it. One example of this
might be `dependency-graph` which returns a SemanticGraph object.

As a general rule, functions with names that are pluralised have a seqable
output, e.g. `sentences` or `tokens`. This does not matter when chaining these
functions, as all of the annotation functions will implicitly map to seqs.

raw docstring

dk.simongray.datalinguist.dependency

Functions dealing with dependency grammar graphs, AKA Semantic Graphs.

CoreNLP contains some duplicate field and method names, e.g. governor is the same as source. This namespace only retains a single name for these terms.

Some easily replicated convenience function cruft has also not been retained:

matchPatternToVertex
variations on basic graph functionality, e.g. getChildList
isNegatedVerb, isNegatedVertex, isInConditionalContext, etc.
getSubgraphVertices, yield seem equal in functionality to descendants

Nor have any useless utility functions that are easily replicated:

toRecoveredSentenceString and the like
empty, size
sorting methods; just use Clojure sort, e.g. (sort (vertices g))

The methods in SemanticGraphUtils are mostly meant for internal consumption, though a few are useful enough to warrant wrapping here, e.g. subgraph.

Functions dealing with semgrex in CoreNLP (dependency grammar patterns) have been wrapped so as to mimic the existing Clojure Core regex functions. The sem-result function also mimics re-groups and serves a similar purpose, although rather than returning groups it returns named nodes/relations defined in the pattern.

Additionally, any mutating functions have deliberately not been wrapped!

Functions dealing with dependency grammar graphs, AKA Semantic Graphs.

CoreNLP contains some duplicate field and method names, e.g. governor is
the same as source. This namespace only retains a single name for these terms.

Some easily replicated convenience function cruft has also not been retained:
  - matchPatternToVertex
  - variations on basic graph functionality, e.g. getChildList
  - isNegatedVerb, isNegatedVertex, isInConditionalContext, etc.
  - getSubgraphVertices, yield seem equal in functionality to descendants

Nor have any useless utility functions that are easily replicated:
  - toRecoveredSentenceString and the like
  - empty, size
  - sorting methods; just use Clojure sort, e.g. (sort (vertices g))

The methods in SemanticGraphUtils are mostly meant for internal consumption,
though a few are useful enough to warrant wrapping here, e.g. subgraph.

Functions dealing with semgrex in CoreNLP (dependency grammar patterns) have
been wrapped so as to mimic the existing Clojure Core regex functions. The
`sem-result` function also mimics re-groups and serves a similar purpose,
although rather than returning groups it returns named nodes/relations defined
in the pattern.

Additionally, any mutating functions have deliberately not been wrapped!

raw docstring

dk.simongray.datalinguist.loom.io

This namespace contains an implementation of loom.io/view that uses dk.simongray.datalinguist.dependency/formatted-string instead of loom.io/dot-str.

The main reason for this is that loom.io/render-to-bytes explicitly requires graphs made of pure data, so the function won't work with e.g. SemanticGraph even though the class satifies loom's Graph protocol.

See loom.io for documentation of the functions.

This namespace contains an implementation of `loom.io/view` that uses
`dk.simongray.datalinguist.dependency/formatted-string` instead of
`loom.io/dot-str`.

The main reason for this is that `loom.io/render-to-bytes` explicitly requires
graphs made of pure data, so the function won't work with e.g. SemanticGraph
even though the class satifies loom's Graph protocol.

See `loom.io` for documentation of the functions.

raw docstring

dk.simongray.datalinguist.tree

Everything to do with trees, chiefly of the constituency grammar kind.

Functions dealing with tregex in CoreNLP (constituency grammar patterns) have been wrapped so as to mimic the existing Clojure Core regex functions. The tregex-result function also mimics re-groups and serves a similar purpose, although rather than returning groups it returns named nodes defined in the pattern.

Everything to do with trees, chiefly of the constituency grammar kind.

Functions dealing with tregex in CoreNLP (constituency grammar patterns) have
been wrapped so as to mimic the existing Clojure Core regex functions. The
`tregex-result` function also mimics re-groups and serves a similar purpose,
although rather than returning groups it returns named nodes defined  in the
pattern.

raw docstring

dk.simongray.datalinguist.triple

Functions dealing with (subject; relation; object) triples.

Functions dealing with (subject; relation; object) triples.

raw docstring

dk.simongray.datalinguist.util

Various utility functions used from the other namespaces, along with collections of more or less static data.

Various utility functions used from the other namespaces, along with
collections of more or less static data.

raw docstring