lucene.custom.text-analysis

Liking cljdoc? Tell your friends :D

Clojure only.

doc->graph
doc->token-strings
doc->tokens
normalize
normalize-doc
text->graph
text->token-strings
text->tokens

doc->graph^clj

(doc->graph doc analyzer)

Each field is analyzed into graph. Params:

doc: flat associative data type
analyzer: Lucene Analyzer, but probably you want a PerFieldAnalyzerWrapper

Each field is analyzed into graph.
Params:
* doc: flat associative data type
* analyzer: Lucene Analyzer, but probably you want a PerFieldAnalyzerWrapper

source raw docstring

doc->token-strings^clj

(doc->token-strings doc analyzer)

Given a document iterates through all its fields, applies an analyzer to each field, and returns a map with the same keys and the analyzed text. TIP: the analyzer probably is the PerFieldAnalyzerWrapper.

Given a document iterates through all its fields, applies an analyzer to each field,
and returns a map with the same keys and the analyzed text.
TIP: the analyzer probably is the PerFieldAnalyzerWrapper.

source raw docstring

doc->tokens^clj

(doc->tokens doc analyzer)

Each field is analyzed into tokens. Params:

doc: flat associative data type
analyzer: Lucene Analyzer, but probably you want a PerFieldAnalyzerWrapper

Each field is analyzed into tokens.
Params:
* doc: flat associative data type
* analyzer: Lucene Analyzer, but probably you want a PerFieldAnalyzerWrapper

source raw docstring

normalize^clj

(normalize text)

(normalize text analyzer)

(normalize text analyzer field-name)

Given a text invokes Analyzer::normalize on it. Returns a String representation of a ByteRef.

Given a text invokes Analyzer::normalize on it.
Returns a String representation of a ByteRef.

source raw docstring

normalize-doc^clj

(normalize-doc doc analyzer)

Normalizes each field with an analyzer. Params:

doc: flat associative data type
analyzer: Lucene Analyzer, but probably you want a PerFieldAnalyzerWrapper

Normalizes each field with an analyzer.
Params:
* doc: flat associative data type
* analyzer: Lucene Analyzer, but probably you want a PerFieldAnalyzerWrapper

source raw docstring

text->graph^clj

(text->graph text)

(text->graph text analyzer)

(text->graph text analyzer field-name)

Given a text (and an optional analyzer) turns the text into a TokenStream that will be converted to the dot language program as a string, e.g.: `digraph tokens { graph [ fontsize=30 labelloc="t" label="" splines=true overlap=false rankdir = "LR" ]; // A2 paper size size = "34.4,16.5"; edge [ fontname="Helvetica" fontcolor="red" color="#606060" ] node [ style="filled" fillcolor="#e8e8f0" shape="Mrecord" fontname="Helvetica" ]

0 [label="0"] -1 [shape=point color=white] -1 -> 0 [] 0 -> 1 [ label="foobarbazs / fooBarBazs"] -2 [shape=point color=white] 1 -> -2 [] }`

Given a text (and an optional analyzer) turns the text into a TokenStream
that will be converted to the `dot` language program as a string, e.g.:
`digraph tokens {
   graph [ fontsize=30 labelloc=\"t\" label=\"\" splines=true overlap=false rankdir = \"LR\" ];
   // A2 paper size
   size = \"34.4,16.5\";
   edge [ fontname=\"Helvetica\" fontcolor=\"red\" color=\"#606060\" ]
   node [ style=\"filled\" fillcolor=\"#e8e8f0\" shape=\"Mrecord\" fontname=\"Helvetica\" ]

   0 [label=\"0\"]
   -1 [shape=point color=white]
   -1 -> 0 []
   0 -> 1 [ label=\"foobarbazs / fooBarBazs\"]
   -2 [shape=point color=white]
   1 -> -2 []
 }`

source raw docstring

text->token-strings^clj

(text->token-strings text)

(text->token-strings text analyzer)

(text->token-strings text analyzer field-name)

Given a text (and an optional analyzer) returns a vector of tokens as strings. Params:

text: String
analyzer: Lucene Analyzer
field-name can be either string if clojure.lang.Named.

Given a text (and an optional analyzer) returns a vector of tokens as strings.
Params:
* text: String
* analyzer: Lucene Analyzer
* field-name can be either string if clojure.lang.Named.

source raw docstring

text->tokens^clj

(text->tokens text)

(text->tokens text analyzer)

(text->tokens text analyzer field-name)

Given a text (and an optional analyzer) returns a list of tokens as maps of shape: {:token "pre", :type "<ALPHANUM>", :start_offset 0, :end_offset 3, :position 0, :positionLength 1} Params:

text: String
analyzer: Lucene Analyzer
field-name can be either string if clojure.lang.Named.

Given a text (and an optional analyzer) returns a list of tokens as maps of shape:
{:token "pre",
 :type "<ALPHANUM>",
 :start_offset 0,
 :end_offset 3,
 :position 0,
 :positionLength 1}
 Params:
 * text: String
 * analyzer: Lucene Analyzer
 * field-name can be either string if clojure.lang.Named.

source raw docstring