datalevin.search-utils

Liking cljdoc? Tell your friends :D

Clojure only.

create-analyzer
create-max-length-token-filter
create-min-length-token-filter
create-ngram-token-filter
create-regexp-tokenizer
en-stop-words-token-filter
lower-case-token-filter
prefix-token-filter
unaccent-token-filter

Some useful utility functions that can be passed as options to search engine to customize search

Some useful utility functions that can be passed as options to search
engine to customize search

raw docstring

create-analyzer^clj

(create-analyzer {:keys [tokenizer token-filters]})

Creates an analyzer fn ready for use in search.

opts have the following keys:

:tokenizer is a tokenizing fn that takes a string and returns a seq of [term, position, offset], where term is a word, position is the sequence number of the term, and offset is the character offset of this term. create-regexp-tokenizer produces such fn.
:token-filters is an ordered list of token filters. A token filter receives a token [term, position, offset] and returns a transformed list of tokens to replace it with.

Creates an analyzer fn ready for use in search.

`opts` have the following keys:

* `:tokenizer` is a tokenizing fn that takes a string and returns a seq of
[term, position, offset], where term is a word, position is the sequence
number of the term, and offset is the character offset of this term.
`create-regexp-tokenizer` produces such fn.

* `:token-filters` is an ordered list of token filters. A token filter
receives a token [term, position, offset] and returns a transformed list of
tokens to replace it with.

source raw docstring

create-max-length-token-filter^clj

(create-max-length-token-filter max-length)

Filters tokens that are strictly longer than max-length.

Filters tokens that are strictly longer than `max-length`.

source raw docstring

create-min-length-token-filter^clj

(create-min-length-token-filter min-length)

Filters tokens that are strictly shorter than min-length.

Filters tokens that are strictly shorter than `min-length`.

source raw docstring

create-ngram-token-filter^clj

(create-ngram-token-filter gram-size)

(create-ngram-token-filter min-gram-size max-gram-size)

Produces character ngrams between min and max size from the token and returns everything as tokens. This is useful for producing efficient fuzzy search.

Produces character ngrams between min and max size from the token and returns
everything as tokens. This is useful for producing efficient fuzzy search.

source raw docstring

create-regexp-tokenizer^clj

(create-regexp-tokenizer pat)

Creates a tokenizer that splits the given text on the pattern given as argument, and returns valid tokens.

Creates a tokenizer that splits the given text on the pattern given as
argument, and returns valid tokens.

source raw docstring

en-stop-words-token-filter^clj

This token filter removes "empty" tokens (for english language).

This token filter removes "empty" tokens (for english language).

source raw docstring

lower-case-token-filter^clj

This token filter converts tokens to lower case.

This token filter converts tokens to lower case.

source raw docstring

prefix-token-filter^clj

Produces a series of every possible prefixes in a token and replace it with them.

For example: vault -> v, va, vau, vaul, vault

This is useful for producing efficient autocomplete engines, provided this filter is NOT applied at query time.

Produces a series of every possible prefixes in a token and replace it with them.

For example: vault -> v, va, vau, vaul, vault

This is useful for producing efficient autocomplete engines, provided this
filter is NOT applied at query time.

source raw docstring

unaccent-token-filter^clj

This token filter removes accents and diacritics from tokens.

This token filter removes accents and diacritics from tokens.

source raw docstring

cljdoc is a website building & hosting documentation for Clojure/Script libraries

Keyboard shortcuts Report a problem cljdoc on GitHub

× close