Liking cljdoc? Tell your friends :D

datalevin.search-utils

Some useful utility functions that can be passed as options to search engine to customize search

Some useful utility functions that can be passed as options to search
engine to customize search
raw docstring

create-analyzerclj

(create-analyzer {:keys [tokenizer token-filters]})

Creates an analyzer fn ready for use in search.

opts have the following keys:

  • :tokenizer is a tokenizing fn that takes a string and returns a seq of [term, position, offset], where term is a word, position is the sequence number of the term, and offset is the character offset of this term. create-regexp-tokenizer produces such fn.

  • :token-filters is an ordered list of token filters. A token filter receives a token [term, position, offset] and returns a transformed list of tokens to replace it with.

Creates an analyzer fn ready for use in search.

`opts` have the following keys:

* `:tokenizer` is a tokenizing fn that takes a string and returns a seq of
[term, position, offset], where term is a word, position is the sequence
number of the term, and offset is the character offset of this term.
`create-regexp-tokenizer` produces such fn.

* `:token-filters` is an ordered list of token filters. A token filter
receives a token [term, position, offset] and returns a transformed list of
tokens to replace it with.
sourceraw docstring

create-max-length-token-filterclj

(create-max-length-token-filter max-length)

Filters tokens that are strictly longer than max-length.

Filters tokens that are strictly longer than `max-length`.
sourceraw docstring

create-min-length-token-filterclj

(create-min-length-token-filter min-length)

Filters tokens that are strictly shorter than min-length.

Filters tokens that are strictly shorter than `min-length`.
sourceraw docstring

create-ngram-token-filterclj

(create-ngram-token-filter gram-size)
(create-ngram-token-filter min-gram-size max-gram-size)

Produces character ngrams between min and max size from the token and returns everything as tokens. This is useful for producing efficient fuzzy search.

Produces character ngrams between min and max size from the token and returns
everything as tokens. This is useful for producing efficient fuzzy search.
sourceraw docstring

create-regexp-tokenizerclj

(create-regexp-tokenizer pat)

Creates a tokenizer that splits the given text on the pattern given as argument, and returns valid tokens.

Creates a tokenizer that splits the given text on the pattern given as
argument, and returns valid tokens.
sourceraw docstring

en-stop-words-token-filterclj

This token filter removes "empty" tokens (for english language).

This token filter removes "empty" tokens (for english language).
sourceraw docstring

lower-case-token-filterclj

This token filter converts tokens to lower case.

This token filter converts tokens to lower case.
sourceraw docstring

prefix-token-filterclj

Produces a series of every possible prefixes in a token and replace it with them.

For example: vault -> v, va, vau, vaul, vault

This is useful for producing efficient autocomplete engines, provided this filter is NOT applied at query time.

Produces a series of every possible prefixes in a token and replace it with them.

For example: vault -> v, va, vau, vaul, vault

This is useful for producing efficient autocomplete engines, provided this
filter is NOT applied at query time.
sourceraw docstring

unaccent-token-filterclj

This token filter removes accents and diacritics from tokens.

This token filter removes accents and diacritics from tokens.
sourceraw docstring

cljdoc is a website building & hosting documentation for Clojure/Script libraries

× close