(Micro)Library to build Lucene analyzers in a data-driven fashion.
lucene-custom-analyzer
?Dependencies:
lt.jocas/lucene-custom-analyzer {:mvn/version "1.0.34"}
Code:
(require '[lucene.custom.analyzer :as custom-analyzer])
(custom-analyzer/create
{:tokenizer {:standard {:maxTokenLength 4}}
:char-filters [{:patternReplace {:pattern "foo"
:replacement "foo"}}]
:token-filters [{:uppercase nil}
{:reverseString nil}]
:offset-gap 2
:position-increment-gap 3
:config-dir "."})
;; =>
;; #object[org.apache.lucene.analysis.custom.CustomAnalyzer
;; 0x4686f87d
;; "CustomAnalyzer(org.apache.lucene.analysis.pattern.PatternReplaceCharFilterFactory@2f1300,org.apache.lucene.analysis.standard.StandardTokenizerFactory@7e71a244,org.apache.lucene.analysis.core.UpperCaseFilterFactory@54e9f0d6,org.apache.lucene.analysis.reverse.ReverseStringFilterFactory@3e494ba7)"]
Short notation for analysis components:
(custom-analyzer/create
{:tokenizer :standard
:char-filters [:htmlStrip]
:token-filters [:uppercase]})
;; =>
;; #object[org.apache.lucene.analysis.custom.CustomAnalyzer
;; 0x16716eb1
;; "CustomAnalyzer(org.apache.lucene.analysis.charfilter.HTMLStripCharFilterFactory@4c7f61fa,org.apache.lucene.analysis.standard.StandardTokenizerFactory@6fc69052,org.apache.lucene.analysis.core.UpperCaseFilterFactory@3944ccba)"]
If no options are provided then an Analyzer with just the standard tokenizer is created:
(custom-analyzer/create {})
;; =>
;; #object[org.apache.lucene.analysis.custom.CustomAnalyzer
;; 0x456fe86
;; "CustomAnalyzer(org.apache.lucene.analysis.standard.StandardTokenizerFactory@5703f5b3)"]
If you want to check which analysis components are available run:
(lucene.custom.analyzer/char-filter-factories)
(lucene.custom.analyzer/tokenizer-factories)
(lucene.custom.analyzer/token-filter-factories)
Under the hood this library uses the factory classes TokenizerFactory
, TokenFilterFactory
, and CharFilterFactory
.
The actual factories are loaded with java.util.ServiceLoader
.
All the available classes are automatically discovered.
If you want to include additional factory classes, e.g. your implementation of the TokenFilterFactory,
you need to add it to the classpath 2 things:
META-INF/services
add/change a file named org.apache.lucene.analysis.TokenFilterFactory
that lists the classes from the step 1.An example can be found here.
Copyright © 2023 Dainius Jocas.
Distributed under The Apache License, Version 2.0.
Can you improve this documentation?Edit on GitHub
cljdoc is a website building & hosting documentation for Clojure/Script libraries
× close