(Micro)Library to build Lucene analyzers in a data-driven fashion.
lucene-custom-analyzer?Dependencies:
lt.jocas/lucene-custom-analyzer {:mvn/version "1.0.34"}
Code:
(require '[lucene.custom.analyzer :as custom-analyzer])
(custom-analyzer/create
{:tokenizer {:standard {:maxTokenLength 4}}
:char-filters [{:patternReplace {:pattern "foo"
:replacement "foo"}}]
:token-filters [{:uppercase nil}
{:reverseString nil}]
:offset-gap 2
:position-increment-gap 3
:config-dir "."})
;; =>
;; #object[org.apache.lucene.analysis.custom.CustomAnalyzer
;; 0x4686f87d
;; "CustomAnalyzer(org.apache.lucene.analysis.pattern.PatternReplaceCharFilterFactory@2f1300,org.apache.lucene.analysis.standard.StandardTokenizerFactory@7e71a244,org.apache.lucene.analysis.core.UpperCaseFilterFactory@54e9f0d6,org.apache.lucene.analysis.reverse.ReverseStringFilterFactory@3e494ba7)"]
Short notation for analysis components:
(custom-analyzer/create
{:tokenizer :standard
:char-filters [:htmlStrip]
:token-filters [:uppercase]})
;; =>
;; #object[org.apache.lucene.analysis.custom.CustomAnalyzer
;; 0x16716eb1
;; "CustomAnalyzer(org.apache.lucene.analysis.charfilter.HTMLStripCharFilterFactory@4c7f61fa,org.apache.lucene.analysis.standard.StandardTokenizerFactory@6fc69052,org.apache.lucene.analysis.core.UpperCaseFilterFactory@3944ccba)"]
If no options are provided then an Analyzer with just the standard tokenizer is created:
(custom-analyzer/create {})
;; =>
;; #object[org.apache.lucene.analysis.custom.CustomAnalyzer
;; 0x456fe86
;; "CustomAnalyzer(org.apache.lucene.analysis.standard.StandardTokenizerFactory@5703f5b3)"]
If you want to check which analysis components are available run:
(lucene.custom.analyzer/char-filter-factories)
(lucene.custom.analyzer/tokenizer-factories)
(lucene.custom.analyzer/token-filter-factories)
Under the hood this library uses the factory classes TokenizerFactory, TokenFilterFactory, and CharFilterFactory.
The actual factories are loaded with java.util.ServiceLoader.
All the available classes are automatically discovered.
If you want to include additional factory classes, e.g. your implementation of the TokenFilterFactory, you need to add it to the classpath 2 things:
META-INF/services add/change a file named org.apache.lucene.analysis.TokenFilterFactory that lists the classes from the step 1.An example can be found here.
Copyright © 2023 Dainius Jocas.
Distributed under The Apache License, Version 2.0.
Can you improve this documentation?Edit on GitHub
cljdoc builds & hosts documentation for Clojure/Script libraries
| Ctrl+k | Jump to recent docs |
| ← | Move to previous article |
| → | Move to next article |
| Ctrl+/ | Jump to the search field |