A Clojure library designed to solve set matching (or multi-string matching) problem by applying the classic aho-corasick algorithm.
The following algorithm listing is from the original paper "Efficient String Mathcing: An Aid to Bibliographic Search (1975)" by Alfred V. Aho and Margaret J. Corasick.
###Algorithm 1. Pattern matching machine. Input. A text string x = a1a2...an where each ai is an input symbol and a pattern matching machine M with goto function g, failure function f, and output function output, as described above.
Output. Locations at which keywords occur in x.
Method.
begin
state := 0
for i := 1 until n do
begin
while g(state, ai) == fail do state := f(state)
state := g(state, ai)
if output(state) != empty then
begin
print i
print output(state)
end
end
end
Input. Set of keywords K = {y1, y2, ..., yk}.
Output. Goto function g and a partially computed output function output.
Method. We assume output(s) is empty when state s is first created, and g(s, a) == fail if a is undefined or if g(s, a) has not yet been defined. The procedure enter(y) inserts into the goto graph a path that spells out y.
begin
newstate := 0
for i := 1 until k do enter(yi)
for all a such that g(0, a) == fail do g(0, a) := 0
end
procedure enter(a1a2...am):
begin
state := 0; j := 1
while g(state, aj) != fail do
begin
state := g(state, aj)
j := j + 1
end
for p := j until m do
begin
newstate := newstate + 1
g(state, ap) := newstate
state := newstate
end
output(state) := {a1a2...am}
end
Input. Goto function g and output function output from Algorithm 2.
Output. Failure function f and output function output.
Method.
begin
queue := empty
for each a such that g(0, a) != 0 do
begin
s := g(0, a)
queue := queue U {s}
f(s) := 0
end
while queue != empty do
begin
let r be the next state in queue
queue := queue - {r}
for each a such that g(r, a) != fail do
begin
s := g(r, a)
queue := queue U {s}
state := f(r)
while g(state, a) == fail do state := f(state)
f(s) := g(state, a)
output(s) := output(s) U output(f(s))
end
end
end
FIXME
Copyright © 2016 FIXME
Distributed under the Eclipse Public License either version 1.0 or (at your option) any later version.
Can you improve this documentation?Edit on GitHub
cljdoc is a website building & hosting documentation for Clojure/Script libraries
× close