Liking cljdoc? Tell your friends :D

# aho-corasick

A Clojure library designed to solve set matching (or multi-string matching) problem by applying the classic aho-corasick algorithm.

## Original Algorithm listing

The following algorithm listing is from the original paper "Efficient String Mathcing: An Aid to Bibliographic Search (1975)" by Alfred V. Aho and Margaret J. Corasick.

###Algorithm 1. Pattern matching machine. Input. A text string x = a1a2...an where each ai is an input symbol and a pattern matching machine M with goto function g, failure function f, and output function output, as described above.

Output. Locations at which keywords occur in x.

Method.

``````begin
state := 0
for i := 1 until n do
begin
while g(state, ai) == fail do state := f(state)
state := g(state, ai)
if output(state) != empty then
begin
print i
print output(state)
end
end
end
``````

### Algorithm 2. Construction of the goto function.

Input. Set of keywords K = {y1, y2, ..., yk}.

Output. Goto function g and a partially computed output function output.

Method. We assume output(s) is empty when state s is first created, and g(s, a) == fail if a is undefined or if g(s, a) has not yet been defined. The procedure enter(y) inserts into the goto graph a path that spells out y.

``````begin
newstate := 0
for i := 1 until k do enter(yi)
for all a such that g(0, a) == fail do g(0, a) := 0
end

procedure enter(a1a2...am):
begin
state := 0; j := 1
while g(state, aj) != fail do
begin
state := g(state, aj)
j := j + 1
end
for p := j until m do
begin
newstate := newstate + 1
g(state, ap) := newstate
state := newstate
end
output(state) := {a1a2...am}
end
``````

### Algorithm 3. Construction of the failure function.

Input. Goto function g and output function output from Algorithm 2.

Output. Failure function f and output function output.

Method.

``````begin
queue := empty
for each a such that g(0, a) != 0 do
begin
s := g(0, a)
queue := queue U {s}
f(s) := 0
end
while queue != empty do
begin
let r be the next state in queue
queue := queue - {r}
for each a such that g(r, a) != fail do
begin
s := g(r, a)
queue := queue U {s}
state := f(r)
while g(state, a) == fail do state := f(state)
f(s) := g(state, a)
output(s) := output(s) U output(f(s))
end
end
end
``````

FIXME 