Liking cljdoc? Tell your friends :D

vector-search-clj

Clojars Project cljdoc test

Embedded approximate-nearest-neighbor vector search for Clojure: an in-process HNSW index with metadata and save/load, over hnswlib.

Stack

Clojure hnswlib

Part of a local RAG substrate stack: pdfplumber-clj (extract) → chunk-clj (split) → tokenizers-clj (tokenize) → embeddings-clj (embed) → vector-search-clj (retrieve).

Installation

deps.edn:

net.clojars.savya/vector-search-clj {:mvn/version "0.1.0"}

Leiningen:

[net.clojars.savya/vector-search-clj "0.1.0"]

Pure JVM - no native dependencies, no server.

Usage

(require '[vector-search.core :as vs])

(def idx (vs/index {:dim 384 :metric :cosine}))

(vs/add! idx "chunk-1" vec-1 {:source "report.pdf" :page 3})
(vs/add! idx "chunk-2" vec-2 {:source "report.pdf" :page 7})
(vs/add-batch! idx [{:id "chunk-3" :vector vec-3 :metadata {:source "notes.md"}}])

(vs/search idx query-vec 10)
;; => [{:id "chunk-2" :score 0.87 :metadata {:source "report.pdf" :page 7}} ...]

(vs/get-item idx "chunk-1")   ;; => {:id .. :vector float[] :metadata ..}
(vs/remove! idx "chunk-1")    ;; => true
(vs/size idx)                 ;; => 2

;; persistence: a directory with hnswlib's index.bin + an EDN sidecar
(vs/save idx "data/my-index")
(def idx2 (vs/load-index "data/my-index"))

Options to index (defaults shown):

optiondefaultmeaning
:dimrequiredvector dimensionality
:metric:cosine:cosine, :dot, or :euclidean
:capacity10000initial max items; grows automatically when full
:m16HNSW graph degree
:ef-construction200build-time search breadth
:ef50query-time search breadth; higher = better recall, slower

Semantics worth knowing:

  • Scores: for :cosine and :dot, :score is a similarity (higher is better; cosine of an exact match ≈ 1.0). For :euclidean it is the L2 distance (lower is better). Results are always ordered best-first.
  • Vectors: float[] (zero-copy, e.g. straight from embeddings-clj) or any sequential of numbers.
  • Ids: any EDN-round-trippable, Serializable value (strings, keywords, numbers, ...).
  • add! with an existing id replaces the stored vector and metadata.
  • HNSW is approximate: recall is tuned by :ef (the seeded test suite holds recall@10 ≈ 0.99 on defaults).

Errors are ex-info maps keyed :vector-search/error (:missing-dim, :unknown-metric, :dim-mismatch, :invalid-vector, :index-not-found).

Running tests

clojure -M:test

Everything is deterministic and self-contained (the recall smoke test uses a seeded RNG); there is nothing to download.

License

Copyright © 2026 Savyasachi.

Distributed under the Eclipse Public License 2.0.

Can you improve this documentation?Edit on GitHub

cljdoc builds & hosts documentation for Clojure/Script libraries

Keyboard shortcuts
Ctrl+kJump to recent docs
Move to previous article
Move to next article
Ctrl+/Jump to the search field
× close