Hands-on walkthrough of Sandbar's fulltext-search surface — BM25F multi-field weighted scoring over Datomic + Lucene. When to use which verb. Worked examples for
:wherecomposition,:facet-byaggregation, and:includeprojection options. For the algebraic theory + lineage, seedoc/concepts/fulltext-search.md.
| Question | Primitive |
|---|---|
| "Single-field search; return matches" | sandbar.search/search-attribute |
| "Multi-field weighted search across a class's slots" | sandbar.search/search-bm25f |
| "Facet counts over a search result set" | :include [:facets] on search-bm25f |
| "Snippets / highlights" | :include [:snippets] on search-bm25f |
| "Combine fulltext + structured filter" | :where opt on search-bm25f |
The substrate primitives live in sandbar.db.datatype:
dt/search-fulltext — single Lucene-backed attribute querydt/bm25f-weights-of — read declared weights from class entitydt/fulltext-indexed? — predicate; is this attribute :db/fulltext true?:db/fulltext slotsA class's string slots that should be searchable need :db/fulltext true in the schema:
;; schema/mm.edn excerpt
{:db/ident :mm.memory/name
:db/valueType :db.type/string
:db/cardinality :db.cardinality/one
:db/fulltext true ; ← required for fulltext indexing
:dt/required? true}
{:db/ident :mm.memory/body-raw
:db/valueType :db.type/string
:db/cardinality :db.cardinality/one
:db/fulltext true}
Datomic builds the Lucene index at transact time for :db/fulltext slots. Without this flag, (fulltext ...) queries return empty. Use dt/fulltext-indexed? to assert at runtime.
Per-class slot weights live in the schema layer:
;; schema/mm.edn
{:db/ident :mm/Memory
:dt/type :dt/Class
:dt/subclass-of :dt/Resource
:dt/slots [:mm.memory/name
:mm.memory/description
:mm.memory/body-raw
:mm.memory/tags]
:dt/bm25f-weights [[:mm.memory/name 12.0] ; ← high weight on titles
[:mm.memory/description 8.0]
[:mm.memory/body-raw 1.0] ; ← unit weight on body
[:mm.memory/tags 6.0]]}
At query time, the substrate reads weights from the class entity via dt/bm25f-weights-of. Callers may override via the :field-weights opt, but the schema declaration is the default.
The simplest form. One :db/fulltext attribute, ranked by Lucene's single-field BM25:
(require '[sandbar.search :as search])
(search/search-attribute
{:attribute :mm.memory/name
:query "datomic"
:limit 20})
;; => {:hits [{:entity <entity-map> :score 5.42} ...]
;; :total <int>}
Use when: the search is field-specific (titles only, body only) and per-field weighting isn't needed.
The general form. Walks all :dt/bm25f-weights slots, scores per field, combines via the Robertson-Zaragoza canonical formula:
(search/search-bm25f
{:class :mm/Memory
:query "datomic recursive rules"
:limit 20})
;; => {:hits [{:entity <entity-map> :score 12.34} ...]
;; :total <int>
;; :timing {:tokenize-ms 1 :score-ms 12 :total-ms 14}}
Override declared weights:
(search/search-bm25f
{:class :mm/Memory
:query "datomic"
:field-weights {:mm.memory/name 20.0 ; boost titles further
:mm.memory/body-raw 0.5} ; dampen body
:limit 10})
Use when: ranking quality matters across multiple slots — titles + body + tags weighted distinctly.
:where DatalogFulltext ∩ structured filter. The :where clauses constrain the candidate set BEFORE BM25F scoring:
;; Search only :decision-typed memories
(search/search-bm25f
{:class :mm/Memory
:query "datomic recursive rules"
:where '[[?e :mm.memory/memory-type :decision]]
:limit 20})
;; Search only memories tagged "architecture"
(search/search-bm25f
{:class :mm/Memory
:query "datomic"
:where '[[?e :mm.memory/tags ?tag]
[?tag :mm.tag/value "architecture"]]
:limit 20})
;; Combine multiple filters
(search/search-bm25f
{:class :mm/Memory
:query "BM25F"
:where '[[?e :mm.memory/memory-type :decision]
[?e :mm.memory/scope :global]]
:limit 20})
?e is the conventional variable name for the entity at the head of the BM25F walk; bind to it in your :where clauses to filter the candidate set.
:include [:snippets] emits per-slot snippet windows centered on the first query-term hit:
(search/search-bm25f
{:class :mm/Memory
:query "datomic recursive rules"
:limit 10
:include [:snippets]})
;; => {:hits [{:entity <entity-map>
;; :score 12.34
;; :snippets {:mm.memory/name "...**datomic** **recursive** **rules**..."
;; :mm.memory/body-raw "...the **datomic** layer handles **recursive** ..."}}
;; ...]}
Highlighting is **term** markdown syntax. Snippet window ~240 chars centered on the first match, with ellipsis pre/suffix when text continues beyond edges. Approximate (regex-based); Lucene's native positional highlighter is a Phase-2 optimization.
:include [:facets] + :facet-by emits per-slot value counts over the BM25F match set:
(search/search-bm25f
{:class :mm/Memory
:query "datomic"
:limit 20
:include [:facets]
:facet-by [:mm.memory/memory-type :mm.memory/scope]})
;; => {:hits [...]
;; :facets {:mm.memory/memory-type {:decision 12 :plan 7 :observation 4 :pattern 2}
;; :mm.memory/scope {:global 18 :scoped 7}}}
Use when: the consumer needs both ranked results AND aggregated counts over the same query — saves a round-trip vs separate search + aggregate calls.
:include [:scores] exposes the per-field score contributions for debugging or relevance tuning:
(search/search-bm25f
{:class :mm/Memory
:query "datomic"
:limit 5
:include [:scores]})
;; => {:hits [{:entity <entity-map>
;; :score 12.34
;; :field-scores {:mm.memory/name 8.2
;; :mm.memory/description 3.1
;; :mm.memory/body-raw 1.04
;; :mm.memory/tags 0.0}}
;; ...]}
Use when: tuning weights or diagnosing why a result ranked unexpectedly.
The sandbar.search.bm25f MCP verb accepts the same opts (forthcoming at Stage 27 — currently search MCP verbs are not yet exposed; use in-process Clojure form):
# Pending: Stage 27 of the comprehensive arc wires search MCP verbs
curl -X POST http://localhost:8080/mcp \
-H "Authorization: Bearer $SANDBAR_TOKEN" \
-d '{"jsonrpc":"2.0","id":1,"method":"tools/call","params":{
"name":"sandbar.search.bm25f",
"arguments":{
"class":":mm/Memory",
"query":"datomic recursive rules",
"limit":20,
"include":["snippets","facets"]}}}'
Search composes with aggregation directly via :facet-by. Future Stage 29 will compose with navigation via :from + :via (path-grammar restriction) — a graph-walk neighborhood becomes the candidate set for BM25F ranking.
:limit.doc/concepts/fulltext-search.md — lineage, theory, referencesdoc/concepts/aggregation.md — the :facet-by composition patterndoc/guides/navigating-with-paths.md — composing search + path-grammar (Stage 29)doc/api/dt-star.md — dt/search-fulltext / dt/bm25f-weights-of / dt/fulltext-indexed? substrate primitivesdecisions/bm25f_canonical_robertson_zaragoza_form.md — ADR locking the canonical formCan you improve this documentation?Edit on GitHub
cljdoc builds & hosts documentation for Clojure/Script libraries
| Ctrl+k | Jump to recent docs |
| ← | Move to previous article |
| → | Move to next article |
| Ctrl+/ | Jump to the search field |