Liking cljdoc? Tell your friends :D

Sandbar

A metamodel and platform for LLM memory systems.

Sandbar is a graph data store built on RDFS-style types, classes, properties, and inheritance exposed simultaneously through MCP (Model Control Protocol), and HTTP REST API.

The graph store is Equipped with a four-axis retrieval surface (fulltext search with BM25F, structural + temporal aggregation, typed-edge navigation with path-grammar, and orientation).

The type system is data, queryable + evolvable at runtime through the same API you use to query your application's entities. The wire-format layer is substrate, not application concern. Long-running operations have history, cancellation, and outcome classification baked in.

The value: a substrate where "find by content," "walk the typed-edge graph from this seed," "rank by structural prominence," and "describe yourself" are all one-line queries against a single coherent model — not four separate libraries glued together. And this forms the basis of our LLM memory store design.

This README is a 5-minute elevator. For depth, follow the pointers into doc/concepts/ (theoretical reference, citation-rich) and doc/guides/ (hands-on how-to).

Documentation map

Documentation is layered. Every entry below is a link; pick the layer that matches your goal.

Reading order suggestions:

What makes Sandbar interesting

Sandbar's individual ingredients exist elsewhere. The unique value is in the synthesis — how these ingredients combine into one substrate with a consistent discipline.

Metacircular RDFS on Datomic

RDFS gave us a clean vocabulary for classes, properties, inheritance, and predicates. Datomic gave us schema-on-read, first-class time, and expressive query. Sandbar stores its own type system inside Datomic using its own type system — :dt/Class is itself an instance of :dt/Class. Adding a class is a transaction; introspecting the schema is a query. Application data and metadata flow through the same dt/* API.

doc/concepts/metamodel.md for theory + citations

Layer-targeting discipline + multi-protocol surface

The same metamodel is exposed simultaneously through HTTP REST, the Model Context Protocol (JSON-RPC + SSE for AI clients), and (incrementally) RDF / TTL. Every protocol layer projects from the same dt/* API — there are no parallel schemas to keep in sync. Adding a new protocol means adding a translator, not duplicating the model.

doc/concepts/mcp-protocol.md · doc/guides/writing-an-mcp-client.md · doc/guides/writing-a-rest-client.md

Codec layer absorbs wire-format complexity

Consumers talk in their native representation. The memory-corpus consumer passes markdown; a future RDF consumer will pass Turtle; an MCP client passes JSON. Sandbar's codec layer absorbs the parse/emit and binds the result to the model — same architectural shape as dt/* absorbing Datomic. Per-class :dt/native-codec declares the default; the mediator resolves at call time.

doc/concepts/codec-layer.md · doc/guides/implementing-a-codec.md

Fulltext search via Datomic + Lucene + BM25F

:db/fulltext slots are queryable through Datomic's native Lucene integration; Sandbar layers BM25F multi-field weighted scoring on top — same canonical Robertson-Zaragoza form as the corpus's reference implementation, with per-class :dt/bm25f-weights declared at the schema layer. The analyzer (Unicode-aware tokenizer + Porter stemmer) is metamodel-driven; no consumer hardcoding. Result projection composes with the rest of the retrieval surface — :where Datalog clauses, snippets, facets, structural composition — all opts on one verb.

doc/concepts/fulltext-search.md · doc/guides/searching-the-corpus.md

Aggregation primitives as first-class retrieval

count / group-by / structural-rank are substrate, not application-layer. degree, backlink-density, recency, and freshness are the four ranking axes — the substrate is class-agnostic (temporal slots are caller-supplied; no hardcoded knowledge of :mm.memory/last-touched etc.). sandbar.aggregate/{count-by,group-by,rank-by} opts-shaped API + MCP verbs + REST endpoints.

doc/concepts/aggregation.md

Path-grammar navigation (Wilbur lineage)

Sandbar speaks Kleene-algebra-over-binary-relations as a first-class navigation surface. EDN path expressions like [:SEQ [:REP* [:OR :cites :evidences]] [:RESTRICT [:type :decision]]] parse → canonicalize → compile to Datomic recursive rules. Twenty-one operators committed (eighteen Wilbur-derived from Nokia's 1989-2009 lineage + three SPARQL 1.1 parity additions); the executable Canonical-8 + Tier-2 = thirteen operators today. Paths are first-class values: length, prefix, subpath compose. Three-layer DSL/IR/Backend architecture means a future Asami or NFA backend is a translator, not a rewrite.

doc/concepts/path-grammar.md · doc/concepts/navigation.md · doc/guides/navigating-with-paths.md

Bootstrap-by-discovery

Every non-abstract class is automatically discoverable through every protocol. MCP tools/list walks dt/all-classes; JSON Schema is reflected from dt/range-of. Add a class to the schema and it auto-surfaces as a tool, a resource, a REST endpoint — no hand-curated registries, no mapping tables, no server restart.

doc/concepts/mcp-protocol.md

Workflows as first-class substrate

State machines are entities. Processes are running instances. MCP Tasks are workflow processes — task-id IS :db/id (no parallel registry). Terminal states carry an outcome classification (:success / :failure / :cancel) so consumers don't reinvent the "what kind of done is this" projection. Cancellation is workflow-substrate, not per-tool plumbing.

doc/concepts/workflow-substrate.md · doc/guides/designing-workflows.md

Filesystem-canonical projection (Anderson lineage)

The filesystem format is the canonical ground-truth. Sandbar's sandbar.projection/project-graph + ingest-graph primitives are bidirectional — DB state ↔ filesystem hierarchy of native-format files. Any backend complies with the filesystem format. Document chunks are addressable entities with their own URIs and sibling-chain navigation (:next-sibling / :previous-sibling, RDFS-inspired). The pattern borrows from James Anderson's de.setf.rdf:project-graph (Datagraph/Dydra-era CL CLOS-metaclass framework) and applies it to filesystem hierarchies as the native projection target.

doc/concepts/projection.md

Hybrid filesystem/database topology (experimental)

The partition between what lives on disk and what lives in the runtime DB is an open architectural question we're actively exploring. Filtering primitives on project.export / project.import exist precisely to enable this experimentation. Today, both sides are first-class. Tomorrow's answer depends on what measurement reveals.

doc/concepts/multi-store-architecture.md

Three concrete examples

Example 1 — Clojure, in-process

Define a class hierarchy, create a validated instance, query the metamodel:

(require '[sandbar.db.datatype :as dt])

;; Classes describe themselves
(dt/make :dt/Class
  {:db/ident :order/Order
   :dt/subclass-of :dt/Resource
   :dt/slots [:order/customer :order/total :order/status]})

;; Create a validated instance
(dt/make :order/Order
  {:order/customer customer-entity
   :order/total    299.99M
   :order/status   :order/pending})
;; => entity; validation passed; transacted

;; Introspect at runtime
(dt/slots-of      :order/Order)        ; #{:order/customer :order/total ...}
(dt/instance-of?  :order/Order order)  ; true
(dt/all-instances-of :dt/Resource)     ; every entity, including order

Example 2 — AI client (Claude or other MCP consumer)

Discover the surface; create an entity from markdown source; read it back:

export SANDBAR_TOKEN="<your-service-account-token>"

# 1. Discover available tools (bootstrap-by-discovery)
curl -X POST http://localhost:8080/mcp \
  -H "Authorization: Bearer $SANDBAR_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"jsonrpc":"2.0","id":1,"method":"tools/list"}'

# 2. Create an mm/Memory entity by passing markdown source —
#    codec layer absorbs the parse + class-binding
curl -X POST http://localhost:8080/mcp \
  -H "Authorization: Bearer $SANDBAR_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"jsonrpc":"2.0","id":2,"method":"tools/call","params":
        {"name":"sandbar.entity.create",
         "arguments":{"class":"mm/Memory",
                       "format":"markdown",
                       "source":"---\nname: Foo\n---\n# Context\n..."}}}'

# 3. Read it back as markdown (full section tree reconstructed)
curl -X POST http://localhost:8080/mcp \
  -H "Authorization: Bearer $SANDBAR_TOKEN" \
  -d '{"jsonrpc":"2.0","id":3,"method":"resources/read",
        "params":{"uri":"mcp://sandbar/mm/Memory/decisions/foo"}}'

doc/guides/writing-an-mcp-client.md for full client patterns → doc/guides/zorp-tutorial.md for a complete worked example

Example 3 — The four-axis retrieval surface

Walk a typed-edge graph; rank the result; project paths. Three composable axes in one short example:

(require '[sandbar.search    :as search]
         '[sandbar.aggregate :as agg]
         '[sandbar.navigate.path :as path])

;; Fulltext — BM25F across :mm/Memory's weighted slots
(search/search-bm25f
  {:class :mm/Memory
   :query "datomic recursive rules"
   :limit 10
   :include [:snippets :scores]})

;; Aggregation — group memories by type, rank-by backlink-density
(agg/group-by {:class :mm/Memory :group-by :mm.memory/memory-type})
(agg/rank-by  {:class :mm/Memory :rank-by :backlink-density :limit 10})

;; Path-grammar — walk the typed-edge graph with Kleene closure
(path/path-via
  {:from :decisions/some-anchor
   :via  [:SEQ [:REP* [:OR :cites :evidences]]
              [:RESTRICT [:dt/type :mm.memory/decision]]]})

Each axis is also a stable MCP verb (sandbar.search.bm25f, sandbar.aggregate.rank-by, sandbar.navigate.path-via) and a REST endpoint (GET /api/aggregate/rank-by, GET /api/navigate/path). Same model, three projections.

doc/concepts/path-grammar.md for the algebra · doc/guides/navigating-with-paths.md for worked patterns

Quick start

# Prerequisites: Java 11+, Leiningen, Datomic transactor running
git clone <repository-url> && cd sandbar
lein deps && lein repl

# In the REPL
(require '[sandbar.core :refer [go]])
(go)  ; HTTP on :8080; nREPL on :28888

# Sanity check (in another shell)
curl http://localhost:8080/api/status

doc/guides/quickstart.md for the 5-minute hands-on tour

Project layout

sandbar/
├── schema/             EDN class + property definitions
├── src/sandbar/
│   ├── codec.clj       Mediator + per-class :dt/native-codec resolution
│   ├── codec/          Codec protocol + markdown + JSON
│   ├── projection.clj  Anderson-style FS↔DB projection
│   ├── search.clj      BM25F + multi-field weighted scoring
│   ├── search/         search.analysis (Porter + Unicode) + search.bm25f
│   ├── aggregate.clj   count-by / group-by / rank-by (4 structural axes)
│   ├── navigate/       edges / walk / path (Wilbur path-grammar)
│   │   └── path/       ast / ir / datomic / value
│   ├── db/             dt/* model API + Datomic peer connection
│   ├── mcp/            MCP server (transport / protocol / tools / resources / prompts / tasks)
│   ├── api/            REST handlers (store / aggregate / navigate / workflow / event / job / auth)
│   ├── service/        Routing + validation-as-workflow
│   └── util/           Auth (Buddy-hashers) / events / workflow lifecycle
└── doc/                Layered documentation (Layer 2 + 3 + 4)

Running tests

lein test                                              # full suite
lein test :only sandbar.codec.markdown-test            # one namespace
lein test :only sandbar.datatype-test/make-test        # one deftest

FAQ

Q: Is this OWL/RDF? A: Inspired by RDFS, but simpler. Closed-world; no inference engine; no PhD required. The metamodel is closer to KL-ONE-shaped frames-with-inheritance than to OWL DL.

Q: Why both REST and MCP? A: Different consumers; same metamodel. Traditional HTTP clients want REST. AI clients want JSON-RPC with reflective tool discovery + push notifications. Both projections come from the same dt/* introspection — no parallel models to keep in sync.

Q: How does the codec layer relate to Datomic's serialization? A: It doesn't. Datomic handles in-store representation; codecs handle wire format at the protocol boundary. The codec layer absorbs format complexity from consumers, the same way dt/* absorbs Datomic query complexity.

Q: How does path-grammar compare to SPARQL property paths or Cypher relationship patterns? A: Sandbar's path-grammar shares the same Kleene-algebra-over-binary-relations spine. Wilbur (Lassila 1989, Nokia 2001-2009) is the source-of-truth lineage; SPARQL 1.1 (2013) formalized the same algebra independently; Cypher's variable-length paths converge on the same surface. Sandbar inherits the algebra, ships subset-first (Canonical-8 + Tier-2 = 13 operators executable today; Tier-3 vocabulary-registered but compilation deferred), and exposes paths as EDN-native first-class values (length, prefix, subpath). The three-layer DSL/IR/Backend architecture means a future Asami or NFA × graph-product backend is a translator, not a rewrite.

Q: Why BM25F instead of plain BM25 or Lucene's default Similarity? A: BM25F is the multi-field weighted form that Lucene's single-field BM25 doesn't natively express. Per-class :dt/bm25f-weights declare slot weights at the metamodel layer (e.g., :mm.memory/name 12.0 vs :mm.memory/body-raw 1.0); the analyzer (Unicode tokenizer + Porter stemmer) is metamodel-driven and matches the corpus's reference implementation byte-for-byte.

License

Copyright (C) Dan Lentz

Can you improve this documentation?Edit on GitHub

cljdoc builds & hosts documentation for Clojure/Script libraries

Keyboard shortcuts
Ctrl+kJump to recent docs
Move to previous article
Move to next article
Ctrl+/Jump to the search field
× close