A theoretical reference for Sandbar's wire-format boundary abstraction. Explains why codecs live at the boundary (not inside the model, not inside the database, not inside each consumer), how per-class
:dt/native-codecresolution works, and the round-trip discipline that keeps the abstraction load-bearing. For the mechanicalCodecprotocol seedoc/api/codec-protocol.md; for authoring a new codec seedoc/guides/implementing-a-codec.md.
The codec layer absorbs the parse/emit translation between a consumer's native representation and the metamodel's typed entity shape. It is a boundary layer in the Parnas (1972) / Anderson (de.setf.rdf) sense: a stable interface that one side may evolve without disturbing the other.
The motivation is not abstract. The memory-corpus consumer thinks in markdown with YAML frontmatter. A future RDF/TTL consumer will think in Turtle. A JSON-only client thinks in JSON objects. None of these consumers should — and none of them need to — know about :dt/Class, :dt/slots, or Datomic's storage idiom. The codec layer is the place where "the consumer's source-of-truth representation" is converted into "the metamodel's typed entity," and back again, with round-trip semantic equivalence.
This mirrors how dt/* absorbs Datomic. Consumers of dt/* never write datomic.api/q or d/transact directly; they call dt/make and dt/instance-of? and let the layer do the translation. Codecs apply the same discipline one boundary out.
Parnas (1972) gave the canonical statement: a module's interface should hide design decisions that are likely to change. Wire formats are exactly such a decision — markdown today, TTL tomorrow, IPLD-Codec the day after. The metamodel and the application are unlikely to change in lockstep with the wire format; isolating the wire format behind a codec interface means we can change the wire format without rewriting the model, and we can extend the model without rewriting every codec.
de.setf.rdf:project-graphJames Anderson's de.setf.rdf (Datagraph/Dydra-era Common Lisp CLOS-metaclass framework) introduced the boundary-layer primitive idiom this codec design adopts wholesale. In Anderson's model, project-graph took the raw state of an RDF graph and projected it into a native-representation hierarchy (filesystem; rendered HTML; etc.), and ingest-graph did the inverse — accepting a native-representation hierarchy and re-deriving the graph state. The translation lived at the boundary, neither inside the model nor inside the consumer.
Sandbar adopts the same shape one layer up: project-graph / ingest-graph operate on collections of entities at the filesystem boundary (see project-graph.md); the codec layer operates on individual entities at the wire-format boundary. Both share the property that translation is a boundary concern, not a model concern.
RFC 793 (Postel 1981) — "be conservative in what you do, be liberal in what you accept from others" — is the operational discipline for codecs. A codec's parser must tolerate input variation (whitespace, trailing newlines, ordering of frontmatter keys, optional fields) while its emitter must produce a canonical form (single trailing newline; sorted frontmatter when ordering is semantically irrelevant; stable indentation). Without this discipline, round-trips drift and the codec stops being a boundary abstraction.
A codec is a value satisfying the sandbar.codec.protocol/Codec protocol with two methods:
(parse codec source) ; native-representation string → typed entity map
(emit codec entity) ; typed entity map → native-representation string
The contract is:
dt/make against the codec's bound class. The map carries :dt/type resolved.Codecs are values, not singletons. A codec can be parameterized (e.g., a markdown codec with strict YAML mode versus relaxed YAML mode); a registry of codec values lives in sandbar.codec/registry. Per-class :dt/native-codec declares the default codec for a class; the mediator (sandbar.codec/resolve) walks the registry and the class's declaration to find the codec to use.
sandbar.codec/parse and sandbar.codec/emit are mediator functions. They take an explicit codec name, or fall back to the class's :dt/native-codec:
;; Explicit codec
(codec/parse :codec/markdown source)
;; Class-default codec — resolves via :dt/native-codec on :mm/Memory
(codec/parse-for-class :mm/Memory source)
This is the same architectural shape as dt/* absorbing Datomic. Consumers do not import individual codec implementations; they call the mediator and let class-level declarations route.
Sandbar ships two reference codecs. Both are in src/sandbar/codec/.
sandbar.codec.markdown/MarkdownCodec — Markdown body with YAML frontmatter, used by the memory-corpus consumer and any class whose canonical representation is hand-authored text.
:mm/Section entities; bodies between headers become section bodies.:mm.section/previous-sibling and :mm.section/next-sibling references (RDFS-inspired pairwise links rather than rdf:List cons-cells). See the design discussion in mm-section-schema-path-derived-idents-sibling-chain-navigation (or equivalent in-tree ADR if migrated)."\n"); derived attributes (:db/ident, :mm.memory/rel-path, :mm.memory/first-section) are stripped during emit.The markdown codec's complexity is real: it must handle the asymmetry between a freely-authored document and a strictly-typed entity, including ordering of frontmatter (preserved on parse, sorted on emit when no canonical order exists), inline vs block bodies, and section-tree round-trip. These compromises are what make the codec the right place for the complexity — pushing it into the model would couple the model to one wire format; pushing it into consumers would replicate the same logic per consumer.
sandbar.codec.json/JsonCodec — JSON object with typed slot values, used by MCP clients (JSON-RPC payloads) and any class whose canonical wire form is JSON.
:order/total from "order/total" rather than "total").:db.type/long → JSON number; :db.type/bigdec → JSON string (because JSON has no decimal); :db.type/instant → ISO 8601 string.A codec without round-trip discipline is a translator, not a boundary abstraction. If parse(emit(x)) ≠ x for typical x, then consumers downstream of the codec have to know about the asymmetry, and the boundary leaks.
The discipline is enforceable mechanically. Each codec implementation in Sandbar carries a codec/<name>-test namespace with property-style round-trip tests:
(deftest markdown-round-trip
(testing "parse-then-emit is identity on normalized input"
(let [normalized (markdown/normalize-document source)]
(is (= normalized
(codec/emit codec (codec/parse codec normalized)))))))
Failures of round-trip discipline have been the source of every codec-layer bug we have caught in development (see the codec sub-arc memorials). The discipline is not aspirational — it is what makes the abstraction load-bearing.
When an MCP client calls sandbar.entity.create with {:class "mm/Memory", :format "markdown", :source "..."}:
:mm/Memory.:dt/native-codec — :codec/markdown.:format matches the native codec, it uses that codec directly. Otherwise it walks the codec registry to find one bound to the requested format.codec/parse codec source to obtain the entity map.dt/make :mm/Memory parsed-map to transact.The same path runs in reverse for resources/read: the resource handler queries the entity, looks up the codec, calls codec/emit codec entity, and returns the native-representation string.
No consumer of sandbar.entity.create or resources/read knows about the codec implementation. The codec is a routing decision made at the boundary, hidden from both sides.
Datomic has its own serialization concerns — fressian for storage, EDN for transactions, projection through datomic.api/pull. These are in-store concerns, not wire-format concerns. The codec layer does not interact with them. By the time a codec receives an entity from the database, the entity is already a Clojure map; by the time a codec produces an entity for the database, the codec hands the result to dt/make, which translates it into a Datomic transaction.
HTTP Content-Type negotiation selects which codec to apply at the protocol boundary. An HTTP handler accepts Content-Type: text/markdown and routes to the markdown codec; Content-Type: application/json routes to JSON. The codec layer does not own the HTTP-level negotiation — that lives in the protocol layer — but it provides the implementations the protocol layer dispatches to.
tools/call argumentsMCP tools/call passes {:arguments {...}} as JSON-RPC, so the wire format at the protocol boundary is always JSON. But the value inside :source may be a markdown string; in that case the codec invoked is the markdown codec, even though the outer envelope was JSON. The two layers — protocol envelope and codec body — are orthogonal.
A future Turtle codec would let a Sandbar instance serve text/turtle from resources/read for any class. The shape is already prepared: declare :dt/native-codec :codec/turtle on the class, implement the protocol, register, done. No model change required.
ORMs (Hibernate, ActiveRecord, Datalevin's projection mode) sit at the same boundary but on the inside of the database, not the outside. An ORM hides "which SQL did the model emit?"; the codec layer hides "which wire format is the consumer presenting?" They solve different problems with the same shape.
Wire-format schema languages (Protobuf, Avro, Thrift IDL) generate code from a schema definition; the generated code performs parse/emit at the protocol boundary. This is the same idea Sandbar implements, but Sandbar's schema is the metamodel itself, and codec generation is on-demand at runtime via :dt/range reflection rather than build-time codegen. A consumer requesting tools/list receives JSON Schema reflected from the live class definitions; there is no compiled schema artifact to keep in sync.
GraphQL resolvers sit one level higher: they answer "how do I compute this field on this type?" Codecs sit one level lower: "how do I marshal this typed entity to/from this wire format?" A GraphQL projection of Sandbar would use codecs to handle the marshaling; resolvers would be unnecessary because the metamodel is already the type system.
Author a new codec when:
Do not author a new codec when:
:db/id to ident form). Those concerns belong to dt/*.Decomposition and boundary-layer thinking
Robustness principle
Anderson de.setf.rdf lineage
Wire-format schema languages (for contrast)
Markdown / YAML specifications
Turtle / RDF wire formats (for the planned TTL codec)
metamodel.md — the typed-entity shape codecs translate to/fromproject-graph.md — the boundary-layer primitive at the filesystem level; same shape, different scalemcp-protocol.md — how MCP tools/call routes through codecsdoc/api/codec-protocol.md — mechanical protocol surfacedoc/guides/implementing-a-codec.md — hands-on how-toCan you improve this documentation?Edit on GitHub
cljdoc builds & hosts documentation for Clojure/Script libraries
| Ctrl+k | Jump to recent docs |
| ← | Move to previous article |
| → | Move to next article |
| Ctrl+/ | Jump to the search field |