Date: 2026-04-16
Accepted.
chachaml needs to persist arbitrary binary blobs (trained models, plots, datasets, files) attached to a run. Two natural options exist for the default SQLite backend:
BLOB column.Trade-offs:
| Concern | DB BLOB | Filesystem |
|---|---|---|
| Single-file backup | Yes | No — DB and artifact dir must travel together |
| Streaming large files | Awkward; SQLite has 1 GiB limit | Trivial; OS handles it |
| External tooling | None — bytes locked in DB | ls, cp, tar, aws s3 cp all work directly |
| Concurrent writes | SQLite serialises everything | Filesystem handles concurrent file writes natively |
| Hash/dedup later | Easy via SQL | Easy via separate dedup index |
| Atomicity with row | Same transaction | Two-step: write file, then insert row (potential leak on crash) |
The expected workload is ML models and plots — typically MB-scale, occasionally tens of MB. Inline BLOBs would push SQLite past its sweet spot, and the inability to reach artifacts with standard CLI tools rules out a lot of pragmatic workflows (sharing a model with a colleague, uploading to S3, inspecting a saved plot).
Artifact bytes live on the filesystem, in a directory paired with the database file. The database holds only metadata: id, run id, artifact name, relative path, content type, byte size, SHA-256 hex digest, and creation timestamp.
Default layout:
./chachaml.db # SQLite metadata
./chachaml-artifacts/ # paired artifact directory
└── <run-id>/
├── <artifact-name>
└── …
Custom DB paths derive a sibling artifact directory:
/path/to/foo.db ⇒ /path/to/foo-artifacts/ (the .db suffix is
stripped). Users may override with (open {:artifact-dir "…"}).
In-memory mode (:in-memory? true) creates a temporary directory for
the duration of the store, removed on close!.
The write order is deliberately: bytes first, then row insert. A crash between the two leaves orphan files but never a metadata row pointing at missing bytes. Orphan cleanup is not implemented in v0.1; it is a known follow-up if it becomes a real problem.
.db file will silently lose every
artifact. Users must back up the artifact directory too.
CONTRIBUTING.md will mention this when the surface area grows.ArtifactStore
separately from RunStore, so they can use object storage natively
without changing the public API.(run_id, name) index in the DB rejects duplicate
artifact names per run at insert time; the filesystem layer
doesn't need its own collision logic.Can you improve this documentation?Edit on GitHub
cljdoc builds & hosts documentation for Clojure/Script libraries
| Ctrl+k | Jump to recent docs |
| ← | Move to previous article |
| → | Move to next article |
| Ctrl+/ | Jump to the search field |