This library supports creating a clone of a Datomic database by reading the transaction log and
saving that to a durable storage (incrementally). Restoration of the database involves playing back
this transaction log into a new database. The one major complication is that the new database will assign
different IDs to the new entities as they are transacted, thus, we must keep track of what these new IDs are
in the new database so we can remap them in later incremental steps.
Thus, there is a need for two kinds of durable storage (they have different characteristics, so they
are represented as two protocols):
This library comes with a sample BackupStore
based on AWS s3, and an IDMapper
that uses Redis (via carmine).
If you want to use these, then you MUST add the following two libraries to your dependencies (they are
not included as transitive dependencies in case you want to use something else):
-
com.datomic/dev-local
- if you want to play with it in dev mode
-
com.taoensso/carmine
- if you want to use the RedisIDMapper
-
com.amazonaws/aws-java-sdk-s3
- if you want to use the S3BackupStore
-
com.taoensso/nippy
- for S3BackupStore
You will then need to add this library to your application that you are deploying to the cloud,
and run, on a periodic basis, an incremental backup.
Here’s a very naive implementation that would allow you to clone an entire database using
resources in the cloud.
(ns com.fulcrologic.datomic-cloud-backup.cloning-test
(:require
[datomic.client.api :as d]
[com.fulcrologic.datomic-cloud-backup.protocols :as dcbp]
[com.fulcrologic.datomic-cloud-backup.cloning :as cloning]
[com.fulcrologic.datomic-cloud-backup.ram-stores :refer [new-ram-store new-ram-mapper]]
[com.fulcrologic.datomic-cloud-backup.s3-backup-store :refer [new-s3-store aws-credentials?]]
[com.fulcrologic.datomic-cloud-backup.redis-id-mapper :refer [new-redis-mapper available? clear-mappings!]]
[fulcro-spec.core :refer [specification behavior component assertions =>]])
(:import (java.util UUID)))
(defonce client (d/client {:server-type :dev-local
:storage-dir :mem
:system "test"}))
(defn backup! [dbname source-connection target-store]
;; Normally, you'd not write this as a tight loop, but would instead monitor if it is returning
;; positive numbers and pause when it isn't, as an infinite loop to keep streaming.
(loop [n (cloning/backup-next-segment! dbname source-connection target-store 2)]
(when (pos? n)
(recur (cloning/backup-next-segment! dbname source-connection target-store 2)))))
(defn restore! [dbname target-conn db-store mapper]
;; Normally you'd have some hot standby continuously running this loop. The `next-start` may not yet
;; be available, so if it stays the same, delay for a big to see if a new one arrives.
(loop [start-t 0]
(let [next-start (cloning/restore-segment! dbname target-conn db-store mapper start-t {})
last-t 7]
(when (<= next-start last-t)
(recur next-start)))))
(comment
;; Clone a db using RAM resources. Probably runs out of memory for anything
;; but small dbs.
(let [c (d/connect client {:db-name "some-database"})
target-c (d/connect client {:db-name "restored"})
store (new-ram-store)
mapper (new-ram-mapper)]
(backup! :some-database c store)
(restore! :some-database target-c store mapper))
;; Same thing, but uses S3 resources and Redis. Should handle any db size as
;; long as Redis has space for all of the ID mappings
(let [c (d/connect client {:db-name "some-database"})
target-c (d/connect client {:db-name "restored"})
store (new-s3-store "my-s3-bucket")
mapper (new-redis-mapper {:spec {:host "localhost"}})]
(backup! :some-database c store)
(restore! :some-database target-c store mapper)))
The Redis connection options are as described in the Carmine docs.
-
The call to backup-next-segment!
figures out where to start automatically. You tell
it how many transactions you want it to try to do (it stops if there are no more) and
it returns how many were done. Thus, if it returns 0 (or less than you asked) then
it is at or very near the end. If you sleep for some amount of time and try again,
then you’ll be doing a hot streaming replication of your database.
-
The call to restore-segment!
returns the next segment start time that it needs
in order to continue. If you call it with a start-t
that does not yet exist, then
it will return that same start-t
. Thus, you could sleep for some time and try again.
Doing so in an infinite loop results in streaming restoration of your database.
The Redis mapper is only needed, as you can see, on the restore side. The store needs to
be visible on both sides. Thus, an S3 bucket with proper permissions is the perfect way
to share the streaming data across regions/zones.