Liking cljdoc? Tell your friends :D

Datomic Cloud Backup and Restore

datomic cloud backup

This small library allows you to incrementally save the transaction log of a Datomic cloud database (may also work with on-prem, though there is an official story there). Datomic Cloud (at the time of this writing) only has a way to pull data into dev-local, but no way to actually store a proper incremental backup for business continuity reasons.

Status

This library is in production testing as of Sept 10, 2021. The backup creation seems solid, and the restore process has been tested with nearly 100M datoms. Be sure to read the notes section below.

Usage

This library supports creating a clone of a Datomic database by reading the transaction log and saving that to a durable storage (incrementally). Restoration of the database involves playing back this transaction log into a new database. The one major complication is that the new database will assign different IDs to the new entities as they are transacted, thus, we must keep track of what these new IDs are in the new database so we can remap them in later incremental steps.

The storage for the backups is represented with the protocol:

  • BackupStore - A place to store the actual transactions

This library comes with a sample BackupStore based on Filesystems and AWS S3.

Extra Dependencies May Be Required

If you want to use the supplied S3 file store then you MUST add the following to your dependencies (they are not included as transitive dependencies in case you want to use something else):

  • com.datomic/dev-local - if you want to play with it in dev mode

  • com.amazonaws/aws-java-sdk-s3 - if you want to use the S3BackupStore

You will then need to add the backup library to your application that you are deploying to the cloud, and run (on a periodic basis) an incremental backup.

Running a Backup

Basically create a BackupStore (db-store in the example) that your restoration process will be able to see, and then periodically run a function like this:

(defn backup-step!
  "Backs up a maximum of txns-per-backup-segment, starting where we last left off. Returns the actual number
  of transactions written."
  [dbname conn db-store txns-per-backup-segment]
  (let [{:keys [end-t]} (dcbp/last-segment-info db-store dbname)]
    (if (and end-t (pos? end-t))
      (let [next-start (inc end-t)
            max-t      (min (+ next-start txns-per-backup-segment) (:t (d/db conn)))]
        (if (pos? (- max-t next-start))
          (try
            (let [{:keys [start-t end-t]} (cloning/backup-segment! dbname conn db-store next-start max-t)]
              (inc (- end-t start-t)))
            (catch Exception e
              (log/error e "Incremental backup step failed!")
              0))
          (do
            (log/trace "There was nothing new to back up on" dbname)
            0)))
      (do
        (log/warn "Not doing incremental backup step for" dbname "because there is no base backup in the store.")
        0))))

Basically you just call backup-segment! on a periodic basis and it makes a new chunk.

You can run a fast initial backup with backup!, which is capable of creating a large number of chunks (in parallel) for your initial step. Then you can start running the above loop without it taking as long to catch up.

Running a Restore

On a different machine that uses a different Datomic deployment (WARNING: restore uses the db names you use when backing up), you will need to make a simple application that uses this library to read the backup segments and apply them to a new (empty) database. The restore restores timestamps of transactions, so you MUST NOT transact anything against the target databases, or restore will not be able to proceed.

We could relax this and lose the transaction times…​perhaps as an option. Not implemented.

The restore-segment! function automatically figures out the next segment to restore, and applies it. It returns :nothing-new-available if there is nothing else to do, and :restored-segment it restored a segment. The simplest restore process would be:

(while (= :restored-segment (cloning/restore-segment! dbname conn dbstore {})))

Of course you’ll want to do something more complex for streaming replication, since new segments will arrive after this loop finishes. You’ll probably want to log progress, sleep when there is nothing available, etc.

NOTES

  • Two schema items are added to the TARGET database ::cloning/last-source-transaction and ::cloning/original-id.

    • last-source-transaction is the last t that has been restored. It is an ident, and there is exactly ONE datom in the database (with no history) that has this as both that ID and attribute: [::cloning/last-source-transaction ::cloning/last-source-transaction 2]

    • original-id is stored on every restored entity. It is the ID that the entity has in the SOURCE database, and is used for remapping incoming IDs, and as audit/source information.

  • You cannot write transactions to the target database during a restore and expect the restore to be able to continue. The restore tries to include the reified transactions to maintain any auditing data you may have added to the database. As such it is also restoring the original tx time, which it cannot do if you transact something that has a time close to "now".

  • Using an elision predicate on the restore can cause transactions to become empty. Those transactions will be still be added to the database so that last-source-transaction is updated. This ensures that interrupted restores can be properly restarted, and that the basis of every transaction will match the original.

  • If you’ve been running Datomic Cloud for a while, you probably have upgraded storage and base schema at some point. The restore code should handle upgrades fine, but of course if you were to try to restore to an older version of a Datomic cloud template from a newer source it will almost certainly fail.

License

The MIT License (MIT) Copyright (c) 2017-2019, Fulcrologic, LLC

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

Can you improve this documentation? These fine people already did:
Tony Kay & Jakub Holý
Edit on GitHub

cljdoc is a website building & hosting documentation for Clojure/Script libraries

× close