Liking cljdoc? Tell your friends :D

Storage

MonkeyCI needs to store various kinds of information:

  • Project information (repository url, etc...)
  • Build history (pipelines and jobs executed)
  • Logs
  • Artifacts
  • Caches
  • Customer and billing information (including invoices)

Most of this information is fairly small and structured, except for logs, caches and artifacts, which can be large blobs. The structured information needs to be searchable up to a level, and of course it must be durable. I would like to keep an open view on which technology is most suited for this, so I don't want to blindly fall back to a relational database. Currently I'm thinking that keeping the information in edn (or json) files in object storage could be useful. This can then be augmented with some kind of indexing system, to allow for searching. Indices themselves could also be stored in edn, and could be loaded in a Redis or ElasticSearch. As long as there is no income, I will focus on the cheapest solution that gets the job done, without having to re-invent the wheel. OCI also offers an autonomous JSON database, which could also serve our needs.

Storing Information

The build process itself only needs to store information, there is no need to read any, apart from caching. Initially, we will store everything in object storage, as edn files. The advantage over json is that edn can be appended, you can have multiple objects in one file. This could be useful for adding log statements, or updating build progress. The information is stored in a single bucket, organized like <customer>/<project>/<repository>/<build>. The build id is generated by MonkeyCI, which could be as simple as a UUID. Each build "folder" contains the following information:

  • Build metadata (timestamp, trigger type, branch, commit id, etc., result)
  • Per pipeline and step: the logs, artifacts, and results.

Depending on the configuration, this could also just be store locally, which is what we will do initially, or in development mode.

Artifacts

Artifacts are just blobs that will be put into storage after each build step. Since storage is not free, we will have to put a limit to the amount of data, or to the period we will store it. Artifacts are configured at step level, and have a name and one or more paths that will be added to the artifact. We will probably use tar and gzip to put all files in one package.

Caching

Caches are similar to artifacts, but caches are not publicly available, but rather reused between builds. Similar to CircleCI or Gitlab, we could assign a key to each cache. This means that caches won't be stored along with the build, but higher up, most likely at repository level. Each build step can hold a cache configuration entry, that has a key and a list of paths that need to be cached/restored. Before the step is executed, the cache is restored (if found), and after the step, it is updated. Depending on the configuration, the update will happen only if the step was successful, or regardless of status.

Can you improve this documentation?Edit on GitHub

cljdoc is a website building & hosting documentation for Clojure/Script libraries

× close