Liking cljdoc? Tell your friends :D

Sizing & Scaling

Boundary’s architecture (FC/IS + hexagonal ports) is meant to let you scale vertically, horizontally, or a mix — mostly by configuration rather than rewrites. This page is an honest map of how far that holds today: which knobs exist, which components are already safe to run as many replicas, and which still hold in-process state that you must account for.

Three axes

AxisMeaning

Vertical

One process, more resources. Bigger heap, larger connection/thread pools, more CPU. Pure configuration in Boundary.

Horizontal

Many processes (replicas) behind a load balancer, sharing backing services (Postgres, Redis). Requires that no request-handling state lives in a single process.

Functional decomposition

Slice modules into separate deployables (microservices-style) that scale independently. A cross-module call that was in-process becomes a network call. Highest leverage, highest effort.

Vertical scaling — by configuration today

All sizing knobs live in resources/conf/{dev,test,prod,acc}/config.edn plus environment variables. Change the value, restart the process. No code.

KnobWhereNotes

DB connection pool

:boundary/postgresql :poolminimum-idle, maximum-pool-size, connection-timeout-ms, idle-timeout-ms, max-lifetime-ms, keepalive-time-ms, leak-detection-threshold-ms

HikariCP. prod default min 10 / max 50, dev 2 / 10, test (H2) 1 / 5. DB_POOL_SIZE env override.

HTTP server

:boundary/http:port, :host, :port-range

Jetty. HTTP_PORT / HTTP_HOST env. Jetty manages its own thread pool.

JVM heap / GC

JAVA_OPTS (see docker-compose.yml)

Default -Xmx512m -Xms128m -XX:+UseG1GC -XX:+UseContainerSupport. Raise -Xmx for vertical scale.

Cache (Redis) pool

:boundary/cache:max-total, :max-idle, :min-idle, :timeout, :default-ttl

Jedis pool. prod default max-total 50 / max-idle 20 / min-idle 5.

Watch the multiplication: with N replicas each holding a pool of maximum-pool-size, total Postgres connections = N × max. Keep N × max under the server’s max_connections.

How the architecture enables horizontal scaling

Cross-module calls go through protocols defined in each module’s ports.clj (enforced by bb check:ports). Core logic depends on a protocol, never on a concrete adapter. That seam is the scaling lever: swap an in-process adapter for a distributed one — Redis, a queue, a remote service — without touching the functional core.

Two libraries already ship both adapters and pick between them in config:

;; libs/cache  — :in-memory (dev/test) | :redis (multi-instance)
;; libs/jobs   — :in-memory (dev/test) | :redis (shared queue across workers)
:boundary/cache {:provider :redis ...}

This is the template every other seam follows: the protocol is the contract, the distributed adapter is "just configuration" once it exists.

Horizontal readiness matrix

ComponentN replicasDetail

Cache

Redis adapter (libs/cache/…​/adapters/redis.clj) — Nippy serialization, atomic ops. In-memory adapter is dev/test only.

Jobs

Redis queue (libs/jobs/…​/adapters/redis.clj) — priority lists, worker heartbeat, retry/backoff, dead-letter. Scheduled-job promotion is an atomic claim (Redis ZREM / in-memory swap-vals!), so a due job is moved to an execution queue exactly once across concurrent workers. The handler registry is per-process, but a dequeued job with no local handler is re-enqueued for another worker and dead-lettered only after it has gone unhandled for :max-requeue-age-ms (default 5 min) — no silent DLQ drop. The re-enqueue is delayed (parked in the scheduled set :requeue-delay-ms ahead) so a handlerless worker can’t reacquire it in a tight loop. Give-up is age-based, not attempt-based, so wrong-worker misses under load (or more handlerless workers than any attempt budget) can’t drop a job a slow handler-owning worker simply hadn’t polled yet. A worker started with an empty registry warns loudly at startup (BOU-88).

Auth / sessions

DB-backed, pure core (libs/user/…​/core/session.clj). No server-side sticky state — any replica can serve any request.

Multi-tenancy

schema-per-tenant (libs/tenant/…​/provisioning.clj). Instance-agnostic; routing is per-request.

Email / external

Async via the jobs queue / stateless IO adapters.

Readiness checks

/health/ready (readiness-handler, wired in wiring.clj) probes DB + cache and returns 503 when any is down — correct for load balancers and k8s.

Rate limiting

The config-driven http-rate-limit-protection interceptor is wired into the default route pipeline (wiring.clj injects :rate-limit config + the :boundary/cache into the interceptor system map). Configure under :boundary/http :rate-limit {:enabled? :limit :window-ms} (per-env; HTTP_RATE_LIMIT/HTTP_RATE_LIMIT_WINDOW_MS). Enforcement is opt-in (default off — the bundled prod/acc configs ship it disabled; enable it together with an active Redis cache) so an upgrade cannot start 429-ing existing consumers, nor silently run a per-process limiter in production. With an active Redis cache the fixed-window limit is shared across replicas; with no cache it falls back to a per-process counter — single-node only, effective global limit = limit × N (the wiring logs a warning at startup in that case). The in-process fallback is heap-bounded by a hard cap — before a new client is recorded at the cap, stale clients are swept and, if the map is still full, the least-recently-active client is evicted — so high-cardinality client ids can’t leak memory on a long-running node even when every client is in-window. BOU-87.

Graceful shutdown

Integrant halt! runs on a JVM shutdown hook (src/boundary/main.clj), closing pools and stopping Jetty. Jetty is configured for graceful connection draining (configure-graceful-shutdown! in wiring.clj): on stop it stops accepting new connections, rejects new requests with 503, and lets in-flight requests finish within :boundary/http :drain-timeout-ms (default 30000 ms in prod/acc, env HTTP_DRAIN_TIMEOUT_MS; 0 disables). Set the window above the load balancer’s deregistration delay for zero-downtime rollouts (BOU-86).

Realtime / WebSocket

✅ *

Replica-safe via :provider :redis (BOU-85, ADR-035). Live sockets remain node-local; routing envelopes fan out over a Redis pub/sub channel so a broadcast reaches clients on any replica. Topic subscriptions are stored in Redis sets and are cluster-wide. *The default :in-memory provider is single-node only — use :redis for multi-replica deployments.

Topologies

ShapeWhen

Single fat node

Vertical only. One process, large heap and pools. Everything works, including realtime and in-memory rate limiting. Simplest; capped by one machine.

N stateless web replicas

The main horizontal mode. N copies of the uberjar behind a load balancer, sharing Postgres + Redis. Cache, jobs, auth, tenancy all scale. Caveats: wire Redis rate limiting; WebSocket scales horizontally via :provider :redis (sticky sessions are only required when running the :in-memory provider).

Web / worker split (future)

Run dedicated job-worker processes separate from web. The jobs Redis queue already supports it, but there is currently no worker launch mode — boundary.main exposes only server and cli. Adding a worker entrypoint is the enabling step.

Production checklist

  • Use the Redis cache and jobs adapters, never :in-memory, for more than one replica.

  • Register all job handlers on every instance. A dequeued job with no local handler is re-enqueued (bounded by :max-requeues) so another instance can run it, and is dead-lettered only if no instance handles it — but a job-type that no worker registers still burns its requeue budget before failing, so keep handler sets consistent.

  • Enable rate limiting (:boundary/http :rate-limit :enabled? true) with an active Redis cache for a global limit across replicas; without a cache each instance counts independently (limit × N).

  • Keep replicas × maximum-pool-size under Postgres max_connections.

  • Confirm your load balancer points health probes at /health/ready (503-aware), not /health/live.

  • For WebSocket: use :provider :redis on :boundary/realtime to scale across replicas (ADR-035). Sticky sessions / single-node are only required with the default :in-memory provider.

  • Add Redis (and, for testing N replicas, a load balancer) to your deployment — docker-compose.yml currently defines a single app service only.

Functional decomposition (slicing services out)

The third axis: run a module (or a few) as its own process, scaled and deployed independently of the rest. This is where the ports.clj seam pays off most — and where the most net-new infrastructure is needed. It is not free "by config" today, but the architecture is positioned for it.

What already enables it

AssetHow it helps

Per-module activation

Modules are gated by :enabled? / :active in config.edn. A process can boot a subset — the http-handler concats only present routes ((or routes []) in wiring.clj), so a "user-only" process is a config, not a fork.

The protocol seam

Consumers depend on the protocol (e.g. IUserService), never the concrete record. Swapping an in-process record for a remote HTTP client implementing the same protocol leaves the caller untouched.

Wire format ready

Muuntaja (JSON / EDN / Transit) is already in the HTTP stack (reitit_router.clj); Malli schema.clj per module gives ready contracts.

Remote-adapter template

libs/external (SMTP, Twilio) is a gold-standard outbound adapter: record + extend-protocol + clj-http + error envelope + logging. Copy it for a service client.

Clean data boundaries

bb check:ports already forbids one module’s shell from touching another’s shell.persistence/shell.service — the only cross-module path is the service port. No cross-module SQL joins to untangle.

Context plumbing

correlation-id, tenant, and auth already flow through the interceptor pipeline and can ride request headers across a network hop.

What must be built

  • Generic remote-port adapter — a clj-http client that implements a module’s protocol, serializes via Malli, propagates correlation-id / tenant / auth, unwraps errors. None exists yet; all cross-module calls are in-process.

  • Network resilience — timeouts, retries, circuit breaker, service discovery (hardcoded URLs for MVP). External adapters use :throw-exceptions false but no retry/breaker.

  • Break the allowlisted dependency cyclescheck_deps.clj allows admin↔user, platform↔{user,tenant,admin,workflow,search}. A cycle means two modules can’t be cleanly separated; these must be broken (e.g. extract the shared auth check behind a port) before slicing.

  • Async optionIEventBus is defined in libs/user/ports.clj but has no implementation. Event-driven decoupling needs a real adapter (Redis Streams / Kafka / RabbitMQ).

  • Data ownership decision — schema-per-tenant assumes co-located modules in one Postgres. Across services either share the DB (pragmatic) or give each service its own; there are no distributed transactions, so split writes become eventual-consistency.

  • Service launch modeboundary.main exposes only server and cli; slicing needs an entrypoint that boots a named module subset as a service.

Sliceability by module

ModuleEffortWhy

payments

Easy

Zero internal Boundary deps (only Maven). Already a self-contained provider. The natural pilot for the remote-adapter pattern.

core, observability

Easy

Leaf / infra; no sibling deps. (Usually shared libs, not standalone services.)

user, tenant, external

With work

Depend on platform + the in-process service assumption in middleware. Need the remote adapter + cycle-breaking (user↔admin, tenant↔platform).

admin, search, workflow

Entangled

search/workflow depend on admin’s schema provider; `admin↔user is circular. Extract shared schema/auth behind ports first.

Recommended path: build the generic remote-port adapter once, prove it by extracting payments as a standalone service, then tackle user. Don’t attempt admin/search/workflow until the cycles are broken.

Known gaps & roadmap

The architecture delivers the promise; these are the concrete pieces that make "scale by configuration" fully true. Tracked under the BOU-84 spike:

  1. Realtime Redis pub/sub adapter — shipped in BOU-85 (ADR-035). WebSocket is now replica-safe via :provider :redis on :boundary/realtime.

  2. Graceful connection draining — shipped in BOU-86. Configurable shutdown grace (:boundary/http :drain-timeout-ms) lets rollouts finish in-flight requests; Jetty GracefulHandler + setStopTimeout wired in wiring.clj.

  3. Default rate-limit wiring — shipped in BOU-87. Config-driven http-rate-limit-protection is in the default pipeline; enable via :boundary/http :rate-limit and it uses the Redis cache for a cross-replica limit (per-process fallback documented).

  4. Jobs hardening — shipped in BOU-88. Missing-handler jobs are re-enqueued (bounded) instead of silently dead-lettered, an empty-registry worker warns at startup, and scheduled-job promotion is an atomic claim (ZREM / swap-vals!) so a due job runs exactly once across workers.

  5. Deploy topology reference — compose + k8s example with N replicas, Redis, a load balancer, instance-id, and a web/worker split.

For functional decomposition (the bigger bet):

  1. Generic remote-port adapter + RPC envelopeclj-http client implementing a module protocol, Malli (de)serialization, context propagation, error unwrap, retry/circuit-breaker. Pilot by extracting payments.

  2. Service launch modeboundary.main entrypoint that boots a named module subset as an independent service.

  3. Break allowlisted dependency cyclesadmin↔user, platform↔{user,tenant,admin,workflow,search} — prerequisite for slicing those modules.

  4. IEventBus implementation — Redis Streams / Kafka adapter for async, event-driven inter-service decoupling (port already defined, unimplemented).

Can you improve this documentation?Edit on GitHub

cljdoc builds & hosts documentation for Clojure/Script libraries

Keyboard shortcuts
Ctrl+kJump to recent docs
Move to previous article
Move to next article
Ctrl+/Jump to the search field
× close