A high-performance eBPF-based Layer 4 load balancer written in Clojure. Uses XDP (eXpress Data Path) for ingress DNAT and TC (Traffic Control) for egress SNAT, providing kernel-level packet processing for efficient traffic distribution. Can also be used as a simple reverse proxy.
/metrics endpoint for Prometheus scrapingbpftool (optional, for debugging)iproute2 (for TC qdisc management)Install Java 25:
# Ubuntu 24.04+
sudo apt-get install openjdk-25-jdk
# Or download from https://jdk.java.net/25/
Install Clojure CLI:
curl -L -O https://github.com/clojure/brew-install/releases/latest/download/linux-install.sh
chmod +x linux-install.sh
sudo ./linux-install.sh
Clone the repository:
git clone https://github.com/pgdad/clj-ebpf-lb.git
cd clj-ebpf-lb
Create a configuration file (proxy.edn):
{:proxies
[{:name "web"
:listen {:interfaces ["eth0"] :port 80}
:default-target {:ip "10.0.0.1" :port 8080}}]
:settings
{:stats-enabled false
:connection-timeout-sec 300}}
Run the load balancer:
sudo clojure -M:run -c lb.edn
Usage: clj-ebpf-lb [options]
Options:
-c, --config FILE Configuration file path (default: config.edn)
-i, --interface IFACE Network interface to attach to (can specify multiple)
-p, --port PORT Listen port (default: 80)
-t, --target TARGET Default target as ip:port (default: 127.0.0.1:8080)
-s, --stats Enable statistics collection
-v, --verbose Verbose output
-h, --help Show help
Examples:
clj-ebpf-lb -c lb.edn
clj-ebpf-lb -i eth0 -p 80 -t 10.0.0.1:8080
clj-ebpf-lb -i eth0 -i eth1 -p 443 -t 10.0.0.2:8443 --stats
The proxy is configured using an EDN (Extensible Data Notation) file:
{:proxies
[;; Each proxy configuration
{:name "proxy-name" ; Unique identifier
:listen
{:interfaces ["eth0" "eth1"] ; Network interfaces to listen on
:port 80} ; Port to listen on
:default-target
{:ip "10.0.0.1" ; Default backend IP
:port 8080} ; Default backend port
:source-routes ; Optional: source-based routing rules
[{:source "192.168.1.0/24" ; Source IP or CIDR
:target {:ip "10.0.0.2" :port 8080}}
{:source "10.10.0.0/16"
:target {:ip "10.0.0.3" :port 8080}}]}]
:settings
{:stats-enabled false ; Enable real-time statistics
:connection-timeout-sec 300 ; Connection idle timeout
:max-connections 100000}} ; Maximum tracked connections
Route traffic to different backends based on the client's source IP:
:source-routes
[;; Route internal network to internal backend
{:source "192.168.0.0/16"
:target {:ip "10.0.0.10" :port 8080}}
;; Route specific host to dedicated backend
{:source "192.168.1.100"
:target {:ip "10.0.0.20" :port 8080}}
;; Route cloud VPC to cloud backend
{:source "10.0.0.0/8"
:target {:ip "10.0.0.30" :port 8080}}]
The routing precedence is:
Route TLS traffic to different backends based on the Server Name Indication (SNI) hostname in the TLS ClientHello. This enables multi-tenant HTTPS load balancing without terminating TLS (layer 4 passthrough with layer 7 inspection).
{:proxies
[{:name "https-gateway"
:listen {:interfaces ["eth0"] :port 443}
:default-target {:ip "10.0.0.1" :port 8443}
:sni-routes
[{:sni-hostname "api.example.com"
:target {:ip "10.0.1.1" :port 8443}}
{:sni-hostname "web.example.com"
:target {:ip "10.0.2.1" :port 8443}}
{:sni-hostname "app.example.com"
:targets [{:ip "10.0.3.1" :port 8443 :weight 70}
{:ip "10.0.3.2" :port 8443 :weight 30}]}]}]}
How it works:
SNI Routing Features:
API.Example.COM matches api.example.comUse Cases:
Limitations:
*.example.com)Distribute traffic across multiple backend servers with configurable weights. This is useful for canary deployments, A/B testing, capacity-based distribution, and blue/green deployments.
;; Weighted default target - distribute across 3 backends
:default-target
[{:ip "10.0.0.1" :port 8080 :weight 50} ; 50% of traffic
{:ip "10.0.0.2" :port 8080 :weight 30} ; 30% of traffic
{:ip "10.0.0.3" :port 8080 :weight 20}] ; 20% of traffic
;; Weighted source routes
:source-routes
[{:source "192.168.0.0/16"
:targets [{:ip "10.0.1.1" :port 8080 :weight 70}
{:ip "10.0.1.2" :port 8080 :weight 30}]}]
Weight Rules:
Canary Deployment Example:
;; 95% stable, 5% canary
:default-target
[{:ip "10.0.0.1" :port 8080 :weight 95} ; Stable version
{:ip "10.0.0.2" :port 8080 :weight 5}] ; Canary version
Blue/Green Deployment Example:
;; Gradual traffic shift from blue to green
:default-target
[{:ip "10.0.0.1" :port 8080 :weight 20} ; Blue (old)
{:ip "10.0.0.2" :port 8080 :weight 80}] ; Green (new)
Automatically detect unhealthy backends and redistribute traffic to healthy ones. Health checking uses virtual threads for efficient concurrent monitoring.
Enable health checking in settings:
:settings
{:health-check-enabled true
:health-check-defaults
{:type :tcp
:interval-ms 10000 ; Check every 10 seconds
:timeout-ms 3000 ; 3 second timeout
:healthy-threshold 2 ; 2 successes = healthy
:unhealthy-threshold 3}} ; 3 failures = unhealthy
Per-target health check configuration:
:default-target
[{:ip "10.0.0.1" :port 8080 :weight 50
:health-check {:type :tcp
:interval-ms 5000
:timeout-ms 2000}}
{:ip "10.0.0.2" :port 8080 :weight 50
:health-check {:type :http
:path "/health"
:interval-ms 5000
:expected-codes [200 204]}}]
Health Check Types:
| Type | Description | Use Case |
|---|---|---|
:tcp | TCP connection test | Fast, low overhead |
:http | HTTP GET with status validation | Application-level health |
:https | HTTPS GET with status validation | Secure endpoints |
:none | Skip health checking | Always considered healthy |
Weight Redistribution:
When a backend becomes unhealthy, its traffic is redistributed proportionally to remaining healthy backends:
Original weights: [50, 30, 20]
If middle server fails: [71, 0, 29] (proportional redistribution)
If all servers fail: [50, 30, 20] (graceful degradation - keep original)
Gradual Recovery:
When a backend recovers, traffic is gradually restored to prevent overwhelming it:
Health Check Parameters:
| Parameter | Default | Description |
|---|---|---|
type | :tcp | Check type (:tcp, :http, :https, :none) |
path | "/health" | HTTP(S) endpoint path |
interval-ms | 10000 | Time between checks (1000-300000) |
timeout-ms | 3000 | Check timeout (100-60000) |
healthy-threshold | 2 | Consecutive successes to mark healthy |
unhealthy-threshold | 3 | Consecutive failures to mark unhealthy |
expected-codes | [200 201 202 204] | Valid HTTP response codes |
Use DNS hostnames instead of static IPs for backend targets. Ideal for dynamic environments like Kubernetes, cloud deployments, or services with frequently changing IPs.
Basic DNS backend:
:default-target
{:host "backend.service.local" ; DNS hostname instead of :ip
:port 8080
:dns-refresh-seconds 30} ; Re-resolve every 30 seconds (default)
DNS with health checking:
:default-target
{:host "api.backend.local"
:port 8080
:dns-refresh-seconds 15
:health-check {:type :http
:path "/health"
:interval-ms 5000}}
Mixed static and DNS targets:
:default-target
[{:ip "10.0.0.1" :port 8080 :weight 50} ; Static IP
{:host "dynamic.backend.local" ; DNS hostname
:port 8080
:weight 50
:dns-refresh-seconds 10}]
Kubernetes headless service pattern:
;; Headless services (clusterIP: None) return pod IPs as A records
:default-target
{:host "myapp.default.svc.cluster.local"
:port 8080
:dns-refresh-seconds 5} ; Quick refresh for pod scaling
Multiple A Record Handling:
When a hostname resolves to multiple A records, the weight is distributed equally:
{:host "backend.local" :port 8080 :weight 60}Failure Handling:
| Scenario | Startup | Runtime |
|---|---|---|
| DNS timeout | Fatal error | Use last-known-good IPs |
| Unknown host | Fatal error | Use last-known-good IPs |
| Empty A records | Fatal error | Use last-known-good IPs |
DNS API:
;; Get DNS resolution status
(lb/get-dns-status "proxy-name")
;; => {:proxy-name "proxy-name"
;; :targets {"backend.local"
;; {:hostname "backend.local"
;; :port 8080
;; :last-ips ["10.0.0.1" "10.0.0.2"]
;; :consecutive-failures 0}}}
(lb/get-all-dns-status) ; All proxies
(lb/force-dns-resolve! "proxy" "hostname") ; Force refresh
;; Subscribe to DNS events
(require '[lb.dns :as dns])
(dns/subscribe! (fn [event]
(println (:type event) (:hostname event))))
;; Events: :dns-resolved, :dns-failed
Gracefully remove backends from the load balancer by stopping new connections while allowing existing ones to complete. This is essential for zero-downtime deployments, maintenance windows, and rolling updates.
Basic draining:
;; Start draining - no new connections, existing ones continue
(lb/drain-backend! "web" "10.0.0.1:8080")
;; Check drain status
(lb/get-drain-status "10.0.0.1:8080")
;; => {:target-id "10.0.0.1:8080"
;; :status :draining
;; :elapsed-ms 5000
;; :current-connections 3
;; :initial-connections 10}
;; Cancel drain and restore traffic
(lb/undrain-backend! "web" "10.0.0.1:8080")
Draining with timeout and callback:
;; Drain with 60 second timeout and completion callback
(lb/drain-backend! "web" "10.0.0.1:8080"
:timeout-ms 60000
:on-complete (fn [status]
(case status
:completed (println "Drain complete, safe to remove")
:timeout (println "Drain timed out, forcing removal")
:cancelled (println "Drain was cancelled"))))
Synchronous draining (blocks until complete):
;; Block until drain completes or times out
(let [status (lb/wait-for-drain! "10.0.0.1:8080")]
(when (= status :completed)
(lb/remove-backend! "web" "10.0.0.1:8080")))
Rolling update example:
(defn rolling-update [proxy-name targets]
(doseq [target targets]
;; Drain the old instance
(lb/drain-backend! proxy-name target :timeout-ms 30000)
(lb/wait-for-drain! target)
;; Deploy and add new instance
(deploy-new-version! target)
(lb/undrain-backend! proxy-name target)))
Drain Status Values:
| Status | Description |
|---|---|
:draining | Drain in progress, waiting for connections to close |
:completed | All connections closed, drain finished successfully |
:timeout | Timeout expired, some connections may still exist |
:cancelled | Drain cancelled via undrain-backend! |
Configuration:
:settings
{:default-drain-timeout-ms 30000 ; Default timeout (30 seconds)
:drain-check-interval-ms 1000} ; How often to check connection counts
How it works:
drain-backend! sets the target's weight to 0 in BPF mapsFull IPv4/IPv6 dual-stack support enables the same proxy to handle both address families simultaneously. IPv6 addresses can be used for backends, source routes, and SNI routes.
IPv6-only backends:
:default-target
[{:ip "::1" :port 8080 :weight 50}
{:ip "2001:db8::1" :port 8080 :weight 50}]
Mixed IPv4/IPv6 backends (dual-stack):
:default-target
[{:ip "10.0.0.1" :port 8080 :weight 25} ; IPv4
{:ip "10.0.0.2" :port 8080 :weight 25} ; IPv4
{:ip "2001:db8::1" :port 8080 :weight 25} ; IPv6
{:ip "2001:db8::2" :port 8080 :weight 25}] ; IPv6
IPv6 source-based routing:
:source-routes
[;; Route IPv6 documentation prefix to dedicated backend
{:source "2001:db8::/32"
:target {:ip "2001:db8::10" :port 8080}}
;; Route unique local addresses
{:source "fd00::/8"
:target {:ip "fd00::1" :port 8080}}
;; IPv4 routes work alongside IPv6
{:source "192.168.0.0/16"
:target {:ip "10.0.0.1" :port 8080}}]
IPv6 SNI routing:
:sni-routes
[{:sni-hostname "api.example.com"
:target {:ip "2001:db8::10" :port 8443}}
{:sni-hostname "www.example.com"
:target [{:ip "2001:db8::20" :port 8443 :weight 50}
{:ip "2001:db8::21" :port 8443 :weight 50}]}]
Supported IPv6 address formats:
2001:0db8:0000:0000:0000:0000:0000:00012001:db8::1, ::1, fe80::12001:db8::/32, fe80::/10How it works:
IPv6 address utilities:
(require '[lb.util :as util])
;; Detect address family
(util/ipv6? "2001:db8::1") ; => true
(util/ipv4? "192.168.1.1") ; => true
(util/address-family "::1") ; => :ipv6
;; Parse addresses
(util/ipv6-string->bytes "::1")
(util/ip-string->bytes16 "192.168.1.1") ; Unified 16-byte format
;; Parse CIDR
(util/parse-cidr-unified "2001:db8::/32")
; => {:ip <16-bytes> :prefix-len 32 :af :ipv6}
Limitations:
::ffff:192.168.1.1) not supported - use native formatsSee examples/ipv6_dual_stack.clj for comprehensive examples.
Export metrics in Prometheus format for monitoring and alerting. The metrics endpoint is compatible with Prometheus, Grafana, and other monitoring tools.
Enable metrics in settings:
:settings
{:metrics {:enabled true
:port 9090 ; Metrics server port (default 9090)
:path "/metrics"}} ; Endpoint path (default "/metrics")
Available metrics:
| Metric | Type | Labels | Description |
|---|---|---|---|
lb_up | gauge | - | Whether the load balancer is running (1=up) |
lb_info | gauge | version | Load balancer version information |
lb_connections_active | gauge | target_ip, target_port | Current active connections per backend |
lb_bytes_total | counter | target_ip, target_port, direction | Bytes transferred (forward/reverse) |
lb_packets_total | counter | target_ip, target_port, direction | Packets transferred (forward/reverse) |
lb_backend_health | gauge | proxy_name, target_ip, target_port | Backend health status (1=healthy, 0=unhealthy) |
lb_health_check_latency_seconds | histogram | proxy_name, target_id | Health check latency distribution |
lb_dns_resolution_status | gauge | proxy_name, hostname | DNS resolution status (1=resolved, 0=failed) |
Example Prometheus output:
# HELP lb_up Whether the load balancer is running (1=up, 0=down)
# TYPE lb_up gauge
lb_up 1
# HELP lb_backend_health Backend health status (1=healthy, 0=unhealthy)
# TYPE lb_backend_health gauge
lb_backend_health{proxy_name="web",target_ip="10.0.0.1",target_port="8080"} 1
lb_backend_health{proxy_name="web",target_ip="10.0.0.2",target_port="8080"} 0
# HELP lb_health_check_latency_seconds Health check latency in seconds
# TYPE lb_health_check_latency_seconds histogram
lb_health_check_latency_seconds_bucket{proxy_name="web",target_id="10.0.0.1:8080",le="0.005"} 45
lb_health_check_latency_seconds_bucket{proxy_name="web",target_id="10.0.0.1:8080",le="0.01"} 120
lb_health_check_latency_seconds_bucket{proxy_name="web",target_id="10.0.0.1:8080",le="+Inf"} 200
lb_health_check_latency_seconds_sum{proxy_name="web",target_id="10.0.0.1:8080"} 1.234
lb_health_check_latency_seconds_count{proxy_name="web",target_id="10.0.0.1:8080"} 200
Prometheus scrape configuration:
scrape_configs:
- job_name: 'clj-ebpf-lb'
static_configs:
- targets: ['localhost:9090']
Metrics API:
(require '[lb.metrics :as metrics])
;; Start/stop metrics server
(metrics/start! {:port 9090 :path "/metrics"})
(metrics/stop!)
(metrics/running?) ; => true/false
;; Get server status
(metrics/get-status)
;; => {:running true :port 9090 :path "/metrics" :url "http://localhost:9090/metrics"}
;; Collect metrics programmatically (returns Prometheus text format)
(metrics/collect-metrics)
Run multiple load balancer instances with synchronized state for high availability. Cluster mode uses a peer-to-peer gossip protocol (SWIM-style) to share health status, circuit breaker states, drain coordination, and connection tracking across nodes.
Enable cluster mode in settings:
:settings
{:cluster
{:enabled true
:node-id "auto" ; or explicit "lb-1", "lb-2", etc.
:bind-address "0.0.0.0" ; Address for gossip communication
:bind-port 7946 ; Port for gossip (UDP/TCP)
:seeds ["192.168.1.10:7946" ; Seed nodes for initial discovery
"192.168.1.11:7946"]
;; Gossip tuning (defaults optimized for small clusters)
:gossip-interval-ms 200 ; How often to gossip (default 200)
:gossip-fanout 2 ; Peers to contact per interval
:push-pull-interval-ms 10000 ; Full state sync interval
;; Failure detection
:ping-interval-ms 1000 ; Probe interval
:ping-timeout-ms 500 ; Probe timeout
:ping-req-count 3 ; Indirect probes before suspicion
:suspicion-mult 3 ; Suspicion timeout multiplier
;; State synchronization toggles
:sync-health true ; Sync health check results
:sync-circuit-breaker true ; Sync circuit breaker states
:sync-drain true ; Sync drain coordination
:sync-conntrack true}} ; Sync connection tracking
Minimal cluster configuration (2 nodes):
;; Node 1 (192.168.1.10)
:settings
{:cluster
{:enabled true
:node-id "lb-1"
:bind-port 7946
:seeds ["192.168.1.11:7946"]}}
;; Node 2 (192.168.1.11)
:settings
{:cluster
{:enabled true
:node-id "lb-2"
:bind-port 7946
:seeds ["192.168.1.10:7946"]}}
What Gets Synchronized:
| State Type | Description | Conflict Resolution |
|---|---|---|
| Health Status | Target health check results | Latest timestamp wins |
| Circuit Breaker | Open/closed/half-open states | OPEN always wins (conservative) |
| Drain Status | Which targets are draining | Draining wins over active |
| Connection Tracking | Active connections for failover | Owner node wins, shadow on others |
Circuit Breaker Synchronization:
When a circuit breaker opens on any node, it propagates immediately to all nodes to prevent thundering herd:
;; Configure circuit breaker (per-proxy)
{:name "web"
:circuit-breaker
{:enabled true
:error-threshold 5 ; Errors before opening
:error-rate-threshold 0.5 ; 50% error rate threshold
:half-open-requests 3 ; Test requests in half-open
:reset-timeout-ms 30000}} ; Time before half-open
Drain Coordination:
When draining a backend, all cluster nodes stop sending new connections:
;; Drain on any node - propagates to all
(lb/drain-backend! "web" "10.0.0.1:8080")
;; All nodes will:
;; 1. Stop routing new connections to this target
;; 2. Wait for existing connections to complete
;; 3. Report connection counts to originating node
Connection Tracking Sync (Failover):
Connections are synchronized across nodes for seamless failover:
Node 1 (Owner) Node 2 (Shadow)
┌─────────────┐ ┌─────────────┐
│ Connection │ gossip │ Shadow │
│ Created │──────────→│ Entry │
│ (BPF map) │ │ (userspace) │
└─────────────┘ └─────────────┘
│ │
│ Node 1 fails │
▼ ▼
┌─────────────┐
│ Promote │
│ to BPF │
│ (active) │
└─────────────┘
Cluster API:
(require '[lb.cluster :as cluster])
;; Start/stop cluster (usually done via config)
(cluster/start! {:enabled true :bind-port 7946 :seeds [...]})
(cluster/stop!)
(cluster/running?) ; => true/false
;; Get cluster information
(cluster/node-id) ; => "lb-1"
(cluster/cluster-size) ; => 3
(cluster/alive-nodes) ; => #{"lb-1" "lb-2" "lb-3"}
(cluster/all-nodes) ; => {"lb-1" {...} "lb-2" {...}}
;; Check node status
(cluster/node-alive? "lb-2") ; => true
(cluster/node-suspected? "lb-2") ; => false
(cluster/node-dead? "lb-2") ; => false
;; Get cluster statistics
(cluster/stats)
;; => {:status :running
;; :membership {:node-id "lb-1"
;; :alive-count 3
;; :suspected-count 0
;; :dead-count 0}
;; :gossip {:running? true
;; :provider-count 4}}
;; Subscribe to cluster events
(def unsubscribe
(cluster/subscribe!
(fn [event]
(case (:event-type event)
:node-join (println "Node joined:" (:node-id event))
:node-leave (println "Node left:" (:node-id event))
:node-suspect (println "Node suspected:" (:node-id event))
:node-dead (println "Node confirmed dead:" (:node-id event))
nil))))
;; Unsubscribe
(unsubscribe)
;; Manual state broadcast (usually automatic)
(cluster/broadcast! :health "target-1" {:status :healthy})
Admin API Endpoints:
When cluster mode is enabled, additional REST endpoints are available:
| Endpoint | Method | Description |
|---|---|---|
/api/v1/cluster/status | GET | Cluster status and statistics |
/api/v1/cluster/nodes | GET | All known nodes and their states |
/api/v1/cluster/sync | GET | Sync status and lag metrics |
/api/v1/cluster/sync | POST | Force immediate full sync |
# Get cluster status
curl http://localhost:8080/api/v1/cluster/status
# {"status":"running","node-id":"lb-1","cluster-size":3,...}
# Get all nodes
curl http://localhost:8080/api/v1/cluster/nodes
# {"lb-1":{"state":"alive",...},"lb-2":{"state":"alive",...}}
# Force sync
curl -X POST http://localhost:8080/api/v1/cluster/sync
# {"success":true,"message":"Full sync initiated"}
Cluster Metrics:
Additional Prometheus metrics when cluster mode is enabled:
| Metric | Type | Labels | Description |
|---|---|---|---|
lb_cluster_up | gauge | - | Cluster mode running (1=yes) |
lb_cluster_nodes_total | gauge | state | Nodes by state (alive/suspected/dead) |
lb_cluster_conntrack_sync | gauge | - | Synced connection count |
Network Requirements:
Failure Handling:
| Scenario | Behavior |
|---|---|
| Node unreachable | Marked suspected after ping-timeout × suspicion-mult |
| Node confirmed dead | Connections promoted on remaining nodes |
| Network partition | Each partition operates independently |
| All peers dead | Single-node mode, full local functionality |
Configuration Parameters:
| Parameter | Default | Description |
|---|---|---|
enabled | false | Enable cluster mode |
node-id | "auto" | Node identifier (auto-generates UUID) |
bind-address | "0.0.0.0" | Address for gossip communication |
bind-port | 7946 | Port for gossip (UDP and TCP) |
seeds | [] | Seed node addresses for discovery |
gossip-interval-ms | 200 | Gossip tick interval |
gossip-fanout | 2 | Peers to gossip to per tick |
push-pull-interval-ms | 10000 | Full state sync interval |
ping-interval-ms | 1000 | Failure detection probe interval |
ping-timeout-ms | 500 | Probe timeout |
ping-req-count | 3 | Indirect probes before suspicion |
suspicion-mult | 3 | Suspicion timeout multiplier |
sync-health | true | Sync health check results |
sync-circuit-breaker | true | Sync circuit breaker states |
sync-drain | true | Sync drain coordination |
sync-conntrack | true | Sync connection tracking |
Use the load balancer as a library in your Clojure application:
(require '[lb.core :as lb]
'[lb.config :as config])
;; Create configuration
(def cfg (config/make-simple-config
{:interface "eth0"
:port 80
:target-ip "10.0.0.1"
:target-port 8080
:stats-enabled true}))
;; Initialize the load balancer
(lb/init! cfg)
;; Check status
(lb/get-status)
;; => {:running true, :attached-interfaces ["eth0"], ...}
;; Add a source route at runtime (single target)
(lb/add-source-route! "web" "192.168.1.0/24"
{:ip "10.0.0.2" :port 8080})
;; Add a weighted source route at runtime
(lb/add-source-route! "web" "10.10.0.0/16"
[{:ip "10.0.0.3" :port 8080 :weight 70}
{:ip "10.0.0.4" :port 8080 :weight 30}])
;; Get active connections
(lb/get-connections)
;; Print connection statistics
(lb/print-connections)
;; Shutdown
(lb/shutdown!)
;; Add a new proxy at runtime
(lb/add-proxy!
{:name "api"
:listen {:interfaces ["eth0"] :port 8080}
:default-target {:ip "10.0.1.1" :port 3000}})
;; Remove a proxy
(lb/remove-proxy! "api")
;; Add/remove source routes
(lb/add-source-route! "web" "10.20.0.0/16" {:ip "10.0.0.5" :port 8080})
(lb/remove-source-route! "web" "10.20.0.0/16")
;; Add/remove SNI routes (for TLS traffic)
(lb/add-sni-route! "https-gateway" "api.example.com"
{:ip "10.0.1.1" :port 8443})
;; Add weighted SNI route
(lb/add-sni-route! "https-gateway" "web.example.com"
[{:ip "10.0.2.1" :port 8443 :weight 70}
{:ip "10.0.2.2" :port 8443 :weight 30}])
;; Remove SNI route (case-insensitive)
(lb/remove-sni-route! "https-gateway" "api.example.com")
;; List SNI routes
(lb/list-sni-routes "https-gateway")
;; => [{:hostname "web.example.com"
;; :targets [{:ip "10.0.2.1" :port 8443 :weight 70}
;; {:ip "10.0.2.2" :port 8443 :weight 30}]}]
;; List all SNI routes across all proxies
(lb/list-all-sni-routes)
;; Attach/detach interfaces
(lb/attach-interfaces! ["eth1" "eth2"])
(lb/detach-interfaces! ["eth2"])
;; Enable/disable stats
(lb/enable-stats!)
(lb/disable-stats!)
;; Get connection statistics
(lb/get-connection-count)
(lb/get-connection-stats)
;; => {:total-connections 42
;; :total-packets-forward 12345
;; :total-bytes-forward 987654
;; :total-packets-reverse 11234
;; :total-bytes-reverse 876543}
;; Start streaming statistics
(lb/start-stats-stream!)
(let [ch (lb/subscribe-to-stats)]
;; Read events from channel
(async/<! ch))
(lb/stop-stats-stream!)
(require '[lb.health :as health])
;; Start/stop health checking system
(health/start!)
(health/stop!)
(health/running?) ; => true/false
;; Get health status
(health/get-status "web")
;; => {:proxy-name "web"
;; :targets [{:target-id "10.0.0.1:8080"
;; :status :healthy
;; :last-latency-ms 2.5
;; :consecutive-successes 5}
;; {:target-id "10.0.0.2:8080"
;; :status :unhealthy
;; :last-error :connection-refused}]
;; :original-weights [50 50]
;; :effective-weights [100 0]}
(health/get-all-status) ; All proxies
(health/healthy? "web" "10.0.0.1:8080")
(health/all-healthy? "web")
(health/unhealthy-targets "web")
;; Subscribe to health events
(def unsubscribe
(health/subscribe!
(fn [event]
(println "Health event:" (:type event) (:target-id event)))))
;; Events: :target-healthy, :target-unhealthy, :weights-updated
;; Unsubscribe
(unsubscribe)
;; Manual control (for maintenance)
(health/set-target-status! "web" "10.0.0.1:8080" :unhealthy)
(health/force-check! "web" "10.0.0.1:8080")
;; Direct health checks (for testing)
(health/check-tcp "10.0.0.1" 8080 2000)
;; => {:success? true :latency-ms 1.5}
(health/check-http "10.0.0.1" 8080 "/health" 3000 [200])
;; => {:success? true :latency-ms 15.2 :message "HTTP 200"}
;; Format status for display
(health/print-status "web")
(health/print-all-status)
;; Start draining a backend (stops new connections)
(lb/drain-backend! "web" "10.0.0.1:8080")
(lb/drain-backend! "web" "10.0.0.1:8080"
:timeout-ms 60000
:on-complete (fn [status] (println "Drain finished:" status)))
;; Cancel drain and restore traffic
(lb/undrain-backend! "web" "10.0.0.1:8080")
;; Check if target is draining
(lb/draining? "10.0.0.1:8080") ; => true/false
;; Get drain status for a target
(lb/get-drain-status "10.0.0.1:8080")
;; => {:target-id "10.0.0.1:8080"
;; :proxy-name "web"
;; :status :draining
;; :elapsed-ms 5000
;; :timeout-ms 30000
;; :current-connections 3
;; :initial-connections 10}
;; Get all currently draining backends
(lb/get-all-draining)
;; => [{:target-id "10.0.0.1:8080" :status :draining ...}
;; {:target-id "10.0.0.2:8080" :status :draining ...}]
;; Block until drain completes (returns :completed, :timeout, or :cancelled)
(lb/wait-for-drain! "10.0.0.1:8080")
;; Print drain status
(lb/print-drain-status)
+------------------+
| User Space |
| (Clojure App) |
+--------+---------+
|
BPF Maps (shared) |
+------------------------+------------------------+
| | |
v v v
+--------+ +------------+ +----------+
| Listen | | Conntrack | | Settings |
| Map | | Map | | Map |
+--------+ +------------+ +----------+
| | |
+------------------------+------------------------+
|
+------------------------+------------------------+
| | |
v v v
+--------+ +------------+ +----------+
| XDP | Ingress | Kernel | Egress | TC |
| (DNAT) +----------->+ Stack +---------->+ (SNAT) |
+--------+ +------------+ +----------+
^ |
| v
+---+-----------------------------------------------+---+
| Network Interface |
+-------------------------------------------------------+
XDP_PASS to continue normal processing# Run all tests (requires root)
sudo clojure -M:test
The project includes QEMU-based ARM64 testing infrastructure:
# One-time setup (downloads Ubuntu 24.04 ARM64 image)
./qemu-arm64/setup-vm.sh
# Start the ARM64 VM
./qemu-arm64/start-vm.sh --daemon
# Run tests in ARM VM
./qemu-arm64/run-tests-in-vm.sh --sync
# Stop the VM
./qemu-arm64/stop-vm.sh
See qemu-arm64/README.md for detailed ARM64 testing documentation.
clojure -X:uberjar
# Run the uberjar
sudo java -jar target/clj-ebpf-lb.jar -c lb.edn
"Cannot attach XDP program"
sudo)ip link showuname -r (need 5.15+)"BPF program verification failed"
dmesg | tail -50zgrep CONFIG_BPF /proc/config.gz"Connection not being NAT'd"
(lb/list-attached-interfaces)(lb/get-connections)# View attached XDP programs
sudo bpftool prog list
# View BPF maps
sudo bpftool map list
# Monitor traffic
sudo tcpdump -i eth0 -n port 80
max-connections based on expected concurrent connections.MIT License. See LICENSE for details.
Note: This project was previously named clj-ebpf-reverse-proxy. The core functionality remains the same, but the name was changed to better reflect the weighted load balancing capabilities.
Can you improve this documentation?Edit on GitHub
cljdoc builds & hosts documentation for Clojure/Script libraries
| Ctrl+k | Jump to recent docs |
| ← | Move to previous article |
| → | Move to next article |
| Ctrl+/ | Jump to the search field |