A high-performance eBPF-based Layer 4 load balancer written in Clojure. Uses XDP (eXpress Data Path) for ingress DNAT and TC (Traffic Control) for egress SNAT, providing kernel-level packet processing for efficient traffic distribution. Can also be used as a simple reverse proxy.
/metrics endpoint for Prometheus scrapingbpftool (optional, for debugging)iproute2 (for TC qdisc management)Install Java 25:
# Ubuntu 24.04+
sudo apt-get install openjdk-25-jdk
# Or download from https://jdk.java.net/25/
Install Clojure CLI:
curl -L -O https://github.com/clojure/brew-install/releases/latest/download/linux-install.sh
chmod +x linux-install.sh
sudo ./linux-install.sh
Clone the repository:
git clone https://github.com/pgdad/clj-ebpf-lb.git
cd clj-ebpf-lb
Create a configuration file (proxy.edn):
{:proxies
[{:name "web"
:listen {:interfaces ["eth0"] :port 80}
:default-target {:ip "10.0.0.1" :port 8080}}]
:settings
{:stats-enabled false
:connection-timeout-sec 300}}
Run the load balancer:
sudo clojure -M:run -c lb.edn
Usage: clj-ebpf-lb [options]
Options:
-c, --config FILE Configuration file path (default: config.edn)
-i, --interface IFACE Network interface to attach to (can specify multiple)
-p, --port PORT Listen port (default: 80)
-t, --target TARGET Default target as ip:port (default: 127.0.0.1:8080)
-s, --stats Enable statistics collection
-v, --verbose Verbose output
-h, --help Show help
Examples:
clj-ebpf-lb -c lb.edn
clj-ebpf-lb -i eth0 -p 80 -t 10.0.0.1:8080
clj-ebpf-lb -i eth0 -i eth1 -p 443 -t 10.0.0.2:8443 --stats
The proxy is configured using an EDN (Extensible Data Notation) file:
{:proxies
[;; Each proxy configuration
{:name "proxy-name" ; Unique identifier
:listen
{:interfaces ["eth0" "eth1"] ; Network interfaces to listen on
:port 80} ; Port to listen on
:default-target
{:ip "10.0.0.1" ; Default backend IP
:port 8080} ; Default backend port
:source-routes ; Optional: source-based routing rules
[{:source "192.168.1.0/24" ; Source IP or CIDR
:target {:ip "10.0.0.2" :port 8080}}
{:source "10.10.0.0/16"
:target {:ip "10.0.0.3" :port 8080}}]}]
:settings
{:stats-enabled false ; Enable real-time statistics
:connection-timeout-sec 300 ; Connection idle timeout
:max-connections 100000}} ; Maximum tracked connections
Route traffic to different backends based on the client's source IP:
:source-routes
[;; Route internal network to internal backend
{:source "192.168.0.0/16"
:target {:ip "10.0.0.10" :port 8080}}
;; Route specific host to dedicated backend
{:source "192.168.1.100"
:target {:ip "10.0.0.20" :port 8080}}
;; Route cloud VPC to cloud backend
{:source "10.0.0.0/8"
:target {:ip "10.0.0.30" :port 8080}}]
The routing precedence is:
Route TLS traffic to different backends based on the Server Name Indication (SNI) hostname in the TLS ClientHello. This enables multi-tenant HTTPS load balancing without terminating TLS (layer 4 passthrough with layer 7 inspection).
{:proxies
[{:name "https-gateway"
:listen {:interfaces ["eth0"] :port 443}
:default-target {:ip "10.0.0.1" :port 8443}
:sni-routes
[{:sni-hostname "api.example.com"
:target {:ip "10.0.1.1" :port 8443}}
{:sni-hostname "web.example.com"
:target {:ip "10.0.2.1" :port 8443}}
{:sni-hostname "app.example.com"
:targets [{:ip "10.0.3.1" :port 8443 :weight 70}
{:ip "10.0.3.2" :port 8443 :weight 30}]}]}]}
How it works:
SNI Routing Features:
API.Example.COM matches api.example.comUse Cases:
Limitations:
*.example.com)Distribute traffic across multiple backend servers with configurable weights. This is useful for canary deployments, A/B testing, capacity-based distribution, and blue/green deployments.
;; Weighted default target - distribute across 3 backends
:default-target
[{:ip "10.0.0.1" :port 8080 :weight 50} ; 50% of traffic
{:ip "10.0.0.2" :port 8080 :weight 30} ; 30% of traffic
{:ip "10.0.0.3" :port 8080 :weight 20}] ; 20% of traffic
;; Weighted source routes
:source-routes
[{:source "192.168.0.0/16"
:targets [{:ip "10.0.1.1" :port 8080 :weight 70}
{:ip "10.0.1.2" :port 8080 :weight 30}]}]
Weight Rules:
Canary Deployment Example:
;; 95% stable, 5% canary
:default-target
[{:ip "10.0.0.1" :port 8080 :weight 95} ; Stable version
{:ip "10.0.0.2" :port 8080 :weight 5}] ; Canary version
Blue/Green Deployment Example:
;; Gradual traffic shift from blue to green
:default-target
[{:ip "10.0.0.1" :port 8080 :weight 20} ; Blue (old)
{:ip "10.0.0.2" :port 8080 :weight 80}] ; Green (new)
Automatically detect unhealthy backends and redistribute traffic to healthy ones. Health checking uses virtual threads for efficient concurrent monitoring.
Enable health checking in settings:
:settings
{:health-check-enabled true
:health-check-defaults
{:type :tcp
:interval-ms 10000 ; Check every 10 seconds
:timeout-ms 3000 ; 3 second timeout
:healthy-threshold 2 ; 2 successes = healthy
:unhealthy-threshold 3}} ; 3 failures = unhealthy
Per-target health check configuration:
:default-target
[{:ip "10.0.0.1" :port 8080 :weight 50
:health-check {:type :tcp
:interval-ms 5000
:timeout-ms 2000}}
{:ip "10.0.0.2" :port 8080 :weight 50
:health-check {:type :http
:path "/health"
:interval-ms 5000
:expected-codes [200 204]}}]
Health Check Types:
| Type | Description | Use Case |
|---|---|---|
:tcp | TCP connection test | Fast, low overhead |
:http | HTTP GET with status validation | Application-level health |
:https | HTTPS GET with status validation | Secure endpoints |
:none | Skip health checking | Always considered healthy |
Weight Redistribution:
When a backend becomes unhealthy, its traffic is redistributed proportionally to remaining healthy backends:
Original weights: [50, 30, 20]
If middle server fails: [71, 0, 29] (proportional redistribution)
If all servers fail: [50, 30, 20] (graceful degradation - keep original)
Gradual Recovery:
When a backend recovers, traffic is gradually restored to prevent overwhelming it:
Health Check Parameters:
| Parameter | Default | Description |
|---|---|---|
type | :tcp | Check type (:tcp, :http, :https, :none) |
path | "/health" | HTTP(S) endpoint path |
interval-ms | 10000 | Time between checks (1000-300000) |
timeout-ms | 3000 | Check timeout (100-60000) |
healthy-threshold | 2 | Consecutive successes to mark healthy |
unhealthy-threshold | 3 | Consecutive failures to mark unhealthy |
expected-codes | [200 201 202 204] | Valid HTTP response codes |
Use DNS hostnames instead of static IPs for backend targets. Ideal for dynamic environments like Kubernetes, cloud deployments, or services with frequently changing IPs.
Basic DNS backend:
:default-target
{:host "backend.service.local" ; DNS hostname instead of :ip
:port 8080
:dns-refresh-seconds 30} ; Re-resolve every 30 seconds (default)
DNS with health checking:
:default-target
{:host "api.backend.local"
:port 8080
:dns-refresh-seconds 15
:health-check {:type :http
:path "/health"
:interval-ms 5000}}
Mixed static and DNS targets:
:default-target
[{:ip "10.0.0.1" :port 8080 :weight 50} ; Static IP
{:host "dynamic.backend.local" ; DNS hostname
:port 8080
:weight 50
:dns-refresh-seconds 10}]
Kubernetes headless service pattern:
;; Headless services (clusterIP: None) return pod IPs as A records
:default-target
{:host "myapp.default.svc.cluster.local"
:port 8080
:dns-refresh-seconds 5} ; Quick refresh for pod scaling
Multiple A Record Handling:
When a hostname resolves to multiple A records, the weight is distributed equally:
{:host "backend.local" :port 8080 :weight 60}Failure Handling:
| Scenario | Startup | Runtime |
|---|---|---|
| DNS timeout | Fatal error | Use last-known-good IPs |
| Unknown host | Fatal error | Use last-known-good IPs |
| Empty A records | Fatal error | Use last-known-good IPs |
DNS API:
;; Get DNS resolution status
(lb/get-dns-status "proxy-name")
;; => {:proxy-name "proxy-name"
;; :targets {"backend.local"
;; {:hostname "backend.local"
;; :port 8080
;; :last-ips ["10.0.0.1" "10.0.0.2"]
;; :consecutive-failures 0}}}
(lb/get-all-dns-status) ; All proxies
(lb/force-dns-resolve! "proxy" "hostname") ; Force refresh
;; Subscribe to DNS events
(require '[lb.dns :as dns])
(dns/subscribe! (fn [event]
(println (:type event) (:hostname event))))
;; Events: :dns-resolved, :dns-failed
Gracefully remove backends from the load balancer by stopping new connections while allowing existing ones to complete. This is essential for zero-downtime deployments, maintenance windows, and rolling updates.
Basic draining:
;; Start draining - no new connections, existing ones continue
(lb/drain-backend! "web" "10.0.0.1:8080")
;; Check drain status
(lb/get-drain-status "10.0.0.1:8080")
;; => {:target-id "10.0.0.1:8080"
;; :status :draining
;; :elapsed-ms 5000
;; :current-connections 3
;; :initial-connections 10}
;; Cancel drain and restore traffic
(lb/undrain-backend! "web" "10.0.0.1:8080")
Draining with timeout and callback:
;; Drain with 60 second timeout and completion callback
(lb/drain-backend! "web" "10.0.0.1:8080"
:timeout-ms 60000
:on-complete (fn [status]
(case status
:completed (println "Drain complete, safe to remove")
:timeout (println "Drain timed out, forcing removal")
:cancelled (println "Drain was cancelled"))))
Synchronous draining (blocks until complete):
;; Block until drain completes or times out
(let [status (lb/wait-for-drain! "10.0.0.1:8080")]
(when (= status :completed)
(lb/remove-backend! "web" "10.0.0.1:8080")))
Rolling update example:
(defn rolling-update [proxy-name targets]
(doseq [target targets]
;; Drain the old instance
(lb/drain-backend! proxy-name target :timeout-ms 30000)
(lb/wait-for-drain! target)
;; Deploy and add new instance
(deploy-new-version! target)
(lb/undrain-backend! proxy-name target)))
Drain Status Values:
| Status | Description |
|---|---|
:draining | Drain in progress, waiting for connections to close |
:completed | All connections closed, drain finished successfully |
:timeout | Timeout expired, some connections may still exist |
:cancelled | Drain cancelled via undrain-backend! |
Configuration:
:settings
{:default-drain-timeout-ms 30000 ; Default timeout (30 seconds)
:drain-check-interval-ms 1000} ; How often to check connection counts
How it works:
drain-backend! sets the target's weight to 0 in BPF mapsFull IPv4/IPv6 dual-stack support enables the same proxy to handle both address families simultaneously. IPv6 addresses can be used for backends, source routes, and SNI routes.
IPv6-only backends:
:default-target
[{:ip "::1" :port 8080 :weight 50}
{:ip "2001:db8::1" :port 8080 :weight 50}]
Mixed IPv4/IPv6 backends (dual-stack):
:default-target
[{:ip "10.0.0.1" :port 8080 :weight 25} ; IPv4
{:ip "10.0.0.2" :port 8080 :weight 25} ; IPv4
{:ip "2001:db8::1" :port 8080 :weight 25} ; IPv6
{:ip "2001:db8::2" :port 8080 :weight 25}] ; IPv6
IPv6 source-based routing:
:source-routes
[;; Route IPv6 documentation prefix to dedicated backend
{:source "2001:db8::/32"
:target {:ip "2001:db8::10" :port 8080}}
;; Route unique local addresses
{:source "fd00::/8"
:target {:ip "fd00::1" :port 8080}}
;; IPv4 routes work alongside IPv6
{:source "192.168.0.0/16"
:target {:ip "10.0.0.1" :port 8080}}]
IPv6 SNI routing:
:sni-routes
[{:sni-hostname "api.example.com"
:target {:ip "2001:db8::10" :port 8443}}
{:sni-hostname "www.example.com"
:target [{:ip "2001:db8::20" :port 8443 :weight 50}
{:ip "2001:db8::21" :port 8443 :weight 50}]}]
Supported IPv6 address formats:
2001:0db8:0000:0000:0000:0000:0000:00012001:db8::1, ::1, fe80::12001:db8::/32, fe80::/10How it works:
IPv6 address utilities:
(require '[lb.util :as util])
;; Detect address family
(util/ipv6? "2001:db8::1") ; => true
(util/ipv4? "192.168.1.1") ; => true
(util/address-family "::1") ; => :ipv6
;; Parse addresses
(util/ipv6-string->bytes "::1")
(util/ip-string->bytes16 "192.168.1.1") ; Unified 16-byte format
;; Parse CIDR
(util/parse-cidr-unified "2001:db8::/32")
; => {:ip <16-bytes> :prefix-len 32 :af :ipv6}
Limitations:
::ffff:192.168.1.1) not supported - use native formatsSee examples/ipv6_dual_stack.clj for comprehensive examples.
Export metrics in Prometheus format for monitoring and alerting. The metrics endpoint is compatible with Prometheus, Grafana, and other monitoring tools.
Enable metrics in settings:
:settings
{:metrics {:enabled true
:port 9090 ; Metrics server port (default 9090)
:path "/metrics"}} ; Endpoint path (default "/metrics")
Available metrics:
| Metric | Type | Labels | Description |
|---|---|---|---|
lb_up | gauge | - | Whether the load balancer is running (1=up) |
lb_info | gauge | version | Load balancer version information |
lb_connections_active | gauge | target_ip, target_port | Current active connections per backend |
lb_bytes_total | counter | target_ip, target_port, direction | Bytes transferred (forward/reverse) |
lb_packets_total | counter | target_ip, target_port, direction | Packets transferred (forward/reverse) |
lb_backend_health | gauge | proxy_name, target_ip, target_port | Backend health status (1=healthy, 0=unhealthy) |
lb_health_check_latency_seconds | histogram | proxy_name, target_id | Health check latency distribution |
lb_dns_resolution_status | gauge | proxy_name, hostname | DNS resolution status (1=resolved, 0=failed) |
Example Prometheus output:
# HELP lb_up Whether the load balancer is running (1=up, 0=down)
# TYPE lb_up gauge
lb_up 1
# HELP lb_backend_health Backend health status (1=healthy, 0=unhealthy)
# TYPE lb_backend_health gauge
lb_backend_health{proxy_name="web",target_ip="10.0.0.1",target_port="8080"} 1
lb_backend_health{proxy_name="web",target_ip="10.0.0.2",target_port="8080"} 0
# HELP lb_health_check_latency_seconds Health check latency in seconds
# TYPE lb_health_check_latency_seconds histogram
lb_health_check_latency_seconds_bucket{proxy_name="web",target_id="10.0.0.1:8080",le="0.005"} 45
lb_health_check_latency_seconds_bucket{proxy_name="web",target_id="10.0.0.1:8080",le="0.01"} 120
lb_health_check_latency_seconds_bucket{proxy_name="web",target_id="10.0.0.1:8080",le="+Inf"} 200
lb_health_check_latency_seconds_sum{proxy_name="web",target_id="10.0.0.1:8080"} 1.234
lb_health_check_latency_seconds_count{proxy_name="web",target_id="10.0.0.1:8080"} 200
Prometheus scrape configuration:
scrape_configs:
- job_name: 'clj-ebpf-lb'
static_configs:
- targets: ['localhost:9090']
Metrics API:
(require '[lb.metrics :as metrics])
;; Start/stop metrics server
(metrics/start! {:port 9090 :path "/metrics"})
(metrics/stop!)
(metrics/running?) ; => true/false
;; Get server status
(metrics/get-status)
;; => {:running true :port 9090 :path "/metrics" :url "http://localhost:9090/metrics"}
;; Collect metrics programmatically (returns Prometheus text format)
(metrics/collect-metrics)
Use the load balancer as a library in your Clojure application:
(require '[lb.core :as lb]
'[lb.config :as config])
;; Create configuration
(def cfg (config/make-simple-config
{:interface "eth0"
:port 80
:target-ip "10.0.0.1"
:target-port 8080
:stats-enabled true}))
;; Initialize the load balancer
(lb/init! cfg)
;; Check status
(lb/get-status)
;; => {:running true, :attached-interfaces ["eth0"], ...}
;; Add a source route at runtime (single target)
(lb/add-source-route! "web" "192.168.1.0/24"
{:ip "10.0.0.2" :port 8080})
;; Add a weighted source route at runtime
(lb/add-source-route! "web" "10.10.0.0/16"
[{:ip "10.0.0.3" :port 8080 :weight 70}
{:ip "10.0.0.4" :port 8080 :weight 30}])
;; Get active connections
(lb/get-connections)
;; Print connection statistics
(lb/print-connections)
;; Shutdown
(lb/shutdown!)
;; Add a new proxy at runtime
(lb/add-proxy!
{:name "api"
:listen {:interfaces ["eth0"] :port 8080}
:default-target {:ip "10.0.1.1" :port 3000}})
;; Remove a proxy
(lb/remove-proxy! "api")
;; Add/remove source routes
(lb/add-source-route! "web" "10.20.0.0/16" {:ip "10.0.0.5" :port 8080})
(lb/remove-source-route! "web" "10.20.0.0/16")
;; Add/remove SNI routes (for TLS traffic)
(lb/add-sni-route! "https-gateway" "api.example.com"
{:ip "10.0.1.1" :port 8443})
;; Add weighted SNI route
(lb/add-sni-route! "https-gateway" "web.example.com"
[{:ip "10.0.2.1" :port 8443 :weight 70}
{:ip "10.0.2.2" :port 8443 :weight 30}])
;; Remove SNI route (case-insensitive)
(lb/remove-sni-route! "https-gateway" "api.example.com")
;; List SNI routes
(lb/list-sni-routes "https-gateway")
;; => [{:hostname "web.example.com"
;; :targets [{:ip "10.0.2.1" :port 8443 :weight 70}
;; {:ip "10.0.2.2" :port 8443 :weight 30}]}]
;; List all SNI routes across all proxies
(lb/list-all-sni-routes)
;; Attach/detach interfaces
(lb/attach-interfaces! ["eth1" "eth2"])
(lb/detach-interfaces! ["eth2"])
;; Enable/disable stats
(lb/enable-stats!)
(lb/disable-stats!)
;; Get connection statistics
(lb/get-connection-count)
(lb/get-connection-stats)
;; => {:total-connections 42
;; :total-packets-forward 12345
;; :total-bytes-forward 987654
;; :total-packets-reverse 11234
;; :total-bytes-reverse 876543}
;; Start streaming statistics
(lb/start-stats-stream!)
(let [ch (lb/subscribe-to-stats)]
;; Read events from channel
(async/<! ch))
(lb/stop-stats-stream!)
(require '[lb.health :as health])
;; Start/stop health checking system
(health/start!)
(health/stop!)
(health/running?) ; => true/false
;; Get health status
(health/get-status "web")
;; => {:proxy-name "web"
;; :targets [{:target-id "10.0.0.1:8080"
;; :status :healthy
;; :last-latency-ms 2.5
;; :consecutive-successes 5}
;; {:target-id "10.0.0.2:8080"
;; :status :unhealthy
;; :last-error :connection-refused}]
;; :original-weights [50 50]
;; :effective-weights [100 0]}
(health/get-all-status) ; All proxies
(health/healthy? "web" "10.0.0.1:8080")
(health/all-healthy? "web")
(health/unhealthy-targets "web")
;; Subscribe to health events
(def unsubscribe
(health/subscribe!
(fn [event]
(println "Health event:" (:type event) (:target-id event)))))
;; Events: :target-healthy, :target-unhealthy, :weights-updated
;; Unsubscribe
(unsubscribe)
;; Manual control (for maintenance)
(health/set-target-status! "web" "10.0.0.1:8080" :unhealthy)
(health/force-check! "web" "10.0.0.1:8080")
;; Direct health checks (for testing)
(health/check-tcp "10.0.0.1" 8080 2000)
;; => {:success? true :latency-ms 1.5}
(health/check-http "10.0.0.1" 8080 "/health" 3000 [200])
;; => {:success? true :latency-ms 15.2 :message "HTTP 200"}
;; Format status for display
(health/print-status "web")
(health/print-all-status)
;; Start draining a backend (stops new connections)
(lb/drain-backend! "web" "10.0.0.1:8080")
(lb/drain-backend! "web" "10.0.0.1:8080"
:timeout-ms 60000
:on-complete (fn [status] (println "Drain finished:" status)))
;; Cancel drain and restore traffic
(lb/undrain-backend! "web" "10.0.0.1:8080")
;; Check if target is draining
(lb/draining? "10.0.0.1:8080") ; => true/false
;; Get drain status for a target
(lb/get-drain-status "10.0.0.1:8080")
;; => {:target-id "10.0.0.1:8080"
;; :proxy-name "web"
;; :status :draining
;; :elapsed-ms 5000
;; :timeout-ms 30000
;; :current-connections 3
;; :initial-connections 10}
;; Get all currently draining backends
(lb/get-all-draining)
;; => [{:target-id "10.0.0.1:8080" :status :draining ...}
;; {:target-id "10.0.0.2:8080" :status :draining ...}]
;; Block until drain completes (returns :completed, :timeout, or :cancelled)
(lb/wait-for-drain! "10.0.0.1:8080")
;; Print drain status
(lb/print-drain-status)
+------------------+
| User Space |
| (Clojure App) |
+--------+---------+
|
BPF Maps (shared) |
+------------------------+------------------------+
| | |
v v v
+--------+ +------------+ +----------+
| Listen | | Conntrack | | Settings |
| Map | | Map | | Map |
+--------+ +------------+ +----------+
| | |
+------------------------+------------------------+
|
+------------------------+------------------------+
| | |
v v v
+--------+ +------------+ +----------+
| XDP | Ingress | Kernel | Egress | TC |
| (DNAT) +----------->+ Stack +---------->+ (SNAT) |
+--------+ +------------+ +----------+
^ |
| v
+---+-----------------------------------------------+---+
| Network Interface |
+-------------------------------------------------------+
XDP_PASS to continue normal processing# Run all tests (requires root)
sudo clojure -M:test
The project includes QEMU-based ARM64 testing infrastructure:
# One-time setup (downloads Ubuntu 24.04 ARM64 image)
./qemu-arm64/setup-vm.sh
# Start the ARM64 VM
./qemu-arm64/start-vm.sh --daemon
# Run tests in ARM VM
./qemu-arm64/run-tests-in-vm.sh --sync
# Stop the VM
./qemu-arm64/stop-vm.sh
See qemu-arm64/README.md for detailed ARM64 testing documentation.
clojure -X:uberjar
# Run the uberjar
sudo java -jar target/clj-ebpf-lb.jar -c lb.edn
"Cannot attach XDP program"
sudo)ip link showuname -r (need 5.15+)"BPF program verification failed"
dmesg | tail -50zgrep CONFIG_BPF /proc/config.gz"Connection not being NAT'd"
(lb/list-attached-interfaces)(lb/get-connections)# View attached XDP programs
sudo bpftool prog list
# View BPF maps
sudo bpftool map list
# Monitor traffic
sudo tcpdump -i eth0 -n port 80
max-connections based on expected concurrent connections.MIT License. See LICENSE for details.
Note: This project was previously named clj-ebpf-reverse-proxy. The core functionality remains the same, but the name was changed to better reflect the weighted load balancing capabilities.
Can you improve this documentation?Edit on GitHub
cljdoc builds & hosts documentation for Clojure/Script libraries
| Ctrl+k | Jump to recent docs |
| ← | Move to previous article |
| → | Move to next article |
| Ctrl+/ | Jump to the search field |