Liking cljdoc? Tell your friends :D

Benchmarks

Performance comparison of Stratum vs DuckDB across standard analytical benchmarks. Primary results on an 8-core Intel Lunar Lake laptop (10M rows), with partial 100M-row results from a 20-core AMD EPYC server.

Methodology

  • Warmup: 5 iterations (10M) / 2 iterations (100M) before measurement
  • Measurement: 10 iterations (10M) / 5 iterations (100M), median reported
  • GC: System.gc() + 200ms sleep between benchmarks to prevent GC pause inflation
  • Threading: 1T = single-threaded, NT = all cores
  • DuckDB: v1.4.4, JDBC in-process, SET threads TO 1 for single-threaded comparisons
  • Data: Real CSV inputs (TPC-H lineitem, ClickBench hits, NYC taxi) and synthetic data (H2O db-benchmark, TPC-DS via DuckDB)
  • Input modes: Arrays (raw long[]/double[]) and Indices (PersistentColumnIndex). Tables show best of both modes.
  • Validation: All queries cross-checked against DuckDB results
  • JIT warmup: All Java hot paths pre-exercised with tiny data (1000 rows) before benchmarks to ensure stable JIT compilations

Ratio > 1.0x = Stratum faster, < 1.0x = DuckDB faster.

Results (10M rows)

Hardware: Intel Core Ultra 7 258V (8 cores, Lunar Lake), Linux, JVM 25.0.1, DuckDB v1.4.4.

Tier 1: TPC-H / SSB

Standard decision-support queries on TPC-H lineitem data (6M rows from CSV).

QueryDescriptionStratum 1TStratum NTDuckDB 1TDuckDB NT1T Ratio
B1TPC-H Q6: filter + SUM(price*discount)12.9ms7.3ms27.9ms5.4ms2.2x
B2TPC-H Q1: GROUP BY + 7 aggregates74.6ms23.4ms92.5ms16.8ms1.2x
B3SSB Q1.1: filter + SUM(price*discount)12.9ms4.8ms28.3ms5.7ms2.2x
B4COUNT(*) no filter0.1ms-0.4ms0.3ms4.0x
B5Filtered COUNT (NEQ predicate)3.1ms1.7ms12.2ms2.9ms4.0x
B6Low-cardinality GROUP BY + COUNT16.9ms7.3ms24.0ms4.6ms1.4x
SSB-Q1.2Tighter filter + SUM(price*discount)12.5ms4.8ms23.3ms4.5ms1.9x

Stratum's fused filter+aggregate execution evaluates predicates and accumulates results in a single SIMD pass, avoiding intermediate arrays.

Tier 2: H2O.ai db-benchmark

Group-by queries from the H2O.ai db-benchmark, testing various group cardinalities and aggregation types.

QueryDescriptionStratum 1TStratum NTDuckDB 1TDuckDB NT1T Ratio
Q1GROUP BY id1 (string, 100 groups), SUM23.0ms9.9ms45.3ms13.1ms2.0x
Q2GROUP BY id1,id2 (string, 10K groups), SUM33.0ms13.1ms123.0ms47.9ms3.7x
Q3GROUP BY id3 (string, 100K groups), SUM+AVG71.1ms70.0ms361.5ms141.1ms5.1x
Q4GROUP BY id4 (int, 100 groups), 3xAVG43.1ms11.5ms44.8ms8.5ms1.0x
Q5GROUP BY id6 (int, 100K groups), 3xSUM82.3ms81.9ms142.7ms92.7ms1.7x
Q6GROUP BY id4,id5 (10K groups), STDDEV29.8ms13.2ms81.0ms36.2ms2.7x
Q7GROUP BY id3 (string, 100K groups), MAX-MIN84.5ms84.5ms359.1ms143.4ms4.2x
Q8Top-2 per group (ROW_NUMBER window)1645.1ms1047.6ms1207.7ms306.6msDuckDB 1.4x
Q9GROUP BY id2,id4 (10K groups), CORR61.3ms22.1ms133.6ms50.1ms2.2x
Q10GROUP BY 6 columns (10M unique groups)832.2ms623.0ms7055.5ms5888.1ms8.5x

H2O Join Queries:

QueryDescriptionStratum 1TStratum NTDuckDB 1TDuckDB NT1T Ratio
J1INNER JOIN small (100 rows), 2xSUM40.8ms5.8ms26.7ms4.7msDuckDB 1.5x
J2INNER JOIN medium (10K), 2-column key, SUM64.2ms29.7ms71.6ms49.5ms1.1x
J3LEFT JOIN medium (10K), 2-column key, SUM73.9ms30.8ms79.4ms50.9ms1.1x

Tier 3: ClickBench

Queries from the ClickBench web analytics benchmark (6M rows from CSV). Organized by query type.

Metadata / Stats-only queries (answered from pre-computed per-chunk statistics without scanning data):

QueryDescriptionStratum 1TDuckDB 1T1T Ratio
Q2SUM + COUNT + AVG (3 aggregates)0.2ms8.9ms~45x
Q3AVG(UserID)0.1ms7.1ms~71x
Q6MIN + MAX(EventTime)0.2ms6.6ms~33x

Filter + aggregate:

QueryDescriptionStratum 1TStratum NTDuckDB 1TDuckDB NT1T Ratio
Q1COUNT WHERE AdvEngineID != 022.2ms11.2ms4.5ms1.0msDuckDB 4.9x
Q7COUNT WHERE 2 predicates36.8ms16.7ms8.7ms1.9msDuckDB 4.2x

Group-by:

QueryDescriptionStratum 1TStratum NTDuckDB 1TDuckDB NT1T Ratio
GRP-SEGROUP BY SearchEngineID (65 groups)12.6ms5.5ms14.2ms2.7ms1.1x
GRP-REGGROUP BY RegionID (3K groups), SUM+COUNT26.6ms26.4ms30.9ms20.4ms1.2x
Q15GROUP BY UserID (1.1M groups), COUNT114.8ms53.6ms74.0ms32.8msDuckDB 1.6x
Q19+GROUP BY EXTRACT(minute), COUNT10.1ms2.8ms1057.6ms188.8ms105x
Q43GROUP BY DATE_TRUNC(minute), COUNT74.4ms45.9ms46.2ms26.6msDuckDB 1.6x
Q12GROUP BY SearchPhrase (string), COUNT274.1ms273.5ms104.9ms24.8msDuckDB 2.6x
Q33GROUP BY URL (1.9M string groups), COUNT124.1ms68.2ms376.6ms100.8ms3.0x

COUNT DISTINCT:

QueryDescriptionStratum 1TStratum NTDuckDB 1TDuckDB NT1T Ratio
Q5COUNT(DISTINCT UserID) -- 1.1M distinct108.6ms107.2ms61.2ms26.0msDuckDB 1.8x
CD-GRPCOUNT(DISTINCT AdvEngineID) GROUP BY RegionID30.8ms15.7ms27.3ms20.5msDuckDB 1.1x
Q8GROUP BY RegionID, COUNT(DISTINCT UserID), TOP 1071.3ms83.3ms79.3ms31.3ms1.1x
Q9GROUP BY RegionID, SUM+COUNT+AVG+COUNT(DISTINCT)114.1ms120.2ms117.0ms52.3ms1.0x

LIKE pattern matching:

QueryDescriptionStratum 1TStratum NTDuckDB 1TDuckDB NT1T Ratio
LIKE1COUNT WHERE URL LIKE '%example.com/page%'23.9ms22.5ms273.3ms46.4ms11.4x
LIKE2COUNT WHERE URL LIKE '%search%'46.8ms27.1ms239.6ms40.8ms5.1x
LIKE3GROUP BY SearchEngineID WHERE URL LIKE '%shop%'26.4ms23.3ms253.8ms44.0ms9.6x
Q20COUNT WHERE URL LIKE '%google%'25.4ms28.3ms179.1ms31.5ms7.1x

String functions:

QueryDescriptionStratum 1TStratum NTDuckDB 1TDuckDB NT1T Ratio
Q27GROUP BY CounterID, AVG(LENGTH(URL)) HAVING COUNT > 100K42.2ms25.8ms192.1ms33.6ms4.6x
Q28AVG(LENGTH(URL))38.0ms23.2ms170.1ms29.5ms4.5x

+CB-Q19: DuckDB v1.4.4 regression -- EXTRACT uses full scan instead of direct aggregation. Fixed in DuckDB v1.5.0.

Tier 4: NYC Taxi

Real-world trip data (~5.8M rows from CSV).

QueryDescriptionStratum 1TStratum NTDuckDB 1TDuckDB NT1T Ratio
Q1AVG(fare) GROUP BY payment_type15.4ms2.8ms21.1ms4.1ms1.4x
Q2AVG(tip) GROUP BY passenger_count45.0ms29.8ms23.1ms4.4msDuckDB 2.0x
Q3COUNT GROUP BY hour, day-of-week16.6ms10.7ms17.4ms3.2ms1.0x
Q4SUM(total) WHERE fare > 10 GROUP BY month42.3ms13.4ms30.6ms6.1msDuckDB 1.4x

Tier 5: Hash Join

Fact table (10M rows) joined to dimension table (1K rows), followed by GROUP BY + SUM.

QueryDescriptionStratum 1TStratum NTDuckDB 1TDuckDB NT1T Ratio
JOIN-Q1Fact JOIN Dim, GROUP BY category, SUM18.9ms5.8ms39.1ms7.6ms2.1x

Tier 6: Statistical Aggregates

Exact median/percentile (QuickSelect) and approximate quantiles (t-digest) on TPC-H price column (6M rows).

QueryDescriptionStratum 1TStratum NTDuckDB 1TDuckDB NT1T Ratio
STAT-Q1MEDIAN(price)67.8ms67.8ms157.6ms134.0ms2.3x
STAT-Q2GROUP BY returnflag, MEDIAN(price)96.1ms96.8ms157.6ms131.8ms1.6x
STAT-Q3PERCENTILE_CONT(0.95, price)52.7ms37.4ms128.2ms114.2ms2.4x
STAT-Q4APPROX_QUANTILE(price, 0.95)243.3ms41.4ms278.1ms43.9ms1.1x
STAT-Q5P25, P50, P75 of price190.9ms191.1ms428.5ms408.8ms2.2x

Tier 7: Window Functions

Window operations on TPC-H lineitem (6M rows).

QueryDescriptionStratum 1TStratum NTDuckDB 1TDuckDB NT1T Ratio
WIN-Q1ROW_NUMBER() OVER (PARTITION BY orderkey)316.3ms191.1ms425.6ms118.4ms1.3x
WIN-Q2LAG(price, 1) OVER (PARTITION BY orderkey)352.2ms215.1ms501.7ms137.0ms1.4x
WIN-Q3Running SUM(price) OVER (PARTITION BY orderkey)387.2ms263.8ms823.4ms252.5ms2.1x

Window functions benefit from multi-threading when partition sizes exceed 8 rows. Single-threaded performance is 1.3-2.1x faster than DuckDB. DuckDB's parallel window implementation achieves stronger NT scaling (3-4x vs Stratum's 1.5-1.7x).

Tier 8: TPC-DS (sf=1, ~2.9M rows)

QueryDescriptionStratum 1TStratum NTDuckDB 1TDuckDB NT1T Ratio
DS-Q1GROUP BY store, SUM + COUNT22.7ms11.1ms12.6ms2.9msDuckDB 1.8x
DS-Q98ROW_NUMBER OVER (PARTITION BY store)299.2ms98.8ms376.1ms89.2ms1.3x

10M Summary

Stratum wins 35 of 46 queries, DuckDB wins 11 (single-threaded comparison, queries > 0.1ms).

Isolation Forest

Operation100K rows1M rows
Train (100 trees x 256)27ms6ms
Score (parallel)155ms419ms
Score (1-thread)152ms1555ms

Results (100M rows)

Hardware: AMD EPYC 7313 16-Core Processor (20 cores), Linux, JVM 25.0.1 (OpenJDK), DuckDB v1.4.4. Index mode only.

Tier 1: TPC-H / SSB (100M rows)

QueryDescriptionStratum 1TStratum NTDuckDB 1TDuckDB NT1T Ratio
B1TPC-H Q6: filter + SUM(price*discount)362.3ms65.5ms681.8ms57.6ms1.9x
B2TPC-H Q1: GROUP BY + 7 aggregates2525.7ms218.7ms2547.0ms197.9ms1.0x
B3SSB Q1.1: filter + SUM(price*discount)349.7ms57.1ms771.4ms64.6ms2.2x
B4COUNT(*) no filter0.3ms--8.5ms1.4ms27x
B5Filtered COUNT (NEQ predicate)49.5ms15.6ms423.9ms37.3ms8.6x
B6Low-cardinality GROUP BY + COUNT957.6ms92.4ms781.0ms59.2msDuckDB 1.2x
SSB-Q1.2Tighter filter + SUM(price*discount)366.1ms50.6ms501.2ms43.7ms1.4x

Tier 2: H2O.ai Group-By (100M rows)

QueryDescriptionStratum 1TStratum NTDuckDB 1TDuckDB NT1T Ratio
Q1GROUP BY id1 (string, 100 groups)398.8ms44.4ms1636.1ms146.1ms4.1x
Q2GROUP BY id1,id2 (string, 10K groups)482.3ms64.3ms3286.9ms296.7ms6.8x
Q3GROUP BY id3 (string, 1M groups)7356.0ms1997.7ms7821.1ms1300.1ms1.1x
Q4GROUP BY id4 (int, 100 groups), 3xAVG1092.0ms79.4ms705.8ms62.1msDuckDB 1.5x
Q5GROUP BY id6 (int, 1M groups), 3xSUM9102.3ms2324.4ms3623.5ms758.1msDuckDB 2.5x
Q7GROUP BY id3 (string, 1M groups), MAX-MIN9194.8ms2278.0ms7991.2ms1218.4msDuckDB 1.2x
Q10GROUP BY 6 columns (100M unique groups)137.9s135.3s175.4s144.1s1.3x

H2O Join Queries (100M rows):

QueryDescriptionStratum 1TStratum NTDuckDB 1TDuckDB NT1T Ratio
J1INNER JOIN small (100 rows), 2xSUM527.6ms52.7ms414.0ms34.2msDuckDB 1.3x
J2INNER JOIN medium (10K), 2-column key, SUM5979.7ms668.3ms6645.9ms2243.4ms1.1x
J3LEFT JOIN medium (10K), 2-column key, SUM6361.8ms664.1ms6893.5ms2253.5ms1.1x

At 100M rows, multi-column joins (J2, J3) show Stratum's NT advantage growing to 3.4x over DuckDB NT.

Tier 3: ClickBench (100M rows)

~100M rows from the full ClickBench hits dataset.

Metadata / Stats-only queries (answered from pre-computed per-chunk statistics without scanning data):

QueryDescriptionStratum 1TDuckDB 1T1T Ratio
Q2SUM + COUNT + AVG (3 aggregates)1.9ms193.6ms102x
Q3AVG(UserID)0.4ms159.8ms400x
Q6MIN + MAX(EventTime)0.7ms200.5ms286x

Filter + aggregate:

QueryDescriptionStratum 1TStratum NTDuckDB 1TDuckDB NT1T Ratio
Q1COUNT WHERE AdvEngineID != 0142.5ms59.8ms66.8ms8.1msDuckDB 2.1x
Q7COUNT WHERE 2 predicates879.8ms341.8ms160.6ms22.9msDuckDB 5.5x

Group-by:

QueryDescriptionStratum 1TStratum NTDuckDB 1TDuckDB NT1T Ratio
GRP-SEGROUP BY SearchEngineID (96 groups)382.6ms33.9ms417.9ms28.8ms1.1x
GRP-REGGROUP BY RegionID (9K groups), SUM+COUNT669.8ms191.2ms748.9ms99.0ms1.1x
Q15GROUP BY UserID (17.6M groups), COUNT4191.6ms1163.8ms3275.6ms327.4msDuckDB 1.3x
Q19+GROUP BY EXTRACT(minute), COUNT301.1ms138.0ms25144.8ms1804.6msregression+
Q43GROUP BY DATE_TRUNC(minute), COUNT3511.0ms944.9ms1112.1ms153.2msDuckDB 3.2x
Q12GROUP BY SearchPhrase (string), COUNT7762.1ms7497.4ms3947.2ms241.1msDuckDB 2.0x
Q33GROUP BY URL (18.3M string groups), COUNT6551.8ms2460.0ms11608.9ms907.5ms1.8x

COUNT DISTINCT:

QueryDescriptionStratum 1TStratum NTDuckDB 1TDuckDB NT1T Ratio
Q5COUNT(DISTINCT UserID) -- 17.6M distinct3663.9ms4059.9ms2961.3ms198.5msDuckDB 1.2x
CD-GRPCOUNT(DISTINCT AdvEngineID) GROUP BY RegionID1204.8ms382.4ms766.3ms105.5msDuckDB 1.6x
Q8GROUP BY RegionID, COUNT(DISTINCT UserID), TOP 106963.6ms1974.7ms5969.9ms314.4msDuckDB 1.2x
Q9GROUP BY RegionID, SUM+COUNT+AVG+COUNT(DISTINCT)9114.1ms3811.1ms6962.1ms435.1msDuckDB 1.3x

String functions:

QueryDescriptionStratum 1TStratum NTDuckDB 1TDuckDB NT1T Ratio
Q28AVG(LENGTH(URL))1151.2ms580.0ms4435.5ms325.3ms3.9x

+CB-Q19: DuckDB v1.4.4 regression -- EXTRACT uses full scan instead of direct aggregation. Fixed in DuckDB v1.5.0.

Tier 4: NYC Taxi (100M run, ~39.7M rows)

QueryDescriptionStratum 1TStratum NTDuckDB 1TDuckDB NT1T Ratio
Q1AVG(fare) GROUP BY payment_type149.7ms17.5ms248.2ms17.5ms1.7x
Q2AVG(tip) GROUP BY passenger_count784.1ms548.1ms257.9ms19.7msDuckDB 3.0x
Q3COUNT GROUP BY hour, day-of-week183.9ms22.7ms183.8ms15.3ms1.0x
Q4SUM(total) WHERE fare > 10 GROUP BY month537.9ms49.2ms400.1ms32.1msDuckDB 1.3x

Tier 5: Hash Join (100M rows)

QueryDescriptionStratum 1TStratum NTDuckDB 1TDuckDB NT1T Ratio
JOIN-Q1Fact JOIN Dim, GROUP BY category, SUM833.6ms54.8ms576.0ms48.5msDuckDB 1.4x

Tier 6: Statistical Aggregates (100M rows)

QueryDescriptionStratum 1TStratum NTDuckDB 1TDuckDB NT1T Ratio
STAT-Q1MEDIAN(price)1486.0ms1650.4ms4315.5ms4198.5ms2.9x
STAT-Q2GROUP BY returnflag, MEDIAN(price)1844.1ms1830.5ms5091.3ms3617.1ms2.8x
STAT-Q3PERCENTILE_CONT(0.95, price)1300.6ms1474.6ms3874.9ms2684.0ms3.0x
STAT-Q4APPROX_QUANTILE(price, 0.95)6139.9ms657.9ms6182.4ms378.9ms1.0x
STAT-Q5P25, P50, P75 of price4189.6ms4149.4ms12291.1ms11955.1ms2.9x

Exact median and percentile operations scale well -- Stratum maintains a ~3x advantage at 100M rows using O(N) QuickSelect.

Tier 7: TPC-DS (sf=1, ~2.9M rows, 100M run)

QueryDescriptionStratum 1TStratum NTDuckDB 1TDuckDB NT1T Ratio
DS-Q1GROUP BY store, SUM + COUNT36.5ms14.2ms19.5ms4.0msDuckDB 1.9x

100M Summary

Stratum wins 15 of 34 queries, DuckDB wins 19 (single-threaded comparison, queries > 0.1ms).

Window functions, LIKE pattern matching, VARIANCE/CORR group-by, and string function group-by benchmarks are omitted from the 100M results -- these code paths were optimized after this hardware was last available and will be added when re-run.

DuckDB's advantage at 100M is concentrated in high-cardinality hash group-by (1M-100M unique groups) where hash tables become DRAM-bound. Stratum wins on filter+aggregate, statistical aggregates, and moderate-cardinality group-by.


Key Performance Characteristics

  • Fused execution: Predicate evaluation and aggregation run in a single SIMD pass, avoiding intermediate array allocation
  • SIMD vectorization: Java Vector API (DoubleVector/LongVector) processes 4 elements per cycle for all filter, aggregate, and group-by operations
  • Dense group-by: Direct array-indexed accumulation for low/moderate-cardinality groups -- no hash function overhead
  • Zone map pruning: Per-chunk min/max statistics allow skipping entire chunks that cannot match predicates
  • Stats-only aggregation: SUM/COUNT/MIN/MAX/AVG answered directly from chunk statistics without touching data (CB-Q2/Q6: 0.2ms, CB-Q3: 0.1ms at 10M)
  • LIKE fast-path: Dictionary pre-filtering with contains/startsWith/endsWith (5-11x vs DuckDB on string pattern matching)
  • QuickSelect for median/percentile: O(N) average-case exact quantile computation (2-3x vs DuckDB)
  • Hash join with fused aggregation: Single-pass probe + accumulate eliminates intermediate materialization
  • JIT warmup on startup: Pre-exercises all Java code paths to produce stable polymorphic JIT compilations before real queries

Where DuckDB Wins

  • High-cardinality group-by at scale: At 100M rows with 1M+ unique groups, DuckDB's parallel hash aggregate scales better
  • COUNT(DISTINCT) on high cardinality: DuckDB's HyperLogLog is faster for 1M+ distinct values
  • Sparse-selectivity filters (CB-Q1, Q7): When few rows match, DuckDB's column compression and zone maps can skip more data
  • Multi-threaded scaling: DuckDB's NT advantage is generally larger -- better thread utilization at high parallelism, particularly for window functions and high-cardinality group-by

Known Multi-Threading Limitations

In some scenarios, Stratum's multi-threaded execution does not improve over single-threaded:

  • High-cardinality group-by (H2O-Q3/Q5/Q7, 60K-100K groups): NT ~ 1T -- no speedup. Accumulator memory exceeds L2 cache per thread, causing DRAM bandwidth saturation. L3-adaptive thread capping mitigates the worst cases but doesn't eliminate the fundamental issue.
  • Mixed aggregation with COUNT DISTINCT (CB-Q8/Q9): NT is slightly slower than 1T due to per-group hash table overhead scaling with thread count.
  • Statistical aggregates (STAT-Q1/Q2/Q5): MEDIAN and multi-percentile use sequential QuickSelect, preventing parallelism.

Root causes and planned improvements:

  1. High-cardinality group-by needs pre-aggregation or partition-aware scheduling to keep working sets L2-resident
  2. COUNT DISTINCT could benefit from radix-partitioned per-group hash tables to reduce sequential merge cost
  3. String-heavy paths need better parallelization of dictionary encoding and hash table probing

Reproducing

# All tiers, 10M rows (default)
clj -M:olap

# Custom scale
clj -M:olap 1000000
clj -M:olap 100000000

# Specific tiers
clj -M:olap t1          # TPC-H/SSB only
clj -M:olap h2o         # H2O.ai only
clj -M:olap cb          # ClickBench only
clj -M:olap taxi        # NYC Taxi only
clj -M:olap join        # Hash Join only
clj -M:olap stat        # Statistical only

# Isolation forest
clj -M:iforest

# With indices
clj -M:olap idx

Insert Benchmark

Throughput for appending to and maintaining a sorted PersistentColumnIndex. Run with clj -M:insert-bench.

Append-only (1M elements)

Append to the end of the index in batches. This is the common case for time-series data.

Batch sizeTimeThroughput
18308ms120K ops/sec
101622ms617K ops/sec
100967ms1.03M ops/sec
1000900ms1.11M ops/sec

Sorted insert (100K elements)

Insert into a sorted position. Binary search via idx-get-long to find the position, then copy-on-write of the modified chunk.

Batch sizeTimeThroughput
11066ms94K ops/sec
10996ms100K ops/sec
1001065ms94K ops/sec
10001060ms94K ops/sec

Mixed 80/20 (100K elements)

80% append, 20% sorted insert -- similar to a write-ahead index with mostly new entries and occasional backdated facts.

Batch sizeTimeThroughput
11151ms87K ops/sec
10353ms283K ops/sec
100312ms321K ops/sec
1000318ms315K ops/sec

Related Documentation

Can you improve this documentation?Edit on GitHub

cljdoc builds & hosts documentation for Clojure/Script libraries

Keyboard shortcuts
Ctrl+kJump to recent docs
Move to previous article
Move to next article
Ctrl+/Jump to the search field
× close