Liking cljdoc? Tell your friends :D

Performance of set operations

Is it possible to improve upon the performance of Clojure's built-in set operations?

Yes, for particular scenarios. droitfintech's fset library and new utilities introduced in this library offer measurable performance improvements across a wide range of conditions.

Clojure's set namespace prioritizes straightforward, idiomatic implementations with variadic arguments.

We'll use the Criterium benchmarking facility to objectively measure the evaluation times of a range of scenarios. Our test sets will contain one to one-million elements. While spanning that range, we'll investigate the effects of overlap (i.e., the sizes of the unions and intersections) between the two input sets.

  • All means that both sets contain the exact same elements; the union and intersection are identical.
  • Half means that half of each set is shared with the other; the intersection is one-third of the union.
  • None indicates that neither set has an element in common with the other; the intersection is empty.
Finally, we'll explore the performance implications of the two sets being substantially different sizes.

Overall, fset and this library's third variant, designated alt 3 demonstrate 10-40% speedup over Clojure's implementations. For the latter, the cost is more complex implementation and missing variadics.

Comparing set operations

This upper trio of results involve operating on two sets containing the same number of elements, from one to one-million. The first chart shows full overlap, the middle shows half overlap, and the lower shows no overlap between the elements of the two sets. Each data point represents computing the union, the intersection, the left-difference, and the right-difference.

fset (gold circles) and alt 3 (purple squares) demonstrate a modest, but consistent, 20-40% speedup over the core implementation (red triangles).

(fn [n] ((tactic) (get-in test-sets [n :left]) (get-in test-sets [n :right-all])))

Benchmark measurements for expression `(fn [n] ((tactic) (get-in test-sets [n :left]) (get-in test-sets [n :right-all])))`, time versus 'n' arguments, comparing different versions. Show details
times in seconds, mean±std
arg, n
version 1 10 100 1000 10000 100000 1000000
alt 1 1.3e-04±1.5e-06 1.4e-04±2.3e-06 2.0e-04±8.4e-07 9.8e-04±1.1e-05 9.1e-03±1.1e-04 9.9e-02±2.9e-03 1.3e+00±7.6e-02
alt 2 1.3e-04±2.2e-06 1.4e-04±1.3e-06 2.1e-04±1.3e-06 9.7e-04±5.5e-06 8.6e-03±1.2e-04 9.4e-02±2.3e-03 1.2e+00±5.3e-02
alt 3 1.4e-04±5.4e-07 1.4e-04±2.2e-06 2.0e-04±4.1e-06 8.5e-04±1.9e-05 6.7e-03±1.8e-05 7.2e-02±4.3e-04 9.6e-01±6.9e-02
alt 4 1.3e-04±1.5e-06 1.4e-04±9.5e-07 2.0e-04±1.6e-06 9.5e-04±3.9e-05 7.4e-03±1.5e-04 7.9e-02±7.9e-04 1.0e+00±1.4e-01
core 1.4e-04±2.9e-06 1.5e-04±3.5e-06 1.9e-04±1.6e-06 8.3e-04±1.4e-05 7.3e-03±9.3e-05 9.2e-02±5.5e-04 1.2e+00±5.1e-03
fset 1.4e-04±7.8e-07 1.4e-04±8.1e-07 1.8e-04±3.2e-06 6.6e-04±2.6e-06 4.9e-03±1.2e-04 6.7e-02±1.0e-03 9.4e-01±7.1e-02

(fn [n] ((tactic) (get-in test-sets [n :left]) (get-in test-sets [n :right-half])))

Benchmark measurements for expression `(fn [n] ((tactic) (get-in test-sets [n :left]) (get-in test-sets [n :right-half])))`, time versus 'n' arguments, comparing different versions. Show details
times in seconds, mean±std
arg, n
version 1 10 100 1000 10000 100000 1000000
alt 1 1.3e-04±1.1e-06 1.4e-04±2.7e-06 2.1e-04±1.6e-06 9.7e-04±2.1e-05 9.1e-03±1.4e-04 1.0e-01±8.6e-03 1.3e+00±7.7e-02
alt 2 1.3e-04±1.1e-06 1.4e-04±1.9e-06 2.1e-04±2.7e-06 8.8e-04±8.1e-06 8.4e-03±1.8e-04 9.2e-02±6.1e-03 1.1e+00±7.1e-02
alt 3 1.4e-04±4.1e-07 1.4e-04±6.6e-07 1.9e-04±3.1e-06 7.3e-04±7.7e-06 6.0e-03±2.3e-05 6.6e-02±6.7e-04 8.3e-01±1.1e-01
alt 4 1.4e-04±3.6e-06 1.4e-04±2.5e-06 1.9e-04±9.6e-07 7.7e-04±2.5e-06 6.6e-03±1.0e-04 7.1e-02±8.6e-04 8.6e-01±1.3e-01
core 1.4e-04±7.4e-07 1.4e-04±6.8e-07 2.0e-04±2.4e-06 8.7e-04±1.7e-05 8.2e-03±3.0e-04 1.0e-01±1.1e-02 1.3e+00±2.8e-02
fset 1.4e-04±8.2e-07 1.4e-04±1.2e-06 1.8e-04±3.6e-06 6.8e-04±2.6e-06 5.6e-03±1.6e-04 7.2e-02±7.5e-04 1.0e+00±1.4e-01

(fn [n] ((tactic) (get-in test-sets [n :left]) (get-in test-sets [n :right-none])))

Benchmark measurements for expression `(fn [n] ((tactic) (get-in test-sets [n :left]) (get-in test-sets [n :right-none])))`, time versus 'n' arguments, comparing different versions. Show details
times in seconds, mean±std
arg, n
version 1 10 100 1000 10000 100000 1000000
alt 1 1.4e-04±1.0e-06 1.4e-04±9.0e-07 2.0e-04±8.6e-07 9.7e-04±2.8e-05 9.3e-03±1.7e-04 9.8e-02±6.5e-03 1.3e+00±4.7e-02
alt 2 1.4e-04±9.3e-07 1.4e-04±5.9e-07 2.0e-04±1.5e-06 8.6e-04±1.8e-05 8.0e-03±1.3e-04 8.3e-02±6.1e-03 1.1e+00±9.1e-02
alt 3 1.4e-04±1.4e-06 1.4e-04±1.1e-06 1.8e-04±7.9e-07 6.5e-04±4.5e-06 5.2e-03±2.9e-05 5.4e-02±5.7e-04 7.7e-01±1.1e-01
alt 4 1.3e-04±9.1e-07 1.3e-04±7.2e-07 1.8e-04±2.8e-06 7.0e-04±1.5e-05 5.8e-03±1.1e-04 5.8e-02±3.9e-04 8.2e-01±9.8e-02
core 1.4e-04±1.5e-06 1.4e-04±7.3e-07 2.0e-04±9.1e-07 8.5e-04±1.6e-05 8.1e-03±2.1e-04 9.9e-02±3.0e-03 1.3e+00±6.3e-02
fset 1.4e-04±2.9e-06 1.5e-04±3.2e-06 1.8e-04±2.4e-06 6.6e-04±6.3e-06 5.3e-03±2.4e-05 7.1e-02±2.2e-03 1.0e+00±7.6e-02

Operations on un-balanced sets

In contrast to the upper trio of charts where both sets were the same size, these lower two charts explore what happens when the two sets are substantially different sizes.

The first of the two lower charts shows timings when the left-hand set size is fixed at one-million elements while the right-hand set ranges from one to one-million elements. Again, the two alternative implementations (gold circles and purple squares) provide modest speedups compared to the core implementation (red triangles).

The second of the two lower charts shows timings when the left-hand set size is always one-thousand times larger than the right-hand set. Again, the alternative implementations show modest 15-25% speedups.

(fn [n] ((tactic) (get-in test-sets [1000000 :left]) (get-in test-sets [n :right-half])))

Benchmark measurements for expression `(fn [n] ((tactic) (get-in test-sets [1000000 :left]) (get-in test-sets [n :right-half])))`, time versus 'n' arguments, comparing different versions. Show details
times in seconds, mean±std
arg, n
version 1 10 100 1000 10000 100000 1000000
alt 1 1.4e-04±6.4e-06 1.5e-04±5.0e-06 2.3e-04±2.0e-06 1.2e-03±2.4e-05 1.3e-02±4.2e-04 1.2e-01±2.2e-02 1.3e+00±8.2e-02
alt 2 1.3e-04±3.0e-06 1.4e-04±7.7e-07 2.2e-04±1.5e-06 1.2e-03±4.2e-05 1.2e-02±5.1e-04 1.1e-01±1.7e-02 1.1e+00±6.9e-02
alt 3 1.4e-04±9.6e-06 1.4e-04±7.4e-07 2.2e-04±1.4e-06 1.1e-03±1.2e-05 1.1e-02±1.6e-04 8.5e-02±1.1e-03 8.4e-01±1.5e-01
alt 4 1.3e-04±1.8e-05 1.4e-04±3.7e-06 2.2e-04±3.6e-06 1.6e-03±2.2e-04 1.1e-02±8.1e-05 9.1e-02±1.0e-03 8.5e-01±1.3e-01
core 1.5e-04±1.8e-05 1.5e-04±3.8e-06 2.3e-04±3.4e-06 1.3e-03±2.0e-05 1.9e-02±1.3e-03 1.5e-01±1.6e-02 1.3e+00±8.1e-02
fset 1.4e-04±2.9e-06 1.4e-04±2.5e-06 2.1e-04±2.4e-06 1.1e-03±2.8e-05 1.3e-02±2.4e-04 1.1e-01±2.4e-03 1.0e+00±1.1e-01

(fn [n] ((tactic) (get-in test-sets [n :left]) (get-in test-sets [(/ n 1000) :right-half])))

Benchmark measurements for expression `(fn [n] ((tactic) (get-in test-sets [n :left]) (get-in test-sets [(/ n 1000) :right-half])))`, time versus 'n' arguments, comparing different versions. Show details

Can you improve this documentation?Edit on GitHub

cljdoc builds & hosts documentation for Clojure/Script libraries

Keyboard shortcuts
Ctrl+kJump to recent docs
Move to previous article
Move to next article
Ctrl+/Jump to the search field
× close