Liking cljdoc? Tell your friends :D

Performance of set operations

Comparing set operations
Operations on un-balanced sets

Is it possible to improve upon the performance of Clojure's built-in set operations?

Yes, for particular scenarios. droitfintech's fset library and new utilities introduced in this library offer measurable performance improvements across a wide range of conditions.

Clojure's set namespace prioritizes straightforward, idiomatic implementations with variadic arguments.

We'll use the Criterium benchmarking facility to objectively measure the evaluation times of a range of scenarios. Our test sets will contain one to one-million elements. While spanning that range, we'll investigate the effects of overlap (i.e., the sizes of the unions and intersections) between the two input sets.

All means that both sets contain the exact same elements; the union and intersection are identical.
Half means that half of each set is shared with the other; the intersection is one-third of the union.
None indicates that neither set has an element in common with the other; the intersection is empty.

Finally, we'll explore the performance implications of the two sets being substantially different sizes.

Overall, fset and this library's third variant, designated alt 3 demonstrate 10-40% speedup over Clojure's implementations. For the latter, the cost is more complex implementation and missing variadics.

Comparing set operations

This upper trio of results involve operating on two sets containing the same number of elements, from one to one-million. The first chart shows full overlap, the middle shows half overlap, and the lower shows no overlap between the elements of the two sets. Each data point represents computing the union, the intersection, the left-difference, and the right-difference.

fset (gold circles) and alt 3 (purple squares) demonstrate a modest, but consistent, 20-40% speedup over the core implementation (red triangles).

(fn [n] ((tactic) (get-in test-sets [n :left]) (get-in test-sets [n :right-all])))

Show details

times in seconds, *mean±std*
version	1	10	100	1000	10000	100000	1000000
	arg, n
alt 1	1.3e-04±1.5e-06	1.4e-04±2.3e-06	2.0e-04±8.4e-07	9.8e-04±1.1e-05	9.1e-03±1.1e-04	9.9e-02±2.9e-03	1.3e+00±7.6e-02
alt 2	1.3e-04±2.2e-06	1.4e-04±1.3e-06	2.1e-04±1.3e-06	9.7e-04±5.5e-06	8.6e-03±1.2e-04	9.4e-02±2.3e-03	1.2e+00±5.3e-02
alt 3	1.4e-04±5.4e-07	1.4e-04±2.2e-06	2.0e-04±4.1e-06	8.5e-04±1.9e-05	6.7e-03±1.8e-05	7.2e-02±4.3e-04	9.6e-01±6.9e-02
alt 4	1.3e-04±1.5e-06	1.4e-04±9.5e-07	2.0e-04±1.6e-06	9.5e-04±3.9e-05	7.4e-03±1.5e-04	7.9e-02±7.9e-04	1.0e+00±1.4e-01
core	1.4e-04±2.9e-06	1.5e-04±3.5e-06	1.9e-04±1.6e-06	8.3e-04±1.4e-05	7.3e-03±9.3e-05	9.2e-02±5.5e-04	1.2e+00±5.1e-03
fset	1.4e-04±7.8e-07	1.4e-04±8.1e-07	1.8e-04±3.2e-06	6.6e-04±2.6e-06	4.9e-03±1.2e-04	6.7e-02±1.0e-03	9.4e-01±7.1e-02

(fn [n] ((tactic) (get-in test-sets [n :left]) (get-in test-sets [n :right-half])))

Show details

times in seconds, *mean±std*
version	1	10	100	1000	10000	100000	1000000
	arg, n
alt 1	1.3e-04±1.1e-06	1.4e-04±2.7e-06	2.1e-04±1.6e-06	9.7e-04±2.1e-05	9.1e-03±1.4e-04	1.0e-01±8.6e-03	1.3e+00±7.7e-02
alt 2	1.3e-04±1.1e-06	1.4e-04±1.9e-06	2.1e-04±2.7e-06	8.8e-04±8.1e-06	8.4e-03±1.8e-04	9.2e-02±6.1e-03	1.1e+00±7.1e-02
alt 3	1.4e-04±4.1e-07	1.4e-04±6.6e-07	1.9e-04±3.1e-06	7.3e-04±7.7e-06	6.0e-03±2.3e-05	6.6e-02±6.7e-04	8.3e-01±1.1e-01
alt 4	1.4e-04±3.6e-06	1.4e-04±2.5e-06	1.9e-04±9.6e-07	7.7e-04±2.5e-06	6.6e-03±1.0e-04	7.1e-02±8.6e-04	8.6e-01±1.3e-01
core	1.4e-04±7.4e-07	1.4e-04±6.8e-07	2.0e-04±2.4e-06	8.7e-04±1.7e-05	8.2e-03±3.0e-04	1.0e-01±1.1e-02	1.3e+00±2.8e-02
fset	1.4e-04±8.2e-07	1.4e-04±1.2e-06	1.8e-04±3.6e-06	6.8e-04±2.6e-06	5.6e-03±1.6e-04	7.2e-02±7.5e-04	1.0e+00±1.4e-01

(fn [n] ((tactic) (get-in test-sets [n :left]) (get-in test-sets [n :right-none])))

Show details

times in seconds, *mean±std*
version	1	10	100	1000	10000	100000	1000000
	arg, n
alt 1	1.4e-04±1.0e-06	1.4e-04±9.0e-07	2.0e-04±8.6e-07	9.7e-04±2.8e-05	9.3e-03±1.7e-04	9.8e-02±6.5e-03	1.3e+00±4.7e-02
alt 2	1.4e-04±9.3e-07	1.4e-04±5.9e-07	2.0e-04±1.5e-06	8.6e-04±1.8e-05	8.0e-03±1.3e-04	8.3e-02±6.1e-03	1.1e+00±9.1e-02
alt 3	1.4e-04±1.4e-06	1.4e-04±1.1e-06	1.8e-04±7.9e-07	6.5e-04±4.5e-06	5.2e-03±2.9e-05	5.4e-02±5.7e-04	7.7e-01±1.1e-01
alt 4	1.3e-04±9.1e-07	1.3e-04±7.2e-07	1.8e-04±2.8e-06	7.0e-04±1.5e-05	5.8e-03±1.1e-04	5.8e-02±3.9e-04	8.2e-01±9.8e-02
core	1.4e-04±1.5e-06	1.4e-04±7.3e-07	2.0e-04±9.1e-07	8.5e-04±1.6e-05	8.1e-03±2.1e-04	9.9e-02±3.0e-03	1.3e+00±6.3e-02
fset	1.4e-04±2.9e-06	1.5e-04±3.2e-06	1.8e-04±2.4e-06	6.6e-04±6.3e-06	5.3e-03±2.4e-05	7.1e-02±2.2e-03	1.0e+00±7.6e-02

Operations on un-balanced sets

In contrast to the upper trio of charts where both sets were the same size, these lower two charts explore what happens when the two sets are substantially different sizes.

The first of the two lower charts shows timings when the left-hand set size is fixed at one-million elements while the right-hand set ranges from one to one-million elements. Again, the two alternative implementations (gold circles and purple squares) provide modest speedups compared to the core implementation (red triangles).

The second of the two lower charts shows timings when the left-hand set size is always one-thousand times larger than the right-hand set. Again, the alternative implementations show modest 15-25% speedups.

(fn [n] ((tactic) (get-in test-sets [1000000 :left]) (get-in test-sets [n :right-half])))

Show details

times in seconds, *mean±std*
version	1	10	100	1000	10000	100000	1000000
	arg, n
alt 1	1.4e-04±6.4e-06	1.5e-04±5.0e-06	2.3e-04±2.0e-06	1.2e-03±2.4e-05	1.3e-02±4.2e-04	1.2e-01±2.2e-02	1.3e+00±8.2e-02
alt 2	1.3e-04±3.0e-06	1.4e-04±7.7e-07	2.2e-04±1.5e-06	1.2e-03±4.2e-05	1.2e-02±5.1e-04	1.1e-01±1.7e-02	1.1e+00±6.9e-02
alt 3	1.4e-04±9.6e-06	1.4e-04±7.4e-07	2.2e-04±1.4e-06	1.1e-03±1.2e-05	1.1e-02±1.6e-04	8.5e-02±1.1e-03	8.4e-01±1.5e-01
alt 4	1.3e-04±1.8e-05	1.4e-04±3.7e-06	2.2e-04±3.6e-06	1.6e-03±2.2e-04	1.1e-02±8.1e-05	9.1e-02±1.0e-03	8.5e-01±1.3e-01
core	1.5e-04±1.8e-05	1.5e-04±3.8e-06	2.3e-04±3.4e-06	1.3e-03±2.0e-05	1.9e-02±1.3e-03	1.5e-01±1.6e-02	1.3e+00±8.1e-02
fset	1.4e-04±2.9e-06	1.4e-04±2.5e-06	2.1e-04±2.4e-06	1.1e-03±2.8e-05	1.3e-02±2.4e-04	1.1e-01±2.4e-03	1.0e+00±1.1e-01

(fn [n] ((tactic) (get-in test-sets [n :left]) (get-in test-sets [(/ n 1000) :right-half])))

Show details

times in seconds, *mean±std*
version	1000	10000	100000	1000000
	arg, n
alt 1	1.4e-04±2.0e-06	1.5e-04±2.0e-06	2.3e-04±5.5e-06	1.3e-03±7.4e-05
alt 2	1.4e-04±3.9e-06	1.4e-04±2.6e-06	2.2e-04±3.2e-06	1.3e-03±8.1e-05
alt 3	1.4e-04±1.5e-06	1.5e-04±2.2e-06	2.2e-04±2.3e-06	1.0e-03±1.0e-05
alt 4	1.3e-04±2.5e-06	1.4e-04±6.8e-07	2.2e-04±3.3e-06	1.4e-03±1.1e-04
core	1.4e-04±1.2e-06	1.4e-04±2.2e-06	2.2e-04±1.6e-06	1.3e-03±3.4e-05
fset	1.4e-04±2.1e-06	1.5e-04±1.9e-06	2.0e-04±1.5e-06	1.1e-03±1.0e-05

❮merge_performance situational_performance❯

Can you improve this documentation?Edit on GitHub

cljdoc builds & hosts documentation for Clojure/Script libraries

Keyboard shortcuts

`Ctrl`+`k`	Jump to recent docs
`←`	Move to previous article
`→`	Move to next article
`Ctrl`+`/`	Jump to the search field

Raise an issue Browse cljdoc source Chat on Slack

× close