Looking to release an optimized Clojure data structure library for ordered collections. Would appreciate feedback. Built on weight-balanced trees using Hirai-Yamamoto parameters (δ=3, γ=2), Adams’ divide-and-conquer algorithms, and Blelloch et al. 2016 parallel join-based operations. Performance vs sorted-set and data.avl (N=500K): ┌───────────────────┬─────────────┬────────────┬──────────┬──────────────────────────────────────┐ │ Operation │ ordered-set │ sorted-set │ data.avl │ Notes │ ├───────────────────┼─────────────┼────────────┼──────────┼──────────────────────────────────────┤ │ Construction │ 1.2s │ 1.5s │ 2.5s │ 25% faster than sorted-set │ ├───────────────────┼─────────────┼────────────┼──────────┼──────────────────────────────────────┤ │ Sequential insert │ 2.5s │ 1.6s │ 2.5s │ 1.56x slower† │ ├───────────────────┼─────────────┼────────────┼──────────┼──────────────────────────────────────┤ │ Lookup │ 15.2ms │ 14.2ms │ 17.7ms │ 7% slower†, 14% faster than data.avl │ ├───────────────────┼─────────────┼────────────┼──────────┼──────────────────────────────────────┤ │ union │ 190ms │ 1.1s* │ — │ 5.8x faster │ ├───────────────────┼─────────────┼────────────┼──────────┼──────────────────────────────────────┤ │ intersection │ 164ms │ 870ms* │ — │ 5.3x faster │ ├───────────────────┼─────────────┼────────────┼──────────┼──────────────────────────────────────┤ │ difference │ 114ms │ 977ms* │ — │ 8.6x faster │ ├───────────────────┼─────────────┼────────────┼──────────┼──────────────────────────────────────┤ │ Parallel fold │ 42ms │ 98ms │ 170ms │ 2.3x faster │ ├───────────────────┼─────────────┼────────────┼──────────┼──────────────────────────────────────┤ │ Split │ 2.5ms │ — │ 11.2ms │ 4.5x faster │ ├───────────────────┼─────────────┼────────────┼──────────┼──────────────────────────────────────┤ │ First/last │ 2.4ms │ 17s │ 32s │ ~7000x faster │ └───────────────────┴─────────────┴────────────┴──────────┴──────────────────────────────────────┘ *clojure.set operations via filter/remove + into †The insert/lookup overhead comes from heterogeneous key support—ordered-set allows mixed types by default, paying for clojure.core/compare dispatch on every comparison. For homogeneous keys, long-ordered-set uses primitive Long/compare and is 20% faster than sorted-set for both operations. Collection types: - ordered-set / ordered-map — data.avl-compatible API (split-key, split-at, subrange, nearest) - long-ordered-set / long-ordered-map — primitive Long keys, 20% faster - double-ordered-set / double-ordered-map — primitive Double keys - string-ordered-set / string-ordered-map — direct String.compareTo - interval-set / interval-map — interval overlap queries, point stabbing - fuzzy-set / fuzzy-map — approximate lookup with pluggable distance functions - ranked-set — O(log n) rank, nth, median, percentile - ordered-multiset — sorted bag allowing duplicates - priority-queue — O(log n) push/peek/pop with parallel fold - range-map — non-overlapping ranges to values - segment-tree / sum-tree / min-tree / max-tree — O(log n) range aggregate queries Key concepts: - Weight-balanced trees with subtree size annotations → O(log n) positional access - Join-based algorithms enable parallelism via ForkJoinPool - CollFold support for parallel r/fold Branch: https://github.com/dco-dev/ordered-collections/tree/020-mutable-transient (Benchmarks: 2018 i9 MacBook Pro, OpenJDK 25, Clojure 1.12.4)
Interesting. Are these general purpose? Or should mostly be used for sorted data?
well this is for when having an ordered collection is beneficial. its not a quick answer. but you can see a bunch of concepts/usecases in the PR
branch whatev
Nice
I'm about to drop a persistent data structure lib. I'll be looking into this
there is more — you havent even met Zorp yet