https://github.com/replikativ/persistent-sorted-set 0.4.119 released
Major release of our B-tree based persistent sorted set (org.replikativ/persistent-sorted-set) (Thanks to @tonsky for the original work!). This is the backing data structure for Datahike's indices.
Subtree Counts
- count is now O(1) (was O(n)) — each node tracks its subtree size
- New count-slice for O(log n) range counting: how many elements between A and B without scanning
- Works across both CLJ and CLJS
get-nth & Rank Statistics
- O(log n) positional access into the sorted set via get-nth
- Weighted navigation: nodes can carry custom weights (e.g. element count per leaf entry) for rank-based lookups
- Enables efficient percentile/median queries over indexed data
Measure Support
- Pluggable IMeasure interface for user-defined aggregate metadata on tree nodes
- Measures propagate up the tree and are maintained incrementally on conj/disj/replace
- Supports lazy computation (try-compute-measure) and eager (force-compute-measure) modes
- Use case: min/max/sum/count statistics per subtree for zone-map pruning
ILeafProcessor Hook
- New ILeafProcessor interface lets you post-process leaf entries after add/remove operations
- Supports both compaction (merging small entries) and expansion (splitting large ones)
- Generalized Branch.add to handle N children from processor output
- modifiedIndex parameter for targeted O(1) processing instead of full leaf scans
- Enabled on both persistent and transient paths
Other Improvements
- Namespace migrated from me.tonsky to org.replikativ
- B-tree structural invariant tests and diagnostics API (tree-stats, validate-full, fill factor histograms)
- compact operation for storage-backed sets
- Transient corruption fix (shared _keys array between persistent and transient views)
- Full CLJ + CLJS parity for all new features