https://github.com/replikativ/persistent-sorted-set 0.4.119 released Major release of our B-tree based persistent sorted set (org.replikativ/persistent-sorted-set). This is the backing data structure for Datahike's indices. Subtree Counts - count is now O(1) (was O(n)) — each node tracks its subtree size - New count-slice for O(log n) range counting: how many elements between A and B without scanning - Works across both CLJ and CLJS get-nth & Rank Statistics - O(log n) positional access into the sorted set via get-nth - Weighted navigation: nodes can carry custom weights (e.g. element count per leaf entry) for rank-based lookups - Enables efficient percentile/median queries over indexed data Measure Support - Pluggable IMeasure interface for user-defined aggregate metadata on tree nodes - Measures propagate up the tree and are maintained incrementally on conj/disj/replace - Supports lazy computation (try-compute-measure) and eager (force-compute-measure) modes - Use case: min/max/sum/count statistics per subtree for zone-map pruning ILeafProcessor Hook - New ILeafProcessor interface lets you post-process leaf entries after add/remove operations - Supports both compaction (merging small entries) and expansion (splitting large ones) - Generalized Branch.add to handle N children from processor output - modifiedIndex parameter for targeted O(1) processing instead of full leaf scans - Enabled on both persistent and transient paths Other Improvements - Namespace migrated from me.tonsky to org.replikativ - B-tree structural invariant tests and diagnostics API (tree-stats, validate-full, fill factor histograms) - compact operation for storage-backed sets - Transient corruption fix (shared _keys array between persistent and transient views) - Full CLJ + CLJS parity for all new features
Hi, I see that this is forked off Tonsky's persistent sorted set which I have been using in one of my projects. what is different from Tonsky's version?
Mostly what is in mentioned above, and additionally also the ability to mark freed addresses on the go which enables online gc such as https://github.com/replikativ/datahike/blob/main/doc/gc.md#online-garbage-collection-incremental-gc.
nice, thank you!
I will bump the Datahike dependency to this soon, it would be cool if people could do son their existing test databases and report to me whether there were any problems @sasha_bogdanov_dev @alekcz360 @hoertlehner or anybody who is able to help a bit with this :)
Perfect thanks!