2026-02-04 datahike | Clojure Slack Archive

datahike

whilo 2026-02-04T11:18:48.437379Z

Datahike Release Announcement - Critical Bug Fix Critical Bug Fix: Datom Replace Operations Timeline Bug Introduced: January 12, 2026 Affected Versions: 0.7.1615 - 0.7.1642 Fixed In: 0.7.1643 What Happened In commit b61dd61 (ClojureScript port #748), we introduced a performance optimization that changed datom upserts from "remove + insert" to a single-traversal ".replace" operation (2x faster). However, this optimization used the WRONG comparator. The bug:

;; WRONG: Uses cmp-quick which compares ALL fields (e, a, v, tx)
(.replace pset old-datom new-datom (index-type->cmp-quick index-type))

When updating a value: - old: [e=1, a=:age, v=25, tx=100] - new: [e=1, a=:age, v=26, tx=101] - cmp-quick(old, new) returns non-zero (different values!) - .replace() asserts comparator must return 0 → AssertionError Required fix:

;; CORRECT: Use cmp-replace which compares only KEY fields
(.replace pset old-datom new-datom (index-type->cmp-replace index-type))

Where cmp-replace for each index compares only the logical keys: - EAVT: compares (e, a) only - allows v and tx to change - AEVT: compares (a, e) only - allows v and tx to change - AVET: compares (a, e) only - allows v and tx to change Why Tests Didn't Catch It Java/CLJ: - Java assertions disabled by default (requires -ea flag) - Bug present but assertion never fired in tests - Would cause silent data corruption in production (see below) ClojureScript: - persistent-sorted-set had :elide-asserts true until v0.3.118 - Assertions were compiled out, bug never detected - After PSS 0.3.118 enabled assertions → CLJS tests started failing Severity Assessment With Assertions Enabled (`-ea` in CLJ, or PSS 0.3.118+ in CLJS): - Tests fail/hang immediately with AssertionError - No data corruption (caught before damage) - Users would notice immediately via test failures Without Assertions (Production CLJ, or old PSS in CLJS): - CRITICAL: Silent data corruption possible - Only affects indexed attributes (AVET index) - When multiple entities have same attribute value, binary search with wrong comparator may find wrong entity → update wrong data - Or may not find datom at all → update silently lost Impact Affects ALL datom updates where values change: - [:db/add entity attribute new-value] - Schema updates - Any value modifications on indexed attributes Corruption risk highest when: 1. Multiple entities share same attribute value 2. Attribute is indexed (appears in AVET) 3. Assertions disabled (typical production) Fix: - Added cmp-replace comparators for all three indices - Used cmp-replace in upsert function - Now all three comparators correct - All tests pass with assertions enabled Additional hardening: - Enabled -ea in all test configurations (commit 077ef54) - Added comprehensive test coverage (commit deac7ec) - Improved error handling to catch Errors not just Exceptions Data Recovery Good news: EAVT and AEVT indices are unaffected by the bug. If you suspect corruption in an affected database: 1. Export all data using EAVT iterator:

(d/datoms db :eavt)  ; :white_check_mark: Safe, contains correct data

2. Delete corrupted database 3. Upgrade to fixed version 4. Re-import data The EAVT index is the source of truth and remains correct. Mitigation for Future 1. Assertions now enabled in all tests - will catch these bugs immediately 2. Comprehensive index-level test coverage - tests all three indices 3. Updated writer error handling - catches Throwable (including AssertionError) Summary - Action Required: Upgrade to this release if using 0.7.1615 or later - Risk: Silent data corruption possible in production without assertions - Recovery: Export via EAVT, reimport if needed - Prevention: Comprehensive tests + assertions enabled going forward Migration Guide for Affected Databases If you're running Datahike 0.7.1615 through 0.7.1642 and have been updating values on indexed attributes, your AVET index may be corrupted. Follow these steps to safely migrate: 1. Export Your Data Use the built-in export function which safely reads from the EAVT index (unaffected by the bug):

clojure
(require '[datahike.api :as d])
(require '[datahike.migrate :as migrate])

;; Connect to your database
(def conn (d/connect your-config))

;; Export all data to a file
(migrate/export-db conn "/path/to/export.cbor")

;; Release connection
(d/release conn)

The export-db function uses (api/datoms db :eavt) which reads from the EAVT index - this index is completely unaffected by the bug and contains your correct data. 2. Delete the Corrupted Database

clojure
(d/delete-database your-config)

3. Upgrade to This Release Update your deps.edn:

clojure
{:deps {io.replikativ/datahike {:mvn/version "0.7.XXXX"}}}  ; New version

4. Recreate and Import

clojure
;; Create fresh database with fixed code
(d/create-database your-config)

;; Connect to new database  
(def conn (d/connect your-config))

;; Import the exported data
(migrate/import-db conn "/path/to/export.cbor")

;; Verify data
(d/q '[:find (count ?e) :where [?e _ _]] @conn)

(d/release conn)

When to Migrate High Priority (Migrate Immediately): - You're using indexed attributes (`:db/index true`) - Multiple entities share the same attribute values - You've been updating values on those attributes - Running in production without -ea assertions Lower Priority (Verify Then Decide): - Only using non-indexed attributes - No value updates, only inserts - Running with assertions enabled (would have seen test failures) - CLJS with persistent-sorted-set < 0.3.118 (assertions were disabled) Skip Migration If: - Running 0.7.1614 or earlier (not affected) - Fresh database created after this release - No indexed attributes in schema Verification After migration, verify your data:

;; Check total datom count matches
(count (d/datoms @conn :eavt))

;; Verify critical entities
(d/pull @conn '[*] entity-id)

;; Run important queries
(d/q your-query @conn)

The export preserves transaction order and all datom information, so your reimported database will be functionally identical to the original (minus any AVET corruption).

whilo 2026-02-04T11:21:58.051529Z

Apologies that this happened, I was not aware that assertions are not active by default in the tests and missed this. I hope nobody is seriously affected in production since this was the large 0.7.* release from just 2 weeks ago.

timo 2026-02-04T13:50:38.725879Z

it's {org.replikativ/datahike... for all new releases right?

whilo 2026-02-04T20:07:03.732039Z

Yes!

Jonas Östlund 2026-02-18T21:08:23.327879Z

Thanks for the information! However, we noticed that recent versions of Datahike produce incorrect results when we initialize the database using load-entities of datoms. My investigation suggests that the AVET index is corrupt also in recent versions with the above fix and it seems to come from the upsert logic of the AVET index. In order for the above logic to work of calling .replace on a persistent set, I would expect that whenever an index is ordered by cmp-datoms-avet-quick, it must imply that the same index is also ordered by cmp-datoms-avet-replace. But that does not always seem to be the case, as demonstrated by an example that I distilled and pushed to a forked repository as a branch https://github.com/replikativ/persistent-sorted-set/compare/main...jonasseglare:persistent-sorted-set:avet-bug-repro?expand=1 . This branch contains a unit test written such that if the bug is present, the unit test passes. The test provides an example that shows that the above implication does not hold and as a consequence the AVET index gets corrupted. Before having understood the problem completely, I started working on a fix of the PersistentSortedSet.replace method that would work if the cmp-datoms-avet-quick comparator is provided. That might be a possible fix to the problem, or another fix would be to go back to the solution of first removing the old datom and then adding the new datom.

🤨 1

whilo 2026-02-18T22:45:05.126109Z

@uppfinnarjonas It is https://github.com/jonasseglare/persistent-sorted-set/commit/1afb451139c6aa9a924bffe4b5208a3a4389515c, right?

whilo 2026-02-18T22:56:59.418359Z

I guess we might have to revert to the old insertion logic and skip replace for avet (the other two are fine).

whilo 2026-02-18T23:10:53.071939Z

https://github.com/replikativ/datahike/pull/781

whilo 2026-02-18T23:12:02.338809Z

Lmk whether this is consistent with your understanding and whether the fix makes sense to you. For the AVET index we need to do still separate disj and conj, because we cannot pattern match the value with any comparator, I think.

whilo 2026-02-18T23:12:29.558889Z

@uppfinnarjonas I want to merge this asap.

whilo 2026-02-19T05:05:28.636039Z

I decided to merge to make sure people starting to use 0.7 are not affected, but we can discuss and also apply a different fix.

Jonas Östlund 2026-02-19T07:04:27.723089Z

Thank you @whilo for looking into this! I finished work just after writing my previous message so I haven't had time to look at it yet but will see if it solves the bigger issue with real data. But I think the heart of the problem is that for the previous logic to work, a set being sorted by cmp-datoms-avet-quick must imply cmp-datoms-avet-replace but this is not necessarily the case, here is a minimal example:

(let [datoms [[50 20 10]
                [20 20 11]]]
    (into []
          (map #(Leaf/isSorted datoms %))
          [cmp-datoms-avet-quick
           cmp-datoms-avet-replace]))
;; => [true false]

So I don't think we can make the comparisons faster in this case by having a special case where we omit comparing the values of the datom, because that breaks the logic. But it might still be useful for the persistent set to have a .replace method but write that method in such a way that it uses the cmp-datoms-avet-quick comparator in case that improves performance. If I have time today, I will finish what I was working on, regarding that.

Jonas Östlund 2026-02-19T11:22:24.731389Z

Here is an PR that adds an additional unit test to check that the issue has been solved: https://github.com/replikativ/datahike/pull/782

Jonas Östlund 2026-02-19T12:12:28.190809Z

I added an additional PR that exhaustively checks that we do not have these issues for the EAVT and AEVT index types: https://github.com/replikativ/datahike/pull/783/changes

Jonas Östlund 2026-02-19T13:31:25.135799Z

Anyway, the PR #781 that does the upsert using disj and conj seems to fix the original issue on the full database. Thanks a lot for quickly addressing this!

whilo 2026-02-19T18:36:15.312599Z

Thank you! I will take a look. I know you are super busy, but it would be good if you could also test against the new pss version https://github.com/replikativ/persistent-sorted-set, I have added subtree counting (and optionally stats and leaf processing/compaction), which will allow us to do adaptive query planning by quickly getting cardinalities for many types of clauses. I bumped the dependency here https://github.com/replikativ/datahike/tree/bump-persistent-sorted-set-0.4.

whilo 2026-02-19T18:37:39.391079Z

Everybody using 0.7 please update to 0.7.1649 to also fix this edge case with AVET.

👍 1

whilo 2026-02-04T11:27:45.615729Z

Datahike: Online Garbage Collection (Experimental) 🧹 With 0.7.1643 we are introducing Online GC for incremental cleanup of freed storage addresses. What is it? Continuously cleans up freed tree nodes during normal operation, preventing unbounded storage growth without downtime. vs Offline GC: • Online: Incremental, continuous, no downtime, single-branch only • Offline (`d/gc-storage`): Full reachability analysis, multi-branch safe, requires downtime Quick Start

;; Enable in config                                                                              
{:online-gc {:enabled? true                                                                      
             :grace-period-ms 60000}}                                                            
                                                                                                   
;; Optional: Background thread                                                                   
(require '[datahike.online-gc :as gc])                                                           
(gc/start-background-gc! (:store @conn) {...})

Key Features • Address Recycling: Freelist reuse instead of deletion (O(n) → O(1)) • Grace Period: Protects long-lived readers (default: 1 min) • Background Thread: Continuous cleanup for long-running services ⚠️ Single-Branch Only Online GC is completely disabled for multi-branch databases (returns 0). Branch A's freed nodes might still be referenced by Branch B through structural sharing. → Use offline GC (d/gc-storage) for multi-branch databases When to Use ✅ Long-running apps, high-write workloads, single-branch ❌ Multi-branch databases, low-write workloads, critical production (yet) Documentation https://github.com/replikativ/datahike/blob/main/doc/gc.md#online-garbage-collection-incremental-gc Status: ⚠️ Experimental Test in staging first • Monitor freed addresses • Single-branch only (enforced) Feedback: https://github.com/replikativ/datahike/issues

whilo 2026-02-04T11:35:13.068339Z

@alekcz360 lmk whether this helps you.

whilo 2026-02-04T11:51:29.433919Z

This PR is the first one that helped with the DBPedia import, keeping the memory footprint reasonably small during batch insertsions.

Clojurians Log v2

datahike