2026-04-06 datahike | Clojure Slack Archive

datahike

whilo 2026-04-06T09:02:30.770189Z

Datahike 0.8.1664 — Query Planner, Secondary Indices & Versioning API We just shipped a big feature release for Datahike — three major systems landing together. 1. Cost-Based Query Planner Datahike now includes an opt-in query planner that converts Datalog queries into fused B-tree scan+merge plans, eliminating the intermediate tuple overhead of the legacy engine. What it does: - DP-based join ordering with cardinality estimation via index sampling - Predicate pushdown into index slice bounds - Semi-naive recursive rule evaluation with magic sets - Probe-driven seeks for selective cross-entity joins - Query result cache with attribute-selective invalidation Performance against Datalevin 0.10.7 and Datomic (20k entities):

Benchmark    Description                       DH     Datalevin  Datomic
-----------------------------------------------------------------------
q1           Simple lookup [?e :name "Ivan"]   0.05ms  0.26ms    2.5ms
q5           Value join (age shared)            3.9ms   120ms    100ms
q-not        NOT negation                       1.5ms    49ms     24ms
q-or-join    OR-join                            4.5ms    66ms     38ms
q-join-chain 3-hop ref chain                     12ms    35ms     47ms

All temporal query shapes (as-of, since, history) are supported. Datahike wins every temporal benchmark against Datomic (1.7x-45x faster). Enable with DATAHIKE_QUERY_PLANNER=true. The planner falls back to the legacy engine for any unsupported query shape — no errors, no breakage. New: d/explain prints a human-readable query plan. d/query-stats returns execution statistics. 2. Pluggable Secondary Indices A new protocol-based infrastructure for indices that run alongside the primary B-tree. Three integrations ship today: - Scriptum (Lucene) — full-text search - Proximum (HNSW) — vector similarity / KNN - Stratum (SIMD columnar) — fast aggregates Secondary indices are declared via schema transactions:

clojure
(d/transact conn [{:db/ident           :idx/search
                   :db.secondary/type  :scriptum
                   :db.secondary/attrs [:article/body]}])

They support dynamic creation with automatic async backfill, copy-on-write branching (indices fork with the database), GC integration, and a composition model via RoaringBitmap entity sets. The protocol is open — implement ISecondaryIndex to plug in your own index type. 3. Versioning API Git-like branching and merging, promoted from experimental to the public API:

clojure
(d/branch! conn :staging :main)         ;; instant CoW fork
(d/transact conn-staging [...])         ;; work on staging
(d/merge-db conn :main :staging)        ;; merge back
(d/delete-branch! conn :staging)        ;; clean up

New functions: d/branches, d/branch!, d/delete-branch!, d/force-branch!, d/merge-db, d/commit-id, d/parent-commit-ids, d/commit-as-db, d/branch-as-db. Branch creation is near-instant regardless of database size (structural sharing). Secondary indices fork with the database. All operations are writer-routed to prevent race conditions. Getting Started

{org.replikativ/datahike {:mvn/version "0.8.1665"}}

Documentation: - doc/query-engine.md — planner architecture, enabling, result cache config - doc/secondary-indices.md — setup, custom index guide, distributed deployment - doc/versioning.md — branching semantics, use cases, examples The query planner is opt-in for now. We're running it against all existing test suites (458 tests, 2275 assertions) and plan to make it the default in a future release. Feedback welcome.

👏 7

👀 1

🔥 1

😮 1

whilo 2026-04-06T09:05:00.254279Z

@alekcz360 @uppfinnarjonas @hoertlehner and anybody else who currently runs Datahike, please run your tests with the new query planner turned on and report issues, I want to make it the default in one of the next releases, but decided to first give everybody the chance to test it. Same for secondary indices, both features will become official, but for now I would appreciate feedback if possible