Datahike 0.8.1664 — Query Planner, Secondary Indices & Versioning API We just shipped a big feature release for Datahike — three major systems landing together. 1. Cost-Based Query Planner Datahike now includes an opt-in query planner that converts Datalog queries into fused B-tree scan+merge plans, eliminating the intermediate tuple overhead of the legacy engine. What it does: - DP-based join ordering with cardinality estimation via index sampling - Predicate pushdown into index slice bounds - Semi-naive recursive rule evaluation with magic sets - Probe-driven seeks for selective cross-entity joins - Query result cache with attribute-selective invalidation Performance against Datalevin 0.10.7 and Datomic (20k entities):
Benchmark Description DH Datalevin Datomic
-----------------------------------------------------------------------
q1 Simple lookup [?e :name "Ivan"] 0.05ms 0.26ms 2.5ms
q5 Value join (age shared) 3.9ms 120ms 100ms
q-not NOT negation 1.5ms 49ms 24ms
q-or-join OR-join 4.5ms 66ms 38ms
q-join-chain 3-hop ref chain 12ms 35ms 47ms
All temporal query shapes (as-of, since, history) are supported. Datahike wins every temporal benchmark against Datomic (1.7x-45x faster).
Enable with DATAHIKE_QUERY_PLANNER=true. The planner falls back to the legacy engine for any unsupported query shape — no errors, no breakage.
New: d/explain prints a human-readable query plan. d/query-stats returns execution statistics.
2. Pluggable Secondary Indices
A new protocol-based infrastructure for indices that run alongside the primary B-tree. Three integrations ship today:
- Scriptum (Lucene) — full-text search
- Proximum (HNSW) — vector similarity / KNN
- Stratum (SIMD columnar) — fast aggregates
Secondary indices are declared via schema transactions:
clojure
(d/transact conn [{:db/ident :idx/search
:db.secondary/type :scriptum
:db.secondary/attrs [:article/body]}])
They support dynamic creation with automatic async backfill, copy-on-write branching (indices fork with the database), GC integration, and a composition model via RoaringBitmap entity sets.
The protocol is open — implement ISecondaryIndex to plug in your own index type.
3. Versioning API
Git-like branching and merging, promoted from experimental to the public API:
clojure
(d/branch! conn :staging :main) ;; instant CoW fork
(d/transact conn-staging [...]) ;; work on staging
(d/merge-db conn :main :staging) ;; merge back
(d/delete-branch! conn :staging) ;; clean up
New functions: d/branches, d/branch!, d/delete-branch!, d/force-branch!, d/merge-db, d/commit-id, d/parent-commit-ids, d/commit-as-db, d/branch-as-db.
Branch creation is near-instant regardless of database size (structural sharing). Secondary indices fork with the database. All operations are writer-routed to prevent race conditions.
Getting Started
{org.replikativ/datahike {:mvn/version "0.8.1665"}}
Documentation:
- doc/query-engine.md — planner architecture, enabling, result cache config
- doc/secondary-indices.md — setup, custom index guide, distributed deployment
- doc/versioning.md — branching semantics, use cases, examples
The query planner is opt-in for now. We're running it against all existing test suites (458 tests, 2275 assertions) and plan to make it the default in a future release. Feedback welcome.@alekcz360 @uppfinnarjonas @hoertlehner and anybody else who currently runs Datahike, please run your tests with the new query planner turned on and report issues, I want to make it the default in one of the next releases, but decided to first give everybody the chance to test it. Same for secondary indices, both features will become official, but for now I would appreciate feedback if possible
Wow! Really good news. I will enable the query planner and see how it goes.
New blog post is out, please upvote https://news.ycombinator.com/item?id=47665627
this link doesn't work fwiw
I don't see the link also.
Weird, still don't understand how Hackernews does this. It linked to https://datahike.io/notes/versioned-analytics-regulated-industries/
If slack is sending headers it might be tripping antibrigade, might be better to force url to plaintext next time
interesting