2026-01-25 datahike | Clojure Slack Archive

datahike

whilo 2026-01-25T01:29:48.549919Z

Announcing Yggdrasil - Unified Copy-on-Write Protocols for Heterogeneous Storage We're excited to release Yggdrasil, a protocol stack that brings Git-like branching semantics to any storage system - filesystems, databases, containers, and more. In Norse mythology, Yggdrasil is the World Tree connecting nine realms. This library connects nine storage backends under one unified API. Key Features: • Unified protocols - Snapshotable, Branchable, Graphable, Mergeable, Watchable • 9 adapters - Git, ZFS, Btrfs, OverlayFS, Podman, Datahike, LakeFS, Dolt, Scriptum • PSI consistency - Parallel Snapshot Isolation guarantees across systems • Zero-copy branching - COW semantics on supported filesystems • Compliance test suite - Verify any adapter implementation

(require '[yggdrasil.adapters.git :as git])
(require '[yggdrasil.protocols :as p])

(def sys (git/init! "/tmp/repo" {:system-name "my-project"}))

;; Git-like operations via protocols
(p/branch! sys :experiment)
(p/checkout sys :experiment)
;; ... make changes ...
(p/merge! sys :experiment)

;; Time-travel
(p/history sys {:limit 10})
(p/as-of sys "abc123")

Perfect for agent sandboxing, reproducible pipelines, or composing versioned storage across heterogeneous backends. Install:

clojure
org.replikativ/yggdrasil {:mvn/version "0.1.1"}

Links: • GitHub: https://github.com/replikativ/yggdrasil • Docs: See README for adapter-specific setup (ZFS, Btrfs, etc.) Works with https://github.com/replikativ/proximum and https://github.com/replikativ/scriptum for versioned vector search and append-only logs. Feedback welcome! 🌱

🚀 5

2026-02-05T00:29:05.124279Z

Actually that is not clear for me also how to branch db with yggdrasil in my case. So... I have a state sys (datahike connection), then do p/branch! , p/checkout and then... what? In my case there intended multiple branches in the same time. What about thread safety in yggdrasil? Workflow I see in examples looks like not intended to working in parallel =(

whilo 2026-02-05T00:59:15.840359Z

right now you could create multiple instances of the same system and checkout different branches, but i get what you are saying; in datahike you use multiple connections right?

whilo 2026-02-05T01:07:03.818859Z

The goal is not to provide a restricting wrapper API around each system, but to standardize the cow features and make them compositional. You can still use the underlying systems features as usual.

whilo 2026-02-05T01:07:40.547179Z

So it should be fine to use datahike directly and you should still see the history reflected in the yggdrasil API as well.

👍 1

2026-02-05T01:15:35.179209Z

Yes, in datahike I use multiple branched connections

2026-02-05T01:16:36.609679Z

And, that floating bug in Datahike with forking/merging/forking where not all data intended to merge appears in last forked database... it annoy my autotests and agents so much (( my integration tests are growing rapidly and now almost every third run is failing

whilo 2026-02-05T01:17:07.682419Z

Which version are you using?

2026-02-05T01:18:02.717849Z

The last one. Just bumped it some hours ago

whilo 2026-02-05T01:18:36.353069Z

Good; Do you have code to reproduce by any chance?

2026-02-05T01:19:22.720719Z

That would be not easy... let me try

whilo 2026-02-05T01:19:33.392799Z

Thanks!

2026-02-05T01:28:42.735739Z

I will create synthetic repo tomorrow. Now sleep)

👍 1

2026-01-28T10:35:15.049059Z

I will take a look soon

2026-02-06T16:34:33.003079Z

Yes, I will 👌

whilo 2026-01-28T00:12:18.280809Z

@sasha_bogdanov_dev maybe this is of interest to you as well, it also has optional support for datahike; i am still thinking of how to finalize the versioning api in datahike itself, either i will just do it through yggdrasil, or probably i would reexport through the external datahike API binding; ideally i could use the same binding mechanism in yggdrasil and just compose the external bindings, but this is not fully clear to me yet

🔥 1

whilo 2026-01-25T01:30:23.375999Z

This will allow to combine the Datahike memory semantics with other copy-on-write systems under a unified abstraction.

2026-02-05T23:37:40.425639Z

Unfortunately, still no success with synthetic failure reproduce =(

whilo 2026-02-06T00:38:39.613839Z

can you share it nonetheless, it will give me a better idea what you are doing

whilo 2026-01-25T02:33:44.825339Z

Announcing Scriptum - Copy-on-Write Branching for Apache Lucene We're releasing Scriptum, a library that brings Git-like branching semantics to Lucene indices. Fork a 100GB search index in 3-5ms by sharing immutable segment files. Scriptum (Latin: "written") - because your search index deserves version control. Key Features: • Zero-cost forking - Branch any index in 3-5ms regardless of size (copies metadata, not data) • Structural sharing - Branches share immutable Lucene segments via COW overlay directories • Time travel - Open readers at any historical commit point • Full Lucene 10.x - Text search, KNN vectors, facets - all branch-aware • Yggdrasil integration - Implements Snapshotable, Branchable, Graphable, Mergeable protocols

(require '[scriptum.core :as sc])

(def writer (sc/create-index "/tmp/search-idx"))
(sc/add-doc writer {:title {:type :text :value "Hello World"}})
(sc/commit! writer "Initial commit")

;; Fork in milliseconds
(def experiment (sc/fork writer "experiment"))
(sc/add-doc experiment {:title {:type :text :value "Branch only"}})
(sc/commit! experiment "Experimental change")

;; Main unchanged, branch has new doc
(sc/search writer {:match-all {}} 10)      ;; => 1 result
(sc/search experiment {:match-all {}} 10)  ;; => 2 results

;; Merge back when ready
(sc/merge-from! writer experiment)

How it works: Scriptum extends Lucene with four components: BranchedDirectory (COW overlay), BranchDeletionPolicy (retains all commits), BranchAwareMergePolicy (protects shared segments), and BranchIndexWriter (main API). See docs/LUCENE_EXTENSION.md for the technical deep-dive. Install:

clojure
org.replikativ/scriptum {:mvn/version "0.1.5"}

Requirements: Java 21+, Lucene 10.3.2 Links: • GitHub: https://github.com/replikativ/scriptum • Technical docs: docs/LUCENE_EXTENSION.md Feedback welcome!

whilo 2026-01-25T02:36:07.302799Z

Scriptum will work well as a secondary fulltext index for Datahike, similarly to how Proximum does this for vector search.

Clojurians Log v2

datahike