Announcing Yggdrasil - Unified Copy-on-Write Protocols for Heterogeneous Storage We're excited to release Yggdrasil, a protocol stack that brings Git-like branching semantics to any storage system - filesystems, databases, containers, and more. In Norse mythology, Yggdrasil is the World Tree connecting nine realms. This library connects nine storage backends under one unified API. Key Features: • Unified protocols - Snapshotable, Branchable, Graphable, Mergeable, Watchable • 9 adapters - Git, ZFS, Btrfs, OverlayFS, Podman, Datahike, LakeFS, Dolt, Scriptum • PSI consistency - Parallel Snapshot Isolation guarantees across systems • Zero-copy branching - COW semantics on supported filesystems • Compliance test suite - Verify any adapter implementation
(require '[yggdrasil.adapters.git :as git])
(require '[yggdrasil.protocols :as p])
(def sys (git/init! "/tmp/repo" {:system-name "my-project"}))
;; Git-like operations via protocols
(p/branch! sys :experiment)
(p/checkout sys :experiment)
;; ... make changes ...
(p/merge! sys :experiment)
;; Time-travel
(p/history sys {:limit 10})
(p/as-of sys "abc123")
Perfect for agent sandboxing, reproducible pipelines, or composing versioned storage across heterogeneous backends.
Install:
clojure
org.replikativ/yggdrasil {:mvn/version "0.1.1"}
Links:
• GitHub: https://github.com/replikativ/yggdrasil
• Docs: See README for adapter-specific setup (ZFS, Btrfs, etc.)
Works with https://github.com/replikativ/proximum and https://github.com/replikativ/scriptum for versioned vector search and append-only logs.
Feedback welcome! 🌱Actually that is not clear for me also how to branch db with yggdrasil in my case. So... I have a state sys (datahike connection), then do p/branch! , p/checkout and then... what? In my case there intended multiple branches in the same time. What about thread safety in yggdrasil? Workflow I see in examples looks like not intended to working in parallel =(
right now you could create multiple instances of the same system and checkout different branches, but i get what you are saying; in datahike you use multiple connections right?
The goal is not to provide a restricting wrapper API around each system, but to standardize the cow features and make them compositional. You can still use the underlying systems features as usual.
So it should be fine to use datahike directly and you should still see the history reflected in the yggdrasil API as well.
Yes, in datahike I use multiple branched connections
And, that floating bug in Datahike with forking/merging/forking where not all data intended to merge appears in last forked database... it annoy my autotests and agents so much (( my integration tests are growing rapidly and now almost every third run is failing
Which version are you using?
The last one. Just bumped it some hours ago
Good; Do you have code to reproduce by any chance?
That would be not easy... let me try
Thanks!
I will create synthetic repo tomorrow. Now sleep)
I will take a look soon
Yes, I will 👌
@sasha_bogdanov_dev maybe this is of interest to you as well, it also has optional support for datahike; i am still thinking of how to finalize the versioning api in datahike itself, either i will just do it through yggdrasil, or probably i would reexport through the external datahike API binding; ideally i could use the same binding mechanism in yggdrasil and just compose the external bindings, but this is not fully clear to me yet
This will allow to combine the Datahike memory semantics with other copy-on-write systems under a unified abstraction.
Unfortunately, still no success with synthetic failure reproduce =(
can you share it nonetheless, it will give me a better idea what you are doing
Announcing Scriptum - Copy-on-Write Branching for Apache Lucene We're releasing Scriptum, a library that brings Git-like branching semantics to Lucene indices. Fork a 100GB search index in 3-5ms by sharing immutable segment files. Scriptum (Latin: "written") - because your search index deserves version control. Key Features: • Zero-cost forking - Branch any index in 3-5ms regardless of size (copies metadata, not data) • Structural sharing - Branches share immutable Lucene segments via COW overlay directories • Time travel - Open readers at any historical commit point • Full Lucene 10.x - Text search, KNN vectors, facets - all branch-aware • Yggdrasil integration - Implements Snapshotable, Branchable, Graphable, Mergeable protocols
(require '[scriptum.core :as sc])
(def writer (sc/create-index "/tmp/search-idx"))
(sc/add-doc writer {:title {:type :text :value "Hello World"}})
(sc/commit! writer "Initial commit")
;; Fork in milliseconds
(def experiment (sc/fork writer "experiment"))
(sc/add-doc experiment {:title {:type :text :value "Branch only"}})
(sc/commit! experiment "Experimental change")
;; Main unchanged, branch has new doc
(sc/search writer {:match-all {}} 10) ;; => 1 result
(sc/search experiment {:match-all {}} 10) ;; => 2 results
;; Merge back when ready
(sc/merge-from! writer experiment)
How it works:
Scriptum extends Lucene with four components: BranchedDirectory (COW overlay), BranchDeletionPolicy (retains all commits), BranchAwareMergePolicy (protects shared segments), and BranchIndexWriter (main API). See docs/LUCENE_EXTENSION.md for the technical deep-dive.
Install:
clojure
org.replikativ/scriptum {:mvn/version "0.1.5"}
Requirements: Java 21+, Lucene 10.3.2
Links:
• GitHub: https://github.com/replikativ/scriptum
• Technical docs: docs/LUCENE_EXTENSION.md
Feedback welcome!Scriptum will work well as a secondary fulltext index for Datahike, similarly to how Proximum does this for vector search.