2026-01-13 datahike | Clojure Slack Archive

datahike

whilo 2026-01-13T06:19:58.208009Z

I have finally merged the cljs support and straightened out a bunch of stuff with the new release 0.7.1615. As mentioned above there are breaking changes for store configurations, every store config now needs to have an :id and while I was at it I also renamed the :mem -> :memory backend to use a more consistent terminology. Most importantly (maybe) the release also contains streaming support through https://github.com/replikativ/konserve-sync. Both cljs support and this support should be considered beta. It is likely that I also broke somethings of downstream users that I was not aware of. Having said that the code base actually did not get much bigger, I could remove quite a bit of bloated bindings and make things leaner at the same time. All documentation is updated and is much improved. You can also install datahike now through npm install datahike@next in node environments (which is also a preview). I am not going to announce this publicly yet, because I hope that some of you can test the release and give me feedback first, including rough edges. Datahike should also be more memory efficient in terms of memory locality and write amplifications and faster during inserts/upserts. Benchmarks will follow later.

whilo 2026-01-13T06:20:44.343089Z

native-image built is currently broken for some reason. I will look into this, @timok maybe you have some capacity, if not then it is fine. I see now why parts of it fail, I made the hitchhiker-tree optional as a dependency and it now is missing in the native pipeline it seems:

Compiling datahike.http.writer
Compiling datahike.impl.entity
Compiling datahike.index
Compiling datahike.index.hitchhiker-tree
Execution error (FileNotFoundException) at datahike.index.hitchhiker-tree.upsert/loading (upsert.cljc:1).
Could not locate hitchhiker/tree__init.class, hitchhiker/tree.clj or hitchhiker/tree.cljc on classpath.

Full report at:
/var/folders/g9/7rkt8rt1241bwwhd3_s8ndp40000gn/T/clojure-4946539122686929424.edn
Error while executing task: ni-cli

alekcz 2026-01-13T06:21:48.514969Z

@whilo I'm free this weekend. I'll test all the changes and give feedback on Monday.

whilo 2026-01-13T06:22:28.837739Z

Awesome. I would also like to release https://github.com/replikativ/konserve-jdbc/ to get this backend out as well.

whilo 2026-01-13T06:43:43.405949Z

I can also take a look at this next https://github.com/replikativ/datahike/pull/755 and I will also need the auto-gc for some of the larger scale experiments I run.

timo 2026-01-13T09:39:52.525259Z

I can check the CI... have to build it up on my fork first to not spam the main branch

whilo 2026-01-13T06:21:26.086619Z

There is also a bigger announcement coming in the next days, I have been cooking over the last month, and this is a preparation for this work.

👀 1

🔥 6

🤖 1

whilo 2026-01-13T06:27:13.650899Z

There are fairly powerful architectures possible now with the memory model, my plan is to use this as foundation for distributed simulation based inference and training of AI models. So while the browser support etc. is very practical and useful, a lot of creative/crazy use cases are possible. For instance it should not be too hard to add DHT support to kabel-pubsub and do p2p synching of databases completely inside the communication layer of the system. https://github.com/replikativ/kabel-auth can provide the expected authentication for such setups (WIP). I am very happy to discuss/brainstorm in any direction. I also like boring apps and simple experiments. Don't be afraid to ask "stupid" questions :)

fmjrey 2026-01-14T17:06:58.570649Z

Thanks for the clarification. Your work is fascinating. EAV/datalog with P2P sounds like a dream come true. And it does look like your simulation approach would work for the purposes of the aforementioned talk.

👍 1

fmjrey 2026-01-13T12:58:30.218409Z

> distributed simulation based inference Do you mean inference based on the output of a distributed simulation? What would P2P/DHT bring to the table compared to the use of a distributed memory grid? Are you thinking of simulation to make systems more resilient, a bit like what this interesting talk suggests in order to make better architectures: https://youtu.be/D8qQUHrksrE

fmjrey 2026-01-13T18:43:30.001059Z

Just to clarify my previous comment, when viewing the talk I mentioned, it became clear to me that the kind of simulation it suggests is not just monkey testing the infra, it's simulating the environment outside the IS or even the business. It's closer to E2E testing but with scenario that are a bit more farfetched.

whilo 2026-01-14T00:06:44.477049Z

I mean probabilistic inference, which is a generalization of many other forms of inference (including logical ones). I wrote my PhD thesis on Bayesian inference and it covers this a little bit https://open.library.ubc.ca/soa/cIRcle/collections/ubctheses/24/items/1.0449997. No need to understand this technically though (the introduction tries to explain it intuitively). Coding models are in effect probabilistc inference models that are trained on massive data sets/tasks. The most challenging problems we face, climate change, reorganizing societies, etc. can be addressed through simulations and trying to infer ahead of time what is even worth trying (i.e. using the simulation for planning). Fuzz testing of code is a special case of doing sampling based inference on correctness predicates (tests).

whilo 2026-01-14T00:07:55.294569Z

I brought up p2p/dht because kabel could bootstrap connectivity inside out even under adversarial conditions, it is always something I thought about doing for replikativ, but didn't have the time to. Maybe I could give a coding assistant a spin and add routing tables and information, which then allows peers to explore and rewire a network.

whilo 2026-01-14T00:08:16.041469Z

It is not necessary for the simulation angle, but you generally want a resilient and flexible network substrate, I think.

whilo 2026-01-14T00:09:05.841429Z

I have only skimmed the talk, but yes, testing business models/ideas is in general the strategy and my goal.

whilo 2026-01-14T00:09:09.648049Z

Not just code.

whilo 2026-01-14T00:09:29.586899Z

It changes the perspective on what code is, probabilistic programming turns it into a model.

whilo 2026-01-13T06:31:16.904629Z

Next I plan to also promote the versioning API, https://github.com/replikativ/datahike/blob/main/doc/versioning.md. I am happy to hear suggestions about how you would like to use it, etc. I am keeping this somewhat close to git/code versioning tools, but it is more flexible, i.e. merges can programmed with datalog queries themselves.

🔥 1

2026-01-13T11:06:19.800579Z

Very cool! So, it really would be nice to get rid of manual boilerplate when merging like

{:adds-in-branch
   (sort (d/q '[:find ?db-add ?e ?a ?v ?t
                :in $ $2 ?db-add
                :where
                [$ ?e ?a ?v ?t]
                [(not= :db/txInstant ?a)]
                (not [$2 ?e ?a ?v ?t])]
              branch-db base-db :db/add))

   :retracts-in-branch
   (sort (d/q '[:find ?db-add ?e ?a ?v ?t
                :in $ $2 ?db-add
                :where
                [$ ?e ?a ?v ?t]
                [(not= :db/txInstant ?a)]
                (not [$2 ?e ?a ?v ?t])]
              base-db branch-db :db/retract))

   :adds-in-main
   (sort (d/q '[:find ?db-add ?e ?a ?v ?t
                :in $ $2 ?db-add
                :where
                [$ ?e ?a ?v ?t]
                [(not= :db/txInstant ?a)]
                (not [$2 ?e ?a ?v ?t])]
              main-db base-db :db/add))
}

I am pretty sure that it is possible to pass this part to Datahike itself and optimize a lot

Clojurians Log v2

datahike