2026-01-16 datahike | Clojure Slack Archive

datahike

telekid 2026-01-16T02:39:29.558969Z

I'm playing around with konserve, and have more experience with missionary than with core.async. I'm wondering if you can help me understand what :sync? false is doing. Do these differ from a functionality or performance perspective, for example?

(go (<! (k/update-in store [:user :age] (fnil inc 0))))

(m/sp
 (m/?
  (m/via m/blk
         (k/update-in store [:user :age] (fnil inc 0) {:sync? true}))))

Maybe a more direct question is just "what is the best way to use konserve with missionary?"

pat 2026-01-16T06:51:42.758559Z

:sync? false will return a channel, see https://github.com/replikativ/konserve/blob/main/doc/api-walkthrough.md

pat 2026-01-16T06:52:45.322519Z

use can use core.async/take! to get the value async and do what you like with it, like handing to missionary

whilo 2026-01-16T12:02:11.308649Z

the difference is whether it will steal a thread from you just to block and wait; async doesn't, sync does; sync can be faster if your store is fast, because context switches of core.async also take time (haven't checked with the new virtual thread integration though)

👍 1

telekid 2026-01-16T03:06:48.316749Z

Separately, can a data structure stored in konserve be queried efficiently at artificial depths? In this example:

a       b             c
{:users {... 17401484 {:name "whilo"} ...}}

If b had 10m records, would lookup of 17401484 be fast? (I'm thinking in relation to Rama's subindexing feature.)

whilo 2026-01-16T12:02:58.464449Z

no, konserve stores blobs and if you operate on substructures you still need to deserialize and reserialize the whole blob

whilo 2026-01-16T12:03:54.184669Z

i recommend implementing a persistent data structure that will be efficient for your operations on top, such as https://github.com/tonsky/persistent-sorted-set, or our fork https://github.com/replikativ/persistent-sorted-set; this is my strategy to get fast IO

👍 1

whilo 2026-01-16T12:04:05.498359Z

datahike is this on steroids; datascript as well

telekid 2026-01-16T03:12:41.426319Z

Are range queries against clojure's sorted maps stored in konserve fast?

telekid 2026-01-16T03:18:53.010249Z

Finally (sorry for all the questions): Is it efficient to make small changes deep within a store?

;; With this schema:
{user-id {location-id name}}

;; Assume 10m user ids, each with 100 location ids.

;; Is this efficient?
(k/assoc-in store [10481 14] "Telluride")

pat 2026-01-16T06:53:31.808539Z

depends on the size of the blob, for small blobs its fine

pat 2026-01-16T06:54:40.425669Z

konserve is kv primitive, you can build normalized datastructures on top of it if you setup a key-ref pattern

👍 2

pat 2026-01-16T06:56:08.288259Z

| Are range queries against clojure's sorted maps stored in konserve fast? For expensive things, you can have an index layer and store that as independent blob, followed by second io call

telekid 2026-01-16T03:19:07.276619Z

Thanks in advance for your insights!

João Loula 2026-01-16T21:14:53.157109Z

hey @whilo --- just learned about datahike and wanted to join the channel, super cool to see the new clojurescript support and the edge compute possibilities that it unlocks 🙂 saw that you did your phd with frank wood, that's awesome: curious how you see the world of probabilistic programming interacting with datalog in general and datahike in particular!

whilo 2026-01-16T21:18:34.038659Z

hey @jloulacampos! do you know frank? i think clojure's memory model is particularly well suited for simulation based inference/prob prog; coding assistants being a very prominent example; i am churning atm. to get datahike into shape for that, this is also what i am building as an integrated app under #simmis (naming still in flux); datahike fortunately works fairly well with coding assistants and i am working on making the whole stack persistent; almost ready to release a persistent vector db index, also have a persistent fulltext search prototype; what are you working on?

whilo 2026-01-16T21:19:08.733649Z

working on a new website, too https://datahike.io/ for the release

whilo 2026-01-16T21:25:59.519569Z

oh you worked with josh, nice!

whilo 2026-01-16T21:26:59.497459Z

i want to do more prob prog again; i have ported anglican on my new stack, but i am basically stabilizing and releasing it bottom up atm.

whilo 2026-01-16T21:27:52.382189Z

SMC with LLMs makes sense; it is a fairly general perspective

whilo 2026-01-16T21:29:28.079959Z

should probably read this https://openreview.net/forum?id=xoXn62FzD0

whilo 2026-01-16T21:33:42.476779Z

do you also have a talk about it that i could watch?

João Loula 2026-01-16T21:35:27.046669Z

@whilo yes! here: https://iclr.cc/virtual/2025/oral/31732

João Loula 2026-01-16T21:36:21.933649Z

I know Frank's papers and have seen him at conferences but never talked to him 🙂

João Loula 2026-01-16T21:36:50.821989Z

just took a look at #simmis, still getting situated but looks really cool!

João Loula 2026-01-16T21:38:08.456689Z

> datahike fortunately works fairly well with coding assistants I'm curious what your experience is with using LMs to write datahike queries --- any particular approaches you found helpful there?

João Loula 2026-01-16T21:39:39.491619Z

> what are you working on? recently finished my phd and joined a startup with vikash and some friends from the paper above! doing some probabilistic programming + LM stuff

João Loula 2026-01-16T21:41:09.440649Z

have you thought about probabilistic programming variants of datalog? I know about problog etc. but curious if someone's tried to build something like a probabilistic-by-construction version of datomic

whilo 2026-01-16T21:42:51.553779Z

datomic datalog is well supported by many coding assistants, and there is enough datahike code on github that claude for instance is able to use it very well

whilo 2026-01-16T21:43:22.851189Z

which also changes the SQL is more pragmatic calculation quite a bit, since datalog queries can be much more compact and expressive

whilo 2026-01-16T21:44:05.998279Z

what startup is it? sounds fun; i tried to find people to do this with, but i was the last prob prog person in the lab, it also got renamed

whilo 2026-01-16T21:44:32.544749Z

from programming languages for ai, to pacific laboratory for ai 😢

whilo 2026-01-16T21:53:32.373109Z

yes, i have thought about problog; david poole here worked with it and i hope i can integrate it; happy to collaborate on that; if you need something specific lmk

whilo 2026-01-16T21:54:41.877209Z

how do you view the programming languages situation right now? i also really like julia, but i stuck to clojure because of the persistent memory semantics; obviously industry common sense right now is python/typescript

João Loula 2026-01-16T21:59:13.433859Z

@whilo startup is still in stealth so can't say much, switching to DMs if that's ok!

whilo 2026-01-16T21:59:25.498489Z

sure

fmjrey 2026-01-17T13:42:49.237959Z

Just a pedantic remark on terminology: in computing the term "memory model" is generally understood as how data in memory should be accessed and modified by concurrent execution threads. I would probably use instead the term "persistent data structure". Wikipedia, which I don't necessarily consider as an absolute reference, seems to agree with that distinction: https://en.wikipedia.org/wiki/Persistent_data_structure https://en.wikipedia.org/wiki/Memory_model_(programming)

Clojurians Log v2

datahike