I'm playing around with konserve, and have more experience with missionary than with core.async. I'm wondering if you can help me understand what :sync? false is doing. Do these differ from a functionality or performance perspective, for example?
(go (<! (k/update-in store [:user :age] (fnil inc 0))))
(m/sp
(m/?
(m/via m/blk
(k/update-in store [:user :age] (fnil inc 0) {:sync? true}))))
Maybe a more direct question is just "what is the best way to use konserve with missionary?":sync? false will return a channel, see https://github.com/replikativ/konserve/blob/main/doc/api-walkthrough.md
use can use core.async/take! to get the value async and do what you like with it, like handing to missionary
the difference is whether it will steal a thread from you just to block and wait; async doesn't, sync does; sync can be faster if your store is fast, because context switches of core.async also take time (haven't checked with the new virtual thread integration though)
Separately, can a data structure stored in konserve be queried efficiently at artificial depths? In this example:
a b c
{:users {... 17401484 {:name "whilo"} ...}}
If b had 10m records, would lookup of 17401484 be fast? (I'm thinking in relation to Rama's subindexing feature.)no, konserve stores blobs and if you operate on substructures you still need to deserialize and reserialize the whole blob
i recommend implementing a persistent data structure that will be efficient for your operations on top, such as https://github.com/tonsky/persistent-sorted-set, or our fork https://github.com/replikativ/persistent-sorted-set; this is my strategy to get fast IO
datahike is this on steroids; datascript as well
Are range queries against clojure's sorted maps stored in konserve fast?
Finally (sorry for all the questions): Is it efficient to make small changes deep within a store?
;; With this schema:
{user-id {location-id name}}
;; Assume 10m user ids, each with 100 location ids.
;; Is this efficient?
(k/assoc-in store [10481 14] "Telluride")
depends on the size of the blob, for small blobs its fine
konserve is kv primitive, you can build normalized datastructures on top of it if you setup a key-ref pattern
| Are range queries against clojure's sorted maps stored in konserve fast? For expensive things, you can have an index layer and store that as independent blob, followed by second io call
Thanks in advance for your insights!
hey @whilo --- just learned about datahike and wanted to join the channel, super cool to see the new clojurescript support and the edge compute possibilities that it unlocks š saw that you did your phd with frank wood, that's awesome: curious how you see the world of probabilistic programming interacting with datalog in general and datahike in particular!
hey @jloulacampos! do you know frank? i think clojure's memory model is particularly well suited for simulation based inference/prob prog; coding assistants being a very prominent example; i am churning atm. to get datahike into shape for that, this is also what i am building as an integrated app under #simmis (naming still in flux); datahike fortunately works fairly well with coding assistants and i am working on making the whole stack persistent; almost ready to release a persistent vector db index, also have a persistent fulltext search prototype; what are you working on?
working on a new website, too https://datahike.io/ for the release
oh you worked with josh, nice!
i want to do more prob prog again; i have ported anglican on my new stack, but i am basically stabilizing and releasing it bottom up atm.
SMC with LLMs makes sense; it is a fairly general perspective
should probably read this https://openreview.net/forum?id=xoXn62FzD0
do you also have a talk about it that i could watch?
@whilo yes! here: https://iclr.cc/virtual/2025/oral/31732
I know Frank's papers and have seen him at conferences but never talked to him š
just took a look at #simmis, still getting situated but looks really cool!
> datahike fortunately works fairly well with coding assistants I'm curious what your experience is with using LMs to write datahike queries --- any particular approaches you found helpful there?
> what are you working on? recently finished my phd and joined a startup with vikash and some friends from the paper above! doing some probabilistic programming + LM stuff
have you thought about probabilistic programming variants of datalog? I know about problog etc. but curious if someone's tried to build something like a probabilistic-by-construction version of datomic
datomic datalog is well supported by many coding assistants, and there is enough datahike code on github that claude for instance is able to use it very well
which also changes the SQL is more pragmatic calculation quite a bit, since datalog queries can be much more compact and expressive
what startup is it? sounds fun; i tried to find people to do this with, but i was the last prob prog person in the lab, it also got renamed
from programming languages for ai, to pacific laboratory for ai š¢
yes, i have thought about problog; david poole here worked with it and i hope i can integrate it; happy to collaborate on that; if you need something specific lmk
how do you view the programming languages situation right now? i also really like julia, but i stuck to clojure because of the persistent memory semantics; obviously industry common sense right now is python/typescript
@whilo startup is still in stealth so can't say much, switching to DMs if that's ok!
sure
Just a pedantic remark on terminology: in computing the term "memory model" is generally understood as how data in memory should be accessed and modified by concurrent execution threads. I would probably use instead the term "persistent data structure". Wikipedia, which I don't necessarily consider as an absolute reference, seems to agree with that distinction: https://en.wikipedia.org/wiki/Persistent_data_structure https://en.wikipedia.org/wiki/Memory_model_(programming)