datahike

phronmophobic 2026-05-28T05:52:09.929169Z

Hi, I'm trying out datahike and running into an exception with using a :db/valueType of :db.dtype/bytes

(def cfg {:store {:backend :file
                  :id #uuid "550e8400-e29b-41d4-a716-446655440000"
                  :path (str
                         (fs/file scripts-dir "datahike" "automerge"))}
          :initial-tx
          [{:db/ident :doc/id
            :db/unique :db.unique/identity
            :db/valueType :db.type/string
            :db/cardinality :db.cardinality/one }
           {:db/ident :doc/description
            :db/valueType :db.type/string
            :db/cardinality :db.cardinality/one}
           {:db/ident :doc/save
            :db/valueType :db.type/bytes
            :db/cardinality :db.cardinality/one}]})

(d/create-database cfg)

(def conn
  (delay
    (d/connect cfg)))

(d/transact @conn
              [{:doc/id "mydoc-id"
                :doc/description "description1"
                :doc/save (byte-array [1 2 3])
                }])

(d/transact @conn
              [{:doc/id "mydoc-id"
                :doc/description "description2"
                :doc/save (byte-array [1 2 3])
                }])
This code throws clojure.lang.ExceptionInfo: byte[] cannot be cast to java.lang.Comparable More info in ๐Ÿงต

phronmophobic 2026-05-28T05:52:38.405679Z

For context, I running datahike in a native image application.

phronmophobic 2026-05-28T05:54:18.969869Z

If I omit the :doc/save attributes in my transaction, everything seems to work fine:

(d/transact @conn
              [{:doc/id "mydoc-id"
                :doc/description "description1"
                ;; :doc/save (byte-array [1 2 3])
                }])

(d/transact @conn
              [{:doc/id "mydoc-id"
                :doc/description "description2"
                ;; :doc/save (byte-array [1 2 3])
                }])

whilo 2026-05-28T17:04:26.678199Z

Hey @smith.adriane! Thanks for reporting. This might be an edge case bug in the native image compilation. I will take a look.

๐Ÿ™ 1
whilo 2026-05-28T17:05:06.668469Z

How does native image help you? I don't think many people use it yet, but I can see the value.

phronmophobic 2026-05-28T17:05:28.898529Z

Thanks! I was surprised at how easy it was to get started and have it running under native image on my iPhone!

๐Ÿ”ฅ 1
whilo 2026-05-28T17:08:24.562489Z

Wow!

whilo 2026-05-28T17:08:42.232469Z

I think no one has tried that before.

whilo 2026-05-28T17:09:36.706859Z

What deployment context do you use on the iPhone? Is the JVM sufficiently native compilable that you can build IOS apps easily like this these days?

phronmophobic 2026-05-28T17:10:04.984879Z

https://github.com/phronmophobic/grease

phronmophobic 2026-05-28T17:12:22.852699Z

It's using native image and static jdk. dtype-next is used for exposing bindings. I wrote an https://github.com/phronmophobic/objcjure for calling iOS APIs. It's still very experimental

phronmophobic 2026-05-28T17:12:39.586289Z

But I've been using this setup for my personal podcast app for 2-3 years.

phronmophobic 2026-05-28T17:14:37.316739Z

I also embed sci and an nrepl so you can live code on the device from your laptop.

phronmophobic 2026-05-28T17:15:54.816749Z

I would like to create an app like https://www.omz-software.com/pythonista/, but free and powered by sci. It's still unclear if Apple will have some issue with the app.

whilo 2026-05-28T17:16:54.549149Z

Grease is very cool!

whilo 2026-05-28T17:18:18.585529Z

That is also my sense of Apple, but worth a shot, I guess. And it is always educating to bring an interpreter onto a new stack.

whilo 2026-05-28T18:50:43.670479Z

@smith.adriane Should be fixed with the latest release through https://github.com/replikativ/datahike/pull/833

๐Ÿ™ 1
phronmophobic 2026-05-28T18:52:46.845119Z

Amazing!

phronmophobic 2026-05-28T18:52:51.909889Z

I'll give it a try

whilo 2026-05-28T18:53:03.962389Z

Note that you don't want to use the bytes for blobs, in this case it is better to put them in a raw separate store, e.g. konserve, and join against it in your query clauses by accessing the store there. You can also just use a filesystem. If you want persistent semantics you need to ensure this though, e.g. by using content based hashing and a simple mark sweep GC that can run in the background (pulling all reachable documents from Datahike and deleting the rest). But if you have small byte arrays, or you want them to be range scannable/indexed, then the bytes attribute is good.

phronmophobic 2026-05-28T18:54:38.875489Z

ah ok

phronmophobic 2026-05-28T18:54:57.270199Z

I'm not actually sure how big I expect these documents to get

whilo 2026-05-28T18:56:50.565079Z

Right, we could try to provide a batteries included Datahike, but so far I opted more for a compositional stack where replikativ abstractions make sense on their own. (This is more functional and general than the database "unbundled" notion.)

whilo 2026-05-28T18:58:52.662989Z

I think the byte support is a typical cliff where people start to use it for cases that are not ideal in an indexed database. Datahike also does not yet bound string sizes, which Datomic does, and would be necessary to guarantee latency on index behaviour (otherwise you can get arbitrarily large B-tree nodes from the payloads). Nonetheless in Datomic it was very annoying for many users that they did limit it with hard settings, which were not needed for their use case.

whilo 2026-05-28T19:17:15.946049Z

I would actually be interested to know whether the distributed settings work for you, i.e. whether you can use the datahike-http server or kabel-writer with the native image. So far we kept them out of the native image and they also don't compile cleanly in our ci/cd pipeline when added. Together with the recently added optimistic updates this setting gives me the ability to write nice browser apps with Datahike, e.g. a snappy Roam/Logseq like note editor including multi-player.

maxweber 2026-05-28T19:26:09.343759Z

In past we also put some larger EDN documents into Datomic, since we didn't know that we should avoid this ๐Ÿ˜… We then needed to be careful to not use the corresponding attribute in queries, since this triggers a pull of the index segments that are mostly filled with these large documents. Thereby it evicts a lot of entries from the in-memory Datomic object cache, making the whole app slow for everyone else (we using Datomic for a multi-tenant SaaS). Since Datomic compresses the segments and Fressian also takes a bit of CPU, we also observed large CPU spikes in production.

phronmophobic 2026-05-28T19:39:13.667669Z

I'm new to datahike so I haven't really tried anything except include datahike.api. If it's not working for you in your CI/CD, then I'm skeptical it will work any better for me.

whilo 2026-05-28T19:40:41.938899Z

@smith.adriane I am happy to help, I just wanted to float this with you, since you are more deeply in native compilation land atm. and I have not engaged with it deeply lately. So I think this might be a valuable thing to try out, but it only makes sense if it does for you ofc.

phronmophobic 2026-05-28T19:40:56.497689Z

What are you using for for multiplayer? I'm currently trying out automerge via their C api

whilo 2026-05-28T19:41:00.603149Z

@maxweber Sounds fun ๐Ÿ˜›

phronmophobic 2026-05-28T19:41:46.018769Z

currently, my build setup is already more complicated than I'd like. Maybe I'll try more ambitious setups in the future.

whilo 2026-05-28T19:41:58.332379Z

No conflict resolution yet, obviously we have replikativ, and spindel's programming model will compose well with them eventually.

maxweber 2026-05-28T19:42:21.998649Z

@whilo perfect task for the weekend or the night ๐Ÿ˜…

phronmophobic 2026-05-28T19:43:11.363909Z

I noticed konserve uses core.async. I'm curious about the rationale for spindel if you're already for familiar with core.async

whilo 2026-05-28T19:49:14.642849Z

Core.async was one of the first decisions I made in 2013 when starting to work on a cross-platform stack for replikativ, and I used it to make IO portable (network [kabel] and storage [konserve]). It was sometimes a bumpy ride, but much better than anything else. Nonetheless core.async should be called core.csp, because async programming only requires a very minimal inversion of control (which I now cover with https://github.com/simm-is/partial-cps, which is the basis of spindel, too). My understanding is that Zig also abstracted it (in a different way), to be a minimal choice over control flow management. My current take is that spindel is obviously not as mature, so I will not start to put it in the database layer, and also that core.async is still fine because it is more available in the Clojrue ecosystem, and as long as I use it to program effects and not logic (which I did in replikativ), it is totally fine (e.g. in konserve), because they don't have value semantics anyway. But for the pub-sub system of replikativ/kabel I will move to spindel and also make the CRDTs more programmable in there, i.e. an FRP abstraction.

whilo 2026-05-28T19:50:07.669189Z

@maxweber I can imagine... I think this can happen with Postgres, too, though. It is always a tradeoff, but this is why they have all the size bounded VARCHAR types etc.

๐Ÿ‘ 1
whilo 2026-05-28T17:30:17.505429Z

@hoertlehner As requested I readded the CHANGELOG https://github.com/replikativ/datahike/blob/main/CHANGELOG.md. In general the git history should also be sufficient these days to see what changed, but this is summarized.