datahike

cch1 2024-12-18T02:40:40.183969Z

This channel is where I ended up from the link on the https://github.com/replikativ/hasch. I'm guessing readers here might be interested https://clojurians.slack.com/archives/C03S1KBA2/p1734446525391189 about hashing.

whilo 2024-12-18T13:20:17.952859Z

hash is used by the internals of Java for equality and to use its hashes in collections, e.g. HashMap. It choses to allow collisions for higher performance and I guess it used Integers because historically architectures were 32 bit. You can use https://github.com/replikativ/hasch/blob/main/src/hasch/core.cljc#L12 and extract 8 bytes from it (e.g. the first eight) to get the optimal trade off in terms of collisions for 64 bit in terms of SHA-512.

cch1 2024-12-18T19:46:38.119909Z

Thanks for your helpful response, @whilo. I will venture another hasch question here: why does the hasch documentation and code make so many references to EDN? There appears to be no correlation to EDN: nothing is read with the edn reader and AFAICT nothing is even serialized to EDN (or any other string format) with pr-str. If I define an EDN reader tag for my own type it does nothing to add support in hasch for my type. Finally, even the print-method support for Clojure unordered collections as used by pr-str (and read by EDN) would be a poor choice for input to hashing since the serialized string it is not deterministic for equal-valued sets. It is really confusing to talk about "crypto-hash EDN data structures` when EDN has nothing to with hasch. Maybe a design goal of hasch is to support the same Clojure types that the (Clojure) EDN reader can read? Am I missing something?

whilo 2024-12-18T19:53:58.017939Z

You are right, edn denotes a syntax for the core data types of Clojure. I agree that this could be reframed better, I wrote this library 10 years ago as one of the first to tackle distributed databases. Basically the goal was to make everything hashable that would be edn serializable in Clojure. It is exactly for the reasons you point out not a good idea to use serialized edn (or any other serializer) for hashing. If Clojure had explicit data types and a name for this core set of structures as represented then it would be be better to think of hasch operating on that, not on a serialized format.

cch1 2024-12-18T19:55:10.824389Z

Gotcha. That makes sense.

cch1 2024-12-18T19:55:21.337589Z

Are you are big Datomic user as well?

cch1 2024-12-18T19:55:33.945719Z

(I read/inferred that from something in the repo...)

cch1 2024-12-18T19:55:47.052509Z

Either way, thanks for your contribution to the community.

whilo 2024-12-18T19:58:06.123289Z

Not that much lately. I had the intention to build a strong fully global distributed memory model with replikativ, and Datahike is lifting this from Datomic to the fully openly replicated setting. I am trying to follow innovations in the space, but I haven't used Datomic in a long time.

👏 1
whilo 2024-12-18T19:58:51.145509Z

Lately I have been exploring Electric and think it would be nice to get Datahike cljs working to combine it as distributed memory with Datahike in the backend.

cch1 2024-12-18T19:58:57.789089Z

That sounds ambitious. Are you aware of the Convex project?

whilo 2024-12-18T19:59:22.469669Z

I have seen it and they wanted to hire me at some point, I haven't caught up on the exact CRDT stuff they did yet.

whilo 2024-12-18T19:59:57.099299Z

But I think you don't need a crypto project, Datomic with open joining and crypto-hashed snapshots would already go a long way.

cch1 2024-12-18T20:00:15.664149Z

Agreed. For my needs at least.

whilo 2024-12-18T20:00:16.168869Z

We had a tendermint/cosmos prototype with Datahike 5 years ago.

cch1 2024-12-18T20:07:58.006999Z

That sounds pretty cool. One day I hope to have a good business reason to integrate with Convex.

whilo 2024-12-19T02:37:27.084759Z

It already works and Datahike is now tested at scale in production https://github.com/replikativ/datahike/blob/main/doc/distributed.md . I need to update the docs.