Fork me on GitHub
#xtdb
<
2020-05-02
>
jeroenvandijk19:05:24

Maybe someone here wants to give feedback on a choice for Datomic vs Datahike vs Crux for an open source version of RoamResearch https://github.com/athensresearch/athens/issues/9

👍 12
refset19:05:22

Thanks for mentioning this - I've swapped a few messages with Jeff already but I'll add some new thoughts there now 🙂

richiardiandrea20:05:57

Interesting thanks for sharing I will watch the issue as well

refset21:05:15

Cool, I just wrote a bunch of things. If anyone else has other ideas please feel free to chime in!

jeroenvandijk21:05:35

Thank you 🙂

🙏 4
kosengan11:05:31

+1 would like to know the pros and cons of Crux and Datahike. Exploring and considering for side-projects..

refset12:05:30

Hey @UJMU98QDC I really don't know enough about Datahike to provide a fair comparison, but I think it's accurate to say that Datahike took an in-memory Datalog engine (DataScript) and swapped the indexes to be persistent, whereas Crux was designed for persistence from the beginning, so the query engine is optimised for laziness, modest memory usage and pushing the "hard work" down into RocksDB/LMDB. This also means Crux's Datalog is arguably more declarative, as it doesn't require the user to figure out the optimal clause ordering. I would expect Crux to provide better overall performance when handling large volumes of queries in parallel over large data sets (> memory). We've not run any comparison benchmarking ourselves and it's not a priority for us to do so, although it would be interesting to see the results. In any case, we think it is good to have a healthy range of alternative Datalog technologies succeed with different strengths in various niches 🙂

kosengan12:05:04

"We're using Datomic (on-prem) in production and I played a bit with Crux, so a short summary as they're often compared. For a typical web apps document model rather than graph model + the absence of explicit schema (in Datomic schema is our primary source of truth / data model) makes it a bit less expressive / convenient (e.g. it's not trivial to make an upsert in Crux). And the absence of pull syntax, which is what we're using 95% of time, is a huge limitation. That said, Crux is good for what it is advertised for: aggregating documents from external sources. Also, it should scale better for use cases with high churn attributes (e.g. logging user events). Still, I don't see why it is often compared to Datomic: they're rather complementary beasts and the true power would be to use them together." source: https://www.reddit.com/r/Clojure/comments/elu8kw/crux_development_diary/ ☝️Any comments ?

refset12:05:54

> the absence of explicit schema makes it a bit less expressive / convenient Well, your app schema has to live somewhere. I've heard several people say they are pretty happy already with just using spec as the source-of-truth schema for their app's data model, and they derive their Datomic schema from that, therefore migrating to Crux actually means removing a layer of complexity with the schema translation step. Perhaps the biggest gap vs something with a richer data model (such as Datomic), regardless of schema tech you choose (spec, malli, JSON schema etc.) is consistently enforcing invariants between documents. This is sovleable if you funnel your writes through a single node though, or if you switch on the experimental transaction functions feature (there is still some design work left before we can roll this out properly as a stable feature). Pull syntax is an easier fix and can be tackled in user space. There has been a lot of discussion about it in this channel lately. Officially supporting EQL feels like a good long-term bet to me, and we've already got a working prototype, but can't make promises just yet!

👍 8
refset16:05:49

@UJMU98QDC Hi again - I noticed you mention on the #datahike channel that you thought Datahike would be better for scalability than Crux - did you manage to run some of your own performance testing? It would be great to know exactly which angle(s) of scalability you're looking at, e.g. max db size, concurrent queries per node, latency, ingestion throughput