This page is not created by, affiliated with, or supported by Slack Technologies, Inc.
2022-01-02
Channels
- # babashka (37)
- # beginners (19)
- # calva (5)
- # cider (14)
- # clojure (21)
- # clojure-europe (2)
- # clojurescript (16)
- # conjure (11)
- # datalog (78)
- # deps-new (3)
- # gratitude (2)
- # introduce-yourself (1)
- # joker (1)
- # jvm (4)
- # lsp (17)
- # malli (5)
- # meander (6)
- # minecraft (3)
- # off-topic (17)
- # other-languages (25)
- # practicalli (1)
- # quil (12)
- # reveal (6)
- # spacemacs (19)
Any advice on choosing an appropriate datalog-ish store? I'm indexing external data, so I had no control over defining the initial data model. It can get quite nested at times, and there are different kinds of entities in the system which can be related and have some hierarchical structure
Entities can be IDed based on unique identifiers or an additional index and the unique identifier of their parent
Leaning towards Asami at the moment because I don't need to fight with the data to flatten it, requiring familiarity with all possible options, and I won't have to fight with the schema when things become polymorphic
@ben.sless clj
or cljs
?
the main problem with asami
is that work on it will not be continuing anytime soon, but that would have to be asked @quoll directly
but as it is in clojure, it will probably work until the end of the world without any problems
if clj
, I can't recommend datalevin
enough, which is being developed all the time at an amazing pace, and @huahaiy is doing an amazing job ❤️
clj
. I mostly need to deal with nested data I don't own, the best tool for the job will be derived from there
basically all datalog
db's normalize the data, so you probably need to choose a different selection criterion, because that doesn't help much 🙃
I'm still working on Asami, it's just not my job. That's OK… it hasn't been my job for most of its life anyway. That said, I haven't been looking at a computer for the past 2 weeks. I plan to this week though (I still have 1 more week of vacation)
@ben.sless datascript
automatically unfolds nested maps, and so all the things that are based on it, like datalevin
and datahike
This reminds me… I need to expose this feature in Asami. The (intentional) lack of schemas can be awkward sometimes!
One possible criteria is whether you need the db to store arrays. I've found Datomic-likes constraining when modelling certain kind of data, but not Asami.
Storage for it works fine, but arrays and maps are usually interpreted into triples, which is why this feature is not exposed
It would certainly be useful, but I wouldn't want to take time away from a faster entity
/`pull` implementation (which I know you're working towards) 🙂
I was caught up trying to write the re:clojure talk, and I was procrastinating over that so I wrote cljs-math, and then I went on a Christmas break… my Asami coding is falling behind. That's why I was hoping to spend this week coding. Although, I have 2 other (hopefully short) projects I want to work on this week as well
There's some subtleties, whether the array will get stored as a single value or converted into Asami's linked list.
There was a discussion on this https://clojurians.slack.com/archives/C018H97E02D/p1636986148121300.
datalevin has no problem storing an array as a single value , in fact, if you have single value with huge size regardless the type, datalevin is probably your best option among the alternatives, because LMDB Is faster than file system when dealing with large blobs, as it doesn't incur cost of context switch of system calls, that's why machine learning people use it for storing images for computer vision training.
Less important, but other differences that come to mind:
• Asami and Datahike keep histories, Datascript and Datalevin do not.
• pull
for extracting entities/subtrees – Asami has entity
but it is not as powerful, and – as of DataScript 1.3.0
– not as fast as pull
. (But @quoll is working on it!)
• Asami doesn't support namespaced ids – there are only :db/id
and :id
(you can certainly include an attribute like :person/id
, but it won't be treated as an identifier).
• Datalevin includes (or should soon) full-text search.
• If I'm not mistaken, Asami's query is fastest. And supports some graph features.
• Some considerations if you need durable storage.
You can also look at the https://clojurelog.github.io/. (Not sure how up-to-date it is. See comment below.)
I’ve not been a huge fan of that table. It focuses on the features of XTDB (hence, why it's green on most features) and doesn't consider features that are only supported by other DBs. Then again, I would say that, given that Asami doesn't intersect greatly with XTDB features
https://github.com/lambdaisland/datalog-benchmarks/blob/main/src/datalog_benchmarks/scratch.clj
I don't know how, but doing a doall
on the asami results makes it no longer the fastest
you can a fork this bench https://github.com/joinr/datalevinbench and add asami
| | q1 | q2 | q3 | q4 | qpred1 | qpred2 |
|----------------------+------+------+------+-------+--------+--------|
| latest-datascript | 1.30 | 3.60 | 5.10 | 7.80 | 5.50 | 11.70 |
| latest-datalevin | 0.57 | 2.40 | 2.90 | 4.70 | 5.30 | 6.60 |
| latest-asami | 2.20 | 9.00 | 9.50 | 12.80 | 34.20 | 46.40 |
| latest-datahike-mem | 0.74 | 3.00 | 4.20 | 7.60 | 18.40 | 18.40 |
| latest-datahike-file | 0.80 | 3.10 | 4.30 | 7.20 | 18.20 | 18.20 |
if I remember correctly, previously datahike was the slowest, but we can see guys have made a lot of progress
which is quite interesting anyway, and surprising that datalevin
is so much faster, and both solutions have datascript
underneath
That might be due to different caching implementations, etc. But, supposedly in Datahike examples everything else is equal but type of storage?
in general, I don't know how datahike
works, it probably keeps some data in memory and the benchmark generates so little data that everything fits in memory
Quite surprised to see Asami last. I remember conversations discussing its superior algorithm, for example, in @huahaiy's plans for https://github.com/juji-io/datalevin/issues/11.
writing in clojure, you can very quickly get bogged down using functions that are horribly slow and kill all performance
probably if @ben.sless sits down and does some PR, asami
will be 10x faster than all the rest 🙃
The implementation of asami is mostly idiomatic clojure, so there is large room for improvement. In general, all the existing datalog offerings in the clojure world has large room for performance improvement. I plan to finish datalevin’s query engine rewrite this year, hopefully to address some of the performance issues so it performs similarly to a row store (I.e. any of the sql dbs), and still retains the flexibility of an eav store. Stay tuned.
Datalevin does implement some simple optimizations within the current framework of Datascript, which does make a difference.
Datahike is working on query performance optimization as well. We hope to get to that soon. Help always appreciated. Chime in on the discussion on GitHub if you are interested on Datahike's features: https://github.com/replikativ/datahike/discussions/categories/ideas
@U4GEXTNGZ I am impressed with the progress datahike
has made, congratulations to you guys
less than a year ago datahike
was on average 3x slower than datascript
, now it is marginally faster, but faster
Thanks @U0BBFDED7. This kind of comment makes it worth it. More and more people and companies are relying on Datahike and that encourages going the extra mile. And another great thing of extended interest are the minor and major contributions coming in.
@U4GEXTNGZ, any thoughts on this? > How are results for in-memory and on-disk Datahike so close?