This page is not created by, affiliated with, or supported by Slack Technologies, Inc.
2021-03-16
Channels
- # announcements (11)
- # atom-editor (4)
- # babashka (26)
- # beginners (126)
- # calva (35)
- # chlorine-clover (14)
- # clj-kondo (50)
- # cljfx (1)
- # cljs-dev (1)
- # cljsrn (3)
- # clojure (31)
- # clojure-europe (144)
- # clojure-germany (2)
- # clojure-nl (3)
- # clojure-serbia (17)
- # clojure-spain (11)
- # clojure-uk (38)
- # clojurescript (87)
- # community-development (1)
- # conjure (1)
- # datalog (1)
- # datascript (160)
- # datomic (28)
- # duct (2)
- # emacs (4)
- # events (1)
- # figwheel-main (1)
- # fulcro (15)
- # graalvm (4)
- # honeysql (53)
- # jobs (2)
- # jobs-discuss (14)
- # juxt (6)
- # lsp (59)
- # malli (13)
- # music (1)
- # off-topic (8)
- # pathom (22)
- # portal (7)
- # re-frame (2)
- # reagent (3)
- # releases (1)
- # remote-jobs (1)
- # rewrite-clj (1)
- # shadow-cljs (25)
- # sql (3)
- # tools-deps (38)
- # xtdb (17)
I've heard that there can be performance issues with Datascript in single page apps with "large" databases ? Is that still true or an old myth ? What are your experiences with performance ? Are there any types of queries / use cases to avoid ? Looking from the perspective of re-frame, possibly official support etc.
We use datascript in production, or rather what's left of it. We had to cut out most of the functionality due to tragic performance with more products in the db. Basically, all that's left is the pull syntax.
Datascript
, contrary to what we can read on github
, has very poor performance. For example datalevin
despite the fact that it uses data stored on disk is much faster.
Almost everything is faster than datascript
, even queries written in meander
, operating on flat db in fulcro
style. Simplest macro that creates transreducer
on the fly beats datascript
.
As about re-posh
, sometimes it loses changes, especially if you evict data from db. It also doesn't allow to use all possibilities of datascript
, and with bigger number of arguments in :in
it loses order, so you have to wrap arguments in vector.
Thanks for the reply @U0BBFDED7 🙂 Roughly how big is your datascript db ?
@U0BBFDED7 nice - if damning - review. It’s good to hear from people who have used it in anger (literally, it would seem).
there was once a pretty interesting conversation on this topic here, but the archive doesn't work
lilactown created autonormal https://github.com/lilactown/autonormal
I'm a compulsive github browser, and overall what I've noticed recently is that everyone seems to have run into the same problems and everyone is trying to solve them somehow.
one is state management, preferably using as flat a normalized db as possible or using a datalog.
the other is to get rid of unnecessary multiple recalculations using graphs.
third is closer integration with react. rumext
, helix
or uix
are only first examples.
been trying to get an overview by compiling a list of the graph stuff: https://github.com/simongray/clojure-graph-resources
Thanks! That is all very helpful. I'll look into all that. We want to improve the state management and subs (graph) storey in re-frame, but I doubt we'll ever diverge from reagent as the backwards compat is probably too much of an issue.
@U4P4NREBY you can also add the mentioned autonormal
@U0BBFDED7 yup, I have a few things I need to add, just been a bit busy (just got a kid)
Oh awesome, that is a good milestone. Those initial weeks can be full-on, after that its all a bit more reasonable.
@U0G75S29H reagent is great. so great that one of the main inspirations for the hooks was reagent itself. However, I understand people who are just now facing the choice of react wrapper and I also understand that they may want to use the leanest one possible. Today reagent doesn't offer as much more than pure react as it did a few years ago.
@U4P4NREBY congratulations. Even though I don't have children myself, I am glad that others do;)
@U0BBFDED7 That meander datascript example is cool; basically a macro-based conversion of datalog to meander at compile time.
@U0BBFDED7 thank you!
is not practical at all and probably has a lot of bugs, but it's just an example that maybe it's stupid but possible ; )
Another anecdotal experience report: I don't have much experience with re-posh, but last week I pitched in to investigate some performance issues with the athensresearch project. The codebase uses re-posh and re-frame and does a lot of recursive pulls, which seems to cause havoc on the posh pull-analyzer. Here's an example: https://github.com/athensresearch/athens/pull/665#issuecomment-790088361
Are you involved with Athens, @U05476190? I’m amazed at how that project came out of a single tweet and just started snowballing.
I came across it by accident when I reading about all these new org-like tools that are popping up and ended up submitting a couple of PRs. They definitely seem to have a lot of momentum right now, their discord channel has a lot of activitiy, (and IIUC, they have some funding sources); but the competition is fierce. Also, they're definitely going to have to fight through some scaling pains - I mentioned the re-posh stuff and also the way it's now handling durable storage.
I found it interesting when comparing Athens to what the LogSeq project (also CLJS project) is doing; where LogSeq is e.g. using git for their sync layer and OCaml for their Markdown parser (and modeling data at the page, not block level).
@U05476190 If you had to pick some stack for modelling data in the frontend, what would you go with?
@U4P4NREBY not sure what you mean; if you're talking about frameworks/libraries, my goto stack is fulcro+shadow-cljs (vs say re-frame+figwheel); but you know... it depends. 🙂 If you're talking about more specifically datastores, Fulcro's generic 3-layer DB approach is fast enough for most cases (since it's just maps and lookup-refs); you can always add reactive mutations if you'd like; and if you put Pathom behind it you're free to swap out and add a more complicated datastore (Datomic / DataScript / SQL / etc). I'm definitely keeping an eye on Asami for it's speed and durability promises and I hope to use it in anger sometime.
I think that was kind of rambling; so I usually need a reason not to hide everything behind a Pathom EQL API (irrespective of what ends up resolving the query).
but with that approach, Fulcro's DB map with lookup-refs works nicely for fetching data locally to components
I am pretty new to Datascript but I noticed the not-so-good performance as well. On less than 2000 entities, on a very recent iPhone (React Native), queries can take as much as 100ms+ (!!!) to fetch around 100 entities.
As far as I can tell, the query performance is proportional to the size of the result.
One solution I’ve devised it to use DataScript very carefully and avoid using it as the primary source of reads. Basically I came up with a solution that puts an additional atom as a sort of a cache in front of Datascript. I use that cache atom to do most reads (instead of doing directly to datascript via (query… ) etc which is quite slow)
But I like the expressive power of the Datalog queries… so it’s definitely a trade-off.
I wish I could use it directly as a primary source of reads but it’s simply too slow for the needs of my mobile application, where sometimes I need to read values from the the app state dozens of times a second.
Even though datalog is infinitely powerful, it's usually not used to its full potential on the frontend, and thanks to clojure's expressiveness you can achieve the same effect with not much more code. For the more ambitious there is still meander
.
@U0BBFDED7 how many datoms approx. do you have in your database when you noticed the slow down? Where you using indices in DataScript?
As much as I was a big proponent of datascript, I currently advise against it for everyone. Datalog on the BE side ❤️. On the frontend side, state is best managed in fulcro or in an identical way to fulcro.
@U05476190 I was just wondering what libs you used for handling state and how you handle those transitions between frontend and backend, basically. Thank you for answering.
@U050KSS8M We have several thousand entities in production.
@U0BBFDED7 Did you have to requirement to run on mobile? That’s where I noticed the bulk of the slow down. I tested on non-mobile and the perf. was quite a bit better .
@U4P4NREBY I was SUPER surpised as well.
There’s an explanation by tonsky here: https://github.com/tonsky/datascript/issues/130
Perhaps the query planner? (I have no knowledge of asami): • Query planner: Queries are analyzed to find an efficient execution plan. This can be turned off.
Yeah… I felt that explanation by tonsky gives a lot of clarity: “DataScript is in different category, so expect different tradeoffs: query speed depends on the size of result set, you need to sort clasuses to have smaller joins, accessing entity properties is not free given its id, etc. As a benefit, you gain ability to query dataset for different projections, forward and reverse reference lookups, joins between different sets, etc. And direct index lookup (`datascript.core/datoms`) is still fast and comparable to lookup in a map (at least comparable, think binary lookup vs hashtable lookup, logarithm vs constant). Queries do much more than that.”
Can’t expect to fetch a giant result set in constant time… It feels more like linear time.
[(datascript-q1) (asdb-q1) (mdb-q1)]
;; => [3.99 0.15 17.51]
[(datascript-q4) (asami-q4) (mdb-q4)]
;; => [169.43 3.58 167.25]
@U0BBFDED7 are those times in ms?
I was talking with noprompt
from cisco while discussing meander
, and they are using asami
in production along with re-frame
It feels like DataScript is a pretty simple implementation and leaves a lot on the table for improvement.
Databases are tricky things (in memory or not), you need to resort to clever tricks to squeeze performance.
@U0BBFDED7 alright 🙂 I haven’t explored the internals, so I can’t speak; only speculate.
simple it is to use filter, and it is not slower specifically, despite the lack of indexing
> Any downsides of asami you’ve noticed? It's not a port of Datascript - it was started independently around the same time - and it doesn't try to be 1:1 feature compatible with Datomic API. So you might be surprised by how certain things are incompatible with your existing queries (e.g. no pull syntax at the moment, db/idents work different than DS/Datomic, etc.)
Hmmm… giving me a lot of food for thought here; I like the organization of data that datalog provides
@U05476190 just joined, thanks! 🙂
There was also a #datalog channel that was created sometime ago, meant for these kind of cross-library discussions, but it has been quiet recently
(defn transduce-q4 []
(e/qb 1e1
(into []
(comp
(filter (fn [[_ m]] (= "Ivan" (m :name))))
(filter (fn [[_ m]] (= :male (m :sex))))
(map (fn [[_ m]] (select-keys m [:db/id :last-name :age]))))
mdb100k)))
[(datascript-q4) (asami-q4) (mdb-q4) (transduce-q4)]
;; => [158.78 4.39 151.03 46.03]
@U0BBFDED7 have you tried q4 with specter?
Yes… Well… One “trick” I’ve resorted to on React Native is runAfterInteractions https://reactnative.dev/docs/interactionmanager (not sure if there’s comparable browser API/trick)
Basically it delays the execution of a given fn after all user interactions have ended
Aka, you can only communicate with your in-memory db asynchronously… but to be fair… with runAfterInteractions… it’s already happening! Lol
@U0BBFDED7 I have a suspicion the destructs [_ m]
are killing your perf in transduce-q4
I am really curious where the major slow down in DataScript is compared to other options.
The similarity to Datomic is still very compelling for me, and the power of Datalog + pull syntax is definitely useful.
@U0BBFDED7 have you explored putting :db/index on certain schema elements in DataScript?
I find Fulcro's approach pragmatic - seldom do you need the full power of Datalog when you're re-rendering a component; I think of it as a UI data cache for my EQL-backed data (which can still be a proxy for a DataScript instance running in the browser; just not something that needs to run every animation frame).
[(datascript-q4) (asami-q4) (mdb-q4) (transduce-q4) (specter-q4)]
;; => [168.09 4.36 153.06 48.14 49.84]
Specter is quite an amazing tool IMO… Esp. when it comes down to data transformation (less so for just data reading)
@U0BBFDED7 if you're yak-shaving you may be interested in updating that transduce with some macros from https://github.com/bsless/clj-fast (and bsless also has this libary I never played with - https://github.com/bsless/impedance#performance-differences)
but I think it's going to be hard to beat asami, since it looks like the query-planner short-circuits a lot of work in your benchmark 😄
I only joined this channel this morning, so I didn’t see any of the questions here until now. If anyone is interested I can explain how Asami works? It’s quite different to the structures in DataScript.
Truth be told, if someone had told me about DataScript 5 years ago then I wouldn’t have started Asami
heya, I'm the sync person from Roam Research
I can't talk about query performance very much, as I mostly operate on transaction semantics and database persistence
I can say that for large databases (50+ mb of datascript transit) transact starts getting slow
proportionally to the database size
transit deserialization is also overall slow, but that's unsurprising
this is all from browser CLJS
@U051N6TTC hey there! kinda curious about the asami query planner, is it still efficient when the data keeps changing?
e.g. query query query vs query transact query transact query
uhm... at Roam we have a mostly generic persistence layer for Datascript
It relies on the “count” of resolution of individual patterns (these get cached too, so it’s not hitting the DB too much for this)
it syncs datascript transactions as a totally ordered list locally first and then remotely
right now we use it to sync first to indexeddb, then to firebase
but it's based on an abstract driver system
so it was easy to make variants for indexeddb+datomic
the in-memory db is still datascript
but the only things that matter as far as syncing is concerned the the transaction fn and error handling
so that can be abstracted to use asami or anything else (e.g. datahike) as long as it's an in-memory database
it's important to do in-memory because there can be a lot of rollbacks as optimistic transactions are turned into confirmed txs
e.g. two clients doing txs at the same time will have different optimistic orders than the final confirmed order, the sync system "rebases" the optimistic txs on top of the confirmed as these come
we were thinking of open sourcing this
can asami run as an in-memory db? if so maybe we could work together to make the sync system generic
then you could use arbitrary persistence layers via these drivers
oh cool then that'd definitely work
our sync thing (we call it Link) could persist it to indexedb and other places
are you interested in some collaboration if we can provide an open source persistence layer for in-memory asami dbs?
coolio, going to see what I can do WRT making our stuff open source
will keep you posted
I’m doing persistence right now though. Everything is based on a block abstraction that can be stored in anything (the first implementation of this is memory mapped files in the JVM, but the second one is going to be indexedb… partly implemented now)
the sync persistence we have is just based on the transaction log (and optionally snapshots)
saving immutable serialized transactions instead of mutable data structures
https://clojurians.slack.com/archives/C07V8N22C/p1615898168016200?thread_ts=1615872613.002600&cid=C07V8N22C
@U4P4NREBY from a high level, one of the main differences I’d seen is that DataScript (and Datomic) store datoms, and then index them. Asami doesn’t do that. Instead, it has indexes for the valid statements, without pointing at instances of statements. It’s all just nested maps. The main consequence of this is that searching for when statements get created or deleted isn’t so straight forward. But so far we haven’t needed that.
If you’re looking for a :where
clause with a single pattern in it, then that might be [entity :my-property '?value]
. In this case, both the entity and the attribute have been set. So you can just go to the EAV index, and say: (get-in eav [entity :my-property])
and you have your values. So simple queries that just do a single pattern are literally just a lookup in a map, followed by a lookup in the nested map.
Joins cost a bit more. For instance, [?person :name "Betty"][?person :age 20]
First of all, the optimizer figures out the pattern with a smaller result, and uses the above to get a result. If this first one is people named “Betty”, then it will go to the AVE index, to get the set of all person entities. It then iterates over that, and uses it to modify the second pattern, which it then looks up. So the first person named “Betty” may be an entity identified by :node-123
, which means that the second pattern gets updated to [:node-123 :age 20]
. This is resolved with (get-in eav [:node-123 :age 20])
, and if it is true, then that value for ?person
gets returned. The same goes for every other person who was resolved as well.
How does this compare to joins in DataScript? I don’t actually know! I never looked 🙂
@UJVKWJTGE That makes sense. And it’s easy to replay. It’s not what Asami is using though. I’m saving immutable data structures.
the difference is this model (for asami) is that the in memory version would be fed the relevant transactions on load, and those transactions would be persisted to disk or network separately of asami