Fork me on GitHub
#datascript
<
2021-03-16
>
superstructor05:03:13

I've heard that there can be performance issues with Datascript in single page apps with "large" databases ? Is that still true or an old myth ? What are your experiences with performance ? Are there any types of queries / use cases to avoid ? Looking from the perspective of re-frame, possibly official support etc.

ribelo06:03:56

We use datascript in production, or rather what's left of it. We had to cut out most of the functionality due to tragic performance with more products in the db. Basically, all that's left is the pull syntax. Datascript, contrary to what we can read on github, has very poor performance. For example datalevin despite the fact that it uses data stored on disk is much faster. Almost everything is faster than datascript, even queries written in meander, operating on flat db in fulcro style. Simplest macro that creates transreducer on the fly beats datascript. As about re-posh, sometimes it loses changes, especially if you evict data from db. It also doesn't allow to use all possibilities of datascript, and with bigger number of arguments in :in it loses order, so you have to wrap arguments in vector.

3
ribelo06:03:21

much better db, but still having many rough edges, is Asami

superstructor07:03:30

Thanks for the reply @U0BBFDED7 🙂 Roughly how big is your datascript db ?

ribelo07:03:48

several thousand entities

simongray07:03:28

@U0BBFDED7 nice - if damning - review. It’s good to hear from people who have used it in anger (literally, it would seem).

ribelo07:03:03

there was once a pretty interesting conversation on this topic here, but the archive doesn't work

ribelo07:03:53

roman01la also mentioned problems with re-posh and datascript in general

ribelo07:03:33

joinr commented on this same topic on Reddit

simongray08:03:18

Hah, he even made that comment as a reply to me 😛

❤️ 3
ribelo08:03:26

I'm a compulsive github browser, and overall what I've noticed recently is that everyone seems to have run into the same problems and everyone is trying to solve them somehow.

ribelo08:03:34

one is state management, preferably using as flat a normalized db as possible or using a datalog. the other is to get rid of unnecessary multiple recalculations using graphs. third is closer integration with react. rumext, helix or uix are only first examples.

simongray08:03:43

yup, noticed the same thing

simongray08:03:24

been trying to get an overview by compiling a list of the graph stuff: https://github.com/simongray/clojure-graph-resources

superstructor08:03:23

Thanks! That is all very helpful. I'll look into all that. We want to improve the state management and subs (graph) storey in re-frame, but I doubt we'll ever diverge from reagent as the backwards compat is probably too much of an issue.

ribelo08:03:29

@U4P4NREBY you can also add the mentioned autonormal

ribelo08:03:13

there is also no problem to use meander to search graphs, it is even in examples

simongray08:03:18

@U0BBFDED7 yup, I have a few things I need to add, just been a bit busy (just got a kid)

👍 3
🎉 3
superstructor08:03:43

@U4P4NREBY congrats! How old ? I have a 3 month old boy.

👍 3
simongray08:03:55

6 weeks on thursday 🙂

bananadance 3
simongray08:03:12

and thank you - you too!

superstructor08:03:37

Oh awesome, that is a good milestone. Those initial weeks can be full-on, after that its all a bit more reasonable.

ribelo08:03:02

@U0G75S29H reagent is great. so great that one of the main inspirations for the hooks was reagent itself. However, I understand people who are just now facing the choice of react wrapper and I also understand that they may want to use the leanest one possible. Today reagent doesn't offer as much more than pure react as it did a few years ago.

ribelo08:03:47

@U4P4NREBY congratulations. Even though I don't have children myself, I am glad that others do;)

superstructor08:03:31

@U0BBFDED7 That meander datascript example is cool; basically a macro-based conversion of datalog to meander at compile time.

simongray08:03:33

very cool, though not sure how practical it is

ribelo08:03:48

I wrote for fun and to get to know meander better

ribelo08:03:46

is not practical at all and probably has a lot of bugs, but it's just an example that maybe it's stupid but possible ; )

pithyless09:03:15

Another anecdotal experience report: I don't have much experience with re-posh, but last week I pitched in to investigate some performance issues with the athensresearch project. The codebase uses re-posh and re-frame and does a lot of recursive pulls, which seems to cause havoc on the posh pull-analyzer. Here's an example: https://github.com/athensresearch/athens/pull/665#issuecomment-790088361

👍 6
simongray09:03:59

Are you involved with Athens, @U05476190? I’m amazed at how that project came out of a single tweet and just started snowballing.

pithyless09:03:04

I came across it by accident when I reading about all these new org-like tools that are popping up and ended up submitting a couple of PRs. They definitely seem to have a lot of momentum right now, their discord channel has a lot of activitiy, (and IIUC, they have some funding sources); but the competition is fierce. Also, they're definitely going to have to fight through some scaling pains - I mentioned the re-posh stuff and also the way it's now handling durable storage.

👍 3
pithyless09:03:13

I found it interesting when comparing Athens to what the LogSeq project (also CLJS project) is doing; where LogSeq is e.g. using git for their sync layer and OCaml for their Markdown parser (and modeling data at the page, not block level).

simongray11:03:00

@U05476190 If you had to pick some stack for modelling data in the frontend, what would you go with?

pithyless12:03:13

@U4P4NREBY not sure what you mean; if you're talking about frameworks/libraries, my goto stack is fulcro+shadow-cljs (vs say re-frame+figwheel); but you know... it depends. 🙂 If you're talking about more specifically datastores, Fulcro's generic 3-layer DB approach is fast enough for most cases (since it's just maps and lookup-refs); you can always add reactive mutations if you'd like; and if you put Pathom behind it you're free to swap out and add a more complicated datastore (Datomic / DataScript / SQL / etc). I'm definitely keeping an eye on Asami for it's speed and durability promises and I hope to use it in anger sometime.

👍 3
pithyless12:03:59

I think that was kind of rambling; so I usually need a reason not to hide everything behind a Pathom EQL API (irrespective of what ends up resolving the query).

pithyless12:03:49

but with that approach, Fulcro's DB map with lookup-refs works nicely for fetching data locally to components

raspasov12:03:26

I am pretty new to Datascript but I noticed the not-so-good performance as well. On less than 2000 entities, on a very recent iPhone (React Native), queries can take as much as 100ms+ (!!!) to fetch around 100 entities.

raspasov12:03:56

As far as I can tell, the query performance is proportional to the size of the result.

raspasov12:03:21

It seems to be about ~1ms per entity (that’s a very rough ballpark estimate)

raspasov12:03:40

One solution I’ve devised it to use DataScript very carefully and avoid using it as the primary source of reads. Basically I came up with a solution that puts an additional atom as a sort of a cache in front of Datascript. I use that cache atom to do most reads (instead of doing directly to datascript via (query… ) etc which is quite slow)

raspasov12:03:00

But I like the expressive power of the Datalog queries… so it’s definitely a trade-off.

raspasov12:03:57

I wish I could use it directly as a primary source of reads but it’s simply too slow for the needs of my mobile application, where sometimes I need to read values from the the app state dozens of times a second.

ribelo12:03:54

Even though datalog is infinitely powerful, it's usually not used to its full potential on the frontend, and thanks to clojure's expressiveness you can achieve the same effect with not much more code. For the more ambitious there is still meander.

raspasov12:03:57

@U0BBFDED7 how many datoms approx. do you have in your database when you noticed the slow down? Where you using indices in DataScript?

ribelo12:03:28

As much as I was a big proponent of datascript, I currently advise against it for everyone. Datalog on the BE side ❤️. On the frontend side, state is best managed in fulcro or in an identical way to fulcro.

simongray12:03:16

@U05476190 I was just wondering what libs you used for handling state and how you handle those transitions between frontend and backend, basically. Thank you for answering.

ribelo12:03:37

@U050KSS8M We have several thousand entities in production.

simongray12:03:04

It doesn't make much sense to me that an in-memory db can be that slow.

raspasov12:03:13

@U0BBFDED7 Did you have to requirement to run on mobile? That’s where I noticed the bulk of the slow down. I tested on non-mobile and the perf. was quite a bit better .

raspasov12:03:21

@U4P4NREBY I was SUPER surpised as well.

ribelo12:03:58

if someone really wants a datalog, I recommend asami

ribelo12:03:09

is 100-200x faster

simongray12:03:08

So what does Asami do differently? AFAIK it started as a fork of Datascript.

simongray12:03:26

Just like Datahike and Datalevin

raspasov12:03:01

Perhaps the query planner? (I have no knowledge of asami): • Query planner: Queries are analyzed to find an efficient execution plan. This can be turned off.

ribelo12:03:21

Asami has a planner that is additionally cached.

raspasov12:03:32

Yeah… I felt that explanation by tonsky gives a lot of clarity: “DataScript is in different category, so expect different tradeoffs: query speed depends on the size of result set, you need to sort clasuses to have smaller joins, accessing entity properties is not free given its id, etc. As a benefit, you gain ability to query dataset for different projections, forward and reverse reference lookups, joins between different sets, etc. And direct index lookup (`datascript.core/datoms`) is still fast and comparable to lookup in a map (at least comparable, think binary lookup vs hashtable lookup, logarithm vs constant). Queries do much more than that.”

raspasov12:03:50

This was key for me: “query speed depends on the size of result set”

raspasov12:03:05

Can’t expect to fetch a giant result set in constant time… It feels more like linear time.

ribelo12:03:25

[(datascript-q1) (asdb-q1) (mdb-q1)]
;; => [3.99 0.15 17.51]

[(datascript-q4) (asami-q4) (mdb-q4)]
;; => [169.43 3.58 167.25]

raspasov12:03:00

@U0BBFDED7 are those times in ms?

raspasov12:03:34

asdb: asami?

ribelo12:03:36

mdb is a simple replica of the datalog in meander, which I posted here

raspasov12:03:20

Any downsides of asami you’ve noticed?

ribelo12:03:47

apart from testing, I have not had the opportunity to use

raspasov12:03:35

It seems around 50x faster

raspasov12:03:44

Based on those two queries

ribelo12:03:12

I was talking with noprompt from cisco while discussing meander, and they are using asami in production along with re-frame

ribelo12:03:24

so it's battle tested

raspasov12:03:32

It feels like DataScript is a pretty simple implementation and leaves a lot on the table for improvement.

ribelo12:03:31

actually, considering the speed it offers, I'd say it's rather complicated

raspasov12:03:34

Databases are tricky things (in memory or not), you need to resort to clever tricks to squeeze performance.

raspasov12:03:04

@U0BBFDED7 alright 🙂 I haven’t explored the internals, so I can’t speak; only speculate.

ribelo12:03:05

simple it is to use filter, and it is not slower specifically, despite the lack of indexing

raspasov12:03:57

Rrrright 🙂

pithyless12:03:15

> Any downsides of asami you’ve noticed? It's not a port of Datascript - it was started independently around the same time - and it doesn't try to be 1:1 feature compatible with Datomic API. So you might be surprised by how certain things are incompatible with your existing queries (e.g. no pull syntax at the moment, db/idents work different than DS/Datomic, etc.)

raspasov12:03:31

Hmmm… giving me a lot of food for thought here; I like the organization of data that datalog provides

pithyless12:03:42

And FYI - there is an active #asami channel on Slack ;]

raspasov12:03:58

@U05476190 just joined, thanks! 🙂

pithyless12:03:39

There was also a #datalog channel that was created sometime ago, meant for these kind of cross-library discussions, but it has been quiet recently

ribelo12:03:34

(defn transduce-q4 []
  (e/qb 1e1
    (into []
          (comp
           (filter (fn [[_ m]] (= "Ivan" (m :name))))
           (filter (fn [[_ m]] (= :male (m :sex))))
           (map (fn [[_ m]] (select-keys m [:db/id :last-name :age]))))
          mdb100k)))
[(datascript-q4) (asami-q4) (mdb-q4) (transduce-q4)]
;; => [158.78 4.39 151.03 46.03]

raspasov12:03:42

transduce-q4 is just regular Clojure transduce code, yes?

ribelo12:03:27

you have the code above

pithyless12:03:54

@U0BBFDED7 have you tried q4 with specter?

raspasov12:03:06

Yes… Well… One “trick” I’ve resorted to on React Native is runAfterInteractions https://reactnative.dev/docs/interactionmanager (not sure if there’s comparable browser API/trick)

ribelo12:03:21

I even have the code, just let me find

raspasov12:03:23

Basically it delays the execution of a given fn after all user interactions have ended

raspasov12:03:45

Another option I’ve explored is run DataScript in its own worker…

raspasov12:03:58

(That would definitely help, but it comes with its own set of challenges)

raspasov12:03:33

Aka, you can only communicate with your in-memory db asynchronously… but to be fair… with runAfterInteractions… it’s already happening! Lol

pithyless12:03:42

@U0BBFDED7 I have a suspicion the destructs [_ m] are killing your perf in transduce-q4

ribelo12:03:35

Yes, but it's just a quick write-up

ribelo12:03:14

db is of the form

{{?id {:db/id ?id ?k ?v ...} ...}

ribelo12:03:07

this is slower to query, but pull syntax/eql is lightning fast

raspasov13:03:52

I am really curious where the major slow down in DataScript is compared to other options.

raspasov13:03:07

The similarity to Datomic is still very compelling for me, and the power of Datalog + pull syntax is definitely useful.

raspasov13:03:39

@U0BBFDED7 have you explored putting :db/index on certain schema elements in DataScript?

pithyless13:03:07

I find Fulcro's approach pragmatic - seldom do you need the full power of Datalog when you're re-rendering a component; I think of it as a UI data cache for my EQL-backed data (which can still be a proxy for a DataScript instance running in the browser; just not something that needs to run every animation frame).

ribelo13:03:54

[(datascript-q4) (asami-q4) (mdb-q4) (transduce-q4) (specter-q4)]
;; => [168.09 4.36 153.06 48.14 49.84]

ribelo13:03:35

I had to rewrite because I lost the q4 with the specter

ribelo13:03:16

please note that I am far from proficient with specter

raspasov13:03:57

Specter is quite an amazing tool IMO… Esp. when it comes down to data transformation (less so for just data reading)

pithyless13:03:01

@U0BBFDED7 if you're yak-shaving you may be interested in updating that transduce with some macros from https://github.com/bsless/clj-fast (and bsless also has this libary I never played with - https://github.com/bsless/impedance#performance-differences)

pithyless13:03:31

but I think it's going to be hard to beat asami, since it looks like the query-planner short-circuits a lot of work in your benchmark 😄

quoll15:03:41

I only joined this channel this morning, so I didn’t see any of the questions here until now. If anyone is interested I can explain how Asami works? It’s quite different to the structures in DataScript.

quoll15:03:12

Truth be told, if someone had told me about DataScript 5 years ago then I wouldn’t have started Asami

quoll15:03:36

(Asami was originally part of Naga, and that project started in 2016)

Filipe Silva16:03:17

heya, I'm the sync person from Roam Research

Filipe Silva16:03:56

I can't talk about query performance very much, as I mostly operate on transaction semantics and database persistence

Filipe Silva16:03:44

I can say that for large databases (50+ mb of datascript transit) transact starts getting slow

Filipe Silva16:03:51

proportionally to the database size

Filipe Silva16:03:28

transit deserialization is also overall slow, but that's unsurprising

Filipe Silva16:03:36

this is all from browser CLJS

Filipe Silva16:03:26

@U051N6TTC hey there! kinda curious about the asami query planner, is it still efficient when the data keeps changing?

quoll16:03:40

We’re still working on durable storage for CLJS, so that’ll be a while, sorry

Filipe Silva16:03:53

e.g. query query query vs query transact query transact query

quoll16:03:06

Yes, the whole point of the planner is to base the plan on the data

Filipe Silva16:03:45

uhm... at Roam we have a mostly generic persistence layer for Datascript

quoll16:03:46

It relies on the “count” of resolution of individual patterns (these get cached too, so it’s not hitting the DB too much for this)

quoll16:03:12

Can you explain what you mean by that please?

Filipe Silva16:03:29

it syncs datascript transactions as a totally ordered list locally first and then remotely

Filipe Silva16:03:40

right now we use it to sync first to indexeddb, then to firebase

Filipe Silva16:03:52

but it's based on an abstract driver system

Filipe Silva16:03:07

so it was easy to make variants for indexeddb+datomic

Filipe Silva16:03:18

the in-memory db is still datascript

Filipe Silva16:03:41

but the only things that matter as far as syncing is concerned the the transaction fn and error handling

Filipe Silva16:03:09

so that can be abstracted to use asami or anything else (e.g. datahike) as long as it's an in-memory database

Filipe Silva16:03:53

it's important to do in-memory because there can be a lot of rollbacks as optimistic transactions are turned into confirmed txs

Filipe Silva16:03:38

e.g. two clients doing txs at the same time will have different optimistic orders than the final confirmed order, the sync system "rebases" the optimistic txs on top of the confirmed as these come

Filipe Silva16:03:53

we were thinking of open sourcing this

Filipe Silva16:03:00

can asami run as an in-memory db? if so maybe we could work together to make the sync system generic

Filipe Silva16:03:39

then you could use arbitrary persistence layers via these drivers

quoll16:03:02

Asami on CLJS is currently only in-memory

Filipe Silva16:03:13

oh cool then that'd definitely work

Filipe Silva16:03:37

our sync thing (we call it Link) could persist it to indexedb and other places

quoll16:03:41

I don’t have a pull API for it yet. It hasn’t been a priority

Filipe Silva16:03:10

are you interested in some collaboration if we can provide an open source persistence layer for in-memory asami dbs?

quoll16:03:43

Sure. I keep it open source for a reason 🙂

Filipe Silva16:03:01

coolio, going to see what I can do WRT making our stuff open source

Filipe Silva16:03:06

will keep you posted

quoll16:03:11

I’m doing persistence right now though. Everything is based on a block abstraction that can be stored in anything (the first implementation of this is memory mapped files in the JVM, but the second one is going to be indexedb… partly implemented now)

quoll16:03:27

Probably better to ask in #asami 🙂

Filipe Silva16:03:41

the sync persistence we have is just based on the transaction log (and optionally snapshots)

Filipe Silva16:03:20

saving immutable serialized transactions instead of mutable data structures

quoll16:03:47

https://clojurians.slack.com/archives/C07V8N22C/p1615898168016200?thread_ts=1615872613.002600&amp;cid=C07V8N22C @U4P4NREBY from a high level, one of the main differences I’d seen is that DataScript (and Datomic) store datoms, and then index them. Asami doesn’t do that. Instead, it has indexes for the valid statements, without pointing at instances of statements. It’s all just nested maps. The main consequence of this is that searching for when statements get created or deleted isn’t so straight forward. But so far we haven’t needed that. If you’re looking for a :where clause with a single pattern in it, then that might be [entity :my-property '?value]. In this case, both the entity and the attribute have been set. So you can just go to the EAV index, and say: (get-in eav [entity :my-property]) and you have your values. So simple queries that just do a single pattern are literally just a lookup in a map, followed by a lookup in the nested map. Joins cost a bit more. For instance, [?person :name "Betty"][?person :age 20] First of all, the optimizer figures out the pattern with a smaller result, and uses the above to get a result. If this first one is people named “Betty”, then it will go to the AVE index, to get the set of all person entities. It then iterates over that, and uses it to modify the second pattern, which it then looks up. So the first person named “Betty” may be an entity identified by :node-123, which means that the second pattern gets updated to [:node-123 :age 20]. This is resolved with (get-in eav [:node-123 :age 20]), and if it is true, then that value for ?person gets returned. The same goes for every other person who was resolved as well. How does this compare to joins in DataScript? I don’t actually know! I never looked 🙂

👍 3
quoll16:03:47

@UJVKWJTGE That makes sense. And it’s easy to replay. It’s not what Asami is using though. I’m saving immutable data structures.

Filipe Silva18:03:51

the difference is this model (for asami) is that the in memory version would be fed the relevant transactions on load, and those transactions would be persisted to disk or network separately of asami

quoll19:03:12

I see. Well, Asami doesn’t store the relevant transactions. That said, they do get returned from a call to transact (like datomic does), meaning that they’re easy to accumulate