Fork me on GitHub
#datascript
<
2019-05-20
>
anovick12:05:02

I've been trying to figure out a performant way to persist incremental updates initiated by a client running a DataScript instance to a remote server. In my case every user account has a different DataScript db that is completely separate from other users db's. My best idea so far has been to : 1. When the connection is first initiated with a user, spin up per's DataScript instance on the server. 2. At this point, the server may hold any number of DataScript instances at the same time depending on how many users are connected. 3. Maintain a long-lasting connection via WebSockets in order to determine when the client will not be sending updates anymore, i.e. when the DataScript instance can be persisted to disk and then discarded from memory. Any thoughts or improvements on this are welcome 👀

Daniel Hines13:05:58

It’s focused on Datomic for backend persistence, but I think the idea is that you can swap out different implementations (such as Datahike, or a custom, persisted datascript), without too much difficulty.

anovick13:05:02

@d4hines thanks for addressing this question 🙂 I'm not sure I understand DatSync approach. I have watched the Clojure/West 2016 talk by Christopher Small about it but still not sure I got the small details right. 1. In DatSync model clients send transactions to a "master" Datomic db on the server before committing the changes to the local DataScript db.. 2. Server's Datomic executes the transactions coming from clients, and sends back the transactions it executed (by order of transaction, which Datomic can do with the log it maintains) to clients via pub/sub. 3. Clients receives transactions and execute them locally on the client. I assume all queries are run on the clients local db without needing to involve the "main" server db.

anovick13:05:04

that said, queries might have to be delayed before being executed when pending transactions that have been sent to the server haven't been transacted locally yet.

Daniel Hines13:05:45

I would think you could employ optimistic updates at that point. “Eventual consistency” and all that. But I’m not sure. Is optimistic update not a hard enough line for you?

Daniel Hines13:05:23

Also, wouldn’t you encounter this issue in any remote setup?

anovick13:05:14

@d4hines not sure what you're saying. 1. Optimistic updates can be implemented to eliminate need for delaying queries that haven't been transacted yet, but will require maintainig transaction log on the browser which might be too costly for my application. 2. Some remote setups are easier depending on the stack. a synchronization solution for document stores can be easy when choosing CouchDB/PouchDB stack or Meteor stack. but not necessarily be easy without that stack.

anovick13:05:55

Thanks for the help. i'm still not sure what DatSync is capable of.

anovick13:05:36

also just to take note of something I just read on DatSync wiki: > Data scoping mechanisms: Currently, assumption is that we sync the whole db This approach is a tradeoff: - Instead of having to spin up DataScript instances in order to execute transactions coming from the client, just replace the entire db every time, meaning that you remove limitations of memory space on the server, allowing for cheaper scaling as number of users grows. - On the expense of requests from the client to server (and back afterwards) with big payload, i.e. the entire db. this implies more resources are needed on the clients to send and read incoming responses which is expensive especially on battery life (mobile clients, laptops).

Daniel Hines14:05:35

Points taken. Worth noting though is seen as a large limitation and active research topic. A number of folks have high hopes for Differential Dataflow solving this problem.

anovick14:05:20

@d4hines thank you for the great discussion 🙂 interesting reference to Differential Dataflow by the way. not the first time I heara bout it, but at least to know that it is an active research topic gave me some context.

metasoarous18:05:55

Hi @icyiceling184. A few things: • Datsync server sends just the datoms produced by the transaction to the clients, not the original tx itself. This lets you use transaction functions and whatnot. • You could do optimistic updates without a full transaction log; tx data locally, but keep a reference to previous db state, and only drop that reference once the tx comes back from the server. Have to get more clever though if you want transactions submitted in the interim to be handled properly. • I've been thinking about query scopes and subscriptions again lately, and we may be putting out some api for this soon. • Queries don't have to be delayed in the current model if you're using "reactive" queries (posh or differential datalog or whatever), as the queries will just update once the new data comes in.

anovick19:05:15

@metasoarous It's inspiring to see your dedication to this project, still sticking around after having been involved with it for so long at this point. Really appreciate you taking the time to respond at such length. Having people like you around to collaborate with is one of the biggest reasons to participate in the Clojure ecosystem in my point of view. I really want to use DataScript for more than just reading data, which is all an application of mine is doing at this point. Trying to grasp your points to get a complete picture: >- Datsync server sends just the datoms produced by the transaction to the clients, not the original tx itself. This lets you use transaction functions and whatnot. I'm confused about the last part (transaction functions), but as for the first part, do you mean that only the :tx-data part are sent from the server to client, and not the entire #datascript.db.TxReport map? (which includes both :previous-db, :next-db, :tx-data and some other stuff) >- You could do optimistic updates without a full transaction log; tx data locally, but keep a reference to previous db state, and only drop that reference once the tx comes back from the server. Have to get more clever though if you want transactions submitted in the interim to be handled properly. Hmm interesting. so what you describe is a solution for having a temporary mode after sending a transaction before it is approved by the server. In order to keep things simple one could have this mode be blocking on additional transactions until the first one was approved. Not bad but overall not ideal. It makes it possible to query the new state of the db locally after the transaction immediately. >- I've been thinking about query scopes and subscriptions again lately, and we may be putting out some api for this soon. Cool! >- Queries don't have to be delayed in the current model if you're using "reactive" queries (posh or differential datalog or whatever), as the queries will just update once the new data comes in. Yea well not sure about differential datalog yet. I assume you refer to the work done by Nikolas Gobel particularly clj-3df, which I find fascinating. Cool stuff! I hope to be able to look more into it sometime, but I think it's still evolving fast and really cutting edge technology at this point, so I would rather give it time to stabilize. Regarding Posh, I think it's ok for some simple queries use cases. Unfortunately for my case I need to use recursive queries (using rules) which it's not suited for.

👍 4
metasoarous20:05:34

You're very welcome @icyiceling184, and many thanks for the kind words. Always really nice to get acknowledgement for open source work 🙂

metasoarous20:05:01

Regarding your questions, yes, it's only the tx-data that gets sent; Don't need resend the whole db (before & after).

👍 4
metasoarous20:05:36

Regarding the optimistic updates, you block additional transactions as a start, yes. The smarter thing to do is build up a queue of transactions that have been asserted in the tentative db, and if a remote transaction fails, walk back everything that occurred after, and (possibly anyway), rerun later transactions in the queue on the backup db value.

metasoarous20:05:29

Obviously, not the most straightforward thing to do, and it may not always be obvious when you'd want to retry those later txs or just scrap them (if they depend on the original remote tx having gone through).

metasoarous20:05:56

Cleaner solution here is something like a CRDT

metasoarous20:05:34

Though you're going to be constrained around certain things like identity, and not able to express "all of datomic/datascript" that way.

metasoarous20:05:45

3df is super cool, yes; And I did get a chance to meet Nikolas at the Conj, and had a good conversation about future directions.

metasoarous20:05:05

It's definitely still early days and could use some settling and hardening, but I'm optimistic.

metasoarous20:05:57

You're right that Posh doesn't do recursive queries/rules, but honestly, it might not be that hard to add in if you're really motivated.

anovick20:05:02

@metasoarous If you don't mind, I'd like to pick your brains about this particular problem I'm facing, for which I've been deliberating a solution for quite unsuccessfully so far. I'm developing a system where browser clients have their own DataScript instances to operate on, each hosting data entirely separate from each other, but with the same underlying schema. (think a personal Wikipedia per user) The average use case would be a few thousands nodes per graph, so this shouldn't be too hard for browsers in terms of memory requirements. However, I need to persist that data somehow, somewhere. I'm thinking that on a remote server is the best bet but it doesn't have to be Datomic since the client can run its own queries / transactions locally already, just need to replicate them. I gues this isn't the typical DatSync use-case, since I don't have a "central big database" serving all these clients. So the two problem are: 1. Finding a performant synchronization solution from client to server (doesn't have to be multi-clients, just a single client would suffice) 2. Finding a format to persist the data to. Any thoughts on this? would be glad to hear 😅

anovick21:05:45

One idea that I'm thinking about is that the client will only send :tx-data to the server, which the server will add the :tx-data as a message to a queue and add a label to it with the user who the transaction was enacted by. When available, the message at the top of the queue is picked. An already existing DataScript instance will be d/reset-conn!ed with data belonging to that message's labelled user, and the message's :tx-data will be d/transact!ed into it and then serialized to EDN using prn-str.

devurandom12:05:25

@d4hines You were right. I produced a minimal test case that shows that [?e2] does not add ?e2 into the result set.

devurandom13:05:54

Also the [?a ?b] does not set ?a to be the same element(s) as ?b (i.e. rename). I think I saw that somewhere and thought it's handy. Sadly it does not work. 😞

Daniel Hines13:05:56

Thanks for reporting back 😄 In regards to [?a ?b], in what scenarios do you think that kind of one-to-one renaming would be useful? I think there’s a function that can do that (or else, I think you easily pass one in)

devurandom13:05:46

@d4hines In scenarios like the above, where I want ?e to be the union of ?x and ?y. Or in the (subproject-or-self ?sub ?top) scenario, where I want ?sub to also contain ?top.

Daniel Hines13:05:00

Hmm. There are definitely ways of working with sets/vectors in Datascript and Datomic, but that’s beyond my understanding.

devurandom13:05:51

Is not every ?var a set?

devurandom13:05:05

i.e. the set of :db/id that it was bound to?

devurandom13:05:42

Interesting... I am able to make DataScript select a nil entity...

Daniel Hines13:05:17

I’ve definitely seen some stuff that drops down to Clojure code and manipulates vectors directly.

anovick18:05:10

Is there a way to serialize DataScript to EDN using the JavaScript API? if not, shouldn't it be possible to do so by exporting a function that invokes ClojureScript's prn-str for that?

metasoarous18:05:52

@icyiceling184 You may want to look at datascript-transit, which lets you easily serialize the datascript database. You can also just serialize all the datoms, but then you have to rehydrate the database from that yourself using d/init-db, which takes in datoms without running them through all the normal transaction machinery/overhead.

metasoarous18:05:18

I just implemented this for a project, and hope to put it into more automated subscription based bootstraps for datsync in the near future.

anovick19:05:07

@metasoarous Is this reply aimed at JavaScript API? if so, it seems that when using transit-js I would still need to have the readers API available: datascript.db/db-from-reader and datascript.db/datom-from-reader, however they're not currently exported. Regarding serializing the raw datoms and then ingesting them using d/init-db I still need to play with that for a bit as I'm still unfamiliar how that works. Glad to hear you've been exploring this space recently 🙂