Fork me on GitHub

Hi all! I'd like to ask for some advice. I have a pretty big data project, and I have a transact that runs about 17 seconds long. I've looked into the source to see if there's some alternate path that's faster, and I saw init-db where I pass a trusted list of datoms, but this initializes a db, rather than insert data into an existing db. Would appreciate any advice on this!


@levitanong What kind of transaction is taking that long to process?


I'm not sure what you mean, but this transaction is a large one involving at least 100000 entities (likely more, i haven't fully flattened them out) with a lot of references to each other. Is this the kind of info you're looking for?


Yeah... that's a lot


You might be able to more efficiently compute the resulting set of tuples for the next db value, and use init-db, but unless unless they're very simple assertions, you're unlikely to beat datascript's transaction mechanism


For example, if you create a bunch of entities that refer to each other, you're going to need to assign entity ids to them, and that's the kind of stuff you'd rather leave to the datascript tx functions if you can


But if it's just a bunch of simple assertions with low relational structure, you might be able to come up with something faster


Alas, it's unfortunate that it's unlikely I can make it faster, but it's nice to have confirmation that this is a limit. I guess the only way forward is to either have a loading indicator, or to chunk the transaction into smaller bits and have a requestAnimationFrame transact a bit at a time to unblock the UI.


Yeah, I was going to suggest pipelining the transactions into batches (stu did a talk where he showed off a pipeline for doing this with datomic; might be instructive). Of course, if there are relations between entities you might have to do a bit of preprocessing work to make sure data comes in the right order for relationships to be found.


Oh, is it this one? I've done the batching thing before, but wasn't entirely happy with the result because the goal would be less than 16ms per batch, and the chunk sizes are going to vary depending on how fast the CPU is. Though it's still better than 17 seconds of unresponsiveness, so I guess I'll go with that 😅


Does anyone know a good way to unpack a recursive datastructure in datascript? I've had some success with q '[:find [ (pull ?e [*]) ...] :in $ ?root-id :where (or [?e :struct/parent ?root-id ] (and [(identity ?root-id ?e)] [?e]] root-id), but it seems unreliable.