Fork me on GitHub
Shuky Badeer06:04:51

Hi guys! A performance related question here.. This query takes 7-9 seconds to finish executing. It's doing a simple SQL-like join on a dataset that is 11,000 lines big. We're pulling the relevant attributes of each entity (plus the rest of attributes that were cut from the image). Is 7-9 seconds normal for a query like that?


Hard to give an answer without knowing anything about the infrastructure. What is datomic running on? 11000 lines should almost be able to fit in memory.


did you add an index to both the id and the belongs_to attributes? I guess another thing to do is to build it back up from the simplest possible query to see which part is taking all the time.


Also, do you actually need the first join? You don't seem to be using ?data_id anywhere. If every ?sfd has a :strive_form_data/id attribute, you can simplify to just matching for:

[?sfdaa :strive_form_data_additional_answers/belongs_to ?sfd]


^ And if the above is true, you could simplify it to just fetch directly via the index:

(->> (d/datoms :avet :strive_form_data_additional_answers/belongs_to)
  (map pull-many ,,,)) 

👌 1

Also, you can consider if you need two pulls, since one is just a nested data of the other.


Remember that Datomic does not have a query optimizer, so in general when queries run slow make sure the correct indices are in place AND the order of where clauses reduces the number of matches in the working set. (e.g. are there fewer :strive_form_data/id or :strive_form_data_additional_answers/belongs_to datoms? perhaps re-ordering the clauses would help? etc.)


Speaking of which, I have my suspicions of that double-pull: won't that create a separate entry for each [?sfd ?sfdaa]? So if sfd1 has 3 sfdaa's, you will see: [(pull sfd1) (pull sfdaa1)] [(pull sfd1) (pull sfdaa2)] [(pull sfd1) (pull sfdaa3)] ?