Fork me on GitHub

the thing I'm stuck on right now is the specifics of how to chain patterns


take something like:

[:find ?id
 [?e :foo/id ?id]
 [?e ?a "bar"]]
I think one way to handle it is: 1. resolve [?e :foo/id ?id], filtering the db to find all matching entries: [[1 :foo/id "123"] [4 :foo/id "456]] 2. expand [?e ?a "bar] to use all ?e values we currently have:
[1 ?a "bar"]
[4 ?a "bar"]
3. for each new pattern, filter the db to find all matching entries: [[1 :foo/name "bar"]] 4. since we're joining across the entity ID, this is the final results


Filtering is A way to do it

Ben Sless13:08:18

I've been thinking of a way to model/phrase it without tying it to a specific implementation


But the more efficient way is usually to: 1. Resolve [?e :foo/id ?id] to find the matching entries 2. Iterate over the bindings from 1 (bindings of ?e and ?id) and use these to rewrite the second constraint. This means updating ?e. Do a lookup for each rewrite. 3. Concatenate the results of 2

Ben Sless13:08:27

btw, how do you handle binding? do you just maintain a replacements map?


Because you’re iterating over the LHS but not the RHS, you want to keep the pattern with the smallest resolution to the left


I don’t. I have a vector that describes the variables [?e ?a] and then each binding is a vector of values that aligns with those column positions


SPARQL’s definition actually describes each line as a map. So, while I represent the above results as: ^{:cols [?e ?a]} [[1 :foo/name]] SPARQL will say the same thing as: [{?e 1, ?a :foo/name}]


perhaps my terminology was confusing, but I think that's very similar to what I wrote above, assuming "lookup" is basically what I meant by "filtering", right?


I'm doing these queries on nested EDN, rather than indexes, so a "lookup" of a tuple ends up doing a scan of a bunch of data but I'm not too worried about that rn


Yes, a resolution can be done via filtering


The API will look identical, but it’s just slower


Jena started life doing exactly this (which is why it sucked at the time :rolling_on_the_floor_laughing: )


> Because you’re iterating over the LHS but not the RHS, you want to keep the pattern with the smallest resolution to the left I think this is the part I've been getting hung up on. this is essentially handled by keeping an updated value of ?e that you pass to the next pattern, right?


how do you handle something like:

[:find ?movie ?person
 [_ :move/name ?movie]
 [_ :person/name ?person]]
where you have a "disjoint union"?


That’s an outer product

😅 3

So, [_ :move/name ?movie] resolves to: [["Gone with the Wind"] ["Romeo and Juliet"]] and [_ :person/name ?person] resolves to: [["Robert Downey Jr"] ["Cate Blanchett"]] So the result is: columns: [?movie ?person] [["Gone with the Wind" "Robert Downey Jr"] ["Gone with the Wind" "Cate Blanchett"] ["Romeo and Juliet" "Robert Downey Jr"] ["Romeo and Juliet" "Cate Blanchett"]]


This works, because if you were to bring in a constraint that bound ?movie and ?person then it would filter those down to the matches


This is why query planning matters. It: a) reduces the number of bindings to iterate over b) reduces memory consumption for intermediate results


I’ve had complex queries over large datasets go from 90 seconds down to <100ms with an appropriate plan


nice. who knows, maybe if datascript had a planner I wouldn't be trying to replace it rn? 😂

Ben Sless14:08:36

does datahike have a planner? :thinking_face:


(I also need fast EQL queries which I would have to implement myself on datascript anyhow)


sorry… EQL?


it's very similar to datomic pull


ah, I see


Planning isn’t too hard if you have an index (meaning that you can get fast counts of resolutions), and if your queries are simple (meaning, joins and filters)


It gets harder when you support bindings, or/and/not operators, and so on


For reference, Crux's lazy tuple-at-a-time (multi-way join) execution engine means the performance of the plan comes down to selecting a good variable ordering, which is a complicated thing to do ~well:

ℹ️ 3
👍 3

Although, to be fair, when doing this in Clojure (b) isn’t as big an issue because your results are usually lazy