This page is not created by, affiliated with, or supported by Slack Technologies, Inc.
2021-08-07
Channels
- # announcements (10)
- # babashka (11)
- # beginners (69)
- # calva (1)
- # cider (2)
- # clj-kondo (35)
- # cljdoc (48)
- # cljs-dev (3)
- # clojure (60)
- # clojurescript (10)
- # community-development (6)
- # cursive (4)
- # datahike (1)
- # datalog (33)
- # deps-new (2)
- # depstar (8)
- # docker (24)
- # fulcro (1)
- # graphql (4)
- # honeysql (5)
- # java (2)
- # leiningen (2)
- # missionary (3)
- # off-topic (104)
- # pedestal (8)
- # polylith (18)
- # portkey (3)
- # reagent (7)
- # reveal (1)
- # rewrite-clj (4)
- # shadow-cljs (19)
- # specter (3)
- # tools-deps (2)
https://gist.github.com/bsless/632b4040a2b2ad7469369f52cd610c06 Something like this as a starting point for a parser
take something like:
[:find ?id
:where
[?e :foo/id ?id]
[?e ?a "bar"]]
I think one way to handle it is:
1. resolve [?e :foo/id ?id]
, filtering the db to find all matching entries: [[1 :foo/id "123"] [4 :foo/id "456]]
2. expand [?e ?a "bar]
to use all ?e
values we currently have:
[1 ?a "bar"]
[4 ?a "bar"]
3. for each new pattern, filter the db to find all matching entries: [[1 :foo/name "bar"]]
4. since we're joining across the entity ID, this is the final resultsI've been thinking of a way to model/phrase it without tying it to a specific implementation
But the more efficient way is usually to:
1. Resolve [?e :foo/id ?id]
to find the matching entries
2. Iterate over the bindings from 1 (bindings of ?e
and ?id
) and use these to rewrite the second constraint. This means updating ?e
. Do a lookup for each rewrite.
3. Concatenate the results of 2
Because you’re iterating over the LHS but not the RHS, you want to keep the pattern with the smallest resolution to the left
I don’t. I have a vector that describes the variables [?e ?a]
and then each binding is a vector of values that aligns with those column positions
SPARQL’s definition actually describes each line as a map.
So, while I represent the above results as:
^{:cols [?e ?a]} [[1 :foo/name]]
SPARQL will say the same thing as:
[{?e 1, ?a :foo/name}]
perhaps my terminology was confusing, but I think that's very similar to what I wrote above, assuming "lookup" is basically what I meant by "filtering", right?
I'm doing these queries on nested EDN, rather than indexes, so a "lookup" of a tuple ends up doing a scan of a bunch of data but I'm not too worried about that rn
Jena started life doing exactly this (which is why it sucked at the time :rolling_on_the_floor_laughing: )
> Because you’re iterating over the LHS but not the RHS, you want to keep the pattern with the smallest resolution to the left
I think this is the part I've been getting hung up on. this is essentially handled by keeping an updated value of ?e
that you pass to the next pattern, right?
how do you handle something like:
[:find ?movie ?person
:where
[_ :move/name ?movie]
[_ :person/name ?person]]
where you have a "disjoint union"?So,
[_ :move/name ?movie]
resolves to: [["Gone with the Wind"] ["Romeo and Juliet"]]
and
[_ :person/name ?person]
resolves to: [["Robert Downey Jr"] ["Cate Blanchett"]]
So the result is:
columns: [?movie ?person]
[["Gone with the Wind" "Robert Downey Jr"]
["Gone with the Wind" "Cate Blanchett"]
["Romeo and Juliet" "Robert Downey Jr"]
["Romeo and Juliet" "Cate Blanchett"]]
This works, because if you were to bring in a constraint that bound ?movie
and ?person
then it would filter those down to the matches
This is why query planning matters. It: a) reduces the number of bindings to iterate over b) reduces memory consumption for intermediate results
I’ve had complex queries over large datasets go from 90 seconds down to <100ms with an appropriate plan
nice. who knows, maybe if datascript had a planner I wouldn't be trying to replace it rn? 😂
(I also need fast EQL queries which I would have to implement myself on datascript anyhow)
Planning isn’t too hard if you have an index (meaning that you can get fast counts of resolutions), and if your queries are simple (meaning, joins and filters)
For reference, Crux's lazy tuple-at-a-time (multi-way join) execution engine means the performance of the plan comes down to selecting a good variable ordering, which is a complicated thing to do ~well: https://github.com/juxt/crux/blob/3e9f48b5ea9fa6815a5ef216ec65059db9219359/crux-core/src/crux/query.clj#L586-L703