Hi Guys, I'm using xtdb v1. Is following intentional? My understanding was that if I use single variable for :find it will create (effectively) set of elements
If I use :order-by I get duplicates in result
(biff/q db '{:find ?p
:order-by [[?p :desc]]
:in [?user]
:where [[?tx :tx/payee ?p]
[?tx :tx/created-by ?user]]}
"nikhil")
;; => ("Zomato"
;; "Zomato"
;; "Lucky Chan"
;; "Fruit Vendor"
;; "Devanshee"
;; "Cinepolis"
;; "Cinepolis"
;; "Anand")
but I don't then, expectedly, I get de-duplicates list
(biff/q db '{:find ?p
:in [?user]
:where [[?tx :tx/payee ?p]
[?tx :tx/created-by ?user]]}
"nikhil")
;; => ("Fruit Vendor" "Zomato" "Devanshee" "Lucky Chan" "Cinepolis" "Anand")yeah I think I remember having a discussion about :order-by turning off de-duplication and that it's an intended/known behavior. Don't remember the motivation though. Also note that biff/q is adding a (map first ...); if you call xt/q directly (and wrap the :find value in a vector) then the first example will give you an actual set.
Hey @nikwarke it's definitely the intended behaviour - per this PR: https://github.com/xtdb/xtdb/pull/975/files#diff-6c6576892a723f0244c94ef6f4ce3132aa1d8e6280f07a59e2d90c7580eef6a4L2998-R2999 I believe the main reasoning is that deduping over large result sets isn't totally "free". There may be more to it though. I had a quick dig through the internal Slack archives but it seems the decision was discussed live on a video call and all I have is this summary: > conclusion about bag semantics: > - q without order-by, limit, offset returns a set (no dupes) > - q with order-by, limit, or offset returns an ordered vector (dupes) > - open-q returns a lazy seq (dupes) James might remember more 🙂 (seeing as he also implemented the prior PR which temporarily https://github.com/xtdb/xtdb/pull/662 here)
> I remember having a discussion about :order-by turning off de-duplication yes, your discussion (on Zulip) is what kicked off the change in the PR above 😅
I recall the decision not to introduce a breaking change here, but not much more than that I'm afraid. in those days we also tried to keep compatibility with Datomic too, so that likely influenced the call - whereas with open-q (being lazy) we couldn't dedupe the results without maintaining the whole set in memory
Thanks everyone! I hope this helps someone who is wondering about the same question in future also 😄