Fork me on GitHub
#xtdb
<
2020-06-23
>
Jacob O'Bryant08:06:57

I discovered two unexpected things today: • open-q returns duplicates • q applies :limit before removing duplicates (makes sense given the first point) It's not a huge deal now that I know about it, at least since I'm not actually using :limit in my application (I've only been using it during repl exploration). But would it be difficult/harmful to performance if duplicates were automatically removed from open-q's results + if :limit was applied after removing duplicates? That would be much less surprising IMO. Otherwise, I'd suggest mentioning those things in the docs.

jarohen09:06:16

Morning 🙂 I think this is for historical reasons - I'll check the context and get back to you

morning 3
dominicm09:06:26

Doesn't it make sense given that it's not returning a set?

Jacob O'Bryant09:06:22

Maybe, but it wasn't obvious to me that's what would happen, despite knowing it didn't return a set. Before I was aware of the behaviour, I assumed that iterating over the return values of q and open-q would yield the same results (with different performance characteristics obviously), which I think is a reasonable assumption. It does make since that open-q would include duplicates since I'm not aware of how you'd remove duplicates without retaining the result set in memory, defeating the purpose of streaming the results (right?). A note in the docs would still be helpful though.

dominicm10:06:12

I'm not sure if dedupe uses hashes or values to dedupe with. But it could potentially be done.

Jacob O'Bryant10:06:29

distinct would be the relevant function. It uses values.

jarohen11:06:07

dedupe, too, if your underlying sequence can guarantee that all equal values are next to each other in the sequence

dominicm11:06:21

Oh, yeah, distinct :)