Fork me on GitHub
Jacob O'Bryant08:06:57

I discovered two unexpected things today: • open-q returns duplicates • q applies :limit before removing duplicates (makes sense given the first point) It's not a huge deal now that I know about it, at least since I'm not actually using :limit in my application (I've only been using it during repl exploration). But would it be difficult/harmful to performance if duplicates were automatically removed from open-q's results + if :limit was applied after removing duplicates? That would be much less surprising IMO. Otherwise, I'd suggest mentioning those things in the docs.


Morning 🙂 I think this is for historical reasons - I'll check the context and get back to you

morning 3

Doesn't it make sense given that it's not returning a set?

Jacob O'Bryant09:06:22

Maybe, but it wasn't obvious to me that's what would happen, despite knowing it didn't return a set. Before I was aware of the behaviour, I assumed that iterating over the return values of q and open-q would yield the same results (with different performance characteristics obviously), which I think is a reasonable assumption. It does make since that open-q would include duplicates since I'm not aware of how you'd remove duplicates without retaining the result set in memory, defeating the purpose of streaming the results (right?). A note in the docs would still be helpful though.


I'm not sure if dedupe uses hashes or values to dedupe with. But it could potentially be done.

Jacob O'Bryant10:06:29

distinct would be the relevant function. It uses values.


dedupe, too, if your underlying sequence can guarantee that all equal values are next to each other in the sequence


Oh, yeah, distinct :)