Fork me on GitHub
#sql
<
2020-12-10
>
nikp13:12:47

Ah thanks Sean! I did indeed misunderstand. Is there any way in realizing the result set / running the builder-fn in parallel? It may be there's no benefit to that, but just curious at this point

seancorfield17:12:29

@nikp There would be no benefit since you can't build rows until after you've read them in and you can't start concatenating chunks of the result set until you've read at least two chunks. Because of resource management, the processing of result sets has to be eager (in Clojure) even if the actual reading of rows is "lazy" against the database. If you're using plan for "lazy" reading (i.e., result set streaming) because you can't fit it all in memory, then your reducing function must be eagerly producing smaller data structure (or at least sending those chunks elsewhere for processing). If streaming results doesn't matter (i.e., the full result set will fit in memory), plan is still faster when you can avoid realizing rows as Clojure data structures.

nikp17:12:16

I see, so result set processing has to be eager. That clarifies things, thanks

nikp17:12:14

plan makes a lot of sense for row-by-row processing / large data that can't fit in memory, or for parallel processing

seancorfield17:12:04

plan makes sense in any situation where you don't actually need the rows as realized Clojure hash maps, i.e., where you can process the columns without needing to create an actual "row".

seancorfield17:12:49

execute! and execute-one! are eager as well. There's no way to produce a lazy sequence of data in Clojure from any SQL query in next.jdbc (because of connection management).

seancorfield17:12:33

We use plan a lot at work because we tend to set up a lot of our data processing in pipelines via transduce.

nikp17:12:20

Definitely can see a case for that. We have a flow right now where we load the entire dataset into memory and use in multiple pipelines instead of having multiple queries for the dataset. Was trying to see if there was a way to speed up the query time to load the dataset into memory

seancorfield18:12:10

That would depend on what the bottleneck is. Is it really the actual Java ResultSet processing in JDBC, or is it in the Clojure code that builds data structures from it? If you use execute! you pay the cost of realizing the entire ResultSet into memory as a vector of hash maps in Clojure. If you use plan, you can usually avoid all of that overhead -- and you have the possibility of overlapping at least some of the data processing. But it usually involves turning your data processing "inside out" so it all runs inside a reducing pipeline / transducing pipeline, rather than the (Clojure) lazy way of reading data and then piping it through multiple (lazy) processing steps.

nikp18:12:08

It's the Clojure code that creates the hashmaps from look of things. I'll double check again later with a profiler

seancorfield18:12:13

Yeah, it can be surprising just how much of an overhead that can be for large result sets...