sql

2023-10-30T12:38:03.998969Z

I have a query that returns events in a particular order allowing me to (partition-by :match-id) to get the events in nice workable batches. Previously, this query was running against athena and the results were generated by an iteration and I was able to generate a lazy sequence and give that to the partition-by call. Now I’m trying to do the same thing with jdbc.next. plan gives you an IReducible right? Is there a way I can take that reducible, and return a lazy sequence of the rows in batches, partitioned by some column in each row (assuming the rows are ordered by that column in the query).

2023-10-30T13:21:43.081979Z

Hm, I thought this might work…

(transduce
   (partition-by :match_id)
   (fn
     ([] nil)
     ([acc] acc)
     ([acc m]
      (prn :processing-match (:match_id (first m))
           :first-row (first m))))
   (jdbc/plan ds ["select match_id, match_level_id
                     from read_parquet('/tmp/arqam-01.parquet')
                    where match_id in (1119060, 1119640)
                    order by match_id, match_level_id"]))
…but it prints this…
:processing-match 1119640 :first-row {:match_id 1119640, :match_level_id 1}
:processing-match nil :first-row {row} from `plan` -- missing `map` or `reduce`?

2023-10-30T13:50:25.764879Z

Hm, this actually works as expected if I pass a literal vector as the reducible.

(transduce
   (partition-by :match_id)
   (fn
     ([] nil)
     ([acc] acc)
     ([acc m]
      (let [events (map #(select-keys % [:match_id :match_level_id]) m)]
        (prn :processing-match (:match_id (first events))
             :first-row (first events)))))
   [{:match_id 1 :match_level_id 1}
    {:match_id 1 :match_level_id 2}
    {:match_id 2 :match_level_id 1}
    {:match_id 2 :match_level_id 2}])

:processing-match 1 :first-row {:match_id 1, :match_level_id 1}
:processing-match 2 :first-row {:match_id 2, :match_level_id 1}
:processing-match 1 :first-row {:match_id 1, :match_level_id 1}
:processing-match 2 :first-row {:match_id 2, :match_level_id 1}

2023-10-30T15:03:08.753299Z

This if you add a map transducer which does the select keys before the partition it will work

2023-10-30T15:06:09.095299Z

sweet! that worked. cheers

2023-10-30T15:07:15.881599Z

There is a Note at the end of this section about it https://github.com/seancorfield/next-jdbc/blob/develop/doc/getting-started.md#plan--reducing-result-sets