Fork me on GitHub
#sql
<
2020-05-22
>
geraldodev19:05:17

@seancorfield jdbc/plan ... {:builder-fn rs/as-arrays} does not return the name of the columns as the 1st item like (execute! ..as-arrarys). Is that intended ?

geraldodev19:05:32

I've noticed this because I was generating excel files with execute! when I've made the code more resilient to low memory with plan I've noticed the lack of names.

seancorfield19:05:44

@geraldodev jdbc/plan only uses the RowBuilder half of a result set builder -- it doesn't know anything about the result set as a whole.

seancorfield19:05:40

The docs should probably point that out (but I hadn't even thought about someone using plan with array builders to be honest). It's "intentional" that you don't get the column names but I can see how it would be convenient to be able to get that information... Hmm, once as-arrays has actually been called you can get the column names from the object it returns... give me a second...

geraldodev20:05:52

I'm a recent convert to as-arrays to build spreadshets, before I used juxt. It's unnecessary if the qry reflects what you want to output if you use as-arrays. The argument to inclusion I'd make is that this information is already included when the result is a map and that would match the behaviour of the as-arrays flag on other places .

seancorfield21:05:08

OK, I think I can see how to expose the column names in the reduce over an array builder with plan... I'll have to write some tests and verify how it all works now. I'll most likely add a new protocol to make this cleaner. TL;DR: a result set builder (record) has fields for the result set, the result set metadata, and the column names but there's currently no easy way to access that part of the builder via the "mapified" result set.

parrot 4
seancorfield21:05:52

@geraldodev Please add comments to https://github.com/seancorfield/next-jdbc/issues/110 describing what you're doing with arrays and spreadsheets and plan so I have more context on how and where you would use the vector of column names (labels) if you had access to them.

seancorfield22:05:39

Thanks for the additional context @geraldodev -- that's an interesting use case. One of the problems here is that you're actually "losing" the whole builder context by realizing the full row (that first (map #(into [] %)) so by the second step of your pipeline, you're dealing with pure data and you no longer have access to the result set or metadata so whatever solution I come up with here is going to need a different solution shape.

seancorfield22:05:15

One thing I can do is to have the mapified-result-set implement clojure.lang.Indexed because columns can be accessed via index (essentially nth) but I'm not sure how much it helps... What is your idx-v-fn ?

geraldodev22:05:45

The #(into [] %) was necessary to get rid of not available statement. I could go to plan with map, but even with that, I'd have to compute the order of the fields at the start of the pipeline, and store in a atom.

seancorfield22:05:58

Right, you're missing the point.

geraldodev22:05:34

(defn idx-v-fn [idx v] [idx v])

seancorfield22:05:04

The row passed into the reduction (`transduce` in your case) is an abstraction. By calling seq on it -- implied by calling into -- you've blown away that abstraction and that's put you on the slow path.

seancorfield22:05:36

Ah, so idx-v-fn is just vector

seancorfield22:05:52

(map-indexed vector) would do the same thing.

👍 4
geraldodev22:05:09

Yes, I need indexes to craft excel spreadsheet

seancorfield22:05:00

Yeah, you're really working against the whole idea behind plan at this point.

geraldodev22:05:47

I need plan because I dont know the size of the table in advance. I need it to get the data in chunks.

seancorfield22:05:55

Right, I understand that you need streaming and you need to reduce on that so you can have chunking.

seancorfield23:05:18

I'll have to think about this but it would be fairly easy for me to extend the "API" that mapified-result-set exposes (which is currently just a hash map that is also a DatafiableRow) to make it possible to get the row index (the underlying result set already knows that) as well as making each row Indexed (again, because the result set can be accessed by label or by index) and maybe a few other things that will make this a lot easier for you.

seancorfield23:05:31

The vector of column names is also available "under the hood" but it's just not exposed anywhere.

seancorfield23:05:19

In order to take advantage of that, you'd need a way to build chunked sequences of vectors that stayed inside the abstraction which is going to be the hard problem to solve here.

geraldodev23:05:30

Remember that Oracle is not Datafiable friendly 🙂 But as you said the column names are already available.

seancorfield23:05:38

I'm not sure what you mean about "Oracle is not Datafiable friendly"?

seancorfield23:05:21

Oracle has nothing to do with the abstraction I'm talking about.

seancorfield23:05:31

If you're using an array builder, datafiable-row will produce a vector of column values -- no column names involved (and no table names either -- I know Oracle doesn't give you the table names but that's not relevant here).

geraldodev23:05:41

Yes they lack getTableName , not related to issue , please disregard the comment

seancorfield23:05:53

(I just checked using REBL and the whole datafy / nav machinery works just fine on result sets built as arrays so that basic stuff is fine -- I had not tested that before)

seancorfield23:05:34

Hmm, working through this locally, partition-all is the real problem here because it requires that the rows be realized (because partition-all is a stateful transducer and the reduction is already happening over a stateful object -- the ResultSet). So what you need is a way to produce rows (vectors of column values) efficiently that would also give you the row number and the column names -- I'm thinking as metadata -- and then that should "survive" the trip through partition-all.

geraldodev23:05:54

That would be very nice to have

seancorfield23:05:35

I have part of it working locally. I need to shift gears and move that p.o.c. into a proper test context so I can play with it in more depth.

seancorfield23:05:51

The TL;DR is that I'll switch MapifiedResultSet from an interface to a protocol and have it support row-number and column-names and implement those via metadata on the result of datafiable-row.

seancorfield23:05:44

Your code would become:

(transduce
  (comp
    (map #(rs/datafiable-row % conn {}))
    (partition-all 100))
  (completing
    (fn [_ lote]
      (doseq [v lote]
        (row-number v) ; returns 1-based result set row number
        (column-names v) ; returns vector of column names in the same order as the data values in v
        ...)))
  (jdbc/plan conn [...] {:builder-fn rs/as-arrays}))

seancorfield23:05:22

You wouldn't need map-indexed since you could get the row number from the (realized) row anyway.

geraldodev23:05:42

Very good abstraction.

seancorfield23:05:34

There are some interesting DB-specific restrictions in play with this, I've just discovered. Apache Derby only allows .getRow when scroll cursors are used. And introducing the dereferencing of the builder in datafiable-row seems to throw exceptions if you use plan incorrectly (I have tests for several types of plan misuse and they now blow up instead of giving you something more predictable, but that's easy to fix 🙂 )