Fork me on GitHub
#sql
<
2019-04-22
>
ikitommi01:04:44

some news from my porsas spike: 1) Found a (few times) faster way to do ResultSet->Map conversion 2) new create-query, which just takes the mapping options and return a normal query function, which creates the actual ResultSet->Map converter on the ~first invocation and stores it in a local cache of sql->ResultSet->Map. 3) new query which looks ~like next.jdbc query, but behind the scenes uses the create-query way

ikitommi01:04:08

so, there is no more separate compile-phase, it’s done behind the scenes.

ikitommi01:04:26

using normally:

(p/query connection "SELECT * FROM fruit")

ikitommi01:04:30

using the caching version:

;; initialize once per application
(def query (p/create-query))

(query connection "SELECT * FROM fruit") ;; "slow"
(query connection "SELECT * FROM fruit") ;; "fast"
(query connection "SELECT * FROM fruit") ;; ...

ikitommi02:04:33

quick perf test claims that the non-caching version is just bit faster that next.jdbc (which is many times faster than java.jdbc), but the cached versions are much faster with all settings.

ikitommi02:04:47

(defn perf-test []

  ;; 630ns
  (title "java")
  (bench! (java-query connection "SELECT * FROM fruit"))

  ;; 630ns
  (let [query (p/create-query {:row (p/rs->map)})]
    (title "porsas: compiled & cached query")
    (bench! (query connection "SELECT * FROM fruit")))

  ;; 1400ns
  (let [query (p/create-query)]
    (title "porsas: cached query")
    (bench! (query connection "SELECT * FROM fruit")))

  ;; 2100ns
  (title "porsas: dynamic query")
  (bench! (p/query connection "SELECT * FROM fruit"))

  ;; 3500ns
  (title "next.jdbc")
  (bench! (jdbc/execute! connection ["SELECT * FROM fruit"]))

  ;; 6600ns
  (title "java.jdbc")
  (bench! (j/query {:connection connection} ["SELECT * FROM fruit"])))

ikitommi02:04:06

@seancorfield would you be interested in supporting the two-phase appoach of porsas in next.jdbc?

ikitommi02:04:03

just pushed all codes in. also, comments on the approach welcome.

ikitommi02:04:29

if there would be any support for cache of sql->ResultSet->Map, it should be bounded and inspectable.

ikitommi02:04:09

e.g. PersistentArrayMap has a constructor, which takes the initial values as an array. Creating a map with 5 fields drops from 300ns to 130ns on my macbook.

ikitommi02:04:15

with the reduce+assoc, the impl needs to compare keys to see if it’s a new value or not. there should not be duplicates in ResultSets I guess.

seancorfield02:04:25

Unless you join across multiple tables and don't use aliases in SQL to make them unique.

seancorfield02:04:44

clojure.java.jdbc has renaming code to make column names unique. In next.jdbc I've avoided that as a deliberate tradeoff -- but it shouldn't be needed when qualified keywords are used.

ikitommi02:04:34

tested what happens if there are duplicates: all get into the map. get returns the first, but with merge the last stays. kinda dangerous, if there could be duplicates.

ikitommi02:04:56

(keys result)
; (:ID :NAME :APPEARANCE :GRADE :GRADE)

seancorfield02:04:14

Your rs->map is a bit of a straw man since next.jdbc uses a transient hash map (if you're building rows). But your weird array map approach might well still be faster -- and it's extensible (in next.jdbc so you could implement it yourself anyway).

ikitommi02:04:42

oh, didn’t notice, there is an option for it?

ikitommi02:04:20

the two-phase is fast because the resultsetmeta & column reading needs to do only once. But it also assumes that the tables don’t change while the application is running. Which is kinda bad assumption (but this is how the staticly typed jdbc libs all need to work, and are fast because of that)

seancorfield02:04:10

That's an interesting optimization. Pretty sure you could build that with the open builder machinery. Have a read of the docs https://cljdoc.org/d/seancorfield/next.jdbc/1.0.0-alpha8/doc/readme and try it out.

👍 4
seancorfield02:04:28

And, yes, there's an official release now.

seancorfield02:04:21

I haven't announced it yet. Will do that later tonight. There are still caveats about the final resting place and therefore the group/artifact ID.

ikitommi02:04:10

btw, protocol methods can have docs too:

(defprotocol RowBuilder
  "Protocol for building rows in various representations"
  (->row [_] "called once per row to create the basis of each row")
  ...)

seancorfield02:04:17

It links to the tests that have an example.

seancorfield02:04:32

Based on your record builder 😁

ikitommi02:04:09

found the link, looks… nice 😉

seancorfield02:04:28

BTW, I'm out at dinner and on my phone so detailed responses are hard.

seancorfield02:04:16

And, yes, I know about docs on protocol methods -- but they're documented thoroughly at the namespace level.

seancorfield02:04:09

I plan to keep expanding the docs including adding a full guide on datafy / nav in this context.

ikitommi02:04:10

have a nice dinner. need to go too, but the quick guess is that the porsas-style cache can’t be added to the current impl of the rowbuilders as the cache is sql-string -> ResultSet -> Value, would need the String to get the cache working. Your current impl starts with ResultSet. Not sure if there is anything from get the cache id…

ikitommi02:04:23

pass the sql-string as extra arg to :gen-fn?

ikitommi02:04:46

(or the sql-vec)

seancorfield02:04:44

You could handle it through extra options. In the general case -- via an arbitrary PreparedStatement you can't get the SQL string anyway I think?

ikitommi09:04:48

I think you can't have get that out. I would add the sqlvec as extra argument to the :row-fn, so the custom compilers could use it. Extra ags is perf-wise basically zero, but would allow one to build a caching (or logging!) rowcompiler.

ikitommi09:04:59

And congrats on the release!