Fork me on GitHub
#sql
<
2020-03-29
>
J15:03:16

Hello 🙂 I’m playing with next.jdbc(really really fan of the library). I was wondering, when I use (nav row :relation value)on a vector of rows, each navcall the database, right? So in the case of a large amount of data, isn’t it better to fetch all relation by id in one request?

seancorfield17:03:25

@jean.boudet11 Can you provide a bit more context around your question? I'm not sure what you're really asking.

J18:03:25

I have the relation A.B_id = B.id When I execute select * from A I wanted inject Binto each rows. I have two options: • Get B ids -> (map :B_id rows) and make a request to get all B rows (and process to inject B into each rows) • Or for each rows -> (map #(nav % :B_id (:B_id %)) rows) What the best choice on a large amount of rows? Maybe the first choice if nav call the database connection each time.

seancorfield18:03:08

datafy/`nav` are really intended for use by tools, not by "users". Besides, in order to use nav you should first call datafy on a thing.

seancorfield18:03:22

Why not just use a SQL JOIN?

johnj18:03:40

Why would (time (jdbc/execute! ds ["select * from products"])) take 20ms in the repl but print 300ms from the CLI?

seancorfield18:03:27

@lockdown- What is ds? Perhaps in the REPL you've already opened a connection to the DB prior to timing this, but in the CLI you're timing is including opening the connection (which is likely the expensive part).

johnj19:03:22

ds is (def ds (jdbc/get-datasource db))

johnj19:03:10

In both cases I'm just timing the execute! expression, as pasted before

seancorfield19:03:01

Hard to say, without a lot more information/context. Something has to be different about the code you're running, or the way you're timing it, or something in the environment.

johnj19:03:06

this is literally all the code

johnj19:03:43

(require '[next.jdbc :as jdbc])

(def db {:dbtype "postgres"
         :dbname "xx"
         :user "xx"
         :password "xx"})

(def ds (jdbc/get-datasource db))

(time (jdbc/execute! ds ["select * from products"]))

seancorfield19:03:58

Duplicate that last line so it runs twice. See if the timings are different on repeated execution. Run it via the CLI again.

johnj19:03:44

@seancorfield yep, the second line is now 30ms, what is getting cached there?

johnj19:03:27

the JVM warming up the clojure code?

seancorfield19:03:16

More likely to be the initial JDBC driver setup of the connection to the DB.

johnj19:03:41

anyway, ~30-50ms still seems very high, in psql is takes <1ms, do you know if this overhead is normal? the driver or somewhere else?

seancorfield19:03:40

Well, you're still standing up a connection on each call to execute!.

seancorfield19:03:46

The initial code path through JDBC involves the driver manager loading and initializing the specific JDBC classes needed, and setting up everything needed for connections to the database -- that's the big 300ms vs 30ms chunk of time.

seancorfield19:03:08

But you're not using a connection pool so each execute! is causing a new connection to be created and closed.

seancorfield19:03:29

In real code, you'd want to use HikariCP or c3p0 (that's all described in the next.jdbc docs) and then once the first few connections are setup, subsequent operations become a lot faster because they can reuse previously connections, just taking them out of the pool and putting them back in.

johnj19:03:54

gonna give it a try

seancorfield19:03:37

The other thing is that execute! realizes the whole ResultSet and builds Clojure data structures for it. That can be quite an overhead for large result sets. That's why plan exists, so you can sidestep all that realization and data construction.

johnj19:03:50

that probably may be it, in the code above, is the object (jdbc/get-datasource db)) creates, is created a new every time when calling execute!?

johnj19:03:34

can it be cached?

seancorfield19:03:10

You are "caching" the data source object -- it is computed once and stored in the ds Var.

seancorfield19:03:45

But execute! will call .getConnection on that datasource object and that is what sets up a new connection to the database (and then it is closed at the end of the operation).

johnj19:03:37

I see, thanks!

seancorfield20:03:14

If you don't want to go all the way to connection pooling, but you want to run multiple statements without the overhead each time, look at the section of the docs on reusing connections.

seancorfield20:03:05

(Connection pooling is described just a bit further on in that Getting Started page)

johnj20:03:28

tried with both hikari and reusing the connection with next.jdbc/get-connection and now I'm getting ~1ms when timing

seancorfield20:03:26

That sounds about right.