Fork me on GitHub
#sql
<
2024-04-22
>
sheluchin16:04:03

I have an EDN file with ~20k maps. Some of the maps are very large while others are small. I want to insert these as JSON into a Postgres table. I have some code that works, but it takes a very long time to process. I'm using batch transactions to write while controlling the batch size using partition-all in my transducer. Can anyone suggest how I might optimize this process? Partitioning the records is helpful but it's still slow and since some maps are huge, finding a good partition size up front is not ideal.

igrishaev17:04:04

Did you try COPY In from stdin with CSV and CopyManager? I believe it will be much faster than insert

igrishaev17:04:45

Encode your edn to csv with a single JSON column, then copyIn using the CopyManager from postgres

seancorfield19:04:28

Unless you specify various settings, you may not get batch inserts -- the next.jdbc docs talk about this. Can you show the code of how you are doing the inserts and the settings/options you are passing into the connection setup etc?

seancorfield19:04:32

From the docs: Even when using next.jdbc/execute-batch!, PostgreSQL will still send multiple statements to the database unless you specify :reWriteBatchedInserts true as part of the db-spec hash map or JDBC URL when the datasource is created. https://cljdoc.org/d/com.github.seancorfield/next.jdbc/1.3.925/doc/getting-started/tips-tricks#batch-statements-2

seancorfield19:04:14

See also https://cljdoc.org/d/com.github.seancorfield/next.jdbc/1.3.925/api/next.jdbc.sql#insert-multi! If called with :batch true will call execute-batch! - see its documentation for situations in which the generated keys may or may not be returned as well as additional options that can be passed. Note: without :batch this expands to a single SQL statement with placeholders for every value being inserted -- for large sets of rows, this may exceed the limits on SQL string size and/or number of parameters for your JDBC driver or your database!

sheluchin22:04:18

@U04V70XH6

(defstate datasource
  :start
  (connection/->pool HikariDataSource
                     (-> config/config
                         :com.example.databases/postgresql
                         :db-spec
                         (rename-keys {:user :username})
                         (assoc :reWriteBatchedInserts true)))
  :stop (.close datasource))

(jdbc.sql/insert-multi! ds table hash-maps (merge {:batch true} opts))
Does that add up?

sheluchin22:04:41

@U1WAUKQ3E thank you for the tip. I'm going to try it out.

seancorfield00:04:52

Yes, adding that rewrite option should help, and the :batch true in the insert.

sheluchin00:04:53

I think my data is just too big and it's about as fast as it gets. Even encoding it to CSV with one JSON str column as suggested is taking a long time. Thanks anyhow. I think it's not really related to my DB interactions.

seancorfield00:04:20

OK. We tried! 🙂

❤️ 1
theequalizer7320:04:55

Hi friends, I need to write an audit record for every update and insert using next.jdbc, and it must be performed within a transaction. I’m wrapping my connection with jdbc/with-logging, but I’ve noticed it won’t work for transactions.

(extend-protocol p/Transactable
  SQLLogging
  (-transact [this body-fn opts]
             (p/-transact (:connectable this) body-fn
                          (merge (:options this) opts))))
Does anyone have a solution for this?

seancorfield20:04:30

Per the docs you need to rewrap the Connection inside with-transaction

theequalizer7320:04:14

I created a wrapper for the sql/insert! and sql/update with the same parameters, the wrapper is using with-transaction to write the audits, my question is, will the with-logging work with with-transaction?

seancorfield21:04:49

Hard to know what you're talking about without seeing your code. I don't know why you have wrappers for insert/update - seems nothing to do with what I'm talking about.

theequalizer7322:04:13

You are absolutely right. What I will do is write, by tomorrow, some simplified examples of what I have now to try to explain better what I need. Thank you.

1
theequalizer7315:04:17

GM, Basically I have a mount component for the db which looks like this:

(mount/defstate db
  :start
  (jdbc/with-options
    (with-logging (connection/->pool HikariDataSouce {:jdbcUrl jdbcUrl
                                                       :username user
                                                       :password password}))
                   custom-logging-options
    custom-options))
And I created a few functions to add audits to the regular next.jdbc sql/update! and sql/insert!:
(defn insert-with-audits!
  [db table key-map opts]
  (let [opts (merge (:options db) opts)
        audit (build-audit-fn key-map)]
    (if (valid-audit? audit)
      (with-open [conn [jdbc/get-connection (:connectable db))]
        (jdbc/with-transaction [tx conn]
          (sql/insert! tx :audits-table audit opts)
          (sql/insert! tx table key-map opts)))
      (throw (Exception. "my custom error message"))))
I’m not able to log my inserts and updates anymore, only my selects. What are the limitations for logging transactions? Is there a solution for this that I can apply to my mount component or extend a protocol somewhere?

seancorfield15:04:03

The documentation talks about this. When you use with-transaction, you get a plain Java Connection object back -- tx above -- and you need to wrap that with logging/options.

theequalizer7320:04:56

Sorry for bothering you with documented stuff, it’s there in the with-logging docstring. I know what I need to do now and it’s working. Thank you