Fork me on GitHub
#sql
<
2021-05-09
>
Dave Suico14:05:33

Hi guys, I have 2.7k rows of csv data and I have function for reading and inserting rows into the database. The reading is fast but the jdbc/insert-multi! takes so long. The postgress database is in Ireland and I'm from Philippines, and now my client expects it to be able to insert those rows in less than a minute. Is there a better implementation for this to make the inserts fast?

dharrigan14:05:25

I'm not familiar with the older jdbc library, but have you done the usual stuff, like turn off indexing, disable triggers (and assuming that this is done within a single transaction, remove foreign keys, insert, then re-add foreign keys)?

👍 3
Dave Suico14:05:42

oh yeah I just solved it now by using pgcopy, I replaced the insert-multi! with pgcopy/copy-into! and it was really fast!

👍 3
3
seancorfield16:05:40

There are quite a few DB-specific caveats around insert-multi!. If you’re inserting a sequence of hash maps, it will do multiple inserts. Also different DBs need different connection string options to enable true multi-record inserts. clojure.java.jdbc doesn’t document that stuff very well. next.jdbc does a better job in that respect. But, yeah, if you use a DB-specific library, you can leverage DB-specific optimizations and that’s going to be the best way to wring performance out of something like that.

❤️ 3
snorremd17:05:42

Hello there. I'm working with next.jdbc and am wondering if anyone here had any good tips with regards to Postgres' range types. Specifically I want to work with TSTZRANGE. There are as far as I'm aware no built in types for this in the Postgres JDBC driver library. I think maybe there are some Hibernate extensions to accomplish this, but would like to avoid dragging in additional libraries. I'm thinking I might have to implement a subclass of PGobject and implement the methods to encode and decode the values myself? Edit: Or it might be possible to simply use PGobject with the SettableParameter protocol. Though I'd need an intermediary type that I can dispatch on in the SettableParameter protocol extension anyhow. Another approach might be to try to avoid treating the range as a range outside Postgres all together, but my limited understanding of the query syntax surrounding ranges in Postgres makes it seem difficult to do this for inserts.

seancorfield17:05:53

@snorremd I just did some searching b/c I hadn’t heard about that type and you made me curious: I see open issues in both Hibernate and JOOQ projects about it and comments indicating there’s no current JDBC support for that type (and that it is hard to implement correctly due to handling of timezone rules). One of the comments in one of those threads was basically “PostgreSQL is too powerful for the JDBC API” 😐

seancorfield17:05:44

PostgreSQL’s many custom data types and the difficulty of supporting it in JDBC are why I tend to think of PostgreSQL as the “Oracle” of the open source world: so much unique (non-portable) stuff that causes endless headaches for library maintainers 🙂

snorremd17:05:21

Oh yeah, I'm picking up the same sentiments while googling, i.e. that there are lots of Postgres specific functionality that can be hard to add to the standard JDBC API. I'm leaning towards maybe just keeping two separate TIMESTAMPTZ columns like I do currently, and then just construct TSTZRANGE values inside Postgres queries/functions whenever needed. That simplifies things on the Clojure side while introducing a bit overhead on writing the schemas and queries. I'll have to read more about it. But thanks for the input. 👍

seancorfield18:05:16

It’s part of why I continue to like MySQL/Percona so much: much less surface area for weirdness and a simpler interaction via JDBC 🙂

dcj18:05:59

> @snorremd Here is what I did: https://github.com/dcj/coerce/blob/develop/src/coerce/jdbc/pg.clj Note that not all aspects of that code are equally awesome 🙂 For this code, I wanted to coerce between org.threeten.extra.Interval and TSTZRANGE On the input/query side, I did this:

(extend-protocol prepare/SettableParameter

  org.threeten.extra.Interval

  (set-parameter [^org.threeten.extra.Interval v ^PreparedStatement ps ^long i]
    (let [meta      (.getParameterMetaData ps)
          type-name (.getParameterTypeName meta i)
          start     (time/start v)
          end       (time/end v)
          start-pg  (if (= start Instant/MIN) "" start)
          end-pg    (if (= end Instant/MAX) "" end)
          value-pg  (str "[" start-pg "," end-pg ")")]
      (.setObject ps i (doto (PGobject.)
                         (.setType type-name)
                         (.setValue value-pg)))))
And on the output side:
(defn ^:private parse-range
  [s]
  (let [len         (count s)
        len-1       (dec len)
        start-delim (subs s 0 1)
        end-delim   (subs s len-1 len)
        ranges      (subs s 1 len-1)
        [start end] (-> ranges
                        (string/replace #"\"" "")
                        (string/split #","))]
    [start-delim start end end-delim]))

(defn ^:private pgobject->interval
  [type s]
  (let [[_ start-str end-str _] (parse-range s)
        start                   (string/replace start-str #" " "T")
        end                     (string/replace end-str #" " "T")
        time-fn                 (case type
                                  :tstzrange time/zoned-date-time
                                  :tsrange   (comp time/instant #(str % "Z")))]
    (time/interval (time-fn start)
                   (time-fn end))))

(defmulti pgobject->clj
  "Convert returned PGobject to Clojure value."
  #(keyword (when % (.getType ^org.postgresql.util.PGobject %))))

;; PostgreSQL comes with the following built-in range types:
;;   int4range — Range of integer
;;   int8range — Range of bigint
;;   numrange — Range of numeric
;;   tsrange — Range of timestamp without time zone
;;   tstzrange — Range of timestamp with time zone
;;   daterange — Range of date

(defmethod pgobject->clj :tstzrange
  [^org.postgresql.util.PGobject x]
  (when-let [val (.getValue x)]
    (pgobject->interval :tstzrange val)))

(defmethod pgobject->clj :tsrange
  [^org.postgresql.util.PGobject x]
  (when-let [val (.getValue x)]
    (pgobject->interval :tsrange val)))

👍 3
❤️ 3
snorremd20:05:56

Thank you! This is super helpful. I'm going to sit down tomorrow and take a closer look, but this looks like a great starting point for me to implement something similar.

dcj18:05:55

And I

(extend-protocol result-set/ReadableColumn

;; PGobjects have their own multimethod
  org.postgresql.util.PGobject

  (read-column-by-label ^org.postgresql.util.PGobject [^org.postgresql.util.PGobject v _]
    (pgobject->clj v))
  (read-column-by-index ^org.postgresql.util.PGobject [^org.postgresql.util.PGobject v _2 _3]
    (pgobject->clj v))