Fork me on GitHub
#sql
<
2021-02-16
>
genekim01:02:13

Thanks for all the great work on next-jdbc, @seancorfield — and I’ve been very much enjoying all the REPL talks you’ve been doing. I can’t wait to try “add-lib” someday! FWIW, I struggled with storing Unicode characters and emojis in MySQL, as well as getting upsert functionality going — elated that I finally got it working. Posting gist here, just in case it saves someone else some hassle! https://gist.github.com/realgenekim/3fb9d2dccde9234743ec392b600055be

👍 1
seancorfield01:02:07

@genekim MySQL's UTF8 is broken. You need to use UTF8MB4.

seancorfield01:02:41

UTF8 is only three bytes in MySQL 😞 And instead of fixing it, they just introduced a new encoding type to support four byte Unicode 👀

👀 1
genekim05:02:32

Thanks @seancorfield — I think I got this covered? Here’s what I did on the MySQL side — does that look reasonable to you? 🙏

; 

ALTER DATABASE twitter CHARACTER SET = utf8mb4 COLLATE = utf8mb4_unicode_ci;
ALTER TABLE tweets CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;
ALTER TABLE tweets MODIFY COLUMN tweet TEXT CHARSET utf8mb4 COLLATE utf8mb4_unicode_ci;

seancorfield05:02:10

Yup, that's what you need!

seancorfield05:02:43

I think you need the MODIFY COLUMN between the CONVERT though otherwise you might lose data?

seancorfield05:02:27

I have a feeling you don't need to CONVERT the table at all -- it can't contain any four byte Unicode characters and anything it already contains is correct as-is.

seancorfield05:02:46

If you have other character columns in that table, you might need to consider what encodings they have -- not all char, varchar, text columns need to be the same charset: it depends on what they are going to contain. We have tables where the charset default is latin-1 but a few specific columns are utf8mb4, for example.

genekim17:02:57

Thanks, @seancorfield — I have so little experience in doing non-ASCII correctly. I can’t tell you how many times I’ve been bitten by not handling Unicode correctly. Is this now the norm of how you set up tables and columns? Or is it just for “user generated data” where people would use emojis, etc. (I have a vague memory of being shocked when a Python program I wrote blew up when I found out that people were putting emojis in Amazon book reviews. 🙂

seancorfield17:02:07

@genekim I was doing email first before Slack today, so I ended up answering that as a comment on your Gist 🙂 TL;DR: we use latin1 for columns that we know only need single-byte characters and utf8mb4 for columns that will store user-supplied input and can therefore have Unicode emojis in it.

seancorfield17:02:49

(we have a couple of columns still on utf8 in our biggest tables because running the ALTER TABLE will lock the table long enough to cause disruption to our users -- at some point we'll probably tackle that during a scheduled maintenance window... but we very rarely have scheduled maintenance any more now that our processes are nearly all Clojure apps running in a (small) cluster!)

genekim17:02:24

That is superb, @seancorfield — what every dev (and DBA) should know about modern realities of strings and their storage in databases. Thank you! (PS: next.jdbc is also superb! Was surprised at how little effort it was to pickup and use. 🎉🎉🎉)

seancorfield18:02:33

Thanks! We use it heavily at work (and our older code uses clojure.java.jdbc very heavily) -- and that's why I can devote so much time to these libraries.

fabrao21:02:42

Hello all, how do I initiate connection pool for next.jdbc to use with mount state management?

fabrao21:02:26

I see that it has a configuration for Component like (component/start (connection/component HikariDataSource db-spec))

fabrao21:02:38

Can I use like this? (connection/->pool HikariDataSource db-spec)

seancorfield21:02:41

@fabrao Yes, that will create a connection pool using HikariCP.

seancorfield21:02:56

Pay attention to the documentation caveats about :user vs :username.

fabrao21:02:00

@seancorfield my concern is about closing it, just apply .close to it?

dharrigan22:02:04

Whilst not quite the same as yours, here's mine...

dharrigan22:02:06

(defn connection-pool-start
  [config]
  (connection/->pool HikariDataSource config))

(defn connection-pool-stop
  [datasource]
  (.close datasource))