clojure 2016-02-13 | Slack Archive

i mean that in the sense that the implementation of that feature varies across db platforms, and the semantics are poorly defined, particularly with DML

kul15:02:39

Just trying to avoid problem when a db becomes highly congested

kul15:02:52

Causing querying threads to stall

kul15:02:36

Rejecting queries is more acceptable than a unresponsive application

jonahbenton15:02:29

gotcha. so if queries are expensive, one approach is tuning at the db level, another would be to utilize a results cache. another approach could be to keep tabs on query response time and if it starts to climb, then simply don't send the query

kul15:02:29

Isn't jdbc spec standard?

kul15:02:51

I mean the behaviour can vary but at least the api is there

jonahbenton15:02:59

it's a standard for communicating from the client side

jonahbenton15:02:11

but different databases attend to that particular request differently

jonahbenton15:02:43

the implication of .setQueryTimeout is that the database stops executing the query

jonahbenton15:02:50

but some don't do that

kul15:02:03

I just want postgres

kul15:02:29

Not sure if i can install pgbouncer on rds

kul15:02:14

All the approaches are costly to do correctly than a simple timeout

jonahbenton15:02:43

it looks like functionality around this is still in flux on the pg jdbc driver. Report of a bug fix in December to thread safety issues: https://jdbc.postgresql.org/documentation/changelog.html

kul15:02:15

Changelog talks about making method threadsafe

jonahbenton15:02:30

the other issue is that sending work and then asking the database to stop in the middle is still subjecting the database to the load.

kul15:02:31

I am assuming some support is already there

jonahbenton15:02:05

yeah- it may be that for the common case it works fine, i haven't used pg since much earlier in the 9 series

jonahbenton15:02:19

if the semantics of your queries can be captured in key-value maps, clojure core.cache is pretty easy to use.

kul15:02:40

Problem is with update queries

jonahbenton15:02:50

kul15:02:52

Db reads are pretty robust

jonahbenton15:02:54

your updates are timing out?

Yep

Not timing out

😆

sorry- what's happening with updates?

jonahbenton15:02:44

the thing about cancelling a transactional update is that it then causes a rollback

jonahbenton15:02:48

which can be very expensive

kul15:02:18

Makes sense

jonahbenton15:02:22

because it impacts other transactions also in flight, both reads and writes

jonahbenton15:02:02

if you absolutely have to get those writes to the transactional store, then aside from tuning, you may need more IO throughput

kul15:02:36

You mean better hardware for db?

jonahbenton15:02:37

but another option if the semantics are more flexible is to maintain state in another location, like a redis store

jonahbenton15:02:44

and then propagate data from redis to pg

niwinz15:02:47

maybe some postgresql tunning, changing a little bit the postgresql defaults can speedup inserts more than 3 or 4x

kul15:02:10

@niwinz i am listening

jonahbenton15:02:50

yes, if you are on rds, you can use provisioned iops

kul15:02:52

@jonahbenton that seems costly to implement

kul15:02:14

I mean the redis thing

jonahbenton15:02:34

do you have multiple applications writing to pg or just one?

kul15:02:47

Just one

niwinz15:02:56

put synchronous_commit= off and wal_writer_delay=200ms

niwinz15:02:07

put high value for checkpoint_segments, (32-128)

niwinz15:02:36

and increment shared_buffers

kul15:02:01

Thanks bookmarked! Will try these out

niwinz15:02:04

the default in 9.4 is 128MB, you increase depending on the ram available

niwinz15:02:16

512MB can increase also writes and reads

kul15:02:50

Great but i am still concerned about those threads waiting on SQL statements in my app

kul15:02:47

I still think timeout at global level on java.jdbc could be useful

jonahbenton15:02:02

yeah- the issue is that the semantics aren't clear

jonahbenton15:02:24

if you want async behavior where your app can respond to users while not waiting for background work to be done

jonahbenton15:02:32

can do that with agents or core.async

kul15:02:53

Its a crud app, no async

kul15:02:53

Strong consistency is required, i would not chose rdbms otherwise

kul15:02:21

Hurmm

kul15:02:39

A core.async pipeline for db ops

jonahbenton15:02:57

what is the nature of the updates? why are they so expensive?

kul15:02:45

It updates multiple tables

kul15:02:13

Ah that reminds me of adding more indexes

jonahbenton15:02:32

yeah- though indexes increase cost at write time

jonahbenton15:02:46

is there row-level write contention?

kul15:02:48

But back pressure still need to be handled gracefully

kul15:02:04

Not really

kul15:02:35

Every user has its own rows to update

jonahbenton15:02:45

so users are able to wait for their operations to complete?

kul15:02:55

Yes they must

kul15:02:11

Its a simple crud app

kul15:02:19

Where data is in sql

kul15:02:29

Nothing fancy

kul15:02:31

What do you think about db operations handled via a core.async thread?

jonahbenton15:02:09

well, core async doesn't like blocking io

jonahbenton15:02:54

the thread pool is limited

kul15:02:26

A couple of (thread ..) for db ops and (go ..) and wait for result completion?

jonahbenton15:02:38

well- how many concurrent users do you have?

kul15:02:32

I was testing with 500 when many threads stalled making it unresponsive

jethroksy15:02:29

promise with a timeout?

jonahbenton15:02:33

and the bottleneck was database writes?

kul15:02:43

I guess

kul15:02:17

Jetty can take 100000 req per sec for a simple no db endpoint

jonahbenton15:02:49

yeah. easy work is easy

kul15:02:52

25 40 k at least for a small vm at aws

kul15:02:58

They suck

kul15:02:28

The vm i mean

kul15:02:02

Ok so java.jdbc has no timeout is a problem? Do we agree on this?

jonahbenton15:02:33

you mean for writes?

jonahbenton16:02:33

when you load tested, did you do it in step fashion to figure out at what stage the system stopped responding?

jonahbenton16:02:28

if the bottleneck there is db, then application architecture isn't going to matter. but if db is doing fine, then can look at changing app architecture to ensure responsiveness. the fundamental issue when talking to a database with a live thread is that most drivers use blocking io. so you are going to need a thread per concurrent action

jonahbenton16:02:47

using timeouts, if actually implemented in the underlying driver, can free up a thread, but a) the work requested of the db doesn't get done b) more load is put on the database

jonahbenton16:02:44

if you figure out that at your budget and provisioning you can support 20 concurrent actions, then you can just maintain a 20-connection db pool. when a user submits work, they try- with a timeout- to retrieve a connection. if they can't get one, you tell them they have to try again later

kul16:02:57

I am.already using c3p0

kul16:02:29

15 max default connections

kul16:02:49

Try with a timeout ? You mean a future ?

jonahbenton16:02:31

http://www.mchange.com/projects/c3p0/#checkoutTimeout

jonahbenton16:02:19

set that to 500ms or something

kul16:02:26

Ok that property makes a lot of sense but it still.does not avoid stalled threads

jonahbenton16:02:47

did you take a thread dump?

kul16:02:53

Yes

jonahbenton16:02:05

ah, so what were they blocked on?

kul16:02:11

Most were waiting.on java.jdbc stack trace

kul16:02:24

Some on query most on updates

jonahbenton16:02:41

very good

jonahbenton16:02:03

so you need better write throughput

kul16:02:15

🙂

jonahbenton16:02:45

the above mentioned settings, plus http://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/CHAP_BestPractices.html#CHAP_BestPractices.PostgreSQL will reduce write activity

seancorfield16:02:47

FWIW, at World Singles, most of our connection pools have min/max pool size of 20/90 and 40 helpers (threads)… and we have this in c3p0:

;; expire excess connections after 10 minutes of inactivity:
               (.setMaxIdleTimeExcessConnections (* 10 60))
               ;; expire connections after 61 minutes of inactivity:
               (.setMaxIdleTime (* 61 60))
               ;; idle connection testing
               (.setAutomaticTestTable "c3p0connection")
               (.setTestConnectionOnCheckin true)
               (.setIdleConnectionTestPeriod (* 11 60))
               ;; unreturned (app leakage) get zapped after an hour:
               (.setUnreturnedConnectionTimeout (* 59 60))

jonahbenton16:02:13

but you may just need more iops

jonahbenton16:02:05

@seancorfield: what db are you using?

seancorfield16:02:23

MySQL… well, Percona… SSD-backed servers.

seancorfield16:02:37

What sort of request load are you looking to support?

jonahbenton16:02:46

the magic of ssd

kul16:02:46

I totally agree but it going to break at some point

kul16:02:23

Btw i have enough inputs to work with

kul16:02:42

Thanks all

seancorfield16:02:58

We have about 450 connections on the master, 50 on the slave; about 3.5k reads/sec on the master; 250 writes/sec on the master; 4k reads/sec on the slave… and the DBs are not even breaking a sweat...

seancorfield16:02:10

(but, yeah, fairly low write loads for us)

jonahbenton16:02:13

nice when you have lots of headroom. easy to sleep at night

kul16:02:24

@seancorfield do not want to be pushy,just want to ask if exposing prepared stmt options to common calls like query etc makes sense to you?

seancorfield16:02:56

You can already create a prepared-statement and pass it as the first element of the sql-and-params vector to query...

seancorfield16:02:17

Per the query docstring: "The second argument is a vector containing a SQL string or PreparedStatement, followed by any parameters it needs. See db-query-with-resultset for details."

seancorfield16:02:10

It’s possible that prepare-statement could expose more options directly? http://clojure.github.io/java.jdbc/#clojure.java.jdbc/prepare-statement

kul16:02:55

It becomes a bit difficult to adapt framework like yesql etc for this

kul16:02:59

I guess

kul16:02:15

I guess dedicated helper threads for doing SQL ops make sense

kul16:02:58

It would work even when the driver or db does not implement termination

kul16:02:07

Any suggestion for an easy to work with threadpooling on Clojure ?

jonahbenton16:02:05

@kul agents with a custom executor pool may work for you. you can assign users to individual agents with a hash function. they preserve order of operation semantics, you can wait for results, and recover on errors

amacdougall20:02:46

As long as we're talking about SQL, I just ran into a fun one... for non-alphanumeric characters, Clojure's compare and Postgres's ORDER BY do not appear to give the same results. Normally I'd just trust Postgres, but I want my unit tests to make sure I'm applying the right sorting. Has anyone else run into this, and have any suggestions on how to deal with it?

jonahbenton20:02:50

hey @amacdougall that's a character set issue.

jonahbenton20:02:16

pg is probably using sql-ascii?

jonahbenton20:02:25

java may be utf-8

amacdougall20:02:57

Oh, interesting. It seems like I'd want to tell Clojure to adopt PG's sorting mechanism, though honestly it doesn't matter as long as alphanumeric strings are sorted the same way in both, which is likely.

amacdougall20:02:51

I'm generating test data using Prismatic Schema, which seems to prefer generating usernames/etc like "//+;.&". I appreciate the fuzz-testing approach, so I don't want to try to limit the test data to alphanumeric.

amacdougall20:02:53

The Clojure docs on compare say: > Strings, Symbols, Keywords: lexicographic order (aka dictionary order) by their representation as sequences of UTF-16 code units. This is alphabetical order (case-sensitive) for strings restricted to the ASCII subset.

amacdougall20:02:12

That sounds like they'd work the same as long as they're all ASCII, which they do seem to be...

amacdougall20:02:55

Of course "alphabetical order" is undefined for punctuation. I'd assume they'd all sort by ASCII code, which should be the same in UTF-8... but that mention of UTF-16 really muddies the waters. I honestly have no idea what "sequences of UTF-16 code units" means. I would assume that it boils down to UTF-8 for characters which don't need double-byte representation, which should be the same as ASCII for ASCII characters...

amacdougall20:02:19

But I'm making several assumptions to get there, and clearly things are being sorted differently.

jonahbenton21:02:06

do you have an example of a group of chars being sorted differently?

amacdougall21:02:26

Sure. Here's a list of strings as PG sorted them:

(" " ";" "/+" "\"" ")" "]" "1797gR}F#j" "1ORfdr" "<1+UE'
1'VA" "20wDClW" "2H01a" "49RF2`]" ".4K5o;N{" "*4q" "4X yCjCk+(" "'>+7" "\"`7s
m:{" "7WOW?/gq[c" "8k-L}3" "9qm.aeM6O1" "a!S!@!" "b2" "Bq`\"^G" "C:sLAo" "d" 
"dF" "-\"Do" "DPo" "#E" "&eb")

amacdougall21:02:02

And here are the compare results from Clojure:

(-27 12 13 -7 -52 44 -24 -11 10 -24 -2 6 4 -10 13 5 -21 -1 -1 -40 -1 32 -1 -33 -1 55 -23 33 -3)

amacdougall21:02:30

As you can see, they're all over the map. That's the result of running (fn [[a b]] (compare a b)) on a (partition 2 1) of the list of strings.

amacdougall21:02:37

i.e. a sliding 2-item window.

amacdougall21:02:51

Ah, I think I see what's up...

amacdougall21:02:03

PG is ignoring leading non-alphanumeric characters. That honestly doesn't seem like the way I'd do it.

amacdougall21:02:45

But if I am willing to just accept that, then I can at least test that my sorting was applied by doing the same thing in my Clojure unit test.

amacdougall21:02:22

...and this in turn has to do with the database's "collation" setting—a vague term that probably has a very precise meaning in PG-land. It's language/locale specific. There's a way of disabling it and doing an ASCII sort if I really want.

amacdougall21:02:40

At least I know what's up. Now I can choose the least bad way of dealing with it. ¯\(ツ)/¯

amacdougall21:02:39

Most likely I will use a custom comparator which applies the same prejudices as Postgres.

jonahbenton21:02:02

yes: https://docs.oracle.com/javase/tutorial/i18n/text/locale.html

amacdougall21:02:25

Thanks for the help!

amacdougall21:02:43

Sometimes I just need a little nudge out of my mental track.

jonahbenton21:02:03

you bet

richiardiandrea21:02:10

How do you guys handle integration test during dev to prod (aot) transitions? Is someone testing at a prod Repl for instance?

jonahbenton21:02:01

hey @richiardiandrea what kind of application?

richiardiandrea21:02:05

Websocket based backend

richiardiandrea21:02:48

@jonahbenton integration meaning testing the effect of a websocket call from the client, queued events and all that

jonahbenton21:02:48

right- but in a test environment, or production environment? IOW, you're talking about deploy-time validation?

jonahbenton21:02:55

in production?

richiardiandrea21:02:03

Yep

jonahbenton21:02:46

most of my apps have involved manual validation- at a prod repl or from the production client- but there were plans to encode system-level tests in a custom client, which would run from privileged infrastructure

richiardiandrea21:02:28

Yes that was my idea as well, so it was not that random :) thanks @jonahbenton

2016-02-13

Channels