2023-07-17 honeysql | Clojure Slack Archive

honeysql 2023-07-17

whiterose 2023-07-17T19:06:21.726709Z

We have a system at work which uses honeysql. It is time critical and even few ms increase is a huge burden. The version of honeysql we use is quite old(v0.9.4) and I see the caching was introduced in later versions according to the https://github.com/seancorfield/honeysql/pull/378 PR. But still I see that the first query creation takes lot more time(almost 10 - 15x) than the following ones and it just flattens. Was there any sort of mechanism to cache queries earlier as well? Our queries are usually identical so I believe this spike in query creation would only be when the deployment is done and honeysql freshly starts caching the queries. Because of the throughput this would be less than few minutes before all pods catch up. Would love to learn more about this. Thanks.

slipset 2023-07-18T12:04:58.692249Z

@seancorfield I do understand that complex queries will take time to format, but I’d like to see a query where the query formatting time was significant in relation to the query time.

seancorfield 2023-07-18T16:15:32.783599Z

Oh, I doubt there are many queries where the formatting time is really the bottle neck. I just took the OP's question at face value.

whiterose 2023-07-20T20:32:46.183979Z

What would be the best way to measure this? I believe the queries would not be bottleneck as it is simple to select a few columns with a couple of where in clauses. Just want to know what would be the best way to benchmark this.

seancorfield 2023-07-17T19:15:34.487029Z

Sounds like you should run format for those queries at startup so they are precalculated and in the cache by the time your "time critical" code runs?

whiterose 2023-07-17T19:21:08.620879Z

That would be right yes. We can just do a format once with random values and it would be able to cache or is there any other approach? But even then I would be fine with those spikes at the start since we are initialising a lot of clients. Few requests get timed out anyways. Just wanted to understand if this caching was already there even before that caching formatter PR.

seancorfield 2023-07-17T19:48:40.262049Z

Ah, I misunderstood your question. No, there was no caching at all prior to that change. And after that change, if you provided a cache object to format, the first parse would populate the cache so subsequent parses could be skipped.

seancorfield 2023-07-17T19:51:03.159369Z

(and you have to use :params so that the query DSL is identical on each format call -- which also means you can't use :in and a few other things that break caching)

seancorfield 2023-07-17T19:52:47.479989Z

If your queries are identical (and time critical), perhaps using plain SQL strings instead of HoneySQL might be a better choice anyway? Or perhaps explicitly calling format with dummy values at startup and explicitly caching the SQL string yourself?

slipset 2023-07-17T20:28:08.026759Z

not knowing the domain at all, but it would seem strange to me that generating the sql string from honey would be significant?

slipset 2023-07-17T20:29:28.254089Z

If it were, couldn’t you then also make some macrology that turned honey into sql strings (at compile time) which you could then feed to a prepared statement?

seancorfield 2023-07-17T22:40:59.350059Z

@slipset It would depend on how complex the DSL for the query was. Although we don't use the caching at work, several people have complained about the DSL parsing overhead for performance critical code so...

whiterose 2023-07-18T05:33:45.610829Z

> (and you have to use :params so that the query DSL is identical on each format call -- which also means you can't use :in and a few other things that break caching) > We are using an old version and so we don't pass this :params for caching and all the values are inside the data structures itself. This query uses :in operators at multiple places. Now I'm wondering how it is able to memorize/cache without any explicit mechanism in place 😅

seancorfield 2023-07-18T05:35:49.780859Z

It isn't. You're likely just seeing other effects of the JVM optimizing frequently executed code paths and the JIT system...

whiterose 2023-07-18T06:13:39.527879Z

Nice. I think it's good to replace the query with just plain SQL as well for this one critical query. This critical flow only has one db call.

Clojurians Log v2

honeysql 2023-07-17