We have a system at work which uses honeysql. It is time critical and even few ms increase is a huge burden. The version of honeysql we use is quite old(v0.9.4) and I see the caching was introduced in later versions according to the https://github.com/seancorfield/honeysql/pull/378 PR. But still I see that the first query creation takes lot more time(almost 10 - 15x) than the following ones and it just flattens. Was there any sort of mechanism to cache queries earlier as well? Our queries are usually identical so I believe this spike in query creation would only be when the deployment is done and honeysql freshly starts caching the queries. Because of the throughput this would be less than few minutes before all pods catch up. Would love to learn more about this. Thanks.
@seancorfield I do understand that complex queries will take time to format, but I’d like to see a query where the query formatting time was significant in relation to the query time.
Oh, I doubt there are many queries where the formatting time is really the bottle neck. I just took the OP's question at face value.
What would be the best way to measure this? I believe the queries would not be bottleneck as it is simple to select a few columns with a couple of where in clauses. Just want to know what would be the best way to benchmark this.
Sounds like you should run format for those queries at startup so they are precalculated and in the cache by the time your "time critical" code runs?
That would be right yes. We can just do a format once with random values and it would be able to cache or is there any other approach? But even then I would be fine with those spikes at the start since we are initialising a lot of clients. Few requests get timed out anyways. Just wanted to understand if this caching was already there even before that caching formatter PR.
Ah, I misunderstood your question. No, there was no caching at all prior to that change. And after that change, if you provided a cache object to format, the first parse would populate the cache so subsequent parses could be skipped.
(and you have to use :params so that the query DSL is identical on each format call -- which also means you can't use :in and a few other things that break caching)
If your queries are identical (and time critical), perhaps using plain SQL strings instead of HoneySQL might be a better choice anyway? Or perhaps explicitly calling format with dummy values at startup and explicitly caching the SQL string yourself?
not knowing the domain at all, but it would seem strange to me that generating the sql string from honey would be significant?
If it were, couldn’t you then also make some macrology that turned honey into sql strings (at compile time) which you could then feed to a prepared statement?
@slipset It would depend on how complex the DSL for the query was. Although we don't use the caching at work, several people have complained about the DSL parsing overhead for performance critical code so...
> (and you have to use :params so that the query DSL is identical on each format call -- which also means you can't use :in and a few other things that break caching) > We are using an old version and so we don't pass this :params for caching and all the values are inside the data structures itself. This query uses :in operators at multiple places. Now I'm wondering how it is able to memorize/cache without any explicit mechanism in place 😅
It isn't. You're likely just seeing other effects of the JVM optimizing frequently executed code paths and the JIT system...
Nice. I think it's good to replace the query with just plain SQL as well for this one critical query. This critical flow only has one db call.