This page is not created by, affiliated with, or supported by Slack Technologies, Inc.
2020-06-22
Channels
- # babashka (36)
- # beginners (42)
- # calva (6)
- # chlorine-clover (25)
- # cider (31)
- # clara (5)
- # clj-kondo (55)
- # cljdoc (3)
- # cljs-dev (7)
- # cljsrn (3)
- # clojure (73)
- # clojure-brasil (6)
- # clojure-europe (8)
- # clojure-italy (2)
- # clojure-nl (3)
- # clojure-norway (1)
- # clojure-spec (3)
- # clojure-sweden (4)
- # clojure-switzerland (2)
- # clojure-uk (29)
- # clojurescript (93)
- # conjure (21)
- # data-science (14)
- # datomic (19)
- # emacs (4)
- # exercism (3)
- # figwheel-main (38)
- # fulcro (38)
- # graalvm (42)
- # graphql (5)
- # jackdaw (3)
- # jobs (1)
- # joker (2)
- # lambdaisland (1)
- # leiningen (31)
- # malli (8)
- # meander (5)
- # off-topic (27)
- # pathom (2)
- # pedestal (28)
- # re-frame (25)
- # reagent (2)
- # reitit (11)
- # releases (3)
- # remote-jobs (1)
- # rum (1)
- # shadow-cljs (63)
- # spacemacs (17)
- # sql (1)
Q: I’d like to hear peoples tip/tricks for tuning queries from prod metrics. I know I’m going to need this so I’ll start with what I’m planning and what I wish existing….
to start I’m gonna keep a metric for every distinct query run, time to execute, number of entities returned etc. this should give me a good signal for poorly tuned queries
Since bad queries are often due to poorly ordered where clauses, I wonder if there is a way to include total number of entities scanned? comparing this to number returned would be a really strong signal
I’ve also been pondering an auto-tune capability. if you took every query and ran it with one :where clause at a time and sorted by the result counts, that should give the best where ordering for prod data distribution. only problem is these queries would consume massive CPU so would need a separate query group.
@steveb8n
1. Make a wrapper namespace for the query namespace and add the instrumentation (timing, cardinality, etc) there. I've seen projects which ship up to honeycomb using a homegrown lib, but the concept is generic. I HIGHLY RECOMMEND creating some notion of trace-id chaining, either via a hierarchy or some other means (e.g. request creates trace-id 123
, which is the parent of trace-id 234
created by a query and then a sibling trace-id 345
is made to do some side-effect, all with their own instrumentation numbers that can also be rolled up). It's extremely valuable to see the whole lifecycle of a request, including all the queries and external calls it performs (datomic or otherwise)
2. I think I remember you being on cloud, so another thing to think about is client-cloud vs ions. They each have different tradeoffs but with ions you get the locality advantage.
3. I don't know of any way to include the number of entities scanned other than re-running the query a few times building up the clauses and magically knowing how to replace the :find
clause args with a call to count
. That being said, if your queries are fairly static (vs dynamically computed from a client request) you could probably build a tool to accomplish this. (d/db-stats db)
is your friend here. Also, there is this tool which may be sufficient, or, at least a great starting point for your "auto-tuner".
4. Try to avoid using massive pull-patterns in your :find
clauses. Pull's do joins like :where
clauses do, but can have subtle and confusing performance semantics, especially when the data cardinalities change out from under you (like in high traffic production environments).
5. Look at some of the new query tools in the latest release such as qseq
and :xform
in pull.
Those are the first 5 off the top of my head, LMK if you want to go deeper on any of them.
Thanks @U0CJ19XAM this is good stuff. I already have a middleware layer in front of all db calls. it’s a successor to https://github.com/stevebuik/ns-clone
I also use x-ray so I have the some of the tools you mention in place. reading that article has given me some ideas though.
ultimately, it’s exceptional queries I want to see, not all of them. so my “signal” vs noise is what I’m currently focused on.
I didn’t know about the large pull behaviour. I am doing this so I’ll dig deeper there. Thanks.
@steveb8n
1. Make a wrapper namespace for the query namespace and add the instrumentation (timing, cardinality, etc) there. I've seen projects which ship up to honeycomb using a homegrown lib, but the concept is generic. I HIGHLY RECOMMEND creating some notion of trace-id chaining, either via a hierarchy or some other means (e.g. request creates trace-id 123
, which is the parent of trace-id 234
created by a query and then a sibling trace-id 345
is made to do some side-effect, all with their own instrumentation numbers that can also be rolled up). It's extremely valuable to see the whole lifecycle of a request, including all the queries and external calls it performs (datomic or otherwise)
2. I think I remember you being on cloud, so another thing to think about is client-cloud vs ions. They each have different tradeoffs but with ions you get the locality advantage.
3. I don't know of any way to include the number of entities scanned other than re-running the query a few times building up the clauses and magically knowing how to replace the :find
clause args with a call to count
. That being said, if your queries are fairly static (vs dynamically computed from a client request) you could probably build a tool to accomplish this. (d/db-stats db)
is your friend here. Also, there is this tool which may be sufficient, or, at least a great starting point for your "auto-tuner".
4. Try to avoid using massive pull-patterns in your :find
clauses. Pull's do joins like :where
clauses do, but can have subtle and confusing performance semantics, especially when the data cardinalities change out from under you (like in high traffic production environments).
5. Look at some of the new query tools in the latest release such as qseq
and :xform
in pull.
Those are the first 5 off the top of my head, LMK if you want to go deeper on any of them.
does anyone know if mariadb is a supported persistence solution?
> If you want to use a different SQL server, you'll need to mimic the table and schema from one of the included databases.
much appreciated 🙇
and I suppose that the sql-url
and sql-driver-class
attributes in the config will inform the transactor of the correct jar to load, and that I should do the same for the peer?