Fork me on GitHub
#datomic
<
2020-06-22
>
steveb8n06:06:43

Q: I’d like to hear peoples tip/tricks for tuning queries from prod metrics. I know I’m going to need this so I’ll start with what I’m planning and what I wish existing….

steveb8n06:06:48

to start I’m gonna keep a metric for every distinct query run, time to execute, number of entities returned etc. this should give me a good signal for poorly tuned queries

steveb8n06:06:57

Since bad queries are often due to poorly ordered where clauses, I wonder if there is a way to include total number of entities scanned? comparing this to number returned would be a really strong signal

👍 3
steveb8n06:06:04

any other tricks?

steveb8n08:06:47

I’ve also been pondering an auto-tune capability. if you took every query and ran it with one :where clause at a time and sorted by the result counts, that should give the best where ordering for prod data distribution. only problem is these queries would consume massive CPU so would need a separate query group.

Joe Lane11:06:48

@steveb8n 1. Make a wrapper namespace for the query namespace and add the instrumentation (timing, cardinality, etc) there. I've seen projects which ship up to honeycomb using a homegrown lib, but the concept is generic. I HIGHLY RECOMMEND creating some notion of trace-id chaining, either via a hierarchy or some other means (e.g. request creates trace-id 123, which is the parent of trace-id 234 created by a query and then a sibling trace-id 345 is made to do some side-effect, all with their own instrumentation numbers that can also be rolled up). It's extremely valuable to see the whole lifecycle of a request, including all the queries and external calls it performs (datomic or otherwise) 2. I think I remember you being on cloud, so another thing to think about is client-cloud vs ions. They each have different tradeoffs but with ions you get the locality advantage. 3. I don't know of any way to include the number of entities scanned other than re-running the query a few times building up the clauses and magically knowing how to replace the :find clause args with a call to count . That being said, if your queries are fairly static (vs dynamically computed from a client request) you could probably build a tool to accomplish this. (d/db-stats db) is your friend here. Also, there is this tool which may be sufficient, or, at least a great starting point for your "auto-tuner". 4. Try to avoid using massive pull-patterns in your :find clauses. Pull's do joins like :where clauses do, but can have subtle and confusing performance semantics, especially when the data cardinalities change out from under you (like in high traffic production environments). 5. Look at some of the new query tools in the latest release such as qseq and :xform in pull. Those are the first 5 off the top of my head, LMK if you want to go deeper on any of them.

💯 3
steveb8n22:06:25

Thanks @U0CJ19XAM this is good stuff. I already have a middleware layer in front of all db calls. it’s a successor to https://github.com/stevebuik/ns-clone

steveb8n22:06:12

I also use x-ray so I have the some of the tools you mention in place. reading that article has given me some ideas though.

steveb8n22:06:03

ultimately, it’s exceptional queries I want to see, not all of them. so my “signal” vs noise is what I’m currently focused on.

steveb8n22:06:39

I didn’t know about the large pull behaviour. I am doing this so I’ll dig deeper there. Thanks.

Joe Lane11:06:48

@steveb8n 1. Make a wrapper namespace for the query namespace and add the instrumentation (timing, cardinality, etc) there. I've seen projects which ship up to honeycomb using a homegrown lib, but the concept is generic. I HIGHLY RECOMMEND creating some notion of trace-id chaining, either via a hierarchy or some other means (e.g. request creates trace-id 123, which is the parent of trace-id 234 created by a query and then a sibling trace-id 345 is made to do some side-effect, all with their own instrumentation numbers that can also be rolled up). It's extremely valuable to see the whole lifecycle of a request, including all the queries and external calls it performs (datomic or otherwise) 2. I think I remember you being on cloud, so another thing to think about is client-cloud vs ions. They each have different tradeoffs but with ions you get the locality advantage. 3. I don't know of any way to include the number of entities scanned other than re-running the query a few times building up the clauses and magically knowing how to replace the :find clause args with a call to count . That being said, if your queries are fairly static (vs dynamically computed from a client request) you could probably build a tool to accomplish this. (d/db-stats db) is your friend here. Also, there is this tool which may be sufficient, or, at least a great starting point for your "auto-tuner". 4. Try to avoid using massive pull-patterns in your :find clauses. Pull's do joins like :where clauses do, but can have subtle and confusing performance semantics, especially when the data cardinalities change out from under you (like in high traffic production environments). 5. Look at some of the new query tools in the latest release such as qseq and :xform in pull. Those are the first 5 off the top of my head, LMK if you want to go deeper on any of them.

💯 3
Lone Ranger16:06:27

does anyone know if mariadb is a supported persistence solution?

ghadi17:06:28

> If you want to use a different SQL server, you'll need to mimic the table and schema from one of the included databases.

ghadi17:06:40

The mysql one should work for maria, I think

Lone Ranger17:06:57

much appreciated 🙇

Lone Ranger17:06:58

and I suppose that the sql-url and sql-driver-class attributes in the config will inform the transactor of the correct jar to load, and that I should do the same for the peer?

ghadi17:06:56

not sure, but I think if you have the correct url & the jar on the classpath, it will auto discover the correct class

👍 3