Fork me on GitHub
#datomic
<
2021-05-07
>
danm09:05:23

Are there docs anywhere about the expected CPU use of queries vs transactions? Our current setup doesn't yet have query groups, and we're performing a lot more writes (i.e. transacts) than we are queries. I'm seeing CPU hitting 98+% on the transactors, and then everything falls over. I'm curious if creating a query group to offload the queries could/would drop CPU on the transactors a lot more than the ratio of queries/transacts would suggest, because maybe queries are a lot more CPU intensive?

danm10:05:00

Also, is there documentation anywhere on all the standard graphs on the Datomic Cloud dashboard? Like, TxBytes. Is that a per second average or an aggregate of all the data transmitted since the last datapoint? I'm assuming the latter, as changing the dashboard period, and therefore the interval between datapoints, alters the value significantly.

danieroux13:05:27

A wish question (I wish-and-hope-this-exists): Does anyone have something that allows me to edit a Datomic cloud database as a spreadsheet? Or as a simple CRUD app? We have a bunch of static information that we display to the internal users on Metabase - and they want to change the values they see.

mafcocinco14:05:34

In Datomic, what is the best-practice way to model this relationship: object A contains references (i.e. many instances) of object B and we want a field in object B to be unique within the context of object A. From the documentation, it does not seem like :db/unique (either with :db.unique/identity or :db.unique/value), by itself, is appropriate. Wondering how to correctly model this constraint within the Datomic Schema.

Joe Lane14:05:39

@U6SN41SJC Look into using :db.unique/identity tuples for this, either heterogeneous or composite. Also, depending on how many "many instances" is, maybe B should point to A ?

mafcocinco14:05:00

True. It doesnt matter which direction the index points and that would probably be easier.

Joe Lane14:05:26

How many is "many instances"? The answer to which direction it should go depends on the required selectivity of the access patterns. Again, all predicated on "many instance" 🙂

mafcocinco18:05:28

as a guess. A is an environment for our testing platform and B is the meta data for each service that will be tested in that environment. Our platform currently consists of ~8 services and I don’t see that number going up significantly.

Joe Lane19:05:43

Then performance doesn't matter here and you should do whatever is most convenient for you. That entire dataset will fit in memory, yay!

kenny17:05:39

Is there a way for me to know which Datomic Cloud query group node a client api request went to?

ghadi17:05:38

xy problem

ghadi17:05:08

groans "what are you actually trying to solve?"

💥 3
kenny17:05:51

Actually lol'ed 🙂 Knew this was coming.

Joe Lane17:05:03

I'm sensing a new precursor to "Everybody drink"

😆 6
kenny18:05:55

We are receiving ~20 datomic client timeouts all on the exact same d/pull call within a 3 minute window, which is surprising because that call doesn't actually pull that much data. I was curious if the node those client api requests went to was overwhelmed.

Joe Lane18:05:24

Check your dashboard, do you have any throttle events?

kenny18:05:47

Not at that time. The query is set to a 15s timeout and it's hitting that on every one of those calls.

Joe Lane18:05:58

I thought it was a pull?

kenny18:05:51

It's a query with a pull. e.g.,

(d/q {:query   '[:find (pull ?p [* {::props-v1/filter-set [*]}])
                 :where
                 [_ :customer/prop-group1s ?p]]
      :args    [db]
      :timeout 15000})

Joe Lane18:05:29

Were these against the same database?

kenny18:05:09

All but 2.

ghadi18:05:22

does that same exact pull call happen at other times of the day?

kenny18:05:57

That query will always return a seq of 3 maps with < 20 total datoms.

ghadi18:05:23

how long does it ordinarily take outside the problem window?

ghadi18:05:53

cool cool...

kenny18:05:53

avg maybe 50ms.

ghadi18:05:39

can you launch that pull concurrently (futures / threads) and reproduce the issue?

Joe Lane18:05:27

Try ^^ against a different QG of size 1 and look at its dashboard.

ghadi18:05:04

one of the lovable perks of infinite read scaling

ghadi18:05:37

that will at least tell you if the synchronicity is significant

Joe Lane18:05:40

Maybe your on-demand DDB table wasn’t provisioned for that demand?

kenny18:05:27

From looking at the query group dashboard, I can see that the group was overwhelmed at the time. min cpu of 99 & max of 100. There were only 2 nodes in the group. I also observe that at lease one other query resulted in 50.4k count. The overwhelmed system simply manifests itself in those frequency, but small queries. Thinking the fix is to scale the system up at the time of the 50.4k query. Separately, does the Query Result Counts graph show the number of datoms a query returns or something else?

Joe Lane18:05:19

That graph show the number of results not datoms. A result can be many datoms

kenny18:05:37

So if that query is pull'ing in the :find, it could actually be some scalar * the reported number?

Joe Lane18:05:10

Assuming all the results are uniform, yes, that many datoms would be returned. Datoms isn't really the right measurement here though.

kenny18:05:42

"that many" is scalar * reported number, assuming uniform?

Joe Lane18:05:05

If I know each pull returns exactly 3 datoms, then the returned datoms is: reported number * 3 = "that many datoms"

✔️ 2
kenny19:05:26

So I can reproduce the query result by calling count on the result of d/q?

kenny19:05:49

d/pull is not included then?

Joe Lane18:05:56

Instead of scaling the qg up, can you make a separate QG for that other query so they don't affect each other?

kenny18:05:19

Yes, that is an option. I'd like a bit more data on which queries are causing that huge result set. I have a couple ideas but need more data to know how to split. Why would you tend to prefer splitting over scaling?

Joe Lane18:05:06

Yep, but beyond that, these sound like different kinds of workloads.

kenny18:05:39

Yeah, they kind of are.

Joe Lane19:05:14

Is one of them a scheduled batch job? You can always spin the QG up just for that job 🙂

kenny19:05:02

Another option I've been considering is "filling out" my query group with spot instances. It's likely that would solve this problem as well, at a fraction of the cost.

Joe Lane19:05:59

"this problem" <- you know what I'm going to ask.

kenny19:05:49

Getting timeouts due to hitting peak capacity.

kenny19:05:19

e.g., cpu spikes to near 100, some small number of queries timeout, then the event is over.

Joe Lane19:05:33

> Getting timeouts due to hitting peak capacity ^^ That is a symptom, and we still don't know why it occurred do we? FWIW, a shorter timeout on your pulls with retry wrapped around it would also alleviate the above symptom because the request would (eventually, but how unlucky can you be?) be routed to a different node.

kenny19:05:39

Fair. My hypothesis is those 50.4k queries. I'm betting there are multiple of them.

kenny19:05:20

& there's only 2 nodes in the group at the event time. So if both nodes are processing 1+ 50.4k queries, perhaps pretty unlucky.

Joe Lane19:05:16

So there are only 2 nodes in the QG and there are 2 queries returning 50.4k results being issued at the same time?

kenny19:05:25

I don't know for certain since I don't have that data instrumented right now but, yes it is likely. There's up to 5 queries that could all run in the same 10s window that are of that size.

kenny23:05:27

Datomic Cloud currently uses the older launch configuration setup in creating ASGs so a mixed group of Spot & On-Demand is not possible 😢 I created a feature request here: https://ask.datomic.com/index.php/607/use-launch-template-instead-of-launch-configuration.