Fork me on GitHub
#datomic
<
2023-02-03
>
jdkealy12:02:35

I'm doing load testing on my app with JMETER. Datomic seems to be the bottleneck. I can do 1000 concurrent users no problem on an endpoint that doesn't use datomic, but on a single instance, it seems any query that hits datomic peaks out around 30. By comparison, I tested another python app that uses postgres, and also can handle 1000 concurrent. What should I be looking for to enhance this ?

jdkealy12:02:36

I'm monitoring RAM usage, and it never gets close to 100%. When I give it 5 CPU's, it will get pretty close. My team doesn't want to allow me 5CPU's due to budget requirements. Even with 5 CPU's it can't hit 1000k concurrent. One thing I've done quite a bit is re-establish DB connections instead of using the same connection. It will take quite a bit of time to ensure I'm using the same DB connection on every request, I know that's best practice, but I would only like to take on things that will affect load right now. Would that help ? Any other things to monitor ?

favila13:02:17

Just broad advice: you should provision a datomic peer similarity to how you would provision a read-only db replica (vs a web application), because in a sense that’s what it is.

favila13:02:49

important metrics to look at are objectcache and memcached hitrates

favila13:02:40

If the hit rates are low (esp objectcache), you are paying significant IO and cpu cost

jdkealy13:02:57

how do i get those metrics ?

jdkealy13:02:02

and what's low ?

favila13:02:14

If you have extra Jvm headroom, consider increasing the bias towards objectcache. (Default is half xmx)

jdkealy13:02:10

i'm giving it 4GB of Ram and my dynamo table is only 1.5GB

jdkealy13:02:15

so wouldn't half more than cover it ?

favila13:02:58

Object cache is decoded objects, so they are much much larger

favila13:02:22

The stuff in dynamo and memcached is fressian-encoded and gzipped

favila13:02:06

If your objectcache hit rate is <99%, then you k ow that it is a bottleneck

favila13:02:38

Because it has to go to memcached or storage at least some of the time

jdkealy13:02:30

so if i have 4GB of RAM, I might consider giving it 3GB ?

jdkealy13:02:48

to objectcache

favila13:02:18

Depends on what your application and query code needs. You can probably give it at least half the headroom you see

favila13:02:43

But really you should look at the metrics to make sure OC hitrates are even a problem

favila13:02:05

If it’s already 100%, obviously more won’t help

jdkealy13:02:34

Is there a way to get an overview of the OC hitrate or do you need to do it query by query like this

(d/q {:query '[:find (count ?a)
                   :in $ ?aname
                   :where [?a :artist/name ?aname]]
          :args [db "The Beatles"]
          :io-context :artist/by-name})

favila13:02:27

that io-stat stuff is brand new. The older mechanism is described there

favila13:02:44

and it is the behavior of the peer as a whole

👍 2
favila13:02:03

The metrics are also logged in the datomic.process logger iirc

favila13:02:37

but you will want to use a callback to get it in your monitoring stack in production eventually

jdkealy13:02:07

ok, so i could just (println request) in the callback handler

jdkealy13:02:20

and tail the log and watch

jdkealy13:02:26

very cool thanks

favila13:02:45

no I mean the java logging of datomic

favila13:02:57

I’ll look up an example

favila13:02:38

Here’s one

2023-02-03 13:01:58.094 76941657 [Datomic Metrics Reporter] INFO  datomic.process-monitor - {:tid 29, :CacheRepair {:lo 1, :hi 1, :sum 51, :count 51}, :MemcachedGetTimeoutMsec {:lo 16.55, :hi 16.55, :sum 16.55, :count 1}, :ObjectCacheCount 12945, :PeerAcceptNewMsec {:lo 0, :hi 0, :sum 0, :count 448}, :MemcachedGetFailedMsec {:lo 0.43, :hi 65.85, :sum 152.07, :count 51}, :MemcachedPutSucceededMsec {:lo 0.52, :hi 10.43, :sum 83.29, :count 51}, :AvailableMB 850.0, :Memcache {:lo 0, :hi 1, :sum 1258, :count 1309}, :MemcachedGetSucceededMsec {:lo 0.38, :hi 30.44, :sum 1327.46, :count 1258}, :StorageGetMsec {:lo 4, :hi 15, :sum 370, :count 51}, :ReaderCachePut {:lo 1, :hi 1, :sum 51, :count 51}, :pid 4920, :event :metrics, :ObjectCache {:lo 0, :hi 1, :sum 9004, :count 10389}, :MemcachedGetMissedMsec {:lo 0.43, :hi 3.21, :sum 36.24, :count 48}, :MetricsReport {:lo 1, :hi 1, :sum 1, :count 1}, :PeerFulltextBatch {:lo 1, :hi 2, :sum 448, :count 445}, :StorageGetBytes {:lo 7448, :hi 79049, :sum 1736271, :count 51}, :DirLoads {:lo 1, :hi 1, :sum 24, :count 24}}

jdkealy13:02:40

Oh i thought I could do

-Ddatomic.metricsCallback=db/stats-handler 

(defn stats-handler [request] 
  (pritnln request) 
)

jdkealy13:02:07

that looks great that's what i need

favila13:02:13

oh, I didn’t understand what you meant by “request handler”

favila13:02:16

yeah, you can do that

favila13:02:48

the reporting interval is like once a minute

Joe Lane13:02:17

Are you using mbrainz in your load test?

jdkealy13:02:49

That’s like the fake dataset ?

Joe Lane13:02:58

Is the server side work of the load test pure query?

Joe Lane13:02:30

The query you posted above looks like mbrainz

jdkealy13:02:46

Oh that was copy pasted from docs

Joe Lane14:02:39

What are the io-stats, query-stats, and process monitor metrics for your queries during the jmeter load test?

jdkealy14:02:23

i haven't set up the i/o stats yet. doing that now

jdkealy16:02:06

{:PeerAcceptNewMsec {:lo 0, :hi 0, :sum 0, :count 7}, :MetricsReport {:lo 1, :hi 1, :sum 1, :count 1}, :PeerFulltextBatch {:lo 1, :hi 1, :sum 7, :count 7}, :AvailableMB 2970.0, :ObjectCacheCount 523}

jdkealy16:02:40

My report has far less data than yours @U09R86PA4

favila16:02:13

It doesn’t report 0

favila16:02:18

so this looks like an idle machine

jdkealy16:02:25

it's def not idle... i can see web traffic coming in

Joe Lane16:02:04

Do you have any io-stats for your query ?

Joe Lane16:02:20

(What is your query?)

jdkealy16:02:23

no, this was just the result of

-Ddatomic.metricsCallback=lms.db/stats-handler

jdkealy16:02:45

(defn stats-handler [request]
    (println request))

fmnoise17:02:05

I have an attribute in schema

{:db/ident       :company/slug
 :db/valueType   :db.type/string
 :db/unique      :db.unique/identity
 :db/cardinality :db.cardinality/many}
but then I found this in docs
Only (:db.cardinality/one) attributes can be unique
but this seems to work fine, so maybe documentation is outdated or there could be issues with indexing? datomic on-prem 0.9.6045