This page is not created by, affiliated with, or supported by Slack Technologies, Inc.
2022-12-15
Channels
- # adventofcode (80)
- # beginners (94)
- # biff (19)
- # cider (74)
- # clj-kondo (11)
- # cljs-dev (7)
- # clojure (110)
- # clojure-austin (3)
- # clojure-australia (1)
- # clojure-belgium (1)
- # clojure-china (1)
- # clojure-europe (83)
- # clojure-filipino (1)
- # clojure-hk (1)
- # clojure-indonesia (1)
- # clojure-japan (1)
- # clojure-korea (1)
- # clojure-my (1)
- # clojure-nl (1)
- # clojure-norway (4)
- # clojure-sg (1)
- # clojure-taiwan (1)
- # clojure-uk (2)
- # cursive (3)
- # data-science (8)
- # datalevin (8)
- # emacs (18)
- # etaoin (5)
- # graalvm (1)
- # holy-lambda (3)
- # honeysql (1)
- # jackdaw (10)
- # java (10)
- # jobs (3)
- # luminus (9)
- # malli (106)
- # off-topic (88)
- # polylith (8)
- # portal (2)
- # re-frame (50)
- # reagent (11)
- # reitit (74)
- # remote-jobs (1)
- # shadow-cljs (46)
- # tools-deps (26)
- # xtdb (49)
Bah. Startup time to synched has increased 50% in couple of weeks, as data amounts have increased. I guess I have to start looking at checkpointing
What kind of startup times you are seeing with checkpoints, with which checkpoint store and number of documents?
in AWS in the long run you need to configure the S3 client because the checkpoint is so huge it times out downloading it 🙈
I don’t have access to prod, but it should be just the time it takes to download latest checkpoint and run anything after that… so depends on your checkpointing frequency
I wonder if the local, in process xtdb node is tenable after all, or do we have to switch to three tiers: golden stores / xtdb node / application code, where only the last is updated often. Have to say that I've been surprised how slow the startup is, considering somewhat low amounts of documents. In production we should have thousands, if not tens of thousands time more data than what we are testing with now.
you should get quite far with checkpoints, but I have been thinking that ephemeral nodes are not the best fit
something like a permanent ec2 machine where you upgrade by just updating a new uberjar for the application code would likely be better and incur no startup cost
you can still have checkpoints on top of that to support scaling up without starting from empty
Yeah, sure. Just wouldn't want to maintain even the VM OS, if possible. With the more common two tier SQL / application code it would be pretty simple
I started a discussion on https://discuss.xtdb.com/t/what-kind-of-startup-times-you-are-seeing-with-out-checkpoints/127 too
I guess we should test the amount of data bytes too, but given how slow the transmission from the SQL to the node is (100kBps peaks), I'm guessing that it's more about the number of documents, rather than how big they are
Though perhaps if the data is split into lots and lots of small keyvalues, all indexed, it might not matter
Hey, @U8ZQ1J1RR in relation to "given how slow the transmission from the SQL to the node is" ...are you using 1.22.1
? Or something older? If you can create and send a flamegraph (e.g. using YourKit or https://github.com/clojure-goes-fast/clj-async-profiler) of what the node is doing during that replaying it would help analyse what the real bottleneck is
.0. I remember reading that .1 is some 40% faster in some bulk ops, but I thought that it doesn't change the scaling. That is, thousand times more documents would still take thousand times as much, so even doubling the speed won't help here. Upgrading to .1 is one thing to I listed to try, in any case
.0 should be the ~current performance (ignoring the RC we're about to put out based on the new master
), .1 was just bug-fix release really so I wouldn't expect any performance difference
Ah, true, we are still at 1.20.0, not 1.21.1. Shouldn't have tried to be clever and save 4 chars
> cache-size
(int): Size of the cache in bytes - default size is 8Mb, although it is https://github.com/facebook/rocksdb/wiki/Setup-Options-and-Basic-Tuning#block-cache-size this is set to a higher amount.
Aha, gotta try that
Locally raising it from the default to 1GB didn't do anything to the startup time. It is further away from the Postgres, but still only about 25% slower than in the same data center. Have you seen changes with the cache size? Or any other setting?
Hm, Looking at the config options again, I think I didn't write them properly. I think it won't complain about config being out of spec, just ignores it
@U899JBRPF That flamegraph btw is really tall. Perhaps best to pass you the whole html file instead of just a screenshot?
Also a significant part is just under the jvm.so. I suppose debug symbols might help with that. Never had to go that deep with flamegraphs myself
Well, took the liberty of sending you that file already. Debug symbols indeed helper. On a high level, third of the time is spent in index_tx_events, significant part in InFlightTx.commit, for some reason small but still quite visible part in nrepl main loop, and what looks to my untrained eye quite a lot of time in JIT compilation and garbage collection. Especially the JIT part seems perplexing, when we are talking about a runtime of four minutes. It shouldn't be tens of percents. Perhaps I'm just interpreting it wrong (CompileBroker::compiler_thread_loop())
we have 512mb of cache-size, and I’m thinking I should increase it still, the default is way too low
@U8ZQ1J1RR thanks for sending. I can already see that the upgrade to 1.22.1 or the https://repo1.maven.org/maven2/com/xtdb/xtdb-core/1.22.2-rc1/ will have some improvement, based on where some of the time is spent in your profile. I agree the JIT part is interesting, I'll review internally and get back to you
> we have 512mb of cache-size, and I’m thinking I should increase it still, the default is way too low Agreed the default is conservative for many realistic workloads - we are looking to improve memory & cache handling in the future so that the defaults can be safely raised/lifted...but it's a complex problem due to the number of caches and the way memory is allocated both initially and incrementally. @U0GE2S1NH has been looking into what can be done here fairly recently
@U8ZQ1J1RR I would be interested to see another graph after a bump to 1.22.2-rc1, would that be possible?
Sure! Not sure if I can manage to make a reasonable diff graph, but at least a standalone graph
Sorry, had a meeting. The startup time fell to some 40%. Much better, but still almost two minutes. The flamegraph is very different. Clojure protocols, into, reduces, transducers etc are not really visible, and rocksdb itself becomes visible
I also sent @U0GE2S1NH the html with the flamegraph for interactive viewing
Heh, commits from yesterday seem to say pretty much what I just noticed (avoid protocol dispatch, avoid boxing etc)
So, after all the tuning tips, the startup time went from four minutes to one. The biggest change by far was trying 1.22.2-rc1. After that increasing the block cache size to 64MB helped too, but nothing bigger had an impact. Using the new :enable-filters? to use rocksdb bloom filters shaved another 14%. And increasing the initial heap size, though I still need to test more thoroughly (I just gave Java loads of more max and initial heap, and stack memory)
Still, it is a minute. So while 4× speed is great, it won't help when the data is hundred times larger. So checkpoints are still needed, or some other way of taking the rocksdb indices to the starting nodes.