Fork me on GitHub
#xtdb
<
2021-11-11
>
Jacob O'Bryant05:11:41

I've been running a single node in production for a while, using jdbc for tx log + doc store and rocksdb for index. I'm trying to add a second node now, but I can't get it to finish indexing. start-node returns successfully, but after that, it (slowly) indexes only a handful of transactions and then stops. e.g. immediately after (start-node ...), I ran (latest-completed-tx node) in a loop every 5 seconds and it gave this:

#:xtdb.api{:tx-time #inst "2021-07-15T17:28:45.420-00:00", :tx-id 188852}
#:xtdb.api{:tx-time #inst "2021-07-15T17:28:45.420-00:00", :tx-id 188852}
#:xtdb.api{:tx-time #inst "2021-07-15T17:28:45.420-00:00", :tx-id 188852}
#:xtdb.api{:tx-time #inst "2021-07-15T17:28:45.420-00:00", :tx-id 188852}
#:xtdb.api{:tx-time #inst "2021-07-15T17:28:45.420-00:00", :tx-id 188852}
#:xtdb.api{:tx-time #inst "2021-07-15T17:28:45.420-00:00", :tx-id 188852}
#:xtdb.api{:tx-time #inst "2021-07-15T17:28:48.148-00:00", :tx-id 189353}
#:xtdb.api{:tx-time #inst "2021-07-15T17:28:55.262-00:00", :tx-id 190856}
#:xtdb.api{:tx-time #inst "2021-07-15T17:28:55.262-00:00", :tx-id 190856}
#:xtdb.api{:tx-time #inst "2021-07-15T17:28:55.262-00:00", :tx-id 190856}
#:xtdb.api{:tx-time #inst "2021-07-15T17:28:55.262-00:00", :tx-id 190856}
#:xtdb.api{:tx-time #inst "2021-07-15T17:28:55.262-00:00", :tx-id 190856}
#:xtdb.api{:tx-time #inst "2021-07-15T17:28:58.229-00:00", :tx-id 191357}
#:xtdb.api{:tx-time #inst "2021-07-15T17:28:58.229-00:00", :tx-id 191357}
#:xtdb.api{:tx-time #inst "2021-07-15T17:28:58.229-00:00", :tx-id 191357}
... [no changes after this even after an hour]
No exceptions. I did get an OOM error one time, but I increased the instance size and haven't gotten it again. The old node is on 1.17.1, the new one is on 1.19.0 (I tried with 1.17.1 also just in case, but same results). Any advice?

Jacob O'Bryant05:11:47

For now I've set it up to start indexing on startup and then restart the jvm if it goes too long without a change in latest-completed-tx. Actually it seems to be not be getting stuck now at least, though it still seems pretty slow. We're up to 2021-07-26 now, maybe it'll be done by tomorrow...

Jacob O'Bryant06:11:03

eh--it seems to be going reasonably fast now I suppose; looks like it'll be done within an hour. I'm still curious to understand why it stalled out on all my previous attempts though. The beginning of the tx log consists of a bunch of large transactions (1000 docs each) because I was migrating from a previous system. I wonder if xt was just choking on that? maybe it's working now because after several iterations of start-node => restart jvm, it managed to get past the migration transactions. In the future maybe a smaller batch size would help?

tatut11:11:05

sorry if I'm hijacking as I don't know how to help... just wanted to ask that I assume you are using persistent disks for the nodes and no checkpointing?

tatut11:11:46

with ephemeral server instances, the tx replay becomes too slow for startup pretty quickly and checkpointing has been essential

Jacob O'Bryant15:11:01

yes, persistent disks and no checkpointing

refset23:11:23

hey, sorry for the delay - these are shots in the dark, but worth ruling in/out: 1) are you using Lucene? 2) have you ever added transactions with large numbers of evict ops?

Jacob O'Bryant01:11:43

No worries. Nope--not using Lucene, and haven't done any evictions.

👍 1
refset07:11:04

How much RAM is there and how much have you allocated for the JVM?

refset15:11:38

> The beginning of the tx log consists of a bunch of large transactions (1000 docs each) because I was migrating from a previous system. I wonder if xt was just choking on that? 1000 ops per tx is actually a pretty typical batch size we use when moving data around, and really shouldn't be problematic, with the exception of eviction ops where I'm aware there is something not quite right happening during bulk ops https://github.com/xtdb/xtdb/issues/1509

👍 1
Jacob O'Bryant17:11:44

At first it was 2GB, then I resized to 4GB. Default settings for JVM (whatever those are). Is there a recommended minimum for RAM?

refset17:11:15

assuming you're using Rocks/LMDB, leave a good chunk aside outside the JVM, for native allocations. 2GB should be plenty for everything though (from a tx ingest perspective, anyway) as long as Rocks/LMDB has a few hundred MBs to play with

👍 1