Fork me on GitHub

I have problem starting XTDB in ECS from a checkpoint... I see logs that it is restoring from checkpoint, but then it hangs before returning from xt/start-node ... I don't know if it is just slow. I have grace period set to 300 seconds and it doesn't start in that time


{ "context": "default", "level": "INFO", "logger": "xtdb.tx", "message": "Started tx-ingester", "thread": "main", "timestamp": "2021-11-10T06:52:16.430Z" }
{ "context": "default", "level": "DEBUG", "logger": "xtdb.hash", "message": "Using libgcrypt for ID hashing.", "thread": "main", "timestamp": "2021-11-10T06:52:17.578Z" }
{ "context": "default", "level": "DEBUG", "logger": "xtdb.lucene", "message": "Committing Lucene IndexWriter...", "thread": "xtdb-lucene-fsync-1", "timestamp": "2021-11-10T06:54:18.635Z" }
{ "context": "default", "level": "DEBUG", "logger": "xtdb.lucene", "message": "Committed Lucene IndexWriter.", "thread": "xtdb-lucene-fsync-1", "timestamp": "2021-11-10T06:54:18.635Z" }


those are the last logs, and 4 minutes later ECS just kills it and starts another, perhaps I'll just increase the startup time yet again


I tried doubling grace period to 10 minutes, still doesn't start in that time


I can verify that this happens locally as well, I configured my local env to use the same S3 checkpoint bucket and tx-log/doc store and starting up the node hangs after it has downloaded the checkpoint... no CPU usage to speak of so it isn't doing anything intensive at least


the LMDB index directory seems to be stable and the correct size, but the lucene folder seems to be fluctuating weirdly and doesn't seem to reach the size of the snapshot


A profiler would probably help here (YourKit is pretty great if you haven't tried it), but just to rule this out could you try increasing the Lucene refresh-frequency as per


looks like it grows a few megabytes and then goes back to a much smaller size


it doesn't look like lucene uses the checkpoint, the folder should be over 40M but it is only a few and growing slowly


can confirm that the lucene module must be the culprit here... I just tried simply commenting out the lucene module completely and startup continues immediately after the download is complete


the refresh-frequency didn't fix the lucene checkpoint stuff


the cp/try-restore is never called from lucene module


I'll see if adding that works


> the `cp/try-restore` is never called from lucene module oh, oops! I agree this looks to be missing, when compared to


added that locally, and startup is fast


should I do a PR?


increasing the refresh-refrequency almost certainly will help with the replay speed though


sure, please, that will put it firmly on the radar 🙂


thanks! I won't merge the PR myself now, but I will make sure it gets some brain cycles soon


Would you be okay to sign the CLA pdf and email it to us? Instructions here


probably fine, I'll look it over

👍 1

signed and emailed

🙏 1

there doesn't seem to be deps.edn files for the modules... it would be much easier to use forked fix versions as git deps without waiting for official release


noted, feel free to open an issue (not PR 😅) for that also - I'm not really sure what it would entail, but we do make rather extensive use of lein features currently. We can also publish a snapshot release very easily once the fix is merged in the meantime, if it helps at all


yeah, I added that in the PR, but I'll revert it... it wasn't as convenient tbh


as afaict you can't use git deps that are in some subfolder of a repo

Jacob O'Bryant01:11:07

You can by adding a :deps/root key


good to know, I didn't see that in the deps docs

Brandon Olivier20:11:23

Has anyone seen issues with xtdb, rocksdb, and last year’s M1 Mac? I keep getting java errors, and can’t find much info on google about it.

Brandon Olivier20:11:53

When I try to follow the in-memory tutorial I get this error

Caused by java.lang.UnsatisfiedLinkError
   'long org.rocksdb.LRUCache.newLRUCache(long, int, boolean, double)'

Brandon Olivier20:11:00

Maybe I’m too dumb to perceive it, but is there a workaround mentioned here?

R.A. Porter20:11:56

It's not obvious. I just clicked through to the RocksDB issue linked and it looks like the suggestion there (which is echoed on the above, but I didn't realize it at first) is to run an x86_64_JDK under Rosetta. Someone with more knowledge will hopefully chime in, as I have no first-hand experience.

Steven Deobald22:11:01

@UJL94RYSW I commented on the issue thread for future users, but switching to an x86 JDK is the current work-around, as other folks have suggested. The Java bindings for Rocks apparently need very little work to run on ARM... they just haven't been released yet.

👍 2
Brandon Olivier22:11:24

Thanks y’all 🙂