Fork me on GitHub
#datomic
<
2023-11-15
>
Dimitar Uzunov09:11:23

Hello guys, our on-prem transactor ec2 instance seems to be configured to terminate when the systemd service for the datomic transactor stops for whatever reason. The employee who implemented this a long time ago has moved on years ago. So my questions are: • Could there be a particular reason, perhaps a best practice of some sort to configure it that way? • The instances are in an ASG so it should get replaced, but perhaps using the health-check is better?

jaret13:11:48

@ULE3UT8Q5 with https://docs.datomic.com/pro/operation/ha.html configured it is good practice to kill a transactor machine after the Datomic process has stopped. Using the health check is also a good option too. We have fixed many of the causes for this, but we have seen behavior in the past that we have coined "Zombie transactors" where after an OOM the transactor is stuck and still potentially reporting healthy. As such, we also generally advise customers to handle killing the transactor process or set a JVM flag for ExitOnOutOfMemory.

Dimitar Uzunov15:11:00

So is using the health check to let a ASG replace an instance an equally good practice? The reason I'm inquiring is that in case of a configuration failure or a transactor crash it is hard to troubleshoot the machine when it dies

Dimitar Uzunov15:11:06

I call that instance self termination "transactor seppuku", and I'm wondering about what the tradeoffs are