datomic

robert-stuttaford 2025-12-16T14:06:27.763469Z

when reasoning about options for dealing with slow Azure VM provisioning times, we're pondering what it might look like to have multiple transactor standby instances, to cover a pathological case of two failovers inside 10 minutes (which is about how fast Azure can provide a new standby). the HA docs don't specifically mention more than one standby. is this a reasonable approach? is there a safe mechanism whereby multiple standbys will allow a single one to promote to primary?

Linus Ericsson 2025-12-17T05:19:06.159099Z

The HA mechanism can have more than one waiting transactors. The failover mechanism works by comparing heartbeats written by live transactor and if that falls for a specified time, the failovers attempts to step in, but with some transactional guarantee

robert-stuttaford 2025-12-17T07:11:19.310149Z

thanks Linus! is it the case that the first standby to establish a lock becomes the primary?

Linus Ericsson 2025-12-17T07:21:55.390819Z

It is not specified in every detail exactly how the failover-mechanism works in the documentation. The documentation says "The first transactor to communicate with storage will become active, and the second transactor will operate in standby mode until/unless the active transactor fails." which is obvious in the case of a cold start. It likely implies that after the first (current) transactor fails writing their heartbeat, is it the second first (or "oldest") that still writes heartbeats according to specfication is the new leader? It could work in some other way as well, but always "electing" the longest running transactor actually seems like the least surprising and likely most robust algorithm (the longest running hot failover has probably the best situation in terms of cached data etc)… https://docs.datomic.com/operation/ha.html

robert-stuttaford 2025-12-17T07:33:18.312099Z

aye, i have read this documentation, and now i am hoping for someone on the Datomic team to confirm 🙂

👍 1
jaret 2025-12-17T13:11:39.344929Z

The short answer is that we don't support more than two transactors running. The paired transactors aren't doing anything as sophisticated as leader election. As those docs indicate (or maybe could be improved to emphasize), the high availability config is built around having a single active transactor acting on storage and a single standby transactor watching storage. In short, having more than 2 copies (3+) would introduce a possible race condition in which multiple standby transactors would compete to become the new active transactor in the event that the active transactor fails or disconnects. In practice, I have had many customers not follow this guidance to no ill effect and launch many N transactors, but the race condition is possible with 3 +. I will bring up your concern with the team.

👍 1
robert-stuttaford 2025-12-17T13:53:43.026059Z

thank you for that clarity, @jaret, appreciated!