Fork me on GitHub
#datalevin
<
2022-06-27
>
baibhavbista11:06:53

I had a question about datalevin consistency/correctness in the face of server crashes that I would love any insight/pointers to Say I am running a datalevin database (or many different datalevin databases) on a server and the server crashes (let’s assume that the crash is not related to datalevin). In that case, RAM would be cleared, but the storage is persisted (say it uses something like AWS Elastic Block Storage which persists beyond lifecycle of machine). In that case, would there be any guarantees about the data in the database? Is there possibility of any issues when loading the database again? Would it be guaranteed to have correct data? (except for the transactions in progress which would be lost I presume).

baibhavbista11:06:18

More info: The transactions in progress being lost would not be an issue. There would be a separate transaction log elsewhere so the transactions could be replayed after the database is loaded again. The only thing that matters is that the database is correct at a point in time and is not corrupted somehow

baibhavbista11:06:48

Since LMDB is an ACID transactional database, my first guess is that datalevin should be okay in the scenario of server crashes as mentioned above? (however, I don’t have extensive experience with databases so my intuition may be wrong here)

Huahai17:06:01

By default, Datalevin does not use the most durable environment setting of LMDB, in order to have better write speed. The default env setting of Datalevin is [:nordahead :mapasync :writemap] , which means we flush disk asynchronously, i.e. using msync(MS_ASYNC) system call (an no-op on newer Linux kernels, see https://man7.org/linux/man-pages/man2/msync.2.html), and the OS kernel is responsible for flushing the dirty pages, so there’s a risk of data corruption if system crashes at the wrong time, although the chances are small, as we should expect the OS kernel to do the right thing. If you worry about data corruption, you can remove :mapasync , which will reduce write speed a few order of magnitude, for we will then flush disk synchronously using mysnc(MS_SYNC) for every write, but you will then not have any data corruption problem. The choice is yours. The write speed difference is significant particularly for small writes, e.g. e.g. it is 200X slower when writing 1 datom at a time than writing 100k datoms in one single transaction.

Huahai17:06:10

At some point, we will probably introduce WAL. At that point, we will turn off asynchronous write as the default. By then, the only loss would be the last few transactions, but data corruption would never happen. For now, you will have to turn it off manually yourself.

Huahai17:06:54

On a second thought, WAL probably will only be enabled for server mode, the embedded mode will probably stay the same.

baibhavbista09:06:49

Thank you for the reply @U0A74MRCJ 🙏 It was exactly what I wanted to know. Will take this information into account

baibhavbista09:06:58

Another question: In the case the system crashes at the wrong time and the OS kernel is not able to flush the dirty pages/has flushed only a subset of them, after recovering the volume to a new instance, is it possible to know that that happened? (via integrity checks on the database or logs maybe?) The advantage of knowing that would be: If we knew the data could be/is corrupt, we could reconstruct the database from the (separate) transaction log. However, we might not want to do that for all databases in a volume when there is an instance crash

Huahai14:06:24

So far I have not received any reports on data corruption, so I don't know. My guess is that you won't be able to initialize the DB if it happens, as you essentially has a corrupted file. How often do you have a file corruption in modern OS? Rarely.

baibhavbista02:06:47

Okay thank you @U0A74MRCJ 👍