@huahaiy I'm trying to understand WAL. From the docs, it seems like it's supposed to increase write throughput by amortizing the cost of fsync across multiple transactions. The docs say
> Concurrent writers receive tangible throughput benefits in WAL mode. The scaling is sub-linear, e.g. 2 concurrent write threads may produce close to 2X throughput increase compared with a single write thread, while 4 threads produce a little over 3X increases
But I was working on this benchmark and I can't find a case where WAL mode is faster for writes.
I will change the default policy as the following: embedded Datalog DB will return to :wal? false , so wa1? true is an explicit opt-in. However, once opted in with WAL, user gets :wal-durability-profile :relaxed as the default, so this meets users' expectation of higher throughput. This is the same policy strategy of Sqlite.
For server/HA profile, the default will be :wal? true and :wal-durability-profile :strict, as this is required for correctness.
I think rewording the docs would be better, so users don't have an expectation of higher throughput to begin with. WAL helps Sqlite with write performance, but I assume Datalevin gets a performance hit from WAL mode because it has different guarantees/indexing than Sqlite.
The equivalent to :relaxed would be sqlite's PRAGMA synchronous=OFF, no? Where code execution resumes even if the transaction wasn't synced to disk yet? That's a default that can lead to issues. I don't think that should be default in the name of performance, I think the docs could be transparent that WAL mode is, most of the time, slower.
This is assuming my benchmarks aren't flawed. I am new to this
No. :relaxed is similar to NORMAL in Sqlite, meaning you may lose your last transactions at untimely crash.
As in you may lose transactions before they run through fsync? In that case, I misunderstood what NORMAL does.
That would make more sense as a default then.
:relaxed does group transaction on the WAL, so it is possible to lose a group when the system crashed before fsync is called. This is similar to Sqlite NORMAL.
Thanks for clearing that up.
There are some difference in term of implementation. Sqlite in WAL NORMAL does not call fsync at all, until checkpoint time. The auto checkpoint threshold is 1000 pages, meaning you can lose up to 1000 pages at untimely crash. Datalevin :relaxed does group commit, which is bounded by time and size: 10ms or 128 WAL commits, whichever comes first. So the worst can happen is to lose 128 WAL commits. Otherwise, they are similar.
Got it, so the window is much shorter in Datalevin by default.
I wouldn't say that. That depends on how much you include in a txn. If most txns are small, yes, but if you do huge txns, well, 128 commits can be more than 1000 pages.
However, I think the 10ms windows is the dominate one. If you have huge commits, 10ms may not be enough.
In any case, our consideration of both time and size is safer, that is true.
Got it. And it sounds like the settings are per connection, so you could use :relaxed most of the time and :strict for transactions that need it.
yes, it is by connection.
which version is this?
0.10.7
Inserting 5000 random users per scenario
==========================================================================================
transact! WAL on (default) 5000 writes 12894.1 ms 387.8 writes/sec
transact! WAL off 5000 writes 2210.4 ms 2262.0 writes/sec
--- now with app-level batching (100 users per transact! call) ---
transact! x100 WAL on 5000 writes 2068.2 ms 2417.6 writes/sec
transact! x100 WAL off 5000 writes 550.1 ms 9088.8 writes/sec
--- transact-async (one user per call, futures deref'd at end) ---
transact-async WAL on 5000 writes 2106.8 ms 2373.2 writes/sec
transact-async WAL off 5000 writes 391.7 ms 12765.4 writes/sec
--- concurrent writers (4 threads, transact!) — where WAL should shine ---
4 threads WAL on 5000 writes 6494.8 ms 769.8 writes/sec
4 threads WAL off 5000 writes 2073.0 ms 2412.0 writes/secon Mac?
Windows
Sorry I clipped off the concurrent part
isn't your result what readme says?
with concurrent writes, your got better throughput with WAL, without, concurrency barely move the needle?
Yes I apologize I'm tired. So WAL has a pretty significant performance hit if you're not allowing concurrent writes, is that expected? Is the only benefit of WAL to amortize the cost of those fsync calls?
did it says anywhere WAL will be faster? no
if you are doing more work, it won't be faster, WAL is doing more work
also, these things are very platform specific, I don't have windows and have never tried on windows
Okay. I think I'm mainly confused because the more I tweaked the thread count, I never saw WAL off beat WAL on, and I thought it was for increasing write throughput. I could try it on a mac.
I find it quite suspicious that your WAL off number is so high, I am pretty sure it's not fully durable on Windows.
Maybe
the same deal with macOS, these vendors cheat to get better bench numbers. On macOS, we know that fsync doesn't really fsync, maybe it's same with Windows
WAL on is using Java's file sync, which is honest. Whereas the default LMDB use msync, which is platform specific. I know it's cheating on macOS, now looks like Windows is the same.
Basically, in WAL, we write a WAL file, which is using Java's file API, which is honest and ensure data is really written. Without WAL, we use msync, which depending on the OS, it may or may not be durable.
In any case, for a DB, you really want to use Linux as the standard, otherwise, you will be mislead by vendor shenanigans like these.
Got it. On my m1 mac I did manage to make WAL mode faster. I will try it on linux later
On Mac,I did the work to force WAL file write to use os fsync, looks like the same needs to be done for windows
It might be worth clarifying in the docs, because I did think WAL mode was meant to be faster for writes. From what I understand, that's the case for SQLite, so I thought the whole point of WAL mode was for better performance. There's two spots that made me think that. • wal.md - WAL improves write throughput while keeping crash recovery explicit and operationally manageable • transact.md - WAL mode enables Datalevin to escape the single writer limitation of LMDB. It uses a sync queue to amortize the cost of expensive disk sync operations • main page - With built-in support for WAL and asynchronous transaction, Datalevin can also handle write intensive workload (which sort of implies WAL is the thing to reach for if you want better write throughput) It seems like it will increase write throughput only under very specific circumstances, but otherwise is slower. Maybe that's obvious to people that know database internals, but I learned what WAL was today.
These were my benchmarks on Linux. WAL on was definitely faster with more threads, but it's worth flagging in the docs that you could get a 5x-10x slowdown with WAL.
transact! WAL on (default) 2500 writes 12034.5 ms 207.7 writes/sec
transact! WAL off 2500 writes 4887.5 ms 511.5 writes/sec
--- now with app-level batching (100 users per transact! call) ---
transact! x100 WAL on 2500 writes 3012.0 ms 830.0 writes/sec
transact! x100 WAL off 2500 writes 502.2 ms 4978.3 writes/sec
--- transact-async (one user per call, futures deref'd at end) ---
transact-async WAL on 2500 writes 2874.0 ms 869.9 writes/sec
transact-async WAL off 2500 writes 363.5 ms 6877.2 writes/sec
--- concurrent writers (4 threads, transact!) — where WAL should shine ---
4 threads WAL on 2500 writes 6526.1 ms 383.1 writes/sec
4 threads WAL off 2500 writes 4553.9 ms 549.0 writes/sec