Just learned that sqlite default on macOS is not actually durable, because fsync on macOS is not durable. JDK fixed this by calling fcntl(F_FULLSYNC) on macOS rather than just calling fsync. To get the same behavior for sqlite, you will need to set PRAGMA synchronous = EXTRA, which nobody does. I guess this explained why sqlite on macOS seems so fast. Now we have a decision to make in Datalevin, should we follow the JDK default, or sqlite's default. Right now, we are doing the former, the speed difference is one order of magnitude: 200 vs 2000 writes per seconds. I incline to offer an option similar to sqlite, i.e. only calling fsync, and set that as default for WAL mode (non-WAL mode will still be using JDK default). Since sqlite is widely used and nobody seems to be complaining about the loss of durability. What do you think?
So… currently sqlite on all defaults uses rollback journaling and FULL, which is not durable; if you use WAL it still uses synchronous = FULL and becomes durable? And datalevin uses WAL by default with datalog but not with key-value stores, and when it is using WAL, is not durable? IOW durability doesn't match sqlite and also is not consistent between platforms and modes?
So I've revisited this. In sqlite wirh WAL mode you do only need FULL not EXTRA as FULL calls fcntl(fd, F_FULLFSYNC)` (as long as your build or pragma has fullsync set to true). EXTRA does not enable fullsync in and of itself, it determines when fsync is called.
All this mess is because of the shortcut Apple took. Guess their users are consumers who don't care about such things as durability.
Oh, I found PostgreSQL on macOS does the same thing as Sqlite, only calling fsync . OK, I will do the same.
It's interesting that Java people are much more durability conscious than database people.
It's a good thing that most database servers are running on Linux.
ok, now we have three durability profiles in WAL mode (default for Datalog). :extra , :strict (default), and :relaxed, corresponding to SQLite's EXTRA, FULL and NORMAL
Is macOS being used for production applications? Maybe for macOS desktop apps? I would think macOS is mostly just used during development (my case) and then goes to Linux in production. Durability is less of a problem during development 🤔
surprising though, you'd think you'd want to mirror that prod a little closer if its a full order of magnitude off :^)
Sqlite in WAL only needs synchronous full for durability not extra.
Generally you set all these settings when opening a connection even if you're using the system installed version (though you normally bundle it with your app as it's 1MB and you often want to enable extension that are disabled on the system version).
On macOS, synchronous=full is not sufficient, because fsync on macOS is not fully durable. To have full durability on macOS, synchronous=extra is needed. I have aligned Datalevin WAL to have the same behavior, as that is what SQLite and PostgreSQL are both doing. I have also mentioned these in the doc.
So our Datalog store default in WAL mode :strict on macOS will not be fully durable, just like other databases, since this is a decision the OS has made. On Linux and Windows, there's not much difference between :extra and :`:strict` , i.e. :extra is effectively a macOS configuration.
When people enjoy the speed of Apple computers, they should be aware of the caveats. There's no free lunch in engineering.
What's commendable in this whole thing, is the OpenJDK people's insistance on correctness.
Since Java 9, FileChannel.force(true|false) will call fcntl(F_FULLSYNC) first, only when that's not available, it will fallback to call fsync. That fcntl(F_FULLSYNC)`` call is what the :extra setting does in Datalevin and Sqllite. The slow down is one or more orders of magnitude. So Java file write is more durable than databases on macOS, something people should be aware of.
I'm curious what can happen on macOS without the durability settings (the previous default). Would it fail silently or would you see certain errors? I was experimenting with Datalevin as Mulog publisher and I saw some errors. I assumed it was because I didn't properly close connections on restarts, but now I wonder if it maybe has a different cause.
As far as I can tell, most DB just call fsync for file durability. Sqlite, Postgres, and other native DB are all the same. Before having WAL mode, Datalevin calls msync for durability. On macOS, it has similar problem as fsync, i.e. it is not fully durable. After we have WAL, we no longer use msync , instead, we use Java's file persistence for WAL durability, and counter intuitively, it was so much slower, that's when I discovered that Java does not call fsync, but call fcntl(fd, F_FULLSYNC) first, hence it is so much slower, but durable.
So I decided to align Datalevin's behavior with other databases, namely, Sqlite and PostgreSQL: just call fsync as the default (this requires custom code in Java, which I wrote). On macOS, that means lose full durability, yes, but that's everyone else is doing. This is the choice for Datalog DB.
For KV DB, the default is not WAL mode, the reason is that, most KV usage is for fast caching purposes, there is no need to introduce complexity of WAL mode by default. So WAL mode is opt-in for KV. Default KV behavior is using msync On macOS, it is not durable. That is to say, on macOS, the default write is consistently non-durable for both KV and Datalog DB.
In summary, the default write on macOS is consistently non-durable, no matter which DB you choose. Datalevin is aligned with Sqlite and PostgreSQL in this behavior. If you want full durability on macOS in WAL mode, both Sqlite and Datalevin gave you the choice of :extra durability. PostgreSQL doesn't even give you this option, i.e. there's no full durability on macOS for PostgreSQL.
Datalevin consistently follows LMDB's philosophy, which is to do as little as possible and relies on the OS as much as possible. Therefore, we made the choice to follow OS policy in default file durabiity, which happens to be the same choice of Sqlite and PostgreSQL in this regard.
The above is the official full story. Anything else is hearsay or noise that you should safely disregard.
You might want to benchmark this F_FULLSYNC thoroughly (on real apple hardware), as there might be some traps. Hard to say if this is still real.. but if it is, then... ouch 😅 Btw. the post seems to be a reaction to a (now dead) twitter post by Hector Martin (marcan), the lead developer of the Asahi Linux project (port of native linux to apple-silicon devices). I guess he might know a thing or two about these things. https://www.phoronix.com/forums/forum/software/bsd-mac-os-x-hurd-others/1309928-turns-out-macos-has-been-benchmark-cheating-in-writes-by-not-doing-proper-fsync
F_BARRIERFSYNC might be yet another option if F_FULLSYNC shows as too slow. It seems to offer a relatively acceptable compromise.. unless power-loss scenario is a concern.
> F_BARRIERFSYNC is roughly on par with Linux fsync for crash/panic safety, but it is weaker than Linux fsync for power-loss durability. For a WAL-mode database (like SQLite or Datalevin), crash safety is the dominant concern — power cuts to a laptop are rare and the OS typically gets to flush before shutdown anyway. For a desktop server (Mac Mini, Mac Studio) running a database that needs strict power-loss guarantees, F_BARRIERFSYNC is genuinely weaker, and you'd want F_FULLSYNC — with the painful 46 IOPS consequence.
>
> What F_BARRIERFSYNC actually is
> It's conceptually different from both fsync and F_FULLSYNC. Rather than flushing the drive's volatile cache to persistent storage, it issues a write barrier — it guarantees that all writes issued before the barrier will be physically written to storage before any writes issued after the barrier. It's the storage equivalent of a memory barrier/fence in CPU architecture.
> This means:
> • Crash safety (OS panic, kernel crash): ✅ Safe. The write ordering is guaranteed, so a WAL-based database will always find its commit record in a consistent position relative to data writes.
> • Power loss: ❌ Not guaranteed. Data sitting in the drive's volatile write cache may still be lost if power is cut suddenly.
Dark waters, indeed. As you said: It's a good thing that most database servers are running on Linux. 💯
I did not know osx doesn't honour fsync. That is wild.
0.10.7 is released https://github.com/datalevin/datalevin/blob/master/CHANGELOG.md#0107-2026-03-03