This page is not created by, affiliated with, or supported by Slack Technologies, Inc.
2023-05-23
Channels
- # ai (1)
- # aleph (1)
- # announcements (7)
- # babashka (87)
- # beginners (34)
- # biff (9)
- # clerk (4)
- # clojars (37)
- # clojure (144)
- # clojure-art (12)
- # clojure-europe (13)
- # clojure-nl (1)
- # clojure-norway (4)
- # clojure-uk (2)
- # clr (5)
- # conjure (1)
- # data-science (1)
- # datahike (7)
- # datalevin (6)
- # datomic (13)
- # events (1)
- # fulcro (1)
- # graalvm (5)
- # gratitude (1)
- # honeysql (4)
- # hyperfiddle (122)
- # malli (26)
- # nbb (2)
- # off-topic (16)
- # portal (93)
- # practicalli (1)
- # re-frame (1)
- # reitit (15)
- # releases (3)
- # remote-jobs (1)
- # shadow-cljs (5)
- # tools-deps (6)
- # xtdb (4)
So my machine crashed whilst doing a transaction and now I’m getting a MDB_PAGE_NOTFOUND: Requested page not found
error. Iv’e done some googling but couldn’t find an obvious way to recover. The database is quite large (didn’t have space for a backup on my current machine) so I don’t want to have to rebuild it from scratch (thankfully it’s not production data though). Is there a way to recover from this? Even if it means losing the data that was in the transaction. Thanks in advance.
Can you still open the DB at all? Would you be able to copy the DB to another directory with dtlv?
The default setting of datalevin is not the most durable one. We write async by default. To avoid such situations, open DB with option like this {:kv-opts {:flags []}}
, it will be fully durable and crash resistant, but suffer in write speed. There’s no free lunch.
It is debatable whether the default should be the most safe setting, which is LMDB’s default. Maybe it should. What do you guys think? If you guys agree, we can do that in the next major release.
dltv copy fails with the same error. I do wonder if it’s just about fixing/removing the corrupt page (I have zero understanding of the internals of LMDB). Anyway, I’ve just decided to rebuild the data from scratch (as it’s mostly ingesting a ton of data from a third party). In terms of defaults. I’m in two minds. Right now I’m mainly doing data analysis stuff with a bunch of initial transforms/normalisation transactions (inserts/updates) with a lot of data, crash resistant is important as there’s a lot of data so backups are big/slow and corruption can waste a bunch of time. If I was doing something more real time with a bunch of inserts daily backups and low downside to losing the last X hours of data, i’d probably take the hit on crash resilience for insert speed. I’m not sure there’s a correct default. As you said there’s no free lunch. If I recall correctly sqlite defaults to correctness/crash resilience and then when you want to get the performance you turn off the various safeties. I do feel opting in to the risk/performance tradeoff is probably a better default. Particularly if corruption means losing all your data not just what happened around the crash.