Fork me on GitHub
#datalevin
<
2024-02-13
>
andersmurphy18:02:57

I’m getting a Fail to transact to LMDB: key/data pair already exists (-30799) error when performing multiple concurrent transactions with large edn blobs. Is this expected? Do I need to make sure transactions are not run concurrently? Also does db/id need to be specified in a transaction?

Huahai18:02:34

This should be a bug. We do test for concurrent writes, but may not be as extensively tested with large data.

Huahai18:02:25

Obviously, we are testing with regular threads, with futures, pmaps, etc. We do not support virtual threads at the moment, Clojure itself does not support virtual threads either.

Huahai20:02:19

db/id does't need to be specified.

m s15:02:45

hi @U0A74MRCJ, I'm running into this as well. Here's a minimal repro:

(let [tx [{:a (apply str (repeat 494 \a))}]]
    (future (datalevin.core/transact! conn tx))
    @(future (datalevin.core/transact! conn tx)))

m s15:02:05

Interestingly, 493 seems to be the limit, at least on my laptop. So far it never failed with 493, and always fails with 494.

m s15:02:48

If I do a dump-datalog after the failure, I can see that there is a duplicate entry:

[1 :a "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"]
[1 :a "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"]

andersmurphy16:02:29

Ahh that’s interesting, my data did have large strings (large hashes), similar size to your test data. So that might be the underlying issue.

Huahai01:02:33

Large data are stored separately, so there might be a bug on how that is handled in a concurrent situation

Huahai01:02:04

As to you test case, it is a pathological case that I did not consider a realistic use case. Basically, a hidden assumption we had is that user will not have two pieces of data that have the same prefix up to 493 bytes, if you do, we currently do not support the case

Huahai01:02:08

I will have to come up a way to support this, without compromising performance too much

Huahai01:02:23

The reason is that LMDB has a 511 key size limit. From there, we have to use some bytes for our purposes, you got 493 bytes for values. In next version, the number is going to be 497 instead, but the limit is still there

Huahai01:02:43

How do we store large values? We store them as values instead of keys, however, the first 493 bytes of the prefix of the values are assumed to be different, otherwise, we will have a problem

Huahai01:02:24

Right now, I am not handling such cases. We can handle it by adding some checks and give each such a value an order number. It will slow things down a bit. But I guess it is ok. Our development is driven by use, if you guys run into this realistically, we can address it.

m s07:02:53

a hidden assumption we had is that user will not have two pieces of data that have the same prefix up to 493 bytesthat's quite the assumption. > if you do, we currently do not support the case what do you mean by that? it seems to be working quite well, the only issue is with concurrent writes. also, there's this in the README: > LMDB also performs well in writing large values (> 2KB). Therefore, it is fine to store documents in Datalevin.

andersmurphy09:02:37

Yeah this only happens with concurrent writes for me. I’ll have to check but I don’t think the prefix was the same in my case.

m s10:02:00

I can repro with different values as well

(let [tx1 [{:a (apply str (repeat 494 \a))}]
      tx2 [{:a (apply str (repeat 494 \b))}]]
  (future (datalevin.core/transact! conn tx1))
  @(future (datalevin.core/transact! conn tx2)))

🙏 1
Huahai16:02:20

Our hidden assumption works fine with serial write cases, because two pieces of data that have the same 493 bytes prefix do have different what we call "giant ids", so they will be stored as different bytes in the index due to the difference in giant ids. However, the full data are stored in a different DBI as values, so an extra step is needed to store them. At a concurrent situation, these two steps probably went out of sync somewhere. Ideally, this shouldn't happen because we lock the db for each transaction. Apparently, we are missing some locks somewhere. Thanks for these test cases, I will investigate. Hopefully it can be fixed by adding a lock somewhere.

Huahai17:02:38

I am happy to report that this bug is already fixed in the branch that is to be released as 0.9.0. Basically, the problem is due to a mix-up in the advancing of giant id. The fix is to lock at the right place. If you guys want the fix in the 0.8.x branch, I can cut a release. Please let me know.

Huahai17:02:38

ok, I will cut a release, since there are a couple of other fixes.

💪 1
🙏 2
Huahai02:02:39

0.8.28 fixes this. Thanks for reporting.

andersmurphy18:02:37

For context this is in embedded mode. Wonder if it’s to do with serialisation speed. I can hammer datalevin with 1000+ smaller concurrent transactions from virtual threads (even with edn blobs). But the minute edn blobs get to a certain size it starts to happen. Also happens much more consistently (also happens with smaller blobs) when I remove com.cognitect/transit-clj {:mvn/version "1.0.329"} (10+ concurrent transactions).

Huahai18:02:53

Don't think virtual threads are supported at the moment. We use locking , i.e. synchronized monitor, to ensure single write at a time, and synchronized is known to be incompatible with virtual threads. They are working on a fix in the JVM and probably won't be available anytime soon. We can switch to reentrantlock, but right now, I don't have the band width. I am busy getting query engine rewrite to be done soon.

Huahai18:02:23

I hope to finish this work and release 0.9.0 in a few weeks.

Huahai18:02:30

Shouldn't be hard to switch to reentrentlock, just change the with-transaction macro, replace locking with such a lock.

Huahai18:02:06

probably want to add a lock for conn as well, it seems to be needed for the 0.8.* branch, but my 0.9.0 branch only needs a lock for conn, I forgot what I have done.

Huahai18:02:57

Please note that LMDB is single writer, so concurrent writes won't improve throughput.

Huahai18:02:31

We are serializing the writes with a lock. So it is really just a convenience feature.

Huahai18:02:39

Notice also that Clojure itself have some synchronized, so even if Datalevin switched to reentrantlock, if you are using large EDN data, you are still going to run into problems with virtual threads.

Huahai19:02:46

On a side note, modern OS supports thousands of regular threads just fine. I regularly use a thread pool with thousands of threads in it in production. It is not a problem. The cost of regular threads is not as big as people imagined.

Huahai19:02:57

The use cases of virtual threads are in the scales of millions of threads, which most people won't be needing.

andersmurphy22:02:06

Yeah, virtual threads are not crucial and I wasn’t expecting support for them. I think Clojure has moved those synchronized to locks in (https://clojure.org/news/2023/10/20/clojure-1-12-alpha5). But like you said regular threads are perfectly fine for most use cases. The reason I was using virtual threads to get more load/higher concurrency to test with smaller edn blob inserts. I can still reproduce the issue with regular threads (the edn blobs just need to be bigger). Good to know that the writes are already being serialised and there’s no extra work on my side 🙏(I’ve used single writer setups before where we’ve done serialising in the app code and it can be a pain). I can create an example/repro repo with the large data blobs (and no virtual threads). If that helps? This is a bit of an atypical use case, where I want to store large responses in edn blobs and later pull out the keys I need for queries so it might fall out of the main usecase. The issue doesn’t seem to come up when using large edn blobs with no concurrency, so that’s also an option for me. PS: Looking forward to the new query engine

Huahai22:02:27

yes, it certainly helps. Thanks.

👍 1
Huahai22:02:03

I am making great progress on the query engine. In general, the more complex is the query, the greater is the speed improvement. There are some surprising findings, e.g. predicates pushdown actually slow query down. This is totally against common wisdom. So I am trying some predicate pushup instead. It's wild.

🤯 2
Huahai01:02:36

It turns out that predicates push down does help a lot, just need to make sure calling them is cheap enough. It is easy to lose a lot of performance in clojure

Huahai02:02:17

In this case, turning the argument list into an array is the solution