Bro, I am sorry, but datascript works well. Just with this backend
So, I do not think I wrote an optimal backend, but it not stuck for 10 seconds, thatβs fact. Unfortunately, I do not know Konserve in details to find a cause in your code
konserve-dynamodb now has 10-20ms write latency, similar for reads (on an EC2 instance). But this is for single write operations. For some reason when I write many in parallel it still adds up to around 100ms, while my fix did half the average transact time for S3.
Your multiple put operation is the optimal thing to do. But it does require a very strong memory model of the underlying storage, usually some form of multi version concurrency control, e.g. SQL or Dynamo. This limits the scaling of the store and you effectively rely on a stronger mutable memory manager to piggieback on top. Datahike can run on much weaker memory semantics such as network filesystems or S3 reliably as well. I will try to fix the dynamo use case though, because it has a very nice latency and I understand the value proposition.
I have no idea why you were stuck for 10 seconds. I cannot reproduce this for datahike-dynamodb. Database creation takes some time now (up to 10 secs) because dynamo takes time to get the table online.
Maybe S3 would actually be fast enough now for you btw. Not sure. It has some variance on how fast it returns, but it averages at around 200ms for a transact call.
I am happy that DataScript works for you, I wanted to have the storage functionality shared with it because I think this software stack is actually fairly clean and it is important for Clojure devs. Datahike is in parts more complicated and maybe some of that could be avoided (e.g. async support), but there are reasons for most of the design decisions and strategy.
But datascript does not have versioning functionality
Nope. It has the right memory model, but Nikita was mostly focused on the lean lightweight in-memory use case.
Which is nice. Datomic can be heavy to set up.
That is good to know. I had prototyped the storage backend for the persistent-sorted-set and Nikita adjusted it. We use the same IStorage interface as DataScript, so your code somewhat translates. DataScript does not nearly have the same amount of features and query performance though. You use different dynamo libraries here as far as I can see. Why did you do that? I need to check the differences. Can you also native compile your code? What are the latencies?
https://github.com/replikativ/datahike/blob/main/src/datahike/index/persistent_set.cljc#L203
What is misc/do-nanos*?
Found it https://github.com/algoflora/himmelsstuermer/blob/main/src/himmelsstuermer/misc.clj#L81
Bro, it just count nanoseconds taken to execute code inside. What wrong with it?)
Nothing, I found it.
Just to make that clear and save you potentially some trouble, DataScript's durability layer is not made for distributed access. It is only safe inside a single JVM process against a strongly consistent backend that can do atomic updates over multiple keys (e.g. not file system or S3). If you restore a DataScript db in parallel lambda invocations you can read incoherent snapshots.
Datahike can read from anywhere without coordination, the only thing you need to ensure in our memory model is that you have a single writer and do not transact in parallel, but rather use the single writer (by setting it up and pointing to it in the config).
I think Datahike and DataScript are very similar in many ways and I am very grateful for DataScript. I just think Nikita has his own take on what DataScript should be that does not really align with what I need Datahike for. I would be more than happy to reduce the maintenance burden and just have the functionality in a joint community project. But the distributed memory functionality is what I really care about.
I suspect you have fairly optimal latency with your DataScript code, there are two differences a) nippy serializer and b) writing all changes to the indices in one write request. The latter can be hacked into Datahike relatively easily (basically by just porting your code over and changing it a bit), but abstracting this through konserve will require a bit more work (not a lot though).
These releases skip a conservative backup creation that konserve has for underlying stores that do not provide atomic updates, since S3, dynamodb and JDBC do provide atomic updates on a per key basis. With this change konserve-dynamodb now has the expected latencies of ~10ms per write operation. I left the update to konserve-jdbc for review for @alekcz360 first.
datahike-dynamodb is still not as fast as it should be though, I think this is because I schedule many write operations in parallel instead of a single batch. I will give this feature a shot at some point in the near future, if you need this right now lmk.
AWS S3 now has d/transact latencies of ~200ms for me, effectively halfing it from before.
@pat you probably want to set the same config for cloud-storage as well: https://github.com/replikativ/konserve-jdbc/pull/24/files