Fork me on GitHub
#datalevin
<
2023-11-08
>
Huahai05:11:01

You said the API design is questionable, what would be a reasonable design?

2
leonoel10:11:38

The object returned by this method call is a handle to one-shot process with a lifecycle, it is not a good fit for clojure's purely functional collection framework. IMO it is perfectly fine to not try to hide the imperative API and expose java.util.Iterator, or just a blocking function to fetch successive items returning nil or a custom sentinel on EOF. Then the user can choose to iterate in a loop or use repeatedly + take-while to interop with clojure collections (with special care wrt laziness escaping with-open scope). If you choose the async path, my recommendation would be the https://github.com/leonoel/flow specification. Implementation is not trivial but I would happily provide support, since datalevin seems rather popular among missionary and electric users

Huahai16:11:26

The purely functional collection framework is indeed incompatible with a mutable database, range-seq is not intended to hide that, it IS an imperative Iterator, but we implement seq as a convenience for Clojure users, the documentation is quite clear on that. I have found the seq like API backed by a mutable underpinning quite convenient in my work. The intent of the function is to provide a way to go over a range of data within a single read transaction, i.e. it is for performance and correctness.

Huahai16:11:08

If these two criteria are not of concern, one can write whatever API on top of point queries. However, there will be no guarantee that the data returned are consistent.

Huahai16:11:34

The main concern for me is to discourage a user from holding a read transaction open for a long time, which blow up the database and return stale data.

Huahai16:11:22

Please realize that the database is mutable, and a read transaction is basically reading a snapshot of the database, which may not reflect the current state any more. This is the kind of things I am concerning with when designing the API.

Huahai16:11:32

Users are free to implement whatever API they like on top of the primitives that the database provides, but the database should not provide APIs that give people wrong ideas. For example, say, we have an async API, we have two choices to implement this, one is to keep a read transaction open, which has problem I mentioned above, another is to do it based on point queries, i.e. each point query is a read transaction of its own, then you lose consistency. It’s not my place to judge which choice is the best for users, because each user have different needs. So it is up to users to implement these.

Huahai16:11:27

What I am saying is, as a database designer, my job is to provide primitives that reflect the underlying reality of the database. It is up to library authors to make choices on how to implement certain user facing APIs.

Huahai17:11:32

For example, if one knows that there are no concurrent writers for a database, An async API implemented on the basis of point queries are perfectly fine. However, that is not an assumption Datalevin should be making.

Huahai17:11:11

One should realize that functional data structure/database is forcing a choice on the users. They say “database is a value”, which means that they always give you a snapshot of the data that may be already out of date while you are still working with it. It’s your responsibility to deal with that, i.e. you must call d/db very time to get the current value if you care about staleness. range-seq also does this for you by working within a single read transaction, i.e. it provides a snapshot of the data, but I only provide a single pass over it, because I don’t want users to keep this transaction open indefinitely. The intent is for the users to copy whatever data they need out of the database and close the transaction quickly.

leonoel21:11:58

I understand the value of single read transactions and I'm not questioning its benefits, however the transaction concern is orthogonal to sync vs async. It is absolutely possible to expose an asynchronous version of range-seq that is also run inside a read transaction, the user is in charge of ensuring the transaction is closed quickly just like in the synchronous version. I am not questioning the convenience of seq either. My point is that it makes it easy for beginners to write incorrect programs, and the reason why I'm concerned is because I have to debug these programs to support the users of the library I maintain. The documentation is indeed clear about that, but the discussion I linked shows that current design doesn't quite match user expectations. To be clear my intent was just to share a user experience, I'm not advocating for any change in the API.

Huahai21:11:37

I am open to adding additional APIs and welcome contributions.

👍 1