Fork me on GitHub
#datalevin
<
2022-09-28
>
Eugen21:09:37

I can add the project that uses datalevin via deps.edn as a project under examples/

Huahai21:09:00

Please do, thanks

Eugen21:09:55

I use Calva and it formats deps.edn automatically. is it ok if I format the whole file? I don't know how you format the file. Did some efort to not break formatting but it's hard

Huahai21:09:26

it’s ok. I use spacemacs default format. the format is not consistent anyway.

Eugen21:09:03

ok 🙂 then we can close https://github.com/juji-io/datalevin/issues/98 . I saw github actions is done

Huahai04:10:38

Thanks 🙏

Eugen21:09:09

@huahaiy: I would like to try and implement one of : https://github.com/juji-io/datalevin/issues/109 and https://github.com/juji-io/datalevin/issues/108 . I think I would start with 108 first. Do you have any directions for me?

Eugen22:09:16

More specifically, I am not sure we want to change the existing API. We will probably want to add a new one. Any ideas on how this should look? I am getting familiar with the API now

Huahai04:09:24

Right, we will add new functions for these functionality. For Datalog query, Datomic has qseq, so we should probably support that. Basically, query results are in a lazy seq. Similarly, for KV, we can have get-range-seq corresponding to get-range that returns a lazy seq, range-filter-seq , and so on.

Huahai04:09:59

Basically, implement ISeq protocols with iterative-kv

Huahai04:09:17

A naive implementation should work, we don’t need complexity such as chunk and so on.

Huahai05:09:53

Keep things simple and straightforward

Eugen05:09:35

ok, thanks.

Huahai05:09:24

We would normally discourage people to use the seq version of the functions, because it will hold a read transaction, the db will grow very fast if we hold too many of these

Huahai05:09:29

because, conceptually, each read transaction keep a copy of the db of its own.

Huahai05:09:41

these should only be used if the results set are too large to hold in memory.

Eugen06:09:34

I would go to expose the transaction semantics to user

Eugen06:09:52

so they can judge when and for how long to keep transactions opened

Eugen06:09:32

either sart-transaction / end-transaction or via (with-open txn (start-transaction params) )

Huahai20:09:44

Exposing transaction and lazy result sets are separate issues

Huahai20:09:04

They should not be considered together

Huahai20:09:43

In any case, we will not introduce a transaction object to users.

Huahai20:09:01

The goal is to not changing the existing functions

Huahai20:09:53

The need to have (with-transaction ..) is to ensure atomicity when there are both read/write.

Huahai20:09:46

The need for lazy query is to solve the memory problem.

Huahai20:09:57

A casual user should not be aware of the ideas of transactions at all. Simplicity is the goal of this project.

Huahai20:09:33

I hope I have made myself understood.

Huahai20:09:40

Using (with-transaction ..) instead of (start-transaction …), is to enforce the idea that a transaction should be something short and well contained.

Huahai08:12:36

range-seq is now implemented for embedded mode. It lazily loads data into memory in batches, implements Seqable , as long as one does not hold the head, process data as they come in, it should not blow up memory.

Huahai08:12:48

The eager version of get-range and range-filter now spill to disk when memory pressure is high. Hopefully, these together could alleviate some of the big data read memory problems.

Eugen09:12:20

thanks, I saw the releases and the changes - did not have time to check them out

Eugen09:12:40

hopefully will be able to do that soon