Fork me on GitHub
#datalevin
<
2022-06-14
>
alex31415908:06:50

Hi everyone. Posted a question at https://stackoverflow.com/questions/72606613/datomic-datalevin-performance-and-db-size-is-this-schema-wronghttps://stackoverflow.com/questions/72606613/datomic-datalevin-performance-and-db-size-is-this-schema-wrongwith example code - would love to use datalevin for my project but performance looks unreasonably slow, wondering if I'm doing something wrong. Thanks!

👀 1
Eugen19:06:51

curios about this myself

Huahai05:06:23

It’s normal that transacting 900k datoms takes several seconds. I am curious what you mean by “crashing” after transacting 10 times. What’s the error?

Huahai05:06:50

is it out of memory?

Huahai05:06:17

If you got tons of data, the better approach is to form the triples yourself and load them with init-db instead of transacting them, or if you have already gotten an existing DB, you can call load-datoms of its underlying store. Transacting does tons of reads (so it is possible to use up the memory due to caching) and checks (this code is mostly original datascript code, not optimized either). You don’t need them if you are bulk loading datoms.

Huahai05:06:13

Also, the DB file grows in 10x increments, i.e. 100MB -> 1GB -> 10GB -> …, so the size you see is not really the real size.

Huahai05:06:53

The next major release will introduce a different storage format that reduces the storage size to some extend, mainly removing the redundant first component of the triples, i.e. there will not be redundant e in eav, nor repeated a in ave, so there will be some storage savings. However, no major write speed improvement should be expected, as we will also introduce other indices that needs computation time. In any case, if you have a write intensive application, e.g. log ingestion, Datalevin may not be suitable for this use case. If you use it as an OTLP database, then it should work well.

alex31415913:06:12

Thanks this explains a lot! It does look like a memory problem for me. Is load-datoms documented anywhere? I couldn't find it in the api docs. I do need to update in batches of 900k datoms so would like to try that route to see if it's workable. Edit - found it in storage.cljc will play around

alex31415913:06:43

Right - I can create my datoms manually and run d/init-db sequentially, managing the datoms id manually. This sort of works (a lot faster than transact! and no memory issues) but it looks like I need to close the connection and reopen it before I execute queries. I am getting “Environment map size reached (-30792)”. Do I need to do anything to clean up the state when I use this low level way of updating the db? Thanks,