datalevin 2023-07-03 | Slack Archive

Patrick Brown17:07:10

How are people deploying Datalevin server in production? Does anyone have an experience report on AWS? Is anyone mounting a DS using EFS?

Huahai17:07:18

• We deploy Datalevin server in production. We use Azure, but AWS should be similar. Using Datalevin on EFS is not recommended. “Do not use LMDB databases on remote filesystems, even between processes on the same host. This breaks flock() on some OSes, possibly memory map sync, and certainly sync between programs on different hosts.”

Huahai17:07:12

yes

Patrick Brown17:07:40

So, if I’ve got an MDB larger than 3gb, what do you suggest?

Patrick Brown17:07:09

I’m trying to migrate from Datomic cloud BTW

Huahai17:07:15

what about a mdb larger than 3gb?

Huahai17:07:20

3gb is small

Patrick Brown17:07:34

I’ve got a size limitation on containerized workflows in AWS

Huahai17:07:08

don’t know too much about that

Patrick Brown17:07:28

Are you on a VM or using a container on Azure?

Huahai17:07:40

on a vm

Patrick Brown17:07:21

If I do that, then I’m on the hook for all the added complexity of HA VMs. Any idea how serious the issues are with NFS?

Huahai17:07:07

i think that warning is regarding multiple process/hosts dealing with the same data file, if you don’t do that, it should be fine

Patrick Brown17:07:38

OK, so let me run this by you…

Patrick Brown17:07:15

Single load-balancer pointing at a failover container doing nothing but running dltv with the DSs mounted by NFS, no other server touches the NFS unless a failover occurs? Does this sound like a reliable solution to you?

Huahai17:07:54

it might be problemtaic when failover occurs though. some pages may not be flushed to disk yet

Huahai17:07:29

unless you use the most safe option, which is not the default right now, but will be in 0.9.0

Huahai17:07:24

right now, you can get the safe option by passing [] to :flags:

Patrick Brown17:07:18

Any CPU/Mem configuration guidelines you’d like to add would be appreciated. I’ll look into the safety option. (open-dbi db dbi-name {:flags []}) ?

Huahai17:07:27

no, open-kv

Huahai17:07:14

with datalog db, pass to :kv-opts {:flags []}

Huahai17:07:36

the config is db file wide, not dbi wide

Patrick Brown17:07:19

Perfect. Thanks for taking the time to help… One more question. Is there a github repo where people have implemented embedding on IOS with native image?

Huahai17:07:00

don’t know any like that. I am not sure we support IOS

Patrick Brown17:07:43

Theoretically it looks possible, but I obviously don’t know much about it.

Huahai17:07:20

yeah, should be possible, and we will expand the supported platforms after 0.9.0, mostly a matter of CI config, but it does take time to do and test

Patrick Brown17:07:59

@U0A74MRCJ Thanks so much. I’m really enjoying the simplicity you’ve created.

Huahai17:07:24

Thanks

Patrick Brown21:07:56

@U0A74MRCJ Just wanted to follow up with some encouragement. Just keep going! This server setup is incredibly fast. I just decided to put it on a VM and got the whole thing sorted out with no issues first time. Very intuitive how to use it. It’s super impressive what you’ve got.

Huahai21:07:49

thanks for the suport!

Huahai17:07:48

Version 0.8.18 is released. Added nippy dump/load format, which is more robust, and fixed a few issues. https://github.com/juji-io/datalevin/blob/master/CHANGELOG.md#0817-2023-07-02

👍 2

🎉 4

phronmophobic22:07:31

I was trying to load data into a db and have several :db/fulltext attributes. It worked fine for a small sample, but loading increasingly took longer and longer. I let it run for 1-2 days and was only about halfway through loading the data (with transactions seeming to load slower and slower). I tried loading from a fresh db after setting (d/datalog-index-cache-limit @conn 0) (I think that turns off the cache? The documentation for this option wasn't clear to me.) Am I doing something wrong or is there some tips/tricks that can help here? More details in 🧵

phronmophobic22:07:59

Happy to share any useful info. The code and data are both open source.

phronmophobic22:07:10

The latest code is available at https://github.com/phronmophobic/dewey/blob/main/examples/deweydb/src/com/phronemophobic/dewey/db.clj

phronmophobic22:07:29

I was able to load the data previously without fulltext and the resulting db was around 70-80gb.

Huahai22:07:15

how big is the gz file?

phronmophobic22:07:50

852mb.

phronmophobic22:07:40

It's the analysis.edn.gz from https://github.com/phronmophobic/dewey/releases/tag/2023-06-12

Huahai22:07:45

that shouldn’t be a big issue, how big is the memory?

Huahai22:07:08

you are probably memory starved

Huahai22:07:32

set a higher Xmx

phronmophobic22:07:34

I set -Xmx12g . It didn't seem to be bumping into memory limitations, but I can try again.

phronmophobic22:07:09

Is there a rule of thumb or should I just keep feeding in more memory?

Huahai22:07:37

the more memory the better of course, LMDB is a MMAP db

Huahai22:07:44

to give you some idea, when I tried to run benchmark for fulltext search on full wiki data, I set both Xmx and Xms to be 24G

👍 2

phronmophobic22:07:14

ok, I'll give a try

Huahai22:07:16

the data size is 15GB

phronmophobic22:07:59

Is (d/datalog-index-cache-limit @conn 0) the right way to disable caching?

Huahai22:07:16

if you hit the memory (80% default), spill to disk kicks in, everything will slow to a crawl

Huahai22:07:26

80% heap

phronmophobic22:07:28

oh, maybe that is what was happening

Huahai22:07:57

you can set a higher percentage, but the best is to increase heap size

Huahai22:07:13

we are using clojure, so it is memory hungry

phronmophobic22:07:11

I think I saw that lmdb dbs are copyable between machines. I might look into trying to do the load on a high memory VM and then download it later.

Huahai22:07:35

correct, you can just copy the db (while compacting)

phronmophobic22:07:57

What does compacting mean?

Huahai22:07:07

0.9.0 will introduce compression, hopefully the file will be smaller

🚀 2

phronmophobic22:07:27

I'll try on my laptop, but I don't actually have that much ram (only 16gb)

Huahai22:07:28

compacting is an option in copy, it will remove unused pages

Huahai22:07:28

you can set the spill-to-disk ratio to 100% or something, so it will not kick in, but you may get OOM

phronmophobic22:07:32

Ok, I'll give those things a try. Thanks for your help!

Huahai22:07:39

the new full text index introduced in 0.8.x is faster, but it does eat much more memory

Huahai22:07:18

if you don’t really must have it, don’t enable fulltext

phronmophobic22:07:21

interesting. I ran clj-kondo analysis on around 18k clojure github repos. The search is pretty useful without fulltext, but I feel like having fulltext search would be awesome.

Huahai22:07:24

{:spill-opts {:spill-threshold 100}} for :kv-opts

👍 2

phronmophobic22:07:05

If I load the data using a high memory VM and then try to query later on a more modest computer, do you think that will also run into problems?

Huahai22:07:12

that will disable spill-to-disk

Huahai22:07:15

probably ok to try that.

Huahai22:07:49

since you are indexing code, I would use a custom analyzer

phronmophobic22:07:21

that's a good point. I'm not actually indexing the code itself, mostly the names of keywords, namespaces, and functios and the doc strings.

Huahai22:07:23

maybe default english analyzer is ok, i don’t know, worth to think about

👍 2

2023-07-03

Channels