This page is not created by, affiliated with, or supported by Slack Technologies, Inc.
2023-07-03
Channels
- # aleph (3)
- # announcements (2)
- # babashka (24)
- # beginners (71)
- # biff (5)
- # calva (19)
- # cider (7)
- # clj-kondo (15)
- # cljdoc (3)
- # clojure (76)
- # clojure-australia (1)
- # clojure-china (1)
- # clojure-denver (24)
- # clojure-europe (56)
- # clojure-filipino (1)
- # clojure-hk (1)
- # clojure-indonesia (1)
- # clojure-japan (1)
- # clojure-korea (1)
- # clojure-my (1)
- # clojure-nl (1)
- # clojure-norway (37)
- # clojure-sg (1)
- # clojure-taiwan (1)
- # clojure-uk (6)
- # clojurescript (2)
- # cursive (2)
- # datalevin (71)
- # datomic (9)
- # dev-tooling (5)
- # emacs (19)
- # events (1)
- # gratitude (1)
- # hoplon (6)
- # introduce-yourself (5)
- # jobs (1)
- # juxt (2)
- # lsp (23)
- # nbb (26)
- # off-topic (12)
- # other-languages (97)
- # practicalli (2)
- # releases (2)
- # remote-jobs (1)
- # shadow-cljs (24)
- # tools-deps (17)
- # vim (2)
How are people deploying Datalevin server in production? Does anyone have an experience report on AWS? Is anyone mounting a DS using EFS?
• We deploy Datalevin server in production. We use Azure, but AWS should be similar. Using Datalevin on EFS is not recommended. “Do not use LMDB databases on remote filesystems, even between processes on the same host. This breaks flock() on some OSes, possibly memory map sync, and certainly sync between programs on different hosts.”
So, if I’ve got an MDB larger than 3gb, what do you suggest?
I’m trying to migrate from Datomic cloud BTW
I’ve got a size limitation on containerized workflows in AWS
Are you on a VM or using a container on Azure?
If I do that, then I’m on the hook for all the added complexity of HA VMs. Any idea how serious the issues are with NFS?
i think that warning is regarding multiple process/hosts dealing with the same data file, if you don’t do that, it should be fine
OK, so let me run this by you…
Single load-balancer pointing at a failover container doing nothing but running dltv with the DSs mounted by NFS, no other server touches the NFS unless a failover occurs? Does this sound like a reliable solution to you?
it might be problemtaic when failover occurs though. some pages may not be flushed to disk yet
unless you use the most safe option, which is not the default right now, but will be in 0.9.0
Any CPU/Mem configuration guidelines you’d like to add would be appreciated. I’ll look into the safety option. (open-dbi db dbi-name {:flags []}) ?
Perfect. Thanks for taking the time to help… One more question. Is there a github repo where people have implemented embedding on IOS with native image?
Theoretically it looks possible, but I obviously don’t know much about it.
yeah, should be possible, and we will expand the supported platforms after 0.9.0, mostly a matter of CI config, but it does take time to do and test
@U0A74MRCJ Thanks so much. I’m really enjoying the simplicity you’ve created.
@U0A74MRCJ Just wanted to follow up with some encouragement. Just keep going! This server setup is incredibly fast. I just decided to put it on a VM and got the whole thing sorted out with no issues first time. Very intuitive how to use it. It’s super impressive what you’ve got.
Version 0.8.18 is released. Added nippy dump/load format, which is more robust, and fixed a few issues. https://github.com/juji-io/datalevin/blob/master/CHANGELOG.md#0817-2023-07-02
I was trying to load data into a db and have several :db/fulltext
attributes. It worked fine for a small sample, but loading increasingly took longer and longer. I let it run for 1-2 days and was only about halfway through loading the data (with transactions seeming to load slower and slower).
I tried loading from a fresh db after setting (d/datalog-index-cache-limit @conn 0)
(I think that turns off the cache? The documentation for this option wasn't clear to me.)
Am I doing something wrong or is there some tips/tricks that can help here?
More details in 🧵
Happy to share any useful info. The code and data are both open source.
The latest code is available at https://github.com/phronmophobic/dewey/blob/main/examples/deweydb/src/com/phronemophobic/dewey/db.clj
I was able to load the data previously without fulltext and the resulting db was around 70-80gb.
852mb.
It's the analysis.edn.gz from https://github.com/phronmophobic/dewey/releases/tag/2023-06-12
I set -Xmx12g
. It didn't seem to be bumping into memory limitations, but I can try again.
Is there a rule of thumb or should I just keep feeding in more memory?
to give you some idea, when I tried to run benchmark for fulltext search on full wiki data, I set both Xmx and Xms to be 24G
ok, I'll give a try
Is (d/datalog-index-cache-limit @conn 0)
the right way to disable caching?
if you hit the memory (80% default), spill to disk kicks in, everything will slow to a crawl
oh, maybe that is what was happening
I think I saw that lmdb dbs are copyable between machines. I might look into trying to do the load on a high memory VM and then download it later.
What does compacting mean?
I'll try on my laptop, but I don't actually have that much ram (only 16gb)
you can set the spill-to-disk ratio to 100% or something, so it will not kick in, but you may get OOM
Ok, I'll give those things a try. Thanks for your help!
the new full text index introduced in 0.8.x is faster, but it does eat much more memory
interesting. I ran clj-kondo analysis on around 18k clojure github repos. The search is pretty useful without fulltext, but I feel like having fulltext search would be awesome.
If I load the data using a high memory VM and then try to query later on a more modest computer, do you think that will also run into problems?
that's a good point. I'm not actually indexing the code itself, mostly the names of keywords, namespaces, and functios and the doc strings.