Fork me on GitHub
#core-logic
<
2018-07-12
>
bajrachar15:07:05

Is it possible to replace pldb in core logic with something like lmdb and if so how do I go about doing so

bajrachar15:07:32

having trouble particularly loading a large data set in memory with pldb

norman15:07:23

how large is large? We use pldb with very large data sets

norman15:07:09

But obviously not so large that we can’t load it in memory

bajrachar15:07:53

well the size of the file I am loading from is close to 6GB

bajrachar15:07:37

When I add in the facts - it blows up all the way up to 19GB in memory

bajrachar15:07:19

This could be due to intermediates created via clojure data structures?

norman15:07:51

Are you indexing a lot? There’s a lot of work to create the in memory index

norman15:07:33

I did not optimize that code for memory efficiency.

bajrachar15:07:07

Yes I do have few indices

norman15:07:12

I would say our “large” is 100-1000MB, which is definitely a lot less than your “large”

norman15:07:14

At some point an actual database is better. I don’t think that example is current, and if I recall it wasn’t actually a very good example

norman16:07:16

I probably can’t help too much though. I wrote pldb, but our use of core.logic and pldb has been stable for many years now, so it’s not code I touch on a day to day basis anymore. And since core.logic is under the clojure CA and dev process, I haven’t been terribly motivated to actively contribute

bajrachar16:07:26

I think the size bloat could be due to indexing as you pointed out - maybe I can play around with it and see if it reduces further

norman16:07:29

If you don’t need them, then remove them. But if you do, you’ll just be trading memory for CPU

bajrachar16:07:27

Also I've realized that clojure data structures by default occupy quite a bit of memory when operated on - unless we use transient

bajrachar16:07:56

so - I will also try if I serialize the db to disk and read back from it - if that reduces the size

norman16:07:19

I’m fairly certain the pldb code does not do that, but it might. It’s been a loong time 🙂

norman16:07:43

You can definitely serialize it. At one point we were saving pldbs in riak

hiredman16:07:58

if anything, serializing it will make it larger

hiredman16:07:01

serializing will remove any structural sharing in the data

bajrachar16:07:04

Thank you for your help @norman

bajrachar16:07:19

I am pretty new to Clojure and core.logic

bajrachar16:07:18

using it for a clinical decision support tool - and as such it's knowledge base is pretty large

hiredman16:07:00

https://gist.github.com/terjesb/3181018 might be a good place to start if you want to use core.logic without storing things in memory

bajrachar16:07:25

oh cool - thanks @hiredman -

bajrachar16:07:02

the dataset here being the lucene index?

hiredman16:07:00

in that code sure, but you can do something similar to extend it to other datastores, I was thinking of it more as example

bajrachar16:07:36

ok - I understand - thanks