Fork me on GitHub
#datalevin
<
2023-06-19
>
Eugen11:06:38

would it make sense to have an API to open multiple dbi at once? Right now it's single dbi

(open-dbi
    [db dbi-name]
    [db dbi-name opts]
    "Open a named DBI (i.e. sub-db) in the LMDB env")
maybe with signature (not sure if compatible with existing) :
[db opts & dbi-name] 

Huahai17:06:47

not sure. open-dbis would be easier?

Eugen18:06:41

I think so.

Eugen18:06:21

it could be added as an utility function as well

Eugen18:06:58

it's something I noticed when I have this:

(d/open-dbi lmdb docsearch-table)
        (d/open-dbi lmdb semmed-table)
        (d/open-dbi lmdb processed-rel-name)
        (d/open-dbi lmdb rel-name)

Eugen18:06:07

should I try a PR ? in datalevin.core ?!

Eugen20:06:32

I'll try first to add a local function that we will use.

Eugen20:06:50

once I figure out I can post it and you can decide if you want to add it or not

phronmophobic18:06:42

I have a datalevin db and I'm trying to scan the index. I thought using seek-datoms would return a result relatively quickly, but the following starts consuming memory rapidly and takes quite a while to return.

> (time
   (->> (d/seek-datoms
         (d/db conn)
         :ave
         ::basis
         "/fo")
        first))
"Elapsed time: 66873.939042 msecs"
If I make a subsequent call to seek-datoms with different value, I quickly hit the RAM limit for the process (currently 12g) at which point it appears to stop making progress
> (time
   (->> (d/seek-datoms
         (d/db conn)
         :ave
         ::basis
         "/bar")
        first))
Am I using seek-datoms incorrectly or is there a better way to scan the index?

phronmophobic18:06:23

Calling seek-datoms 3 times with different start components throws an OutOfMemoryError

Huahai18:06:01

If you know you are doing an operation that takes a lot of memory, you want to disable the cache.

👍 2
phronmophobic18:06:07

I thought seeking to a particular spot in the index would be pretty fast. Is that an incorrect intuition or is there something else going on?

Huahai18:06:44

i think you misunderstood what seek-datoms does. It starts from a spot, and return everything after that.

Huahai18:06:44

if you want a particular value, use index-range

phronmophobic18:06:41

I'll have to double check, but I think seek-datoms is lazy in datomic.

Huahai18:06:11

nothing is lazy in datalevin

phronmophobic18:06:54

It looks like datoms might be lazy?

datalevin.core/datoms
 [db index]
 [db index c1]
 [db index c1 c2]
 [db index c1 c2 c3]
 [db index c1 c2 c3 c4]
  Index lookup in Datalog db. Returns a sequence of datoms (lazy iterator over actual DB index) which components (e, a, v) match passed arguments.

Huahai18:06:18

that’s wrong doc string, we can change that

phronmophobic19:06:36

I just tried the following, and it seems to work:

> (let [db (d/db conn)]
    (d/datalog-index-cache-limit db 0)
    (time
     (->> (d/datoms
           db
           :avet
           ::basis
           "/deps.edn")
          first))) 
"Elapsed time: 1.436584 msecs"

phronmophobic19:06:39

It seems like index-range is also lazy. I think this query would scan my whole database otherwise:

> (let [db (d/db conn)]
    (d/datalog-index-cache-limit db 0)
    (time
     (->> (d/index-range
           db
           ::analysis-id
           "2023"
           nil)
          first)))
"Elapsed time: 4.952667 msecs"

Huahai19:06:28

As i said, right now, these are not lazy. if your memory is large enough, it will loads everything in memory. If memory is not large enough, they will spill to disk, hence the slowness you saw. Disabling cache was to save the memory, otherwise, the results accumulates in memory.

👍 2
phronmophobic19:06:47

Ok, I guess I'm just confused about why seek-datoms is so slow. Using index-range to scan all of the values of a property takes less than a second, but doing the same thing with seek-datoms takes over a minute:

(let [db (d/db conn)]
  (d/datalog-index-cache-limit db 0)
  (time
   (->> (d/index-range
         db
         ::analysis-id
         "2023"
         nil)
        first)))
vs
(let [db (d/db conn)]
  (d/datalog-index-cache-limit db 0)
  (time
   (->> (d/seek-datoms
         db
         :avet
         ::analysis-id
         "2023")
        first)))

Huahai19:06:54

we could have lazy version of things, trading some performance for laizness.

Huahai19:06:10

right, seek-datoms returns everything after that value, so it may have taken more than 80% of memory, then spill to disk is triggered

phronmophobic19:06:29

oh, seek-datoms also includes all of the other properties in the :avet index :face_palm: . I get it now. thanks3

Huahai19:06:16

Right, so it may not make much sense as an eager operation . We will change that in the future