Fork me on GitHub
#datalevin
<
2023-05-10
>
Eugen17:05:35

hi @huahaiy: would it make sense for datalevin server component to be able to manage multiple lmdb database files? The way postgresql has a cluster of databases (where each db is isolated). I would imagine this might be useful in a context where there are large datalevin stores and you would like to sync / backup the individual /lmdb file level

Huahai17:05:06

it already does

Eugen17:05:35

oh, I did not know that and did not check before asking

Huahai17:05:59

The server manages not just multiple databases, but also multiple users, with full RBAC.

Eugen17:05:26

I should give that a try at some point.

Eugen17:05:03

I think in next 2 weeks I will have time to work on datalevin pieces and finish the map / iterator PR

Huahai17:05:06

In fact, our use of datalevin in production is server only

Eugen17:05:19

cool 🙂

andersmurphy19:05:28

Not sure if it’s best to ask this here or in datalog channel, but I’m using datalevin so I figured I’d try here first. 😅 I’m new to datalog and datalevin/datascript/datomic. I’m working with a large dataset (millions of signatures) and I’m effectively trying to get the top N latest results for :signature ordered by block-time (timestamp), but with the caveat that I only care about signatures that belong to a particular :program. The only performant way I’ve found to do this is the following, but it feels a tad complicated:

(let [program-address          "foo"
         db                       @db
         [{program-db-id :db/id}] (d/q '[:find [(pull ?p [:db/id])]
                                         :in $ ?program-id
                                         :where
                                         [?p :program/id ?program-id]]
                                       db
                                       program-address)]
     (->> (d/datoms db :ave :signature/block-time)
          (map first)
          (partition-all 10)
          (mapcat #(d/pull-many db [:signature/program] %))
          (filter (comp #{program-db-id} :db/id :signature/program))
          (take 1)))
Am I missing something obvious? In sql I’d just add an index on block-time and where program-id order-by block-time limit N, to get the latest N values for a given program-id. Here I’ve had to partition the result to lazily pull-many in chunks and then filter by the program-db-id. Any help greatly appreciated. 😀

Huahai19:05:12

Right. There’s no query optimizer to speak of (except some obvious optimizations that we do) at this point, so effectively you have to do these on your own. My goal is to improve the query performance to the point you can do what you do in sql. What’s what I am talking about when I say my goal is to bring the datalevin performance to be on par with RDBMS. Stay tuned.

🙏 1
andersmurphy19:05:16

That’s great news! Datalevin has been a fantastic experience so far.

Huahai19:05:14

I believe this goal is possible to achieve, because we can do whatever RDBMS does behind the scene. I have done the research and done some of the work, just need to finish it.

andersmurphy19:05:30

I’d normally use sqlite for this sort of stuff, but I’d still be writing migrations and working out the schema! 😆

Huahai19:05:31

right, one of the advantage of datalog is that the schema is about the attribute, not about the entity, so it is much more flexible

andersmurphy19:05:40

That’s good to hear, I know ordering/sort-by type stuff is a challenge for datomic style dbs. At least that’s my understanding. Mainly wanted to make sure I wasn’t missing something obvious.

Huahai19:05:32

it is a well known issue that I intent to solve.

🎉 1
Huahai19:05:15

we will retain the flexibility, and achieve the desired performance. The solution, like all computer science problems, is to add more data structures, i.e. create new types of indices, which is the work I have already done. So stay tuned.