2024-10-06 datalevin | Clojure Slack Archive

datalevin 2024-10-06

2024-10-06T14:57:16.575219Z

So an interesting consequence of :auto-entity-time? true being keyed by :db/created-at and :db/updated-at means that these can’t really be used with the datoms function as it would mean iterating over every entity in the database. So in cases where I care about say message created-at, I’m better off adding my own attribute, rather than using db/created-at. If I plan on using the datoms function. Is my understanding correct? This is all in the context of pagination btw. I’ve found range queries are really good for paginating by day/hour etc, but struggle with set step pagination i.e first 100 results, next 100 results etx. If I need a sort order I fall back on datom. Also unrelated but is (count (d/datoms :aev ...)) the fastest way to get the current count on an attribute? Thanks.

Huahai 2024-10-06T15:10:37.363299Z

The effort has always been to improve q. If you use query, probably you can filter things down to what you need.

Huahai 2024-10-06T15:31:04.794539Z

:limit and :offset have not been implemented yet. Looks like we should get it done prior to 1.0.0 since this has been asked a lot. Will do in next release.

Huahai 2024-10-06T15:34:07.478409Z

The fastest way of counting an attribute is not counting the datoms, it is to get the underlying DB and call -count. Right now, it may return an estimate, , but I can add an -actual-count function to return the actual counts. Will expose a count-datoms function in API in the next release.

Huahai 2024-10-06T15:36:01.714399Z

In DL, indexing access functions like datoms etc. are not necessarily faster than query, especially if the DB is big.

Huahai 2024-10-06T15:37:19.758779Z

These functions materialize full datoms, which has its cost, whereas in queries, values are directly poured into tuples without having to be materialized into datoms first.

🧠 2

2024-10-06T19:14:27.957469Z

Completely, agree with focusing on q. I try hard to avoid the datoms function as I always feel like it’s a bit of a hack, it’s also very limiting the minute you want to do some q based filtering. > :limit and :offset have not been implemented yet. Looks like we should get it done prior to 1.0.0 since this has been asked a lot. Will do in next release. That would be amazing, it’s honestly the only sharp edge I bump into now and then, most of the time I find some range query mechanism I can use to do pagination (i.e messages today). But, some times I do just need good old limit and offset. The reason I was asking about count, is I remember reading in the query planner docs that one of the advantages of how datalevin uses LMDB is that counts are cheap/free. Estimate is good enough for my current use case. Thanks.

Huahai 2024-10-06T19:49:58.160239Z

Yes, the counts are cheap. Just need to expose them to the users.

2024-10-06T19:54:28.549339Z

🚨 Disclaimer this may be a terrible idea! I’ve been playing around with creating a backup from the same JVM process while the database is running using a slightly modified version of copy. Means we don’t have to rely on cron (systemd makes sure the uberjar is always running), so we just have a thread that schedules backup jobs. This is when using embeded datalevin.

(ns app.backup
  (:require [datalevin.lmdb :as l]
            [app.db.core :as db])
  (:import [datalevin.db DB]
           [datalevin.storage Store]))

(defn copy
  "Copy a database. conn is a database connection. `dest-dir` is
  the destination data directory path. Will compact while copying if
  `compact?` is true."
  [conn dest-dir compact?]
  (let [lmdb (.-lmdb ^Store (.-store ^DB conn))]
    (println "Copying...")
    (l/copy lmdb dest-dir compact?)
    (println "Copied database.")))

(comment
  (copy @db/*conn* "db-backup-1" true)
  )

So far this has been working fine. The issue was using the normal copy function is it was opening and then closing a connection to the lmdb store. Closing the lmdb env would stop the embedded datalog database from working (need a new connection/restart). So this version gets the underlying lmdb store from the process’s existing connection (so we don’t need to close it). Is there anything I should watch out for?

Huahai 2024-10-06T20:00:14.618609Z

This is totally fine. LMDB copy is design to work when the DB is being used, as it just a read transaction, so it's working on a snapshot of the DB.

2024-10-06T20:09:49.882899Z

I was hoping you’d say that, makes backups trivial for us as we can just copy to block storage and don’t have to set up cron (make sure it’s running etc). We’re working with the constraint that ideally our clients just want to deploy a jar. 🎉

🙂 1

Clojurians Log v2

datalevin 2024-10-06