datalevin

phronmophobic 2025-06-10T21:12:43.892389Z

I'm trying the latest version of datalevin and I'm getting an error when trying to d/open-kv: (built for macOS 15.0 which is newer than running OS). I'm on Apple Sonoma 14.0. Is that too old?

phronmophobic 2025-06-10T21:23:35.540779Z

I can run 0.9.20, but not 0.9.21

Huahai 2025-06-10T22:52:58.113049Z

There are native dependencies that we expect. Unfortunately, we have to use whatever CI/CD build server uses.

phronmophobic 2025-06-10T22:59:11.191109Z

does setting the mac osx deployment target not work? https://stackoverflow.com/questions/25352389/what-is-the-difference-between-macosx-deployment-target-and-mmacosx-version-min#25362535

Huahai 2025-06-10T22:59:56.456609Z

Don’t know about this

phronmophobic 2025-06-10T23:00:43.408779Z

that’s what I use when building native libs for jars. It should work.

phronmophobic 2025-06-10T23:01:10.605309Z

I’m away from keyboard, but setting the environment variable is usually the easiest method.

Huahai 2025-06-10T23:03:59.382429Z

https://github.com/juji-io/dtlvnative PR welcomes

phronmophobic 2025-06-12T15:49:05.570319Z

I think https://github.com/juji-io/dtlvnative/commit/f2fdcb94d0420f0c4fec85eebf289c3e098ac9bd is the only change you need to make. I'm trying to test, but I don't see where macosx-arm64 is built. I see the script for macosx-x86_64 in the github workflow, but not arm64.

Huahai 2025-06-12T17:26:07.423259Z

It is in https://github.com/juji-io/dtlvnative/blob/master/.cirrus.yml

phronmophobic 2025-06-12T22:44:01.624799Z

Setting the mac osx deployment target fixes the original error, (built for macOS 15.0 which is newer than running OS), but it's still not loading due to a dependency on homebrew's libomp. I use macports which stores libraries in a different path, so the shared library fails to load.

phronmophobic 2025-06-12T22:45:51.137339Z

It's possible to workaround this by messing with rpaths, but that's beyond the scope of what I have time to look into and implement.

phronmophobic 2025-06-12T22:46:04.012469Z

Would you like me to file a github issue?

Huahai 2025-06-12T22:47:35.317229Z

sure

phronmophobic 2025-06-10T21:57:08.066089Z

For the key-value db, what does datalevin use for equality? Can maps be used as keys?

Huahai 2025-06-10T22:53:54.447729Z

Yes. We use nippy for serialization of Clojure data structure. As long as it is less than the key size limit.

phronmophobic 2025-06-10T22:57:41.349749Z

does it compare by hasheq? I guess I was just trying to make sure that unordered maps can reliably be used as lookup keys.

Huahai 2025-06-10T23:01:09.613649Z

No, it compare by bytes

Huahai 2025-06-10T23:02:42.631909Z

Yes, it can be lookup keys, as two equal maps will be serialized into the same bytes

phronmophobic 2025-06-10T23:36:53.752089Z

It doesn't seem like nippy guarantees that two equal maps will deserialize to the same bytes:

(let [m1 (-> {}
             (assoc :bar :foo)
             (assoc :foo :bar))
      m2 {:foo :bar
          :bar :foo}]
  {:equal? (= m1 m2)
   :same-bytes? (= (seq (nippy/freeze m1))
                  (seq (nippy/freeze m2)))})
;; {:equal? true
;;  :same-bytes? false}
Does datalevin do something special here?

Huahai 2025-06-10T23:37:20.133619Z

No

phronmophobic 2025-06-10T23:37:57.530679Z

ok, so maps can't be used reliably as lookup keys.

phronmophobic 2025-06-10T23:40:17.041229Z

https://github.com/taoensso/nippy/tree/master?tab=readme-ov-file#stability-of-byte-output > It has never been an objective of Nippy to offer predictable byte output, and I'd generally recommend against depending on specific byte output. It seems like comparing bytes for keys might be a footgun.

Huahai 2025-06-10T23:41:49.459599Z

Why would you want to use maps for keys in a database?

phronmophobic 2025-06-10T23:43:00.547079Z

My plan was to use datalevin's key value db as a cache for downloading files where the key is something like:

{:git/sha (:git/sha repo)
 :repo name
 :owner owner
 :file fname}

Huahai 2025-06-10T23:43:20.482389Z

Use a tuple

Huahai 2025-06-10T23:44:26.758359Z

You don’t want to store those keys, it’s wasteful, isn’t it?

phronmophobic 2025-06-10T23:45:20.496539Z

Using a tuple works fine in this case.

phronmophobic 2025-06-10T23:45:38.694869Z

The extra disk space required for the key is negligible.

phronmophobic 2025-06-10T23:47:02.342009Z

Although the fact that the key value db relies on byte equality makes me wonder if there are subtle bugs with using other types of keys across versions.

phronmophobic 2025-06-10T23:48:54.618239Z

and potentially even when the key is read and written using the same version.

Huahai 2025-06-10T23:48:58.728829Z

All key value stores do this.

phronmophobic 2025-06-10T23:49:55.741029Z

presumably, some key value stores use a serialization mechanism that guarantees predictable byte output if they rely on byte equality.

Huahai 2025-06-10T23:52:10.603249Z

It is predictable. As long as you use the same version of nippy. Cross major DB version migration needs data migration anyway

Huahai 2025-06-10T23:52:39.710109Z

All a DB can do is to automate it

Huahai 2025-06-10T23:53:45.038359Z

So MySQL automates migration, whereas Postgres doesn’t. Auto migration is on our roadmap.

Huahai 2025-06-10T23:57:09.341699Z

Don’t know what you are talking about, “unpredictable byte layout” is just nonsense.

phronmophobic 2025-06-10T23:57:48.639869Z

> It has never been an objective of Nippy to offer predictable byte output, and I'd generally recommend against depending on specific byte output. I believe that nippy probably does have predictable byte output for some datatypes, but it's not clear which ones (maps do not). There may be other types that also do not generally have predictable byte output > As long as you use the same version of nippy This is important to know. Basically, it means that if you're using datalevin, you should also pin your nippy dep or some transitive dependency might break your db.

Huahai 2025-06-11T00:00:28.061609Z

Isn’t it the same for all libraries?

phronmophobic 2025-06-11T00:01:25.116299Z

I don't have anything against datalevin. It's my favorite embedded db. I'm just trying to understand what the guarantees are.

Huahai 2025-06-11T00:01:58.913779Z

The thing is, if you use Datalevin, you should prefer to use its data types, rather than just throwing random things in.

phronmophobic 2025-06-11T00:02:41.833219Z

Isn’t it the same for all libraries?I don't think there's a comparable example for how a transitive dep could break a sqlite database.

Huahai 2025-06-11T00:03:34.354919Z

That’s because you cannot store a Clojure map in SQLite

phronmophobic 2025-06-11T00:04:02.113269Z

What datatypes do you suggest? How am I supposed to know which datatypes guarantee predictable byte output?

Huahai 2025-06-11T00:04:26.982269Z

I said tuple

Huahai 2025-06-11T00:05:49.648989Z

Supporting arbitrary data is on the roadmap, but for that, you will need to implement protocols. There is no free lunch

phronmophobic 2025-06-11T00:06:45.448449Z

Is there a difference between a tuple and a clojure vector?

Huahai 2025-06-11T00:07:17.303659Z

Of course

phronmophobic 2025-06-11T00:08:08.210889Z

I see https://cljdoc.org/d/datalevin/datalevin/0.9.22/api/datalevin.built-ins?q=tuple#tuple, but I can't find any other reference to tuple in the guide part of the docs.

Huahai 2025-06-11T00:08:43.696539Z

Tuple is a Datalevin data type. Whereas a Clojure vector is just a blob that will be serialized with nippy

Huahai 2025-06-11T00:09:21.922629Z

You are looking at wrong thing

Huahai 2025-06-11T00:10:46.155979Z

build-in name space is for datalog build in query function and predicate

phronmophobic 2025-06-11T00:11:24.863279Z

I just searched the docs for "tuple" and that's the only thing that came up besides stuff in the changelog

Huahai 2025-06-11T00:11:38.786859Z

Datalevin.core is where most info about the public api is at

phronmophobic 2025-06-11T00:13:24.268739Z

Do you have a link to any references or docs that can help explain how to use tuples?

phronmophobic 2025-06-11T00:13:38.569179Z

Looking at https://cljdoc.org/d/datalevin/datalevin/0.9.22/api/datalevin.core. I'm not sure where to start.

Huahai 2025-06-11T00:15:54.583039Z

You can start by ctrl-f searching for “tuple” on that page.

Huahai 2025-06-11T00:17:46.588749Z

put-buffer

Huahai 2025-06-11T00:18:29.554289Z

transact-kv

Huahai 2025-06-11T00:19:29.111929Z

Doc string of these should tell you everything.

phronmophobic 2025-06-11T00:21:53.887679Z

what is a buffer?

phronmophobic 2025-06-11T00:22:11.724499Z

bytebuffer?

Huahai 2025-06-11T00:23:21.878919Z

Yes