datalevin

fs42 2024-08-14T17:37:13.070769Z

I've been playing around with the different key-type and value-type that you can :put in a dbi. It's kind of confusing if you start mixing datatypes, like :long and :string, in the same dbi as you have to know the types before getting the value. For example [:all] fails as it doesn't know how to convert the different key-value without explicitly being told. Is it fair to say that you really shouldn't mix types in a dbi? Should you use :data, i.e. edn, as the default and only consider an explicit type for optimizations maybe?

Huahai 2024-08-15T05:30:00.229399Z

Mixing types is useful. As the number of DBIs is limited. For example, the meta DBI of the Datalog store is mixed.

Huahai 2024-08-15T05:30:50.023969Z

You really need to know what you are doing. For example, if you are mixing types, it doesn't make sense to use [:all

Huahai 2024-08-15T05:32:34.610209Z

I wound't use :data as the default. as one of the point of using a KV store is to do range query. :data really doesn't support that, except in a few limited cases.

Huahai 2024-08-15T05:34:30.650289Z

The point of adding headers to data in our implementation is to support mixed typed DBI, so that they do not interferer with each other.

Huahai 2024-08-15T05:35:26.109679Z

So, the intend of my design is to support mixed data DBI.

Huahai 2024-08-15T05:55:36.557129Z

Note, the mixing is intended for data types only, not for semantics. For example, you don't want to have two types of :string data stored in one DBI, unless you have a way to distinguish them (.e.g adding your own headers to your strings)..

Huahai 2024-08-15T05:56:52.265919Z

I call Datalevin a "simple" database for a reason 🙂.

fs42 2024-08-15T05:59:53.325539Z

Ok - thanks for the explanations! Guess I have to play a little more with those types as I am still a little confused how to apply them and for what use cases. When you mention “… adding your own headers to your strings…” - can you elaborate a little more how you would do that?

Huahai 2024-08-15T06:01:38.998699Z

well, just put some character in front as headers, for example, for type A, they all starts with "A|", something like that.

Huahai 2024-08-15T06:03:39.767019Z

I wouldn't recommend to do that, I would use tuple data type for these: e.g. use a [:keywod, :string] tuple as key

fs42 2024-08-15T06:04:37.107099Z

My initial idea was that you could use different types for keys to distinguish different indexes. Like having one data structure that you could index on the date and on the name for example. With those different data types for date (instant) and name (string), you could query for ranges specifically for only the one indexed on date or name in the same dbi. Is that a valid use case?

Huahai 2024-08-15T06:04:51.961099Z

So for type A string, a key would be [:a "a string"], for Type B strings, a key would be [:b "a string"]

Huahai 2024-08-15T06:06:01.067979Z

What you describe is indeed the intent of the design.

Huahai 2024-08-15T06:07:06.336489Z

In that design, it doesn't make sense to say [:all] though.

fs42 2024-08-15T06:08:21.313879Z

Yes - I understand the issue with [:all] as you cannot really compare different base type like longs or strings.

Huahai 2024-08-15T06:09:03.348569Z

right, they can live in the same DBI, but when accessing them, you can only access them one at a time

fs42 2024-08-15T06:10:14.686149Z

When you define a key as [:a “a string”], it’s not a primitive/base type anymore… how is that different from edn then?

Huahai 2024-08-15T06:12:24.250989Z

very different. :data is subject to the encoding of nippy, which is not under our control. It is not built to do range query. It adds meta data for its own purpose, so the ordering coming out will not be expected.

Huahai 2024-08-15T06:13:09.244739Z

tuple data type is native to DL, it is built to do range queries. The order coming out is what you would expect.

Huahai 2024-08-15T06:15:40.953949Z

:data only works out for some limited use cases, but that's by accident, you shouldn't expect it to work in general.

Huahai 2024-08-15T06:17:20.045009Z

In general, if you expect only to do point query, it's fine to use :data, otherwise, you need to know your data type.

fs42 2024-08-15T06:18:46.718299Z

Ok… I have to do some more testing then, because so far :data seems to give me the expected sorted ordering of both strings and integers and seemed to work correctly for the get-range with the different :open, :greater-than, etc. - just to make sure: when I do not specify any datatype, then :data is the default, right?

Huahai 2024-08-15T06:19:58.442949Z

Maybe works out ok for integers, I don't think it will work out for strings, because nippy adds length, so the length of the strings will matter, which would be unexpected

Huahai 2024-08-15T06:20:25.689159Z

right, :data is default

Huahai 2024-08-15T06:23:02.604299Z

See https://github.com/juji-io/datalevin/issues/15

Huahai 2024-08-15T06:24:23.426499Z

So it doesn't even work for integers if your integers values are wildly varied.

fs42 2024-08-15T06:33:07.867419Z

Ok - maybe my simple tests didn’t hit on those limitations - let me play a little more. The other thing I just learned tonight was that the documentation about the datatypes at put-buffer, includes the paragraph: — …x-type can be one of following scalar data types, a vector of these scalars to indicate a heterogeneous tuple data type, or a vector of a single scalar to indicate a homogeneous tuple data type — Which I previously completely missed… guess I kind of glanced over it assuming that only scalar datatypes were allowed. My bad. Thanks for sharing some insight - I’ll report back my findings! Good night!

Huahai 2024-08-15T06:34:07.750429Z

good night