I've been playing around with the different key-type and value-type that you can :put in a dbi. It's kind of confusing if you start mixing datatypes, like :long and :string, in the same dbi as you have to know the types before getting the value. For example [:all] fails as it doesn't know how to convert the different key-value without explicitly being told. Is it fair to say that you really shouldn't mix types in a dbi? Should you use :data, i.e. edn, as the default and only consider an explicit type for optimizations maybe?
Mixing types is useful. As the number of DBIs is limited. For example, the meta DBI of the Datalog store is mixed.
You really need to know what you are doing. For example, if you are mixing types, it doesn't make sense to use [:all
I wound't use :data as the default. as one of the point of using a KV store is to do range query. :data really doesn't support that, except in a few limited cases.
The point of adding headers to data in our implementation is to support mixed typed DBI, so that they do not interferer with each other.
So, the intend of my design is to support mixed data DBI.
Note, the mixing is intended for data types only, not for semantics. For example, you don't want to have two types of :string data stored in one DBI, unless you have a way to distinguish them (.e.g adding your own headers to your strings)..
I call Datalevin a "simple" database for a reason 🙂.
Ok - thanks for the explanations! Guess I have to play a little more with those types as I am still a little confused how to apply them and for what use cases. When you mention “… adding your own headers to your strings…” - can you elaborate a little more how you would do that?
well, just put some character in front as headers, for example, for type A, they all starts with "A|", something like that.
I wouldn't recommend to do that, I would use tuple data type for these: e.g. use a [:keywod, :string] tuple as key
My initial idea was that you could use different types for keys to distinguish different indexes. Like having one data structure that you could index on the date and on the name for example. With those different data types for date (instant) and name (string), you could query for ranges specifically for only the one indexed on date or name in the same dbi. Is that a valid use case?
So for type A string, a key would be [:a "a string"], for Type B strings, a key would be [:b "a string"]
What you describe is indeed the intent of the design.
In that design, it doesn't make sense to say [:all] though.
Yes - I understand the issue with [:all] as you cannot really compare different base type like longs or strings.
right, they can live in the same DBI, but when accessing them, you can only access them one at a time
When you define a key as [:a “a string”], it’s not a primitive/base type anymore… how is that different from edn then?
very different. :data is subject to the encoding of nippy, which is not under our control. It is not built to do range query. It adds meta data for its own purpose, so the ordering coming out will not be expected.
tuple data type is native to DL, it is built to do range queries. The order coming out is what you would expect.
:data only works out for some limited use cases, but that's by accident, you shouldn't expect it to work in general.
In general, if you expect only to do point query, it's fine to use :data, otherwise, you need to know your data type.
Ok… I have to do some more testing then, because so far :data seems to give me the expected sorted ordering of both strings and integers and seemed to work correctly for the get-range with the different :open, :greater-than, etc. - just to make sure: when I do not specify any datatype, then :data is the default, right?
Maybe works out ok for integers, I don't think it will work out for strings, because nippy adds length, so the length of the strings will matter, which would be unexpected
right, :data is default
So it doesn't even work for integers if your integers values are wildly varied.
Ok - maybe my simple tests didn’t hit on those limitations - let me play a little more. The other thing I just learned tonight was that the documentation about the datatypes at put-buffer, includes the paragraph: — …x-type can be one of following scalar data types, a vector of these scalars to indicate a heterogeneous tuple data type, or a vector of a single scalar to indicate a homogeneous tuple data type — Which I previously completely missed… guess I kind of glanced over it assuming that only scalar datatypes were allowed. My bad. Thanks for sharing some insight - I’ll report back my findings! Good night!
good night