Fork me on GitHub
#datalevin
<
2022-08-25
>
currentoor16:08:15

Datalevin looks like a good a fit for an idea I had, but I need to store unsigned 256 bit integers. What would it take to add bigint support? I need to be able to compare these integers inside queries. Also, in Datomic you can invoke arbitrary methods inside d/q, is that possible in Datalevin? I didn’t see it in the docs or limitations section. Because if arbitrary methods are allowed then I could just deserialize the big int as EDN and do my comparison operations on it. But that would have extra overhead.

Huahai16:08:23

What’s needed to store bigint is to find a way to serialize that into bytes, such that the bytes ordering is the same as the bigint ordering.

Huahai16:08:45

You can just arbitrary function in query, no difference from datascript

👍 1
currentoor16:08:06

Big int’s have a toByteArray instance method, would that work?

currentoor16:08:32

the constructor can also take in a byte array, for the reverse situation https://docs.oracle.com/javase/8/docs/api/java/math/BigInteger.html#BigInteger-byte:A-

currentoor16:08:31

ah but this byte array will be two’s-complement representation…

currentoor16:08:03

But that won’t be an issue for positive ints, not sure how negative numbers should be handled

roklenarcic07:08:28

If you wanna be lazy about it you can print bigint to string and then parse that

roklenarcic07:08:44

Not optimal in terms of storage

tony.kay23:08:10

Don’t the internals of the engine have to have a comparison operation for putting things into the index? BigInteger already has a compareTo method, So it seems like toByteArray gets you the bytes to store, and the internals should pull the value to be compared, make it a BigInteger, and just use compareTo. That gets the AVE index in the right order, which is the primary concern isn’t it?

Huahai23:08:01

To have good performance, we would like to use the native bytes comparison. It’s not a big problem to encode BigInt correctly. We are doing the same for long, integer, double and float already. It’s tricker for BigDecimal, I am thinking of encoding it as double and use that as a prefix, so the range scan would be roughly in the correct order, and disambiguate with the exact representation of big decidmal.

Huahai23:08:38

so big decimal will be represented as 3 parts: a 64 bit double prefix, a 32 bit integer scale, followed by the big integer part

tony.kay23:08:32

oh, I thought we were just talking BigInteger

Huahai23:08:27

once we have bigint, it’s natural to have big decimal, the goal is to have feature parity with datomic

Huahai23:08:48

after this is done, the only data type missing is tuple

Huahai23:08:06

the next release will have bigint and bigdecimal. I will get to it after I have some time

tony.kay23:08:34

nice. Yeah, I see the problem with BigDecimal.

tony.kay23:08:51

Seems like you could just use the same approach as I described for bigint, but also store the scale, etc:

BigDecimal(BigInteger intVal, long val, int scale, int prec) 
then you have compareTo again.

tony.kay23:08:27

anyway, good luck with it. It’s a cool project.

Huahai23:08:38

sure, but I don’t assume I always have java method to access the bytes, so I am encoding things into the right bytes such that a native C function can compare very quickly.

Huahai23:08:11

e.g. our comparator for the graalvm version is a dead simple C function, only a few lines.

tony.kay23:08:33

Ah, right, forgot you’re using native code

Huahai00:08:23

Yeah, comparator performance is important for btree based DB

Huahai00:08:25

Because bytes comparison is pretty much most of the real work. Others are just pointer chasing.

roklenarcic16:08:02

These range compares only apply when using operations on key-value datastore mode, right? Not the datalog store?

Huahai19:08:31

Datalog is a layer on top of the key value store. So they apply equally.

roklenarcic19:08:21

Oh I wasn’t aware. I thought that each database was either datalog type or key-value type.

roklenarcic20:08:48

Ah I see, you have kv databases like "datalevin/ave" and such. So if I have a datalog attribute :price , how do I find all the entities that have it between 1000 and 10000 with a kv range query?

Huahai22:08:04

Attributes are stored as int id, say :price is 5, so the range query is something like [:closed <tel:5-1000 5-10000|5-1000-0 5-10000>-maxe], where - is concatenation. Of course, the integers are encoded such that the bytes order has the same order as the value

Huahai22:08:38

The range query returns datoms

Huahai22:08:05

The above is the current implementation. The next datalog store and query engine is more sophisticated. So entity ids will most likely come from a bitmap.

roklenarcic10:08:30

Thank you for the explanation