does xtdb support the generation of a monotonically-increasing number to assign to records?
the primary index is a hash index, so we don't use that for range predicates, only for lookup when you have a primary key in hand
for range predicates, we keep page-level metadata about what data is in each page - min/max values and a bloom filter - so these would be able to filter out (hopefully a decent number of) pages in a range query
Ok that's interesting. Thank you for the detail! Very helpful and happy to understand more :) Offhand I believe cockroachdb also does the same in terms of keeping ranges (but I think without the bloom filters if I'm not mistaken). I assume this you also do splitting and compaction as ranges expand or contract. Re bloom filters: have you looked at cuckoo filters?
so these aren't range indices (i.e. sorted by the range var, because they're already partitioned/sorted by primary key), they're literally just two values that say the min and max values within that page
we don't currently (although we're looking to add) secondary indices - those would be sorted by the range key
I've not really looked at cuckoo filters beyond skimming their wikipedia page a while back 🙂
> they're literally just two values that say the min and max values within that page specifically "zone maps" is the concept to dig into, e.g. https://vldb.org/pvldb/vol10/p1622-ziauddin.pdf
thank you both for the details and clarification!
if you're using XT2, one of the best IDs you can use from an indexing point-of-view is a random (v4) or otherwise well-distributed UUID (contrary to usual database wisdom), so it's fine to generate these in the application and pass them in
(assuming it's not a specific business requirement to have auto-incrementing IDs)
@taylor.jeremydavid in some sense it would act as some sort of external identifier... but not in the sense that you would think about. Still thinking through implementation details though. My latest thought is that perhaps this isnt really needed. I need to do some additional research and thinking. @jarohen I remember catching a whiff of an opinion some time ago that UUIDs are bad... but never looked into it. Would you happen to have any links for me to understand more about UUIDs?
(random) UUIDs are considered bad for b-tree database indices because they cause a lot of tree maintenance - tbh, if you search for "uuid database performance" you get a load of articles that go into more detail 🙂 in XT2, though, our primary index is a hash trie, so same rules as a hash-map - the more distributed the IDs are, the better. now we just have to convince people who've avoided them as PKs for years to come back to them 😅
that said, quite a lot of those articles seem to assume you're storing the UUID as a string, and then shoot that down for being inefficient (which it is, but most databases either have binary or even specific UUID support), so maybe a pinch of salt required 😃
There are some tables in our (MySQL) DB where we hit the maximum for an auto-incrementing ID and we've had to switch to (binary) UUIDs which... isn't so great in some ways but kind of preps us for the mindset of moving to XTDBv2 at some point. We have several tables where we've been forced to move to BINARY(16) and switching to/from string UUID as needed for external references (such as in form fields or url parameters) hasn't been bad at all -- you just need to think ahead a bit.
Hey @jf.slack-clojurians nothing built in (for v1 or v2). What's the use-case out of interest? External identifier? For v2 we have considered implementing SQL's https://learn.microsoft.com/en-us/sql/t-sql/statements/create-sequence-transact-sql?view=sql-server-ver16 functionality - would that work for you? There's an issue for this https://github.com/xtdb/xtdb/issues/3222 For v1 you probably want a transaction function to increment a stored counter entity - I saw your posts on #biff and I suspect there may be some convenient ways to add a counter in that codebase also
@jarohen re hash trie: does this mean that something like a time-based UUID(v7) where the chars (or bytes; depending on storage) at the front are mostly the same would be bad? EDIT: this would be for v1 though
yeah, I'd guess so - important that the ones at the front differ for XT (contrary to usual database wisdom)
Actually I just had an idea: what if the similar chars were at the end instead? Would that resolve the issue of having similar characters?
yep, that'd be fine 🙂 it's the prefix that matters
:) excellent! Thank you!
@jarohen sorry, but I have one more question about ids. Is there any advantage to using parse-uuid to create a java.util.UUID instance and using that as the id... vs simply just using the string version of the UUID?
yep - if it's a j.u.UUID we store that verbatim as the internal ID; if it's a string, we don't check whether it's a UUID, we just hash it to give us our internal ID
I see, thank you. I guess (correct me if I'm wrong) this also means that if I provide anything other than a uuid object for an id that range queries (or any such query where you need to peek into the value) on the "id" (the id from the perspective of the application, rather than that of xtdb) would result in a table scan? (I'm not sure, but I'm assuming that a non-uuid non-string id as provided to xtdb will get hashed as well.)