2024-07-20 xtdb | Clojure Slack Archive

xtdb

jf 2024-07-20T14:45:44.138039Z

does xtdb support the generation of a monotonically-increasing number to assign to records?

jarohen 2024-09-12T08:35:56.551409Z

the primary index is a hash index, so we don't use that for range predicates, only for lookup when you have a primary key in hand

jarohen 2024-09-12T08:36:51.635169Z

for range predicates, we keep page-level metadata about what data is in each page - min/max values and a bloom filter - so these would be able to filter out (hopefully a decent number of) pages in a range query

jf 2024-09-12T08:54:11.503959Z

Ok that's interesting. Thank you for the detail! Very helpful and happy to understand more :) Offhand I believe cockroachdb also does the same in terms of keeping ranges (but I think without the bloom filters if I'm not mistaken). I assume this you also do splitting and compaction as ranges expand or contract. Re bloom filters: have you looked at cuckoo filters?

jarohen 2024-09-12T08:56:02.506699Z

so these aren't range indices (i.e. sorted by the range var, because they're already partitioned/sorted by primary key), they're literally just two values that say the min and max values within that page

jarohen 2024-09-12T08:56:29.442469Z

we don't currently (although we're looking to add) secondary indices - those would be sorted by the range key

jarohen 2024-09-12T08:56:56.972209Z

I've not really looked at cuckoo filters beyond skimming their wikipedia page a while back 🙂

refset 2024-09-12T09:02:23.794339Z

> they're literally just two values that say the min and max values within that page specifically "zone maps" is the concept to dig into, e.g. https://vldb.org/pvldb/vol10/p1622-ziauddin.pdf

jf 2024-09-12T09:54:32.651889Z

thank you both for the details and clarification!

🙏 2

jarohen 2024-07-21T13:18:42.501519Z

if you're using XT2, one of the best IDs you can use from an indexing point-of-view is a random (v4) or otherwise well-distributed UUID (contrary to usual database wisdom), so it's fine to generate these in the application and pass them in

jarohen 2024-07-21T13:20:12.929099Z

(assuming it's not a specific business requirement to have auto-incrementing IDs)

jf 2024-07-21T14:15:48.185229Z

@taylor.jeremydavid in some sense it would act as some sort of external identifier... but not in the sense that you would think about. Still thinking through implementation details though. My latest thought is that perhaps this isnt really needed. I need to do some additional research and thinking. @jarohen I remember catching a whiff of an opinion some time ago that UUIDs are bad... but never looked into it. Would you happen to have any links for me to understand more about UUIDs?

jarohen 2024-07-21T17:00:38.809249Z

(random) UUIDs are considered bad for b-tree database indices because they cause a lot of tree maintenance - tbh, if you search for "uuid database performance" you get a load of articles that go into more detail 🙂 in XT2, though, our primary index is a hash trie, so same rules as a hash-map - the more distributed the IDs are, the better. now we just have to convince people who've avoided them as PKs for years to come back to them 😅

👌 1

jarohen 2024-07-21T17:11:11.261489Z

that said, quite a lot of those articles seem to assume you're storing the UUID as a string, and then shoot that down for being inefficient (which it is, but most databases either have binary or even specific UUID support), so maybe a pinch of salt required 😃

seancorfield 2024-07-22T01:04:35.452679Z

There are some tables in our (MySQL) DB where we hit the maximum for an auto-incrementing ID and we've had to switch to (binary) UUIDs which... isn't so great in some ways but kind of preps us for the mindset of moving to XTDBv2 at some point. We have several tables where we've been forced to move to BINARY(16) and switching to/from string UUID as needed for external references (such as in form fields or url parameters) hasn't been bad at all -- you just need to think ahead a bit.

➕ 1

refset 2024-07-20T18:17:14.928059Z

Hey @jf.slack-clojurians nothing built in (for v1 or v2). What's the use-case out of interest? External identifier? For v2 we have considered implementing SQL's https://learn.microsoft.com/en-us/sql/t-sql/statements/create-sequence-transact-sql?view=sql-server-ver16 functionality - would that work for you? There's an issue for this https://github.com/xtdb/xtdb/issues/3222 For v1 you probably want a transaction function to increment a stored counter entity - I saw your posts on #biff and I suspect there may be some convenient ways to add a counter in that codebase also

jf 2024-09-05T15:31:53.897319Z

@jarohen re hash trie: does this mean that something like a time-based UUID(v7) where the chars (or bytes; depending on storage) at the front are mostly the same would be bad? EDIT: this would be for v1 though

🤔 1

jarohen 2024-09-05T15:34:14.643119Z

yeah, I'd guess so - important that the ones at the front differ for XT (contrary to usual database wisdom)

👌 1

jf 2024-09-05T16:00:46.595369Z

Actually I just had an idea: what if the similar chars were at the end instead? Would that resolve the issue of having similar characters?

jarohen 2024-09-05T16:02:48.851069Z

yep, that'd be fine 🙂 it's the prefix that matters

jf 2024-09-05T16:03:15.442079Z

:) excellent! Thank you!

jf 2024-09-11T15:08:17.764789Z

@jarohen sorry, but I have one more question about ids. Is there any advantage to using parse-uuid to create a java.util.UUID instance and using that as the id... vs simply just using the string version of the UUID?

jarohen 2024-09-11T15:10:32.483729Z

yep - if it's a j.u.UUID we store that verbatim as the internal ID; if it's a string, we don't check whether it's a UUID, we just hash it to give us our internal ID

jf 2024-09-12T01:43:22.069279Z

I see, thank you. I guess (correct me if I'm wrong) this also means that if I provide anything other than a uuid object for an id that range queries (or any such query where you need to peek into the value) on the "id" (the id from the perspective of the application, rather than that of xtdb) would result in a table scan? (I'm not sure, but I'm assuming that a non-uuid non-string id as provided to xtdb will get hashed as well.)

Clojurians Log v2

xtdb