what are the benchmark / performance of joins / nest_one / nest_many? since xtdb is schemaless i cannot "help" him infer when a value is a reference to another entity, what are the tradeoffs of such a thing? while at it, how does the indexer work?
hey @itai π tl;dr is It Dependsβ’ for this one π
nest-one/nest-many naturally plan to a nested loop join to start, but then the optimiser tries its best to turn it into a hash left join - if you're using beta6 you can see whether it's managed to do so by prepending EXPLAIN to your query
the format is pretty internal atm, but main thing is that if you see :apply it's still doing a nested-loop join
> how does the indexer work? broadly speaking (I'll write this up properly before we hit GA) the nodes consume the log into in-memory blocks of ~100k rows, which they then save to the object store. in the background, they then collaborate to compact these blocks into an LSM tree structure, sharded by the hash of the ID of the rows, which is then more suitable for query
hey thanks for your response
so in cases where i query with where on random fields, how do indexes are used?
for example, select name, age where age > 10 going to do a full scan for the age column?
for a predicate like where age > 10 (relatively unselective, compared to e.g. where email='), we keep metadata on each disk page to understand its min/max values, so we'll only look at disk pages that contain rows with age > 10
a traditional b-tree index isn't that much use for those kinds of predicates, because the matching rows will likely be relatively distributed on disk - unless you can entirely serve the query from the sorted index, you'll have to go look at all the disk pages anyway
(Postgres does the same, for example)
cool thanks so what about selective predicates?
We are currently running a service that utilizes XTDB on a pod, with an allocated memory limit of 20 GB. Although we have configured the JVM with a minimum heap size of 5 GB and a maximum of 8 GB, our observations indicate that the heap usage remains below 5 GB. Despite this, the htop command consistently reports a memory consumption of approximately 18 GB.
We have attempted to profile the application, but the results have not clarified the source of the high memory usage. One possibility we are considering is that RocksDB may be consuming the additional memory. Furthermore, when we increased the memory allocation from 20 GB to 24 GB, the overall consumption also increased to around 20 GB. We are seeking insights or recommendations to help identify and resolve the issue of unexpectedly high memory usage.
hey @namit.shah π tl;dr yes this is expected π
XTDB makes heavy use of off-heap memory on the JVM which we manage directly - its on-heap usage should remain relatively low. We then use as much memory as you give us - anything above the (again relatively low) minimum requirements is then used for caches, so it's not required but obviously desirable.
Most JVM's will default the off-heap memory size to be the same (again) as max-heap - e.g. if you specify -Xmx8g you'll get 8GB heap and 8GB off-heap - but you can better control this through -XX:MaxDirectMemorySize if you need to.
If I have -Xmx8g, shouldn't that restrict the memory usage to 16 GB? Or It doesn't depend on that. It just tries to use as much as it can?
Rocks (as well as the JVM itself) will also be using some native memory, yes, but this should be relatively small compared to the heap/off-heap memory of the JVM - you can tweak this using the RocksDB configuration supplied to XTDB
This is our current production configuration,
β’ -Xmx8g
β’ -XX:MaxDirectMemorySize flag is not set i.e. it should default to 8 GB.
β’ We have a https://v1-docs.xtdb.com/storage/rocksdb/#blocks-cache of 2 GB.
β’ We have a Pod with 24 GB memory.
Right now the total consumption of the Pod is ~99%.
If we increase the Pod memory to 64 GB, will the memory consumption still reach 99% given we use the same parameters. Also, is there any way we can profile this. We tried using Java Profiler (JProfiler to be specific) but no success so far in understanding who is consuming how much memory off the heap.
> -XX:MaxDirectMemorySize flag is not set i.e. it should default to 8 GB.
I believe the exact behaviour can vary quite a lot across JVMs (i.e. I wouldn't be surprised if some default wasn't being applied), have you tried setting the flag and observing the effect?
We did try that but resulted into frequent memory spikes, and ultimately terminating the service because of full memory consumption. Maybe there are some gaps in our understanding regarding the memory allocation.
Hmm, I think we should try get to the bottom of that. Would you have any time early next week? Is this instability affecting your prod app currently? feel free to email: <mailto:hello@xtdb.com|hello@xtdb.com>
@jarohen @taylor.jeremydavid Not sure how close an eye you keep on Reddit? https://www.reddit.com/r/Clojure/comments/1itge6s/am_i_stupid_or_is_the_new_version_of_xtdb_super/
spotted and replied, cheers @seancorfield π if anyone here has had similar experiences, please do let us know π - thanks! π
The docs need still have the old /status URL and the docker run command does not have -p 8080:8080 to expose the healthz server.
See my reply to the OP over on Reddit.
thanks Sean, will get those updated π
updated π
This link on the main docs brings users to an outdated example, which adds to confusion on usage: [Learn XTQL Today βοΈ](https://docs.xtdb.com/static/learn-xtql-today-with-clojure.html)
Maybe just the initial put commands. I was trying to follow along in my own REPL and had to find the put-docs transaction