Fork me on GitHub
#xtdb
<
2023-10-11
>
dotemacs13:10:15

XTDB2, with data added:

(xt/submit-tx my-node [[:put :videos {:xt/id -1
                                      :video/id "123"
                                      :player1 "Foo"
                                      :player2 "Bar"
                                      :video
                                      [{:timestamp "00:02" :player :player1 :move "grips"}
                                       {:timestamp "00:05" :player :player1 :move "guard"}
                                       {:timestamp "00:10" :player :player2 :move "sweep"}
                                       {:timestamp "00:15" :player :player2 :move "choke"}]
                                      :winner :player1
                                      :winning-move "choke"}]])
I’d like to be able to query the data nested under :video, to say, give me all the maps where :move is "grips". How would I go about that? I can see a question about nested data querying from 4 months ago: https://clojurians.slack.com/archives/CG3AM2F7V/p1685635106014059 Is that still the case? Thanks

dotemacs16:10:38

With the help of this forum post: https://discuss.xtdb.com/t/2-x-how-filter-query-results-based-on-nested-vector-values/257/5 I put this together:

(xt/q my-node '{:find [timestamp move player]
                :where [($ :videos {:video [video ...]})
                        [(. video :timestamp) timestamp]
                        [(. video :player) player]
                        [(. video :move) move]
                        [(= move "grips")]]})

refset09:10:47

Hey @U3SG7RX7A thanks for sharing your solution 🙂

refset09:10:37

As per James' response in your Clojurians Slack link, we do want to make it easier to write this kind of nested data expression in a single pass

👍 1
dotemacs14:10:03

Hey @U899JBRPF thanks 👍 I can see that you’re scheduling a discussion on 2.x here: https://discuss.xtdb.com/t/upcoming-query-api-discussion-session-2-x-datalog/261 Any finger in the air, guesstimate, when we’re potentially looking at that feature? Thanks in advance

refset14:10:45

<2 weeks, I'd guess (🤞)

👍 1
dotemacs13:10:35

Also, following the example here: https://www.xtdb.com/v2 Unless using Java 17, you’ll get this error:

xtdb/types/ClojureForm has been compiled by a more recent version
   of the Java Runtime (class file version 61.0), this version of the
   Java Runtime only recognizes class file versions up to 60.0

refset09:10:40

Good point, thanks, we subsequently bumped the minimum Java version a few months ago (https://github.com/xtdb/xtdb/commit/4f719bbce586d041b7f666185fa4174a43a466b9, specifically)

refset09:10:35

We're actually really hoping to move to 21 soon but there are a few upstream deps we're waiting on

jarohen09:10:44

just to clarify on this one, we should be able to keep the XT clients on Java 11 - it's only if you're looking to use it in process that there'll be the Java 21 minimum

2
zclj16:10:16

Hi all, some questions about using XTDB in a low resource environment. I will run XTDB locally on a single node with RocksDB for all storage on a machine with 1GB total and half of that allocated to the Clojure application with XT. During local testing with that setup, I ran my tests continuously creating new orders (containing what you would expect--customer info and a few order lines). While the amount of data is thus low, after about 1000 orders I experienced OOM, first seen from the indexing. After a restart the system was usable again, but OOMed after about 200 more orders. For additional context, this is for a small website which will receive maybe 25-50 orders a month. So getting to 1000 will take some time. However, I want to understand the operation constraints. Is this a reasonable setup and expectation for XT or is the allocated memory to low? The second resource consideration is disk space. With a small amount of data the index folder is very much larger than the docs folder. For example, I can have docs at 8MB, tx at 5MB and index at 125MB. Will the index folder keep growing with this multiple? What can I expect here? Happy to here if you have any experiences with XT in this kind of constrained environment. You are doing a really great thing with XTDB so I would like to keep using it :)

refset09:10:28

Hey @U1G8B7ZD3 1GB is quite small, but it should be possible to run at that scale. What are your Xms / Xmx configs looking like? Is there anything else running on the machine that needs RAM? Rocks manages its own native allocations so needs sufficient memory headroom to operate (beyond what the JVM claims for itself)

refset09:10:44

> For example, I can have docs at 8MB, tx at 5MB and index at 125MB. Will the index folder keep growing with this multiple? What can I expect here? it somewhat depends on the 'shape' of the docs (depth vs width, size of keys etc.), but those ratios are pretty normal sounding

refset09:10:14

> You are doing a really great thing with XTDB so I would like to keep using it 🙂 🙏

zclj15:10:58

Thanks for your reply. To clarify, 1GB is for the nodes total memory. In my setup 512MB is allocated for the Docker container running my Clojure web-app with XTDB as a library. Regarding the settings, I have looked at the docs for RocksDB (https://docs.xtdb.com/storage/rocksdb/) and set -XX:MaxDirectMemorySize=512m, but now re-reading the docs in says the sum of MaxDirectMemorySize and Xmx should be available. So I guess in my case an initial setting would be 256MB each? Given this, what is your gut feeling about these 256MB * 2 levels? My intent for this deployment is to use XTDB as an "embedded" DB as SQLite often times are. It evidently works to some extent, given the test I mentioned, but let me know if you think these levels sounds unreasonable Thanks for the info on the disk size expectations!

👍 1
refset18:10:43

We haven't spent much time on tuning for low memory environments so far, unfortunately I think the only way to be sure is to try a few changes and see if it's stable

refset18:10:13

We have various tests / benchmarks in the repo you might find useful to simulate some workload if you need help with that

zclj07:10:15

Thanks, I'll check it out

👍 1
zclj06:12:06

Hi again @U899JBRPF, I have some more data on this issue. I keep having OOM problems. When I enable heap dumps on OOM, I can see that XTDB LRUCache seem to hold on to a lot of values. Looking at the dump, there is more than 800k instances of UUID rooted in the cache. I use UUIDs as xt/ids. At this time there are about 2k-3k of orders in the system. There are a lot of duplicates in the instances. Also, I can see the same behavior of a long list of Longs, which I use as order numbers. I have tried to experiment with JVM memory params, and also add more memory to my test node. The only thing that happens is that it takes a bit longer to OOM and there are even more instances in the same pattern as above. Is there anything I can do about this behavior? Does it indicate that I use XT in some bad way? Is it a bug?

refset17:12:10

Hey @U1G8B7ZD3 how much memory are you giving the JVM here?

refset17:12:32

You can try tuning down the document store LRU cache size to something very low, e.g. by supplying this as a key to the document store module: > :document-cache (cache/->cache {:cache-size (* 1 1024)}

refset17:12:52

it is sized by number of entries rather than explicit memory size

refset17:12:27

I suspect for each of these 800k UUIDs (which are the keys) there will also be 800k docs which are taking up proportionately a lot more memory

refset17:12:52

but as mentioned before, XTDB has not been optimized to work in low memory environments - happy to try to help though 🙂

zclj17:12:51

For that particular screenshot I think it was a low memory setting (not having to wait too long on the OOM, so 128M). The most amount I have tested with is giving the container this is running in 1,5GB, the JVM 512M and the MaxDirectMemorySize flag 512M. Dump looks the same but with 3 Million UUIDs instead. I will not be able to motivate a more expensive node than the 2GB one. Would you recommend some other JVM configuration with 1,5GB at hand or is my configuration above reasonable? I will try the LRU cache size option. When you say that you suspect that there are 800k docs taking memory, are those in user space? As I mentioned the amount of entities in the system from my point of view in this experiment is about 3k orders (order-lines, name, address, etc) and 200 product documents. So I'm curious to understand if I might have a problem in my usage that result in keeping those 800k alive and how it can amount to so many docs. I don't think I'm doing anything exotic, there are some queries for products, transactions for the order-lines, and finally a tx with a transaction function to get an increasing order number for the order tx. Thanks for the help, even if I'm my use-case is an outlier. If I can solve it and convey some information to you, it might mean more applicable use-cases and more XTDB for the world :)

👍 1
refset20:12:10

Very useful to hear that context - thanks for the write-up > 3k orders (order-lines, name, address, etc) and 200 product documents is that across all time? or are there lots of versions/edits to these? I'm also unsure why you should be seeing 800k UUID entries there, unless you're using them elsewhere in the system (i.e. it's a red herring since most have nothing to do with the XT doc-store LRU cache, you just got (un)lucky opening up the first one in that window)

zclj18:12:11

There shouldn't be alot of edits to the docs. Most of the edit data is stored in a session which are then transacted at checkout.

zclj18:12:27

I did some more dumps. FYI before doing Clojure for 13ish years, my background is in C/C++/C#, so I'm no expert in JVM analysis so I hope I'm not to ignorant here..

☺️ 1
zclj18:12:13

First a dump after just a few minutes

zclj18:12:52

Second one is after 38m

zclj18:12:34

Looking at these, there is a growth in the UUIDs and Longs. Using the "Compute merged GC Roots", we can see a lot of roots in XT NotifyingSubscriber, does that tell you anything?

zclj18:12:03

Finally, if I open up a few of those, we can see some more details:

zclj19:12:23

Do you get any ideas that I can further investigate based on this?

refset17:12:01

thanks for the updates - do your documents contain UUIDs? or are you only using them for IDs? looking at the stack in that last screenshot makes me think the UUIDs reside within nested structures inside the documents, as I don't think the LRU structure itself is more than one or two levels of that

refset17:12:17

did reducing the cache size mitigate at all?