This page is not created by, affiliated with, or supported by Slack Technologies, Inc.
2022-04-23
Channels
- # babashka (18)
- # babashka-sci-dev (42)
- # beginners (84)
- # calva (11)
- # cider (5)
- # clj-kondo (11)
- # cljdoc (70)
- # cljs-dev (34)
- # clojure-europe (1)
- # clojurescript (3)
- # conjure (1)
- # core-async (29)
- # data-oriented-programming (10)
- # emacs (13)
- # fulcro (8)
- # gratitude (2)
- # honeysql (1)
- # introduce-yourself (4)
- # kaocha (10)
- # missionary (8)
- # nrepl (4)
- # off-topic (27)
- # portal (32)
- # releases (11)
- # tools-deps (11)
- # xtdb (19)
I am having an issue. I've am pulling a lot of data from external API and putting some of it's data in my XTDB instance unfortunately service keeps falling (small aws server) is there a way for me to optimize this so ::xt/put is done in batches or something so that service doesn't fall? (I am basically doing ::xt/put for data i filtered) Locally it's fine but on small server ... not really and now seems stuck in transaction loop because when i restart the server it tries again to complete transaction.
Hey @U45SLGVHV roughly how many puts have you issued per transaction? Do you have xtdb-lucene enabled?
There are various examples of how we batch puts in the repo, such as https://github.com/xtdb/xtdb/blob/e2f51ed99fc2716faa8ad254c0b18166c937b134/test/test/xtdb/fixtures/lubm.clj#L9-L19 - we typically use batches of 1000 for bulk loading
i do have lucene enabled. I am not sure how many it filters but i am guestimating ~10k puts i do something like
(let [transactions (reduce (fn [acc team]
(conj acc [::xt/fn :upsert team])) [] teams)]
(xt/submit-tx node transactions))
I removed the whole DB in order to 'cancel' failed transaction that would auto start again. (10k puts) I batched the puts by 100 memory was high then it went OOM ~5k So that's a good improvement. Thank you very much for the fast and on point response. You should get a crown here on reddit as king of help 🙂
I unfortunately had to restart my instance (1gig ram) quite a few times before this went through till the end (batching did help for sure) i tried setting pooling options to idle and max connection to 3 (from 10 default) but it didn't help. Is XTDB aggressive with ram usage for this number of records or am i just using ultra small instance to process very high amount of data?
Hey again @U45SLGVHV - sorry I was AFK most of the rest of the weekend 😅 My hunch is that Lucene may be the bottleneck here...please could you try configuring the :refresh-frequency
as per https://docs.xtdb.com/extensions/full-text-search/#_parameters (e.g. to PT5M
, or even PT-1S
)
> is there a way for me to 'cancel' transaction? In short, there isn't a way to do this out-of-the-box today. However, if really needed, you could always manually mutate the underlying tx-log
if the :refresh-frequency
change works, then you may need to separate the bulk-ingestion/replaying phase of node startup from online usage, by closing the node once the bulk-load is ~finished, and then reopening the node with the default configuration (since refresh-frequency PT0S
is needed for full consistency with the main KV indexes)
Oh, thinking about it again, I should have asked about Rocks and node/memory configs. Are you using Rocks? 🙂 Is your -Xmx
configured to use all the memory available, or is there much native memory available for Rocks to use for its own allocations?
(defn create-xtdb-config [{:keys [xtdb]}]
{:xtdb.jdbc/connection-pool {:dialect {:xtdb/module 'xtdb.jdbc.psql/->dialect}
:db-spec (:db-spec xtdb)}
:xtdb/tx-log {:xtdb/module 'xtdb.jdbc/->tx-log
:connection-pool :xtdb.jdbc/connection-pool}
:xtdb/document-store {:xtdb/module 'xtdb.jdbc/->document-store
:connection-pool :xtdb.jdbc/connection-pool}
:xtdb/index-store {:kv-store {:xtdb/module 'xtdb.rocksdb/->kv-store
:db-dir (:index-dir xtdb)}}
:xtdb.lucene/lucene-store {:db-dir (:lucene-dir xtdb)}
:xtdb-inspector.metrics/reporter {}})
this is the whole config i am using
yeah - use all memory availableif you have a profiler running (e.g. https://www.yourkit.com/) then it may be even quicker to take a direct look at the threads