Fork me on GitHub

I am having an issue. I've am pulling a lot of data from external API and putting some of it's data in my XTDB instance unfortunately service keeps falling (small aws server) is there a way for me to optimize this so ::xt/put is done in batches or something so that service doesn't fall? (I am basically doing ::xt/put for data i filtered) Locally it's fine but on small server ... not really and now seems stuck in transaction loop because when i restart the server it tries again to complete transaction.


Hey @U45SLGVHV roughly how many puts have you issued per transaction? Do you have xtdb-lucene enabled?


There are various examples of how we batch puts in the repo, such as - we typically use batches of 1000 for bulk loading


i do have lucene enabled. I am not sure how many it filters but i am guestimating ~10k puts i do something like

(let [transactions (reduce (fn [acc team]
                               (conj acc [::xt/fn :upsert team])) [] teams)]
    (xt/submit-tx node transactions))


how do i prevent the transaction re-attempting that's happening on each start up?


is there a way for me to 'cancel' transaction?


I removed the whole DB in order to 'cancel' failed transaction that would auto start again. (10k puts) I batched the puts by 100 memory was high then it went OOM ~5k So that's a good improvement. Thank you very much for the fast and on point response. You should get a crown here on reddit as king of help 🙂


I unfortunately had to restart my instance (1gig ram) quite a few times before this went through till the end (batching did help for sure) i tried setting pooling options to idle and max connection to 3 (from 10 default) but it didn't help. Is XTDB aggressive with ram usage for this number of records or am i just using ultra small instance to process very high amount of data?


Hey again @U45SLGVHV - sorry I was AFK most of the rest of the weekend 😅 My hunch is that Lucene may be the bottleneck here...please could you try configuring the :refresh-frequency as per (e.g. to PT5M, or even PT-1S)


> is there a way for me to 'cancel' transaction? In short, there isn't a way to do this out-of-the-box today. However, if really needed, you could always manually mutate the underlying tx-log


if the :refresh-frequency change works, then you may need to separate the bulk-ingestion/replaying phase of node startup from online usage, by closing the node once the bulk-load is ~finished, and then reopening the node with the default configuration (since refresh-frequency PT0S is needed for full consistency with the main KV indexes)


Oh, thinking about it again, I should have asked about Rocks and node/memory configs. Are you using Rocks? 🙂 Is your -Xmx configured to use all the memory available, or is there much native memory available for Rocks to use for its own allocations?


(defn create-xtdb-config [{:keys [xtdb]}]
  {:xtdb.jdbc/connection-pool {:dialect {:xtdb/module 'xtdb.jdbc.psql/->dialect}
                               :db-spec (:db-spec xtdb)}

   :xtdb/tx-log {:xtdb/module 'xtdb.jdbc/->tx-log
                 :connection-pool :xtdb.jdbc/connection-pool}

   :xtdb/document-store {:xtdb/module 'xtdb.jdbc/->document-store
                         :connection-pool :xtdb.jdbc/connection-pool}

   :xtdb/index-store {:kv-store {:xtdb/module 'xtdb.rocksdb/->kv-store
                                 :db-dir (:index-dir xtdb)}}

   :xtdb.lucene/lucene-store {:db-dir (:lucene-dir xtdb)}

   :xtdb-inspector.metrics/reporter {}})
this is the whole config i am using yeah - use all memory available


Xmx1024m for

t3.small 2gb ram


hmm, maybe try Xmx600m and see if that's stable :thinking_face:


although changing the Lucene config might be easier


if you have a profiler running (e.g. then it may be even quicker to take a direct look at the threads


have you seen any memory/cpu logs/graphs that correlate?


i don't have any monitoring on that instance unfortunately first time i hear about yourkit thanks i will check it out

🙂 1