hello...I'm trying out v2 locally for the first time and I'm running into some issues with remote storage in an in-process node. It's likely user error, but not sure how to troubleshoot further. Details in ๐งต
hey @chuck.cassel: so you don't need to worry about compacting - the data's already durable before it goes through the compaction process
re finish-block: don't call that one as it only finishes the block on the local node - if you're running a multi-node cluster this will then develop inconsistencies between the different nodes.
We do have an (as yet https://github.com/xtdb/xtdb/issues/4553) endpoint on the healthz service which you can use - POST /system/finish-block - this sends a message through the tx-log and so ensures that all of the nodes agree on the block boundary.
thanks again, I appreciate the help with all of the odd questions, especially since I just sort of barged in and started building something while you all were in the midst of a major release.
I promise for every one I sent, I rubber-ducked a half-dozen more in the process of typing them out
haha ๐ yep, no problem at all - tbh, for me, it's really nice to see people building things on top of XT again, after having spent the last couple of years largely heads-down in RnD mode on XT2
My deps:
{:deps {org.clojure/clojure {:mvn/version "1.12.0"}
...
com.xtdb/xtdb-api {:mvn/version "2.0.0-beta8.1"}
com.xtdb/xtdb-core {:mvn/version "2.0.0-beta8.1"}
com.xtdb/xtdb-aws {:mvn/version "2.0.0-beta8.1"}
com.xtdb/xtdb-kafka {:mvn/version "2.0.0-beta8.1"}}}
Sample code:
(with-open [node (xtn/start-node {:log [:kafka
{:bootstrap-servers "localhost:9092"
:topic "xtdb-log"
:create-topic? true}]
:storage [:remote
{:object-store [:s3
{:bucket "xtdb"
:credentials {:access-key "minioadmin"
:secret-key "minioadmin"}
:endpoint ""}]
:local-disk-cache "./xtdb-cache"}]})]
(xt/execute-tx node [[:put-docs :testing {:xt/id "id" :name "Test" :seen-at (System/currentTimeMillis)}]])
(println (xt/status node))
(println (xt/q node ["SELECT * FROM testing"])))
; {:latest-completed-tx #xt/tx-key {:tx-id 13, :system-time #xt/instant "2025-06-01T01:28:27.273Z"}, :latest-submitted-tx-id 13}
; [{:xt/id id, :name Test, :seen-at 1748741307258}]
bash
> mc ls --recursive local/tansu/clusters/xtdbtest/topics/xtdb-log
[2025-05-31 20:37:23 EDT] 1.0KiB STANDARD partitions/0000000000/records/00000000000000000000.batch
[2025-05-31 20:49:02 EDT] 1.1KiB STANDARD partitions/0000000000/records/00000000000000000001.batch
[2025-05-31 20:49:13 EDT] 1.1KiB STANDARD partitions/0000000000/records/00000000000000000002.batch
[2025-05-31 20:58:00 EDT] 1.0KiB STANDARD partitions/0000000000/records/00000000000000000003.batch
[2025-05-31 20:58:18 EDT] 1.0KiB STANDARD partitions/0000000000/records/00000000000000000004.batch
[2025-05-31 21:03:35 EDT] 1.0KiB STANDARD partitions/0000000000/records/00000000000000000005.batch
[2025-05-31 21:04:04 EDT] 1.0KiB STANDARD partitions/0000000000/records/00000000000000000006.batch
[2025-05-31 21:05:37 EDT] 1.0KiB STANDARD partitions/0000000000/records/00000000000000000007.batch
[2025-05-31 21:06:31 EDT] 1.0KiB STANDARD partitions/0000000000/records/00000000000000000008.batch
[2025-05-31 21:07:00 EDT] 1.0KiB STANDARD partitions/0000000000/records/00000000000000000009.batch
[2025-05-31 21:16:48 EDT] 1020B STANDARD partitions/0000000000/records/00000000000000000010.batch
[2025-05-31 21:20:14 EDT] 1020B STANDARD partitions/0000000000/records/00000000000000000011.batch
[2025-05-31 21:20:42 EDT] 1.1KiB STANDARD partitions/0000000000/records/00000000000000000012.batch
[2025-05-31 21:28:27 EDT] 1.1KiB STANDARD partitions/0000000000/records/00000000000000000013.batch
[2025-05-31 21:28:27 EDT] 22B STANDARD partitions/0000000000/watermark.json
> mc ls --recursive local/xtdb
# whomp whomp
>
I can see in the MinIO logs that xtdb is connecting successfully and scanning for objects under v06/blocks/ (this happens concurrently with tansu activity on the tx-log topic) but the bucket is empty and there are no PutObject or CreateMultipartUpload calls that I can see.
2025-05-31T21:20:42.656 [200 OK] s3.ListObjectsV2 127.0.0.1:9000/xtdb?list-type=2&prefix=v06%2Fblocks%2F 172.23.0.1 1.23ms โฃ 1.205793ms โ 115 B โ 246 B
I'm not sure I'm misunderstanding what should be happening here - i.e. maybe I shouldn't expect to see objects here yet, or maybe the node just needs to stay up longer after the txn - or if there is an issue with my configuration.some other beginner feedback: โข obvious in hindsight, but it would be helpful if the storage docs mentioned that you need to add the dependency. โข log docs mention the need to add the kafka module, but my above config actually "worked" without adding the dependency and my local node would execute transactions and return query results without reporting any errors in my repl, but was not communicating with the broker or persisting the log.
Hey @chuck.cassel objects only get pushed to the object storage component (whether local disk, MinIO or otherwise) once a certain amount of novelty has accumulated (~100K rows by default) or every 4 hours (configurable). Did you wait longer than 4 hours?
Thanks, I hadn't at the time. I ended up spinning up a xtdb-aws image in my compose file and was seeing the same thing, but it's been running overnight, and I see now that the bucket is populated.
Cool, glad to hear. And thanks for the feedback.
These aren't public APIs currently so we may break any assumptions surrounding them in future - better to raise an issue and chat about what could be done officially. For now you could instead try starting the node again after an initial shutdown but change the config of the flushDuration to 1ms which will force a flush, then shutdown again. Forcing/awaiting compaction isn't possible yet. @jarohen may have other suggestions E.g. https://github.com/xtdb/xtdb/blob/e28ad5be974804eedfc1801204c1e6cd1e368229/src/test/resources/test-config.yaml#L4
so, my idea is to have a 'transient' xtdb system using in-process Clojure nodes with S3 for storage and S2 for the log, where all writes will come from triggered or scheduled tasks doing bulk imports/updates and readers will come online ad-hoc. Since I'll know when writes are complete and I want to minimize the time it takes for new readers to come online and eliminate any possibility of log data loss if nodes are offline for an extended period, I would like to have the update nodes ensure the log is flushed to storage before they terminate.
based on my spelunking, I think I might be able to accomplish this by calling live-index/finish-block! and compactor/compact-all! the way xtdb-bench does...would this a) work, b) lead to any degenerate behavior over time?
what would be the process for extending Log from clojure for use with xtn/start-node? Trying to look through the code, but I'm completely ignorant when it comes to Kotlin, so not sure how everything maps from Kotlin<->JVM<->Clojure.
Would it be sufficient to add a new (defmethod xtn/apply-config! :my/log-imp ...) that calls .setLog with a reified Log implementation? Do I need to implement a Factory, or only if I want to serialize/deserialize the config?
Goal was to implement a Log on top of https://s2.dev/docs/stream
so, I think I'm very close. I ran into an issue with the S2 Java SDK not returning timestamps in subscriptions, but I put together something using the REST API and the world's jankiest SSE client implementation. My local node starts and submits a message to the stream for each transaction I execute, and my subscription receives them and tries to parse and forward to the subscriber, but my execute-tx calls never complete, so I'm assuming the subscriber doesn't like something about the records I'm sending. https://gist.github.com/casselc/f5382146f9413a7c34ce314c094080ac
neat ๐ ๐
nothing immediately jumps out - maybe try upping the log level in XT - something like XTDB_LOGGING_LEVEL_INDEXER=debug?
it works!
;; bad
(-> response :body charred/read-json (get-in ["tail" "seq_num"] dec))
;; good
(-> response :body charred/read-json (get-in ["tail" "seq_num"]) dec)also, the devs published a new version of the Java SDK for me today with timestamp support, so I'll try to get that working too. It uses protobuf so won't have to deal with the base64 encoding overhead I have in the http version.