xtdb

chucklehead 2025-06-01T01:43:16.687199Z

hello...I'm trying out v2 locally for the first time and I'm running into some issues with remote storage in an in-process node. It's likely user error, but not sure how to troubleshoot further. Details in ๐Ÿงต

jarohen 2025-06-25T09:38:40.823499Z

hey @chuck.cassel: so you don't need to worry about compacting - the data's already durable before it goes through the compaction process re finish-block: don't call that one as it only finishes the block on the local node - if you're running a multi-node cluster this will then develop inconsistencies between the different nodes. We do have an (as yet https://github.com/xtdb/xtdb/issues/4553) endpoint on the healthz service which you can use - POST /system/finish-block - this sends a message through the tx-log and so ensures that all of the nodes agree on the block boundary.

chucklehead 2025-06-26T03:17:35.190529Z

thanks again, I appreciate the help with all of the odd questions, especially since I just sort of barged in and started building something while you all were in the midst of a major release.

chucklehead 2025-06-26T03:18:39.315949Z

I promise for every one I sent, I rubber-ducked a half-dozen more in the process of typing them out

jarohen 2025-06-26T06:20:46.956049Z

haha ๐Ÿ˜‚ yep, no problem at all - tbh, for me, it's really nice to see people building things on top of XT again, after having spent the last couple of years largely heads-down in RnD mode on XT2

chucklehead 2025-06-01T01:44:25.914389Z

My deps:

{:deps {org.clojure/clojure {:mvn/version "1.12.0"}
        ...
        com.xtdb/xtdb-api {:mvn/version "2.0.0-beta8.1"}
        com.xtdb/xtdb-core {:mvn/version "2.0.0-beta8.1"}
        com.xtdb/xtdb-aws {:mvn/version "2.0.0-beta8.1"}
        com.xtdb/xtdb-kafka {:mvn/version "2.0.0-beta8.1"}}}
Sample code:
(with-open [node (xtn/start-node {:log [:kafka
                                        {:bootstrap-servers "localhost:9092"
                                         :topic "xtdb-log"
                                         :create-topic? true}]
                                  :storage [:remote
                                            {:object-store [:s3
                                                            {:bucket "xtdb"
                                                             :credentials {:access-key "minioadmin"
                                                                           :secret-key "minioadmin"}
                                                             :endpoint ""}]
                                             :local-disk-cache "./xtdb-cache"}]})]
  (xt/execute-tx node [[:put-docs :testing {:xt/id "id" :name "Test" :seen-at (System/currentTimeMillis)}]])
  (println (xt/status node))
  (println (xt/q node ["SELECT * FROM testing"])))

; {:latest-completed-tx #xt/tx-key {:tx-id 13, :system-time #xt/instant "2025-06-01T01:28:27.273Z"}, :latest-submitted-tx-id 13}
; [{:xt/id id, :name Test, :seen-at 1748741307258}]
bash
> mc ls --recursive local/tansu/clusters/xtdbtest/topics/xtdb-log
[2025-05-31 20:37:23 EDT] 1.0KiB STANDARD partitions/0000000000/records/00000000000000000000.batch
[2025-05-31 20:49:02 EDT] 1.1KiB STANDARD partitions/0000000000/records/00000000000000000001.batch
[2025-05-31 20:49:13 EDT] 1.1KiB STANDARD partitions/0000000000/records/00000000000000000002.batch
[2025-05-31 20:58:00 EDT] 1.0KiB STANDARD partitions/0000000000/records/00000000000000000003.batch
[2025-05-31 20:58:18 EDT] 1.0KiB STANDARD partitions/0000000000/records/00000000000000000004.batch
[2025-05-31 21:03:35 EDT] 1.0KiB STANDARD partitions/0000000000/records/00000000000000000005.batch
[2025-05-31 21:04:04 EDT] 1.0KiB STANDARD partitions/0000000000/records/00000000000000000006.batch
[2025-05-31 21:05:37 EDT] 1.0KiB STANDARD partitions/0000000000/records/00000000000000000007.batch
[2025-05-31 21:06:31 EDT] 1.0KiB STANDARD partitions/0000000000/records/00000000000000000008.batch
[2025-05-31 21:07:00 EDT] 1.0KiB STANDARD partitions/0000000000/records/00000000000000000009.batch
[2025-05-31 21:16:48 EDT]  1020B STANDARD partitions/0000000000/records/00000000000000000010.batch
[2025-05-31 21:20:14 EDT]  1020B STANDARD partitions/0000000000/records/00000000000000000011.batch
[2025-05-31 21:20:42 EDT] 1.1KiB STANDARD partitions/0000000000/records/00000000000000000012.batch
[2025-05-31 21:28:27 EDT] 1.1KiB STANDARD partitions/0000000000/records/00000000000000000013.batch
[2025-05-31 21:28:27 EDT]    22B STANDARD partitions/0000000000/watermark.json

> mc ls --recursive local/xtdb

# whomp whomp

> 
I can see in the MinIO logs that xtdb is connecting successfully and scanning for objects under v06/blocks/ (this happens concurrently with tansu activity on the tx-log topic) but the bucket is empty and there are no PutObject or CreateMultipartUpload calls that I can see.
2025-05-31T21:20:42.656 [200 OK] s3.ListObjectsV2 127.0.0.1:9000/xtdb?list-type=2&prefix=v06%2Fblocks%2F  172.23.0.1       1.23ms       โ‡ฃ  1.205793ms  โ†‘ 115 B โ†“ 246 B
I'm not sure I'm misunderstanding what should be happening here - i.e. maybe I shouldn't expect to see objects here yet, or maybe the node just needs to stay up longer after the txn - or if there is an issue with my configuration.

chucklehead 2025-06-01T01:52:23.781379Z

some other beginner feedback: โ€ข obvious in hindsight, but it would be helpful if the storage docs mentioned that you need to add the dependency. โ€ข log docs mention the need to add the kafka module, but my above config actually "worked" without adding the dependency and my local node would execute transactions and return query results without reporting any errors in my repl, but was not communicating with the broker or persisting the log.

๐Ÿ™ 1
refset 2025-06-01T14:41:57.113179Z

Hey @chuck.cassel objects only get pushed to the object storage component (whether local disk, MinIO or otherwise) once a certain amount of novelty has accumulated (~100K rows by default) or every 4 hours (configurable). Did you wait longer than 4 hours?

chucklehead 2025-06-01T18:28:37.328619Z

Thanks, I hadn't at the time. I ended up spinning up a xtdb-aws image in my compose file and was seeing the same thing, but it's been running overnight, and I see now that the bucket is populated.

refset 2025-06-01T23:56:09.083789Z

Cool, glad to hear. And thanks for the feedback.

refset 2025-06-24T22:54:06.568949Z

These aren't public APIs currently so we may break any assumptions surrounding them in future - better to raise an issue and chat about what could be done officially. For now you could instead try starting the node again after an initial shutdown but change the config of the flushDuration to 1ms which will force a flush, then shutdown again. Forcing/awaiting compaction isn't possible yet. @jarohen may have other suggestions E.g. https://github.com/xtdb/xtdb/blob/e28ad5be974804eedfc1801204c1e6cd1e368229/src/test/resources/test-config.yaml#L4

๐Ÿ™๐Ÿป 1
chucklehead 2025-06-24T05:21:36.214559Z

so, my idea is to have a 'transient' xtdb system using in-process Clojure nodes with S3 for storage and S2 for the log, where all writes will come from triggered or scheduled tasks doing bulk imports/updates and readers will come online ad-hoc. Since I'll know when writes are complete and I want to minimize the time it takes for new readers to come online and eliminate any possibility of log data loss if nodes are offline for an extended period, I would like to have the update nodes ensure the log is flushed to storage before they terminate. based on my spelunking, I think I might be able to accomplish this by calling live-index/finish-block! and compactor/compact-all! the way xtdb-bench does...would this a) work, b) lead to any degenerate behavior over time?

chucklehead 2025-06-01T21:06:23.623219Z

what would be the process for extending Log from clojure for use with xtn/start-node? Trying to look through the code, but I'm completely ignorant when it comes to Kotlin, so not sure how everything maps from Kotlin<->JVM<->Clojure. Would it be sufficient to add a new (defmethod xtn/apply-config! :my/log-imp ...) that calls .setLog with a reified Log implementation? Do I need to implement a Factory, or only if I want to serialize/deserialize the config? Goal was to implement a Log on top of https://s2.dev/docs/stream

chucklehead 2025-06-02T08:46:09.777349Z

so, I think I'm very close. I ran into an issue with the S2 Java SDK not returning timestamps in subscriptions, but I put together something using the REST API and the world's jankiest SSE client implementation. My local node starts and submits a message to the stream for each transaction I execute, and my subscription receives them and tries to parse and forward to the subscriber, but my execute-tx calls never complete, so I'm assuming the subscriber doesn't like something about the records I'm sending. https://gist.github.com/casselc/f5382146f9413a7c34ce314c094080ac

jarohen 2025-06-02T10:57:26.438949Z

neat ๐Ÿ™‚ ๐Ÿ‘ nothing immediately jumps out - maybe try upping the log level in XT - something like XTDB_LOGGING_LEVEL_INDEXER=debug?

chucklehead 2025-06-03T00:16:55.829809Z

it works!

;; bad
(-> response :body charred/read-json (get-in ["tail" "seq_num"] dec))
;; good
(-> response :body charred/read-json (get-in ["tail" "seq_num"]) dec)

chucklehead 2025-06-03T00:20:09.270719Z

also, the devs published a new version of the Java SDK for me today with timestamp support, so I'll try to get that working too. It uses protobuf so won't have to deal with the base64 encoding overhead I have in the http version.