Fork me on GitHub
#xtdb
<
2022-05-27
>
tatut11:05:26

probably only interesting as a curiosity for clojurians, but I've been dabbling with an XTDB client for Smalltalk https://github.com/tatut/pharo-XTDB partly inspired by the kotlin-dsl but Smalltalk is even better suited for that kind of thing... uses the HTTP API which works great

👏 6
🤯 1
Marz Drel13:05:22

Hey there, I am checking out XTDB at this moment. I don’t really have Java or Clojure experience, but I’ve managed to get things going. I’ve managed to get things going with Kafka as both TX and Document Store. Is this a good setup for some research or should I move to JDBC instead? Also - looks like I need to call sync after restart, which seems to be expected. Do you suggest to use RocksDB as index-store in this setup?

jarohen13:05:58

hey @U0143L93962 👋 > Is this a good setup for some research or should I move to JDBC instead? It's heavily dependent on your context - given a choice, Kafka is more suitable for the tx-log, JDBC is more suitable for the doc-store, but it's also a perfectly reasonable decision to put both in the same place if that makes management easier > Do you suggest to use RocksDB as index-store in this setup? It's certainly a good default choice, yep 🙂 > Also - looks like I need to call sync after restart, which seems to be expected. This will ensure that XT has indexed all of the transactions currently on the tx-log. If you only to see the results of a specific transaction, await-tx may well return sooner.

refset13:05:01

Hey @U0143L93962 it sounds like if you have it working already then I think you should be all set for doing some research - no need to worry about JDBC at this stage (IMO). The JDBC document store has some advantages, but Kafka makes for a faster tx-log. Rocks is a good index-store choice, yep, and implements compression (which LMDB doesn't)

jarohen13:05:39

heh 🏃

hidethepain 1
Marz Drel13:05:55

Looks like, that if I don’t do sync after restart the data is not available at all. All queries return nil unless I call sync . :thinking_face:

Marz Drel14:05:34

Hmm, I figured that the problem was with index-store If I use RocksDB, then I cannot read the data back unless I remove the index directory. Then data are visible again, but only for this session. This doesn’t happen if index-store is not defined (when using in memory kv)… 🙂

Marz Drel14:05:43

Sorry for spamming you call with my problems. 😄

jarohen14:05:44

ah, yep - so if you use an in-memory index store, then it'll have to rebuild the indices on restart (hence why you're seeing no data initially) - sync ensures this process is up-to-date before it returns

jarohen14:05:21

> If I use RocksDB, then I cannot read the data back unless I remove the index directory. hmm, this shouldn't be necessary - RocksDB should persist the indices between runs

Marz Drel14:05:41

Yeah, I don’t know if I misconfigured something.

jarohen14:05:14

are you able to share your setup?

Marz Drel14:05:30

(def ^xtdb.api.IXtdb xtdb-node
  (xt/start-node
    {
     :local-rocksdb {:xtdb/module 'xtdb.rocksdb/->kv-store :db-dir (io/file "data/dev/index-store")}
     :xtdb.kafka/kafka-config {:bootstrap-servers "localhost:9092"}
     :xtdb/tx-log {:xtdb/module 'xtdb.kafka/->tx-log
                   :kafka-config :xtdb.kafka/kafka-config
                   :tx-topic-opts {:topic-name "tx-1" :replication-factor 1}}
     :xtdb/document-store {:xtdb/module 'xtdb.kafka/->document-store
                           :kafka-config :xtdb.kafka/kafka-config
                           :doc-topic-opts {:topic-name "doc-1" :replication-factor 1}}
     :xtdb/index-store {:kv-store :local-rocksdb}
     :xtdb.http-server/server {:port 3000} }))

Marz Drel14:05:06

Not sure what’s the preferred way to do that here. This might be hard to read in a thread. :thinking_face:

jarohen14:05:16

are you explicitly closing the node between sessions? otherwise, RocksDB will leave a lock file in the directory and prevent it being opened again

Marz Drel14:05:40

Yeah, I’ve tried to (.close xtdb-node) should that be enough?

Marz Drel14:05:07

Looks like this doesn’t help. The code basically looks like this:

(defn -main
  [& args]
  (do
    (xt/sync xtdb-node))
    (println (xt/entity (xt/db xtdb-node) 1))
    (.close xtdb-node))

Marz Drel14:05:02

When I remove the index directory, it shows the data on the first run, but not on subsequent ones. Only on first run after clearing the index data. :thinking_face:

jarohen14:05:56

that might not close the XT node if there's any error - I'd probably change the start-node to be a function, and then call it using with-open in the -main function:

(defn start-node ^xtdb.api.IXtdb []
  (xt/start-node {...}))

(defn -main [& args]
  (with-open [xtdb-node (start-node)]
    (xt/sync xtdb-node)
    (println ...)))

Marz Drel14:05:14

Hmm. The LOCK file is still there after execute finishes, but I can’t see the data even if the LOCK is removed by hand. Lemme try this.

refset14:05:59

are all the module versions configured as "1.21.0" in your deps?

Marz Drel14:05:31

Hmm, same happens. With data removed, it shows the record, but the LOCK persists after the program finishes. Removing the file by hand doesn’t change the output.

Marz Drel14:05:05

@U899JBRPF

% clj -X:deps tree|grep xtdb
com.xtdb/xtdb-http-server 1.21.0
  X com.xtdb/xtdb-core 1.21.0 :use-top
  . pro.juxt.clojars-mirrors.xtdb/xtdb-http-server-deps 0.0.2
com.xtdb/xtdb-rocksdb 1.21.0
  X com.xtdb/xtdb-core 1.21.0 :use-top
com.xtdb/xtdb-core 1.21.0
com.xtdb/xtdb-kafka 1.21.0
  X com.xtdb/xtdb-core 1.21.0 :use-top

refset15:05:46

have you tried rebooting (that should definitely release locks!)? which OS are you using?

Marz Drel15:05:43

I’ll try. I am on macOS 15.x 12.2.1 on M1.

🤞 1
📝 1
Marz Drel17:05:36

@U899JBRPF Got the same issue still, after rebooting. :thinking_face: I can just disable that, but I am happy to help debug this issue if there is anything else I can try? I never used RocksDB before, but so far I fiddles a bit with ldb and everything looks ok. I can query the data just fine. :thinking_face:

Marz Drel19:05:00

When I try to run the code twice, I get proper error, that this instance is holding a lock:

Execution error (RocksDBException) at org.rocksdb.RocksDB/open (RocksDB.java:-2).
While lock file: /Users/jdoe/code/clojure/xtdbdemo/data/dev/index-store/LOCK: Resource temporarily unavailable

Hukka19:05:50

I don't know rocksdb other than using it with xtdb, but that error doesn't even sound like it sees the lock, but that the operating system cannot do what is requested right away

Marz Drel19:05:08

You mean the one I pasted just above? This is all fine. It happens, when there is existing lock held by other process. My issue is, that when I use RocksDB to hold the index, I can’t query any data when I restart the app.

Hukka19:05:15

Is it fine? The lock doesn't seem to be deleted, or for some reason the operating system won't allow creating a new lock file even if it was

Marz Drel19:05:30

In the above case, there is other process using this data directory already. My problem is that, I can’t query data but there is no error whatsoever.

Hukka19:05:44

The whole error seems a bit surprising when talking about a file. Why would rocks request it with non-blocking mode

Marz Drel19:05:17

From user perspective - this makes all the sense. This DB is in use already. 🤷

Hukka19:05:24

😦 Why and what other process is using it?

Marz Drel19:05:11

Because I’ve run my app twice, to test what will happen. First instance locked this DB data, and the 2nd instance can’t do that anymore.

Hukka19:05:41

I understood that you don't run them at the same time, but the first run will have finished before the second?

Marz Drel19:05:36

I just did that to test. I left the process running, to query the store via HTTP.

Marz Drel19:05:53

Same happens why I try to wring using cli tool:

% rocksdb_ldb --db=data/dev/index-store put a b                                                                             
Failed: IO error: While lock file: data/dev/index-store/LOCK: Resource temporarily unavailable

Marz Drel19:05:02

(When the process is running of course)

Hukka19:05:08

I see. I thought you just ran the with-open block twice, as it should have closed and cleaned after each time

Marz Drel19:05:38

Code is just this at this point.

(defn -main [& args]
  (xt/sync xtdb-node))

Hukka19:05:04

I don't know if that's really feasible to test, but does the same code work in Linux?

👀 1
Hukka19:05:09

At first, or second or third, glance, there isn't anything suspicious in the code, and rocksdb is the usual choice for an index store

Hukka19:05:01

Or perhaps an x86 mac. No idea how much these things have been tested with ARM, although officially it should be supported now

Hukka19:05:39

Well, I don't mean "gets support" but rather that XTDB 1.21 depends on libraries that have been updated and said to work on M1 :face_with_rolling_eyes:

Marz Drel20:05:25

I had some issues related to M1, but solved them by manually bumping some versions. I’ll check on Linux but this will take a while, as I have to install Clojure and other tools. 🙂

Marz Drel21:05:27

Hmm, no changes on Linux.

Marz Drel21:05:42

Still exactly same behavior.

Hukka08:05:30

Hmh. I haven't noticed any problems with rocksdb on linux, but I also don't use Kafka. But I've understood that Kafka is common even as a doc store, so this all sounds surprising

Marz Drel08:05:06

Yeah, I am not really sure what is special/different about this particular setup. It’s almost a demo app basiaclly.

Marz Drel08:05:54

I’ll try to write a simple app with RocksDB store only I’ll see if there are any issues.

jarohen08:05:57

@U0143L93962 are you able to pop the app up as a tmp github repo, we'll see if we can repro it here?

Marz Drel11:05:31

Just as a side note - I switched to RocksDB for documents and tx-log (all in different directories) and this issue is no longer there.

Hukka05:05:36

What if you keep just one, tx or docs, in kafka?

jarohen08:05:43

hey @U0143L93962, thanks for the repo, I see the same repro locally 😐 will have a look, see if I can see what's going on

jarohen09:05:32

hey, have found the issue: tl;dr: it's a gotcha in XTDB's Kafka document store configuration - best thing to do now is to configure a RocksDB local document store as per our https://docs.xtdb.com/storage/1.21.0/kafka/ Kafka's document store uses a local document store for fast random access, which defaults to an in-memory KV store. unfortunately, though, the latest consumed offset (for historical reasons) is stored in the index store (which in this case is persistent). so, when your node's restarted, the Kafka doc-log consumer thinks it's up-to-date, and doesn't index any documents into the (now empty) local document store. doh!

Marz Drel10:05:53

@U8ZQ1J1RR This happens only if Document Store is in Kafka. TX being either in Kafka or RocksDB doesn’t seem to make any difference.

Marz Drel18:05:56

Just started to listen to the podcast, awesome content, can’t wait to listen to entire backlog! Lots of good talks around design and advanced concepts! Awesome! ❤️

🙏 1
👍 1
chrisbroome18:05:32

Same here. All 3 episodes so far have been really interesting.

Marz Drel18:05:01

Theres lot dozens more available from previous years too. 🙂