Fork me on GitHub
#xtdb
<
2022-12-16
>
phill10:12:38

Ha ha, sync after start-node! Having read the quickstart (https://docs.xtdb.com/guides/quickstart/) and some of the reference, I just figured Xtdb wasn't very reliable. I'd commit a bunch of stuff, sync, and close; then start-node again and count the documents and get zero, or 10000 or some other tiny fraction, and go "oh well I guess I'd better start over"

phill10:12:18

Now that my attention has been brought to https://docs.xtdb.com/administration/1.22.1/checkpointing/ - that would be another place to clarify! Because its very first sentence says "so that new nodes can start to service queries faster." But that did not seem to be my problem. My node started to service queries right away!! The problem was that the answers were all wrong.

refset10:12:02

Hey @U0HG4EHMH interesting, thanks for the feedback! Note that if you configure persistent indexes then there's no need to replay from scratch when starting, although arguably it's probably still a good idea to sync anyway (e.g. if you have other nodes submitting transactions). In the not too distant future we plan to add an initialisation phase that could provide better visibility & intuitive clarity to the situation: https://github.com/xtdb/xtdb/issues/1838

refset10:12:26

I'm sorry this was confusing, but I'm very glad you figured it out 🙂

tatut14:12:42

regarding checkpoints, we’ve had some issues with large checkpoints in S3 (need to configure large http pool acquisition timeouts for the async s3 client) and had the idea to use EFS volume instead that all cluster nodes share, I guess that should work? as it looks like just a regular file system directory to the JVM

tatut14:12:20

I don’t know if fetching from EFS is faster than from S3 but atleast I wouldn’t need any special S3 configuration

tatut14:12:58

one obvious downside is that I would need to handle clearing old checkpoints myself, now we just have an S3 bucket lifecycle policy that clears old files automatically

Hukka19:12:13

Started setting up checkpoints with Google Storage, but got ; (err) You are currently running with version 2.0.0 of google-api-client. You need at least version 1.31.1 of google-api-client to run version 1.32.1 of the Cloud Storage JSON API library. which is a bit weird considering that 2.0.0 is larger than the mentioned 1.31.1, and that xtdb seems to refer to 1.34.1, also newer.

Hukka19:12:18

Aha, so it just makes sure that the NIO provider is included, and after that all NIO Paths globally understand the gs:// prefix

1
☺️ 1
Hukka19:12:44

Ok, so it was a case of google-api-client 2.0.0 flowing in through other deps, and that being incompatible with the version of google-api-services-storages that the google-cloud-nio used in xtdb. Weird error message being google's fault

👍 1
Jacob O'Bryant20:12:04

I might've found a pretty bad bug. I have a document with :profile/{user,title,username} attributes. In different parts of the app, I query for the document based on :profile/user and based on :profile/username. Querying by :profile/user always works. However, sometimes I'll update the :profile/title attribute, and then right after, querying by :profile/username returns nothing. If I update the title again, sometimes it starts working again. If I restart the JVM it works again. I also just noticed that it seemed to start working again after leaving my computer for 30 minutes, without updating the title or restarting the JVM. I'm using rocksdb x3. Something funny going on with indexing I guess. I'll put a code snippet + results in the thread.

😬 1
Jacob O'Bryant20:12:45

Here's a code snippet I've been evaluating:

[(xt/q db
       '{:find [(pull profile [*])]
         :in [username]
         :where [[profile :profile/username username]]}
       "jacob")
 (xt/q db
       '{:find [(pull profile [*])]
         :in [user]
         :where [[profile :profile/user user]]}
       (:xt/id user))
 (->> (xt/q db
            '{:find [(pull profile [*])]
              :where [[profile :profile/user]]})
      (map first)
      (filter #(= "jacob" (:profile/username %))))]
result from a few minutes:
[#{}
 #{[{:profile/username "jacob",
     :profile/title "Jacob's profile",
     :profile/user #uuid "e86e5e14-0001-46eb-9d11-134162ce930f",
     :xt/id #uuid "3a98d62c-3258-4bb3-84dd-378ee4a4c7b1"}]}
 ({:profile/username "jacob",
   :profile/title "Jacob's profile",
   :profile/user #uuid "e86e5e14-0001-46eb-9d11-134162ce930f",
   :xt/id #uuid "3a98d62c-3258-4bb3-84dd-378ee4a4c7b1"})]
result after updating the title (working again):
[#{[{:profile/username "jacob",
     :profile/title "Jacob's top 8",
     :profile/user #uuid "e86e5e14-0001-46eb-9d11-134162ce930f",
     :xt/id #uuid "3a98d62c-3258-4bb3-84dd-378ee4a4c7b1"}]}
 #{[{:profile/username "jacob",
     :profile/title "Jacob's top 8",
     :profile/user #uuid "e86e5e14-0001-46eb-9d11-134162ce930f",
     :xt/id #uuid "3a98d62c-3258-4bb3-84dd-378ee4a4c7b1"}]}
 ({:profile/username "jacob",
   :profile/title "Jacob's top 8",
   :profile/user #uuid "e86e5e14-0001-46eb-9d11-134162ce930f",
   :xt/id #uuid "3a98d62c-3258-4bb3-84dd-378ee4a4c7b1"})]

Jacob O'Bryant20:12:10

For now I'll probably lookup documents with the third query--i.e. load them all and then filter outside the query. It's fast enough for my situation, though obviously a bit unsettling 😬

Jacob O'Bryant20:12:23

Here's a transaction I just ran which caused the bug to manifest:

([:xtdb.api/match
  #uuid "3a98d62c-3258-4bb3-84dd-378ee4a4c7b1"
  {:profile/username "jacob",
   :profile/title "abc 123",
   :profile/user #uuid "e86e5e14-0001-46eb-9d11-134162ce930f",
   :xt/id #uuid "3a98d62c-3258-4bb3-84dd-378ee4a4c7b1"}]
 [:xtdb.api/put
  {:profile/username "jacob",
   :profile/title "hello there",
   :profile/user #uuid "e86e5e14-0001-46eb-9d11-134162ce930f",
   :xt/id #uuid "3a98d62c-3258-4bb3-84dd-378ee4a4c7b1"}]
 [:xtdb.api/fn :biff/ensure-unique #:profile{:username "jacob"}])

Jacob O'Bryant21:12:40

OK, looks like it's probably something to do with the transaction function. I ran similar transactions several times without it and the queries worked. I ran two transactions in between with the tx fn back in, and it broke queries both times. Here's the result of (xt/entity db :biff/ensure-unique):

#:xt{:fn (fn [ctx kvs]
           (when (< 1
                    (count
                     (xtdb.api/q
                      (xtdb.api/db ctx)
                      {:find '[doc],
                       :limit 2,
                       :where (into [] (map (fn [[k v]] ['doc k v])) kvs)})))
             false)),
     :id :biff/ensure-unique}
(I possibly should've had that query use :in instead of inserting the parameters directly into :where, but presumably it should still work?)

Jacob O'Bryant21:12:19

OK, it looks like the bug happens when (1) I run a transaction that includes a tx fn, and (2) right before submitting the transaction, I pass it to with-tx:

(let [[node db] ...
      new-title "aaa"
      profile {:profile/username "jacob",
               :profile/title new-title
               :profile/user #uuid "e86e5e14-0001-46eb-9d11-134162ce930f",
               :xt/id #uuid "3a98d62c-3258-4bb3-84dd-378ee4a4c7b1"}
      tx [[:xtdb.api/put profile]
          [:xtdb.api/fn :biff/ensure-unique #:profile{:username "jacob"}]]]
  (println (some? (xt/with-tx db tx)))
  (xt/submit-tx node tx))
I've been re-evaluating that over and over with new values for new-title each time. If I comment out either the :xtdb.api/fn line or the with-tx line, my queries work fine. If I leave both the lines in, queries break. it's very consistent.

Jacob O'Bryant22:12:50

(For now I'll just disable biff/submit-tx's call to with-tx, which just means if there's contention then biff might try to keep re-rerunning the tx multiple times instead of aborting immediately--not a huge deal. So this isn't urgent. https://github.com/jacobobryant/biff/commit/ba6fb05c0539f18d315592dc5d7cb8b3de2eabe4)

Hukka08:12:22

I haven't checked the whole thing at all yet, but first thing I started thinking about is that https://github.com/jacobobryant/biff/blob/ba6fb05c0539f18d315592dc5d7cb8b3de2eabe4/src/com/biffweb/impl/xtdb.clj#L385 await-tx does not necessarily return the transaction you gave it. It makes sure the given transaction is done, and then returns the latest which might already be something else

Hukka08:12:23

Of course with-tx shouldn't change the db, but is it possible that it changes the timing?

Hukka08:12:45

And it doesn't explain the inconsistent queries you are running against the same db

Jacob O'Bryant16:12:04

oh, I wasn't aware of that--thanks for the tip! but yeah, I don't think it's related to the unexpected behavior here.

Jacob O'Bryant17:12:08

I guess with-tx must have some unintended mutation in its handling of transaction functions

refset22:12:55

Hey @U7YNGKDHA thanks for reporting this - I'll be investigating it tomorrow and will respond in due course once I have a repro

👌 1
refset00:12:10

Okay it's definitely an issue https://github.com/xtdb/xtdb/issues/1851 - this will get attention soon

Jacob O'Bryant01:12:37

Perfect, thanks for looking into this!

wotbrew15:12:52

Hey @U7YNGKDHA 👋 , just a heads up that the cause has been identified and a fix is being prepared. It is a rocks-specific caching issue. A fix should be available in the next RC (which I'm hoping to provide this week, all being well!) - thanks again for the report! 🙏

🙌 1
Jacob O'Bryant15:12:31

thank you again!

Jacob O'Bryant21:12:19

OK, it looks like the bug happens when (1) I run a transaction that includes a tx fn, and (2) right before submitting the transaction, I pass it to with-tx:

(let [[node db] ...
      new-title "aaa"
      profile {:profile/username "jacob",
               :profile/title new-title
               :profile/user #uuid "e86e5e14-0001-46eb-9d11-134162ce930f",
               :xt/id #uuid "3a98d62c-3258-4bb3-84dd-378ee4a4c7b1"}
      tx [[:xtdb.api/put profile]
          [:xtdb.api/fn :biff/ensure-unique #:profile{:username "jacob"}]]]
  (println (some? (xt/with-tx db tx)))
  (xt/submit-tx node tx))
I've been re-evaluating that over and over with new values for new-title each time. If I comment out either the :xtdb.api/fn line or the with-tx line, my queries work fine. If I leave both the lines in, queries break. it's very consistent.