Fork me on GitHub
#xtdb
<
2022-04-11
>
refset09:04:16

Hey everyone - 1.21.0-beta3 is now available! This essentially just includes a few recent bug fixes (on top of the changes in -beta1 and -beta2) that we are keen to have a few users double-check before the main release. I appended a few extra GitHub issues to the previous beta2 release notes here: https://discuss.xtdb.com/t/ann-1-21-0-beta3s-out/77

🚀 3
xt 1
genekim15:04:22

I spent a couple of hours yesterday getting xtdb working with JDBC and MySQL running in Google Cloud SQL, with everything shifted to UTC — that decision was a result of the rather incredible experience of my laptop timezone changing while on a east to west flight. I had problems with transactions not showing up, and I had incorrectly presumed it was due to timezone issues. But instead, it was something else entirely. I had about 245 transactions to insert, so I put them into (doseq [m meetings] (future (zm/save! m) — I just didn’t want to wait for all the transactions to finish before getting my REPL prompt back. (My observation was throughput was around 2-3 puts/second.) What resulted was only about 230 records being created. (!! I thought it was due to timestamps again.) When I finally took out the future, everything worked again — much to my relief. But it brings up the question of why the put! operations failed when put into a future, and how one can do that safely (if at all) — Many thanks! --Gene

jarohen15:04:49

@U6VPZS1EK this sounds quite similar to https://github.com/xtdb/xtdb/issues/1603, for which we have a PR going in to the upcoming 1.21.0 release - if it is, it should be fixed in beta3. Could you confirm your XT version?

refset15:04:29

Is your definition of put!/`save!` approximately just something like (defn put! [node doc] (xt/submit-tx node [[::xt/put doc]])) ? Or are you calling await-tx / sync in there also?

genekim16:04:04

@U050V1N74 That’s fantastic! (Well, you know, fantastic that this could be a known issue around race condition and already fixed. Not fantastic, as we all know that this class of problem are some of the most difficult. 🙂 @taylor.jeremydavid Correct, I merely submitted :tx, but did not await/sync — in fact, when I did do an await or sync, it just hung, despite :tx-time being after the current time! I will try upgrading to beta3 in the next hour or so, and give it a try, and will keep you posted.

genekim16:04:06

Thank you, much!!!! FWIW, this level of help/support is an utter delight — I so much appreciate it, and IMHO, bodes well for any commercial offering you choose to make around xtdb!

🙏 1
Hukka16:04:25

The table lock in https://github.com/xtdb/xtdb/pull/1707/files#diff-3666bdb5606704c7f8cfedda843dc0895c7f0f353e4716abac0a3619242c7072 seems like a pretty big tradeoff. But I suppose if you need more scalability, there's kafka then

✔️ 1
genekim17:04:48

Err… Now only 198 of the 245 records got written. Next run, only 187 records got written. This does seem like a multithreading issue… 🙂 Also, getting this strange error:

(doseq [m (take 15 meetings)]
    (zm/save! jdbc m))
Execution error (AbstractMethodError) at xtdb.node.XtdbNode/submit_tx_async (node.clj:221).
Receiver class xtdb.jdbc.JdbcTxLog does not define or inherit an implementation of the resolved method 'abstract java.lang.Object submit_tx(java.lang.Object, java.lang.Object)' of interface xtdb.db.TxLog.
Okay, gotta get on a Zoom call for a couple of hours, but I hope this helps!

genekim17:04:36

(I originally was using 1.20.0, and the latest was using 1.21.0-beta3.)

refset20:04:29

>

Execution error (AbstractMethodError) at xtdb.node.XtdbNode/submit_tx_async (node.clj:221)
Ah I've seen this recently - I think it's symptomatic of attempting to upgrade to the beta releases and you can resolve it with a lein clean (IIRC)

genekim20:04:44

(@taylor.jeremydavid: I was just writing this up as an issue in GitHub… I pulled down 1.21.0-beta3 via deps.edn — are you suggesting I can resolve the issue somehow, or is this a build/release issue on your side?) 🙏

genekim21:04:14

Solution found, with @taylor.jeremydavid help: you must upgrade both com.xtdb/xtdb-jdbc and com.xtdb/xtdb-core — “they must be in lockstep.” (I had only updated com.xtdb/xtdb-core to “1.21.0-beta3") Ah! Fixed! Thank you @taylor.jeremydavid!

🙌 1
👍 2
refset18:04:28

Hey @U6VPZS1EK - just making sure you see this: https://github.com/xtdb/xtdb/issues/1603#issuecomment-1097092135 - does that look/sound about right? I did this from memory (not by watching the recording), so maybe I'm missing something.

genekim21:04:13

Really!!! I had no idea that you were required to realize/deref the futures?? I'll check this out. It would be a great answer if deref’ing is required — I'll definitely search around to find this requirement. (And if this is the case, I suddenly have fear of all the other places in my code where I didn't do this, either!!! 😂😱😱

genekim21:04:51

Huh…. Not an authoritative source, but definitely making me think I have a fundamental misunderstanding of futures. :man-facepalming:

genekim22:04:00

Thank you for ask this help!! I’ll keep you posted — I will attempt to reproduce tonight

refset22:04:11

Ah oops, I didn't mean to imply that it's required to deref futures in general, just that in order for your use-case to work I don't really see a way around it :thinking_face: that said, I'm definitely a future/threading-novice so I may be missing some obvious tricks 😅 I could imagine creating some mechanism (with core.async or whatever) with a decrementing atom counter, where the xt/sync only gets called (or some other code gets 'notified') after the last future succeeds and the count reaches 0 ...but I don't really know if that would actually be useful to you.

jarohen08:04:31

deref'ing the future in this case turns it into a no-op, though?

jarohen08:04:44

as in, you might as well not bother with the future

✔️ 1
Hukka07:04:35

Well, if you deref it in the loop. But the idea would be to first create a seq of futures, and after that deref them all to make sure they are realized. But that is getting pretty hacky for just doing sideeffects in parallel. In general I recommend using https://github.com/clj-commons/claypoole for all parallel stuff in clojure, instead of the core functions. In these case using pdoseq would make sure that everything inside happens as soon as possible, no memory is used for the results, and the top level call will return only after all the tasks are done

refset07:04:48

Hey @U8ZQ1J1RR thanks for chiming in! There were a couple of comments on the issue that dug into the underlying problem/confusion some more...I guess Claypoole has some nice answers for that too :thinking_face: https://github.com/xtdb/xtdb/issues/1603#issuecomment-1097563892

Hukka07:04:30

Let's see

Hukka07:04:13

Ok, so a bit guessing here, but mysql throws when the parallel submissions cannot get a table lock at the same time, and they give up trying?

✔️ 1
Hukka07:04:24

And that can in principle happen without any parallelism in the application, just having two nodes can collide? That's… ugh 😞. Sounds like a retry helpers are a critical, not optional feature in any code that is meant to scale.

Hukka07:04:46

And indeed with the new table locks, there's probably no benefit in local parallelism anyway since the SQL db is going to make everything sequential anyway

Hukka07:04:29

Hmh, actually I guess that even with just node, having a bog standard jetty might hit this problem since it does multithread the request handling

Hukka07:04:51

Perhaps worth mentioning in https://docs.xtdb.com/clients/clojure/#_submit_tx that submit-tx will not handle exceptions from underlying implementations, and that retrying is the caller's responsibility?

Hukka07:04:02

I haven't really checked the implementations, but can MySQL really throw when trying to get a single table locked, and it just reacts to some internal retries running out? Or does that only happen when really two different transactions get a lock on two (or more) tables in different order, this nobody ever getting all the locks to proceed?

Hukka08:04:08

I'm way over my head here since I don't use MySQL, but does "SELECT * FROM %s ORDER BY event_offset LIMIT 1 FOR UPDATE" lock every row in the table, since there is no where, or just the one since there's a limit? And if it does lock every row, does it do it in the same order? Can simply running two of those in parallel deadlock since the transactions start locking in different order and nobody gets every row?

Hukka08:04:13

https://dev.mysql.com/doc/refman/8.0/en/innodb-locking-reads.html > For index records the search encounters, locks the rows and any associated index entries sounds like it could be every index it hits, not just what it returns

Hukka08:04:12

@U6VPZS1EK Have you tested the same code with postgres backend? That uses an explicit table lock, and I don't see any other locking operations in that module, so I wonder if that will never deadlock

refset08:04:12

> Have you tested the same code with postgres backend? I've not got this all booted up enough to respond properly to your other excellent comments/questions just yet (i.e. I'd need to spend a few hours digging in!), but having spoken to Gene earlier in the week I know the answer to this question is 'no' 🙂 I also know he's not working on big or hugely important system that justifies running a separate non-MySQL db at this point

Nundrum22:04:04

Why can't I list more elements in the :find clause than are present in the :where?

Nundrum22:04:44

Or maybe a better question is: how do I return a full doc instead of just the element used in the search?

Nundrum23:04:15

But it seems not in a q(uery)?

alexdavis23:04:52

Pull is used in the find binding in a q

Nundrum22:04:04

Ah. Thanks. I didn't understand that the first time through.