Fork me on GitHub
#xtdb
<
2021-07-14
>
richiardiandrea21:07:06

Hi again, keeping up with our Crux evaluation...a question one of my DevOps team colleagues asked me today was: "Is there any technical limitation that would not allow to store the indices on Postgres?" I answered that I will forward this question here 😄

nivekuil21:07:01

what's the motivation? the indices are already stateless, which is ideal from an ops perspective 🙂 there's no technical limitation (cf. refset's redis index store) but in general it would be super slow to use a remote kv store. a big part of crux's performance is being able to do the bulk of the query I/O locally

✔️ 4
refset21:07:10

^ thanks @U797MAJ8M - that seems spot on to me

richiardiandrea21:07:14

> what's the motivation? The motivation is to avoid adding another component (RocksDB, LMDB) to our stack basically

richiardiandrea21:07:47

If we had everything on top of Postgres my DevOps team would have less work to do 😄

👌 2
refset21:07:11

how big are the data sets you're hoping to store?

richiardiandrea21:07:50

I don't think that's the point - and to be fair this is not a blocker on our side - we actually were just curious

richiardiandrea21:07:30

we might actually decide to live with the in-memory for the first use cases we implement and in the meantime contribute that ourselves 🙂

nivekuil21:07:52

it seems to be a common misconception, but rocksdb should be totally transparent to the ops team. it has the same operational profile as sqlite and is even more disposable

richiardiandrea21:07:43

well except that you go to their wiki and it is huuuge 😄 I am thinking LMDB is a better fit for us - it seems simpler and more suitable as index store

refset21:07:14

I asked because if it was guaranteed to only ever just be a few gigs then in-memory indexes are arguably an option too :thumbsup:

❤️ 2
richiardiandrea21:07:42

@U899JBRPF yep that makes sense - it is I think ~8 to 10 GB at the moment

nivekuil21:07:04

downside is, you can't checkpoint in-memory indices right?

✔️ 2
nivekuil21:07:14

so you'd be waiting a while for deploys

richiardiandrea21:07:30

yeah I think we can live with that too

richiardiandrea21:07:54

we are in a very slow-moving industry (public health care)

refset21:07:21

agreed that the 4112 issue is daunting...in practice we have managed to avoid tripping over it by configuring things to be ultra-conservative just in case

👍 2
Steven Deobald22:07:43

Folks who are using full-text search, 1.18.0-rc1 is out (mostly with changes to crux-lucene). We'd appreciate if you'd kick the tires (tyres?). https://search.maven.org/search?q=g:pro.juxt.crux No changelog yet, but the recent commits give a good sense of what's changed: https://github.com/juxt/crux/commits/master Notably: • Lucene registers itself as a secondary index, resolving 'tx mismatch' errors and allowing "out of sync" Crux vs. Lucene indices • Lucene index is now asynchronously updated • lucene/search is now a public API • Fix for https://github.com/juxt/crux/issues/1221 (Checkpointing compat) • Fix for https://github.com/juxt/crux/issues/1538 (Lucene now validates its own index version) • Fix for https://github.com/juxt/crux/issues/1540 (In-memory Lucene)

🙌 8
nivekuil22:07:35

I had actually disabled lucene temporarily so I could see how well the new async indexing works, and it did so flawlessly! re-enabled and caught up without a hitch

🙏 2
nivekuil05:07:56

one problem is that :poll-wait-duration is now gone from the kafka tx log, and is renamed and hard coded here https://github.com/juxt/crux/blob/master/crux-kafka/src/crux/kafka.clj#L159

nivekuil05:07:51

another problem, and this one is harder to track down, is that the kafka tx log seems to be constantly reconnecting every ~1 second, going off the kafka driver output

nivekuil05:07:13

i.e. it prints out

2021-07-15T05:38:55.588Z machina INFO [org.apache.kafka.common.utils.AppInfoParser:119] - Kafka version: 2.8.0
2021-07-15T05:38:55.588Z machina INFO [org.apache.kafka.common.utils.AppInfoParser:120] - Kafka commitId: ebb1d6e21cc92130
2021-07-15T05:38:55.589Z machina INFO [org.apache.kafka.common.utils.AppInfoParser:121] - Kafka startTimeMs: 1626327535588
2021-07-15T05:38:55.589Z machina INFO [org.apache.kafka.clients.consumer.KafkaConsumer:1120] - [Consumer clientId=consumer-null-154, groupId=null] Subscribed to partition(s): crux-tx-log-0
2021-07-15T05:38:55.589Z machina INFO [org.apache.kafka.clients.consumer.KafkaConsumer:1582] - [Consumer clientId=consumer-null-154, groupId=null] Seeking to offset 11 for partition crux-tx-log-0
and so on every second

nivekuil05:07:06

it looks like https://github.com/juxt/crux/blob/master/crux-kafka/src/crux/kafka.clj#L148 is getting called from tx-sub/handle-polling-subscription in a loop and it's calling (->consumer) every time

jarohen08:07:38

great feedback as always @U797MAJ8M, thanks! 🙏

jarohen09:07:42

@U797MAJ8M: pushed a change for this one as 1.18.0-SNAPSHOT - thanks again for flagging 🙂

nivekuil10:07:53

tools.deps can't find it -- is it published to maven central?

jarohen11:07:36

oh, yep - although Clojure tooling doesn't pick up central snapshots, they're in a different repo

jarohen11:07:42

one mo, will push to Clojars too

jarohen11:07:15

except I can't - Clojars bans shadowing Central artifacts (probably for the best)

jarohen11:07:52

mind adding as a repo? (I realise the irony in this, trying to migrate to Maven Central so that folk don't have to add repos 😅 )

jarohen11:07:09

or I can release rc2

nivekuil11:07:29

I got it, seems to be all fixed now. I think the most convenient thing I've seen about clojure tooling is being able to point a dep straight at a github sha, but I think you need a deps.edn for that to work and i'm not sure how well it works with subprojects

nivekuil11:07:19

maybe there's a better way to interop with lein, but I've just been eyeballing changes in my local crux repo and building jars when I hack around on crux instead of actually using a repl

jarohen11:07:11

yeah - we've had to hold off tools.deps (well, source dependencies) until now, because Crux has Java files to compile. hopefully with the new :deps/prep-lib we can trial it