Fork me on GitHub

Hi again, keeping up with our Crux evaluation...a question one of my DevOps team colleagues asked me today was: "Is there any technical limitation that would not allow to store the indices on Postgres?" I answered that I will forward this question here šŸ˜„


what's the motivation? the indices are already stateless, which is ideal from an ops perspective šŸ™‚ there's no technical limitation (cf. refset's redis index store) but in general it would be super slow to use a remote kv store. a big part of crux's performance is being able to do the bulk of the query I/O locally

āœ”ļø 4

^ thanks @U797MAJ8M - that seems spot on to me


> what's the motivation? The motivation is to avoid adding another component (RocksDB, LMDB) to our stack basically


If we had everything on top of Postgres my DevOps team would have less work to do šŸ˜„

šŸ‘Œ 2

how big are the data sets you're hoping to store?


I don't think that's the point - and to be fair this is not a blocker on our side - we actually were just curious


we might actually decide to live with the in-memory for the first use cases we implement and in the meantime contribute that ourselves šŸ™‚


it seems to be a common misconception, but rocksdb should be totally transparent to the ops team. it has the same operational profile as sqlite and is even more disposable


well except that you go to their wiki and it is huuuge šŸ˜„ I am thinking LMDB is a better fit for us - it seems simpler and more suitable as index store


I asked because if it was guaranteed to only ever just be a few gigs then in-memory indexes are arguably an option too :thumbsup:

ā¤ļø 2

@U899JBRPF yep that makes sense - it is I think ~8 to 10 GB at the moment


downside is, you can't checkpoint in-memory indices right?

āœ”ļø 2

so you'd be waiting a while for deploys


yeah I think we can live with that too


we are in a very slow-moving industry (public health care)


agreed that the 4112 issue is practice we have managed to avoid tripping over it by configuring things to be ultra-conservative just in case

šŸ‘ 2
Steven Deobald22:07:43

Folks who are using full-text search, 1.18.0-rc1 is out (mostly with changes to crux-lucene). We'd appreciate if you'd kick the tires (tyres?). No changelog yet, but the recent commits give a good sense of what's changed: Notably: ā€¢ Lucene registers itself as a secondary index, resolving 'tx mismatch' errors and allowing "out of sync" Crux vs. Lucene indices ā€¢ Lucene index is now asynchronously updated ā€¢ lucene/search is now a public API ā€¢ Fix for (Checkpointing compat) ā€¢ Fix for (Lucene now validates its own index version) ā€¢ Fix for (In-memory Lucene)

šŸ™Œ 8

I had actually disabled lucene temporarily so I could see how well the new async indexing works, and it did so flawlessly! re-enabled and caught up without a hitch

šŸ™ 2

one problem is that :poll-wait-duration is now gone from the kafka tx log, and is renamed and hard coded here


another problem, and this one is harder to track down, is that the kafka tx log seems to be constantly reconnecting every ~1 second, going off the kafka driver output


i.e. it prints out

2021-07-15T05:38:55.588Z machina INFO [org.apache.kafka.common.utils.AppInfoParser:119] - Kafka version: 2.8.0
2021-07-15T05:38:55.588Z machina INFO [org.apache.kafka.common.utils.AppInfoParser:120] - Kafka commitId: ebb1d6e21cc92130
2021-07-15T05:38:55.589Z machina INFO [org.apache.kafka.common.utils.AppInfoParser:121] - Kafka startTimeMs: 1626327535588
2021-07-15T05:38:55.589Z machina INFO [org.apache.kafka.clients.consumer.KafkaConsumer:1120] - [Consumer clientId=consumer-null-154, groupId=null] Subscribed to partition(s): crux-tx-log-0
2021-07-15T05:38:55.589Z machina INFO [org.apache.kafka.clients.consumer.KafkaConsumer:1582] - [Consumer clientId=consumer-null-154, groupId=null] Seeking to offset 11 for partition crux-tx-log-0
and so on every second


it looks like is getting called from tx-sub/handle-polling-subscription in a loop and it's calling (->consumer) every time


great feedback as always @U797MAJ8M, thanks! šŸ™


@U797MAJ8M: pushed a change for this one as 1.18.0-SNAPSHOT - thanks again for flagging šŸ™‚


tools.deps can't find it -- is it published to maven central?


oh, yep - although Clojure tooling doesn't pick up central snapshots, they're in a different repo


one mo, will push to Clojars too


except I can't - Clojars bans shadowing Central artifacts (probably for the best)


mind adding as a repo? (I realise the irony in this, trying to migrate to Maven Central so that folk don't have to add repos šŸ˜… )


or I can release rc2


I got it, seems to be all fixed now. I think the most convenient thing I've seen about clojure tooling is being able to point a dep straight at a github sha, but I think you need a deps.edn for that to work and i'm not sure how well it works with subprojects


maybe there's a better way to interop with lein, but I've just been eyeballing changes in my local crux repo and building jars when I hack around on crux instead of actually using a repl


yeah - we've had to hold off tools.deps (well, source dependencies) until now, because Crux has Java files to compile. hopefully with the new :deps/prep-lib we can trial it