This page is not created by, affiliated with, or supported by Slack Technologies, Inc.
2021-07-14
Channels
- # announcements (3)
- # babashka (189)
- # beginners (157)
- # calva (5)
- # cider (5)
- # clj-kondo (7)
- # cljdoc (34)
- # clojure (61)
- # clojure-dev (2)
- # clojure-europe (42)
- # clojure-nl (15)
- # clojure-poland (1)
- # clojure-spec (5)
- # clojure-uk (6)
- # clojured (2)
- # clojurescript (31)
- # clojureverse-ops (8)
- # component (2)
- # cursive (41)
- # datomic (15)
- # depstar (44)
- # figwheel-main (9)
- # fulcro (14)
- # holy-lambda (1)
- # inf-clojure (13)
- # introduce-yourself (1)
- # jobs (1)
- # lsp (98)
- # malli (12)
- # off-topic (12)
- # pedestal (1)
- # polylith (3)
- # re-frame (51)
- # reitit (4)
- # releases (1)
- # reveal (5)
- # shadow-cljs (3)
- # tools-deps (56)
- # vim (12)
- # xtdb (36)
Hi again, keeping up with our Crux evaluation...a question one of my DevOps team colleagues asked me today was: "Is there any technical limitation that would not allow to store the indices on Postgres?" I answered that I will forward this question here š
what's the motivation? the indices are already stateless, which is ideal from an ops perspective š there's no technical limitation (cf. refset's redis index store) but in general it would be super slow to use a remote kv store. a big part of crux's performance is being able to do the bulk of the query I/O locally
^ thanks @U797MAJ8M - that seems spot on to me
> what's the motivation? The motivation is to avoid adding another component (RocksDB, LMDB) to our stack basically
If we had everything on top of Postgres my DevOps team would have less work to do š
I don't think that's the point - and to be fair this is not a blocker on our side - we actually were just curious
we might actually decide to live with the in-memory for the first use cases we implement and in the meantime contribute that ourselves š
it seems to be a common misconception, but rocksdb should be totally transparent to the ops team. it has the same operational profile as sqlite and is even more disposable
well except that you go to their wiki and it is huuuge š I am thinking LMDB is a better fit for us - it seems simpler and more suitable as index store
plus - this is not that great to read https://github.com/facebook/rocksdb/issues/4112
I asked because if it was guaranteed to only ever just be a few gigs then in-memory indexes are arguably an option too :thumbsup:
@U899JBRPF yep that makes sense - it is I think ~8 to 10 GB at the moment
yeah I think we can live with that too
we are in a very slow-moving industry (public health care)
agreed that the 4112 issue is daunting...in practice we have managed to avoid tripping over it by configuring things to be ultra-conservative just in case
Folks who are using full-text search, 1.18.0-rc1
is out (mostly with changes to crux-lucene
). We'd appreciate if you'd kick the tires (tyres?).
https://search.maven.org/search?q=g:pro.juxt.crux
No changelog yet, but the recent commits give a good sense of what's changed:
https://github.com/juxt/crux/commits/master
Notably:
ā¢ Lucene registers itself as a secondary index, resolving 'tx mismatch' errors and allowing "out of sync" Crux vs. Lucene indices
ā¢ Lucene index is now asynchronously updated
ā¢ lucene/search
is now a public API
ā¢ Fix for https://github.com/juxt/crux/issues/1221 (Checkpointing compat)
ā¢ Fix for https://github.com/juxt/crux/issues/1538 (Lucene now validates its own index version)
ā¢ Fix for https://github.com/juxt/crux/issues/1540 (In-memory Lucene)
I had actually disabled lucene temporarily so I could see how well the new async indexing works, and it did so flawlessly! re-enabled and caught up without a hitch
Kudos, @U050V1N74 š
one problem is that :poll-wait-duration
is now gone from the kafka tx log, and is renamed and hard coded here https://github.com/juxt/crux/blob/master/crux-kafka/src/crux/kafka.clj#L159
another problem, and this one is harder to track down, is that the kafka tx log seems to be constantly reconnecting every ~1 second, going off the kafka driver output
i.e. it prints out
2021-07-15T05:38:55.588Z machina INFO [org.apache.kafka.common.utils.AppInfoParser:119] - Kafka version: 2.8.0
2021-07-15T05:38:55.588Z machina INFO [org.apache.kafka.common.utils.AppInfoParser:120] - Kafka commitId: ebb1d6e21cc92130
2021-07-15T05:38:55.589Z machina INFO [org.apache.kafka.common.utils.AppInfoParser:121] - Kafka startTimeMs: 1626327535588
2021-07-15T05:38:55.589Z machina INFO [org.apache.kafka.clients.consumer.KafkaConsumer:1120] - [Consumer clientId=consumer-null-154, groupId=null] Subscribed to partition(s): crux-tx-log-0
2021-07-15T05:38:55.589Z machina INFO [org.apache.kafka.clients.consumer.KafkaConsumer:1582] - [Consumer clientId=consumer-null-154, groupId=null] Seeking to offset 11 for partition crux-tx-log-0
and so on every secondit looks like https://github.com/juxt/crux/blob/master/crux-kafka/src/crux/kafka.clj#L148 is getting called from tx-sub/handle-polling-subscription
in a loop and it's calling (->consumer)
every time
great feedback as always @U797MAJ8M, thanks! š
@U797MAJ8M: pushed a change for this one as 1.18.0-SNAPSHOT
- thanks again for flagging š
oh, yep - although Clojure tooling doesn't pick up central snapshots, they're in a different repo
mind adding
as a repo? (I realise the irony in this, trying to migrate to Maven Central so that folk don't have to add repos š
)
I got it, seems to be all fixed now. I think the most convenient thing I've seen about clojure tooling is being able to point a dep straight at a github sha, but I think you need a deps.edn for that to work and i'm not sure how well it works with subprojects