Fork me on GitHub
#xtdb
<
2020-11-25
>
nivekuil05:11:26

what could one do to minimize read-after-write latency on the same node? I think lowering :poll-sleep-duration would do a lot -- is there a minimum/recommendation? I'm also curious what's in the way of a crux node writing to its own index, other than design elegance, as you have the doc and the tx from submit-tx already and that's all the indexer is polling for I think?

refset11:11:23

We haven't established a baseline minimum but I would be happy to help you figure it out for your setup. I expect Kafka should be able to cope fine with 5ms polling, possibly lower. The cost will be increased idle CPU usage and network traffic

nivekuil11:11:15

I turned jdbc (postgres) down to 10ms polls and it looks like the idle cpu usage of my container went up ￱~￱3x (from very little). Not sure how that scales with db size but I'd think a shorter interval is probably ok.. 1s is quite long as a default

refset11:11:01

For JDBC, which is what this question relates to, the costs will definitely be higher given all the things an RDBMS has to do. We could definitely implement a non-polling solution, or help you do so. Since most (all?) JDBC backends offer LISTEN/NOTIFY functionality for "pushing" updates around. It would probably completely eliminate all the polling overheads. We just haven't made it a priority yet 🙂

nivekuil11:11:50

thanks for all the offers of help :p don't think this is a priority as of yet, but I'm very excited to try out the lucene stuff in the works! I'm still figuring out what a good production stack would look like, probably giving redpanda (the new c++ kafka) a go next

refset11:11:25

Cool, sounds interesting! Which JDBC database are you working with? Postgres? If you want to play with Lucene now we already pushed a set of Release Candidate snapshots up to Clojars: https://clojars.org/juxt/crux-lucene/versions/20.11-1.13.1-alpha-SNAPSHOT And the docs: https://github.com/juxt/crux/blob/master/docs/reference/modules/ROOT/pages/lucene.adoc

nivekuil11:11:32

yup, postgres. I'm sure a lot of special optimization can be done there, maybe making use of hstore for k/v and whatever timescaledb uses for the log

👌 3
nivekuil11:11:49

oh, there's docs. very nice, will take a look

🙂 3
nivekuil05:11:02

also, I think the docs and the code disagree on the default poll-sleep-duration; docs say 1 second, code says 100ms? https://opencrux.com/reference/jdbc.html#_transaction_log_crux_jdbc_tx_log https://github.com/juxt/crux/blob/master/crux-core/src/crux/tx.clj#L490

refset11:11:31

(reply deleted - I was looking at Kafka!)

refset11:11:11

oh 😄 I'll have another think in that case

refset11:11:33

Yes I think I agree the docs are out of sync with what the code is showing. I'll make sure we discuss & update it soon, thanks 🙂

dominicm09:11:46

I'm not sure you want to write it locally, as you'd be missing some documents but not others. Some of the log readers are real-time, they steam as soon as it hits the log.

nivekuil09:11:15

ah yeah that would break crux's guarantee of tx ordering within a cluster. which tx log is real-time? I see both jdbc and kafka have the poll wait option (in kafka it's called poll-wait-duration instead of poll-sleep-duration?)

dominicm09:11:20

I think the rocks one might be. The redis one I wrote is streaming-with-timeout.

nivekuil09:11:22

I dunno actually, the doc store can be eventually consistent, so maybe it would be in line with crux's semantics to have the doc for the local tx but not the docs for earlier ones?

nivekuil09:11:02

this is the one right? https://git.sr.ht/~severeoverfl0w/redis-crux/tree/master/src/io/dominic/crux/redis.clj redis seems like a much better fit than s3 stores for docs given the sizes involved but I wonder about the durability

dominicm09:11:03

That's doc store and also a log

dominicm09:11:17

Redis can be configured for good durability.

nivekuil09:11:36

if it were durable enough then I was thinking redis could be used as a write-through cache in front of s3, and the indexers could pull from redis

nivekuil12:11:58

trying out lucene, but I'm not sure how this is supposed to work. I see content in the lucene directory, but can't seem to get a query to return anything

;; returns stuff
(q {:find  '[?e]
    :where '[[?e :entry/title "Wednesday Poem"]]})

;; returns #{}
(q {:find  '[?e ?v ?s]
    :where '[[(text-search :entry/title "W*") [[?e ?v ?s]]]
             [?e :crux.db/id]]})

refset12:11:13

Weird. That looks to me like it should work. Did you update the Clojars coordinate for crux-core also?

nivekuil12:11:15

yup, everything's on 20.11

nivekuil12:11:34

maybe it was improperly indexed, actually. text-search does work when I add a new doc manually. although there is stuff that looks like my domain data in the index dir

refset12:11:24

Well I've managed to reproduce the issue locally. I think we may not have accounted for namespaced attributes properly and I can't see that we have a test to check them... Do you also see the problem for non-namespaced attributes?

nivekuil12:11:22

ah no, my successful test case was just :foo :)

nivekuil12:11:54

:foo/foo appears to be broken, yup

refset13:11:39

great, okay I will write a failing test and open an issue. Thanks for testing the module out - I'm very sorry you hit a stumbling block so soon. We will definitely get it fixed before the main release 🙂

nivekuil13:11:02

no worries! all part of the fun. glad I could help

🙏 3