Fork me on GitHub
#xtdb
<
2020-09-03
>
tolitius00:09:48

using crux-jdbc: crux => rocksdb => postgres. have 3 node nomad cluster with 3 instances of an app running that use crux. loaded 300K entities. have several more millions to load. my observations: index files are 1GB on each node + 1GB on each laptop that need to work with the app (i.e. REPL) I expect this index to grow to 20GB once 5 to 6 millions are loaded. + other such indices for other type of entities. I guess my use case is not very typical because I don’t need queries across entities and only need history per entity: i.e. (crux/entity-history ..) now to the question: is there a recommended way to completely bypass indices: • still being able to insert transactions with :crux/put • not replaying them transactions on start • not relying on any state rather than SQL database (postgres in this case) • still be able to (crux/entity-history ..) but directly from SQL database I understand that it’s an I/O call vs. a speedy rocksdb query, this is perfectly ok for the business use case

3
👌 3
mauricio.szabo13:09:33

There was some discussion on implementing an index kv-store over Redis, so maybe that's the answer?

tolitius13:09:12

but that adds redis to the deployment / topology I’d like to see if there is a way to keep state in one place which already is great at keeping state: postgres (i.e. or alike). I don’t mean to say “crux should have this way”, just want to understand whether it is something feasible or even makes sense within a current crux mindset / future plans / architecture

mauricio.szabo19:09:52

Yes, but it does not need to be Redis: you could implement a KV over PostgreSQL for example 🙂. I was studying a little bit on how to do it, then got a little bit off-track because what I was thinking was quite different from what Crux expects 😅

tolitius19:09:27

yea, I have a hunch I can jam it in via a noop k/v, but while it would work on the way in (writes) I am not sure it will prevent replaying indices / entity lookups .. reads

refset22:09:46

So just to confirm, this would be only using the entity and entity-history APIs? A Postgres KV store is viable, though would ordinarily be very slow, but performance should be acceptable for such a limited requirement. You could even avoid persisting the default "triple" indexes if you know you'll never want to use them via Datalog queries

tolitius22:09:02

> this would be only using the entity and entity-history APIs yes, this is correct + the put (writes) when you say A Postgres KV store do you mean rolling out a new k/v store? the thing is crux-jdbc already does all the queries I need: i.e. I don’t really need a k/v store since postgres already serves that purpose ) but the read API all go via the index-store plus the poller is.. always polling (i.e. open-tx-log) which is not needed in this case that’s why I was not sure about it being as simple as just a noop k/v store

refset23:09:29

Yep, I was I thinking of a new k/v store which would use separate tables to the crux-jdbc backend. Postgres can't trivially SQL query the contents stored within it (by crux-jdbc), and also the temporal indexes need to be materialised somewhere outside of crux-jdbc in order for such a scheme to work at all. You can't query entity history with nothing but a tx-log and doc-store, which is what crux-jdbc is persisting

tolitius00:09:36

right, ok. this is closer to what I thought. the problem with this approach is way too many queries to read out one entity or one entity history I’ll probably try to see whether I can come up with something different. I don’t mean at all that crux should support it and I understand that this is most likely a too of a corner case to bend the core.

tolitius00:09:00

as a result I think I’ll have a better idea on how to make things better in crux-jdbc as well for example: two things that I noticed so far: 1. crux-jdbc updates the existing entity (`docs` topic) with exactly the same data: i.e. there most likely no reason for this update 2. when you ask for the history, fetch-docs is called as many times as there are history items: i.e. 30 items? 30 queries things like that..

refset21:09:59

1. yes this is a known performance bug (only spotted in the last week or so, admittedly), we have a comment/issue in the backlog to prevent the unnecessary writes 2. interesting! I agree, we can probably do some batching here to improve performance thanks for taking such a close look!

tolitius21:09:54

of course, this is all very interesting ) thanks for being so responsive, this is great

💯 3