This page is not created by, affiliated with, or supported by Slack Technologies, Inc.
2020-08-22
Channels
- # babashka (2)
- # beginners (81)
- # calva (5)
- # chlorine-clover (3)
- # cider (1)
- # cljsjs (1)
- # cljsrn (24)
- # clojure (67)
- # clojure-europe (3)
- # clojurescript (37)
- # code-reviews (2)
- # conjure (12)
- # core-async (4)
- # datalog (1)
- # datomic (6)
- # emacs (2)
- # figwheel-main (1)
- # graalvm (12)
- # java (4)
- # kaocha (9)
- # meander (3)
- # other-lisps (1)
- # pathom (14)
- # re-frame (2)
- # sci (32)
- # shadow-cljs (77)
- # sql (88)
- # xtdb (54)
thank you for crux. starting to look at today ) have a couple of questions: https://gist.github.com/tolitius/bc981b2d6c4e8ff47ae65d3776501ea5
Hey, cool! I'll respond here as the answers are short and turnaround will be quicker. Question 1 - correct, there can only be one version of a document per #inst (/range) per transaction Question 2 - there is no built-in way to aggregate across historical versions currently (this is something we are thinking about though). It might be worth considering splitting these "attributes" into separate documents with distinct IDs, then there is no merge/roll-up necessary and you can put as many as you like in the same transaction at the same point in time. Or have you already discounted that?
The other option to consider is a merge transaction function: https://clojurians.slack.com/archives/CG3AM2F7V/p1596880886033600?thread_ts=1596880886.033600&cid=CG3AM2F7V
great thank you. both answers make sense ) I did not expect you would answer that quickly on a Saturday, so I just peeked back I added a 3rd question about a tx log replay
https://gist.github.com/tolitius/bc981b2d6c4e8ff47ae65d3776501ea5#replaying-tx-log-on-start
For Q3, have you added rocksdb to your topology? If the indexes are persisted then the startup time should be nearly instant, and the Crux node will only have to catch up on transactions it might have missed (assuming there are other nodes still online submitting transactions while the node in question is offline)
replaying from the beginning of the log should only be necessary when performing an upgrade of Crux
> For Q3, have you added rocksdb to your topology? no, it’s “app => crux lib => postgres” is there anything specific I should do (start the node in a certain way) to make sure the indices are persisted?
here is how start the node:
(cx/start-node {:crux.node/topology '[crux.jdbc/topology]
:crux.jdbc/dbtype adapter
:crux.jdbc/dbname dbname
:crux.jdbc/host host
:crux.jdbc/user username
:crux.jdbc/password password})
stopping it as:
(.close node)
if I start it again after the (.close node)
I see:
2020-08-22T13:17:11,593 [crux-polling-tx-consumer] DEBUG crux.tx - Indexing tx-id: 2
2020-08-22T13:17:11,840 [crux-polling-tx-consumer] DEBUG crux.tx - Indexing tx-id: 4
2020-08-22T13:17:11,879 [crux-polling-tx-consumer] DEBUG crux.tx - Indexing tx-id: 6
2020-08-22T13:17:11,924 [crux-polling-tx-consumer] DEBUG crux.tx - Indexing tx-id: 8
2020-08-22T13:17:11,967 [crux-polling-tx-consumer] DEBUG crux.tx - Indexing tx-id: 10
2020-08-22T13:17:12,008 [crux-polling-tx-consumer] DEBUG crux.tx - Indexing tx-id: 12
2020-08-22T13:17:12,050 [crux-polling-tx-consumer] DEBUG crux.tx - Indexing tx-id: 14
...
The instructions here show the standalone topology (where you're using jdbc) but the idea is the same: https://www.opencrux.com/reference/rocksdb.html
so you really just need to add rux.kv.rocksdb/kv-store
in your topology vector and add a location under :crux.kv/db-dir
ah.. interesting. something like this?
(cx/start-node {:crux.node/topology '[crux.jdbc/topology
crux.kv.rocksdb/kv-store]
:crux.jdbc/dbtype adapter
:crux.jdbc/dbname dbname
:crux.jdbc/host host
:crux.jdbc/user username
:crux.jdbc/password password
:crux.kv/db-dir (str (io/file "/anydir/to/keep/indices" "indexes"))})
would it keep the transactions in rocksdb as well?
crux.kv/db-dir
is wha confused me a bit, since my posgres is the actual DB in my caseno transactions don't get stored in Rocks also, when used like this. Rocks only holds the indexes. In this sense "db-dir" is misleading...because it's not referring to a Crux db instance, but a RocksDB "db"
got. it I am driving, so hard to try right away ) but thanks a lot! this complicates the deployment a bit because our apps run in nomad on multiple hosts, but I'll try to cook something up as a follow up (wishful) question: can I store indices in postgres instead? I understand that this is probably not supported at the moment. just curious if this is something that may later appear )
We don't offer any non in-process KV stores today. Due to how the query engine works the indexes need to reside as close a possible to the node, but there's a certainly a spectrum to be explored beyond in-process-RocksDB-on-local-disk.
yea, that option would be great, there are use cases where the goal is not to be super performant but to get the temporal benefits by reusing all existing infra. so if there is any place that collects +1
s.. )
It's slightly more specific, but feel free to thumb-up this one: https://github.com/juxt/crux/issues/617
not to have this question buried under the thread above (could be a new thread):
user
|| ;; http
LB ;; load balancer
/ | \ ;; http / tcp
| | | ;; app nodes with rocksdb index dirs
\ | / ;; jdbc
DB ;; postgres DB
how can a user have consistent history view in case rocksdb indices live on different nodes?
i.e. load balancer will only look at one index at a timeif you can tolerate sticky sessions that will help, but otherwise - on the assumption that you always have a tx-time in hand - you will need to retry your requests until you hit a node which has caught up to that tx-time. The general theme of managing node membership of cluster (including "ready" states) is definitely something we are working on over the coming months
This is the open issue, and discussion, more or less: https://github.com/juxt/crux/issues/527
from the issue:
> wait for that node to become up-to-date
what makes a rocksdb index (files) on node A
catch up with an index on node B
?
all nodes automatically consume the single stream of tx events (+ docs) stored in jdbc as quickly as they can, entirely deterministically
user => event => load balancer => (either 1, 2 or 3) node => DB
in the topology above writes don’t go to all 3 nodes, they are load balanced since crux nodes are embedded into apps that accept writes / reads
would that be something crux just does not yet support? i.e. writes can’t be load balanced in the above topology?or maybe (most likely) I misunderstood your point about “the single stream of tx events (+ docs) stored in jdbc”
did you mean that all embedded crux nodes would still see all the evens since they listen to “jdbc db”?
in this case I don’t see the reads working:
• I updated an entity :foo/bar/1
via node A
: crux/put
• I see that node B
immediately “saw” it: “`[crux-polling-tx-consumer] DEBUG crux.tx - Indexing tx-id: 34`"
• I read it with “(crux/entity db :foo/bar/1
)” on node A
: it return s it
• I read it with “(crux/entity db :foo/bar/1
)” on node B
: it does not
are all nodes sharing the same postgres instance and database name configured under :crux.jdbc/dbname
? if so, then they will all be writing to the same place, which I'm pretty sure is what you want
> did you mean that all embedded crux nodes would still see all the evens since they listen to “jdbc db”?
yes, this exactly
> I read it with “(crux/entity db `:foo/bar/1`)” on node `B` : it does not
Are you awaiting the tx-time before running the query? Also is this insert via basic put op, or are you using a transaction function?
first of all, I really appreciate you steady responses, just want to mention that
secondly I did some digging, and I do indeed see data show up in index files of node B
when an entity is updated in A
:
..
crux.db/iddfoo..bar
action-typed�}�T���s��Y^�
[email protected]�[g�H�J����(�F�a[7��|TTvc�bP�p����<���?���������i�P��c��
yes, they all use the same DB. I think I am really close.
I am not awaiting the tx-time, and yes this is a basic put op
the strange thing is even if I wipe all the index files and restart node B
to replay the tx logs, I see it replays all the txs (the right number), but it still is unable to see :foo/bar/1
should I not use a basic put?awesome, I like to be helpful 🙂
> I am not awaiting the tx-time
This is the first thing I would suggest looking at. Are you running the (crux/entity ...)
on node B via the REPL as an isolated step? (i.e. sufficient time has passed since the transaction submitted from node A)
> should I not use a basic put? Using a put should be fine. We solved an issue with transaction functions yesterday that I thought could be related but luckily not.
> sufficient time has passed since the transaction submitted from node A yes, running it from REPL
what I don’t seem to understand is I do see the data in the index file 000006.log
of node B
it gets there in milliseconds after the entity is (crux/put ..)
on node A
but I can’t read it back with either (crux/entity)
or (crux/q)
I am able to read other :foo/bar/x
on B
that was “put” on B
Can you then read the :foo/bar/x
that was put on B from A? Or is this problem happening in both directions?
I feel like I am missing something simple, since I just started with crux today, it most likely is a some kind of rookie mistake
okay, hmm, so there was something jdbc related that came up during our debugging yesterday that might be relevant, though I'm fuzzy on the details. I'm going to try and reproduce this tomorrow. You're using 1.10.1 yeah? which postgres version in particular? If you are keen to see something working in the meantime you might have better luck with Kafka (running crux-kafka-embedded or Confluent Platform locally). Certainly what you are attempting should be working.

> Certainly what you are attempting should be working that’s great, that means we’ll solve it
PostgreSQL 11.8 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-39), 64-bit
[juxt/crux-jdbc "20.08-1.10.1-beta" :exclusions [seancorfield/next.jdbc
com.zaxxer/HikariCP
org.clojure/java.data]]
[juxt/crux-rocksdb "20.08-1.10.1-beta"]
[juxt/crux-core "20.08-1.10.1-beta"]
[hikari-cp "2.13.0"]
[seancorfield/next.jdbc "1.1.582"]
I got a bit further with this. was able to have A
, B
and C
nodes follow the DB updates
I login to their REPLs and see the are all able to see value regardless of who writes into the DB
that started to work only after I mounted crux index dir outside their docker images
which I was planning to do anyway, but I don’t see how it should matter for them to be in sync
however the problem still remains in development:
• I connect a local app with crux as a lib to the same DB
• and keep index files in /opt/crux/..
on the same machine
• if I do updates through this local app all A
, B
and C
(remote) nodes see the updates ✅
• if I do the update through one of the remote nodes, they all see the update, but the local app does not ❌
in the same way as it was with remote nodes yesterday:
it immediately updates the index [crux-polling-tx-consumer] DEBUG crux.tx - Indexing tx-id: 382
but update is not there when queried with entity/q/entity-history/etc.
even if I stop everything, remove index files, restart, it replays all the indices (according to the logs), but it does not see the update
user
|| ;; http
LB ;; load balancer
/ | \ ;; http / tcp
| | | ;; app nodes with rocksdb index dirs
\ | / ;; jdbc
DB ;; postgres DB
| ;; jdbc
REPL ;; app with crux lib pointed to the same DB
one interesting thing I tried is to copy all index files from the remote node to the local one, and it does see the update
so I think the problem is in the way the rocksdb index gets updated from store (postgres): i.e. I know it does get updated, but I think it does not get updated correctly / or maybe does get updated differently in case of a not “write through cache” and does not get correctly read after that..thanks @U0541KMAZ - we'd certainly like to get a repro of this one, if possible - if they're not sensitive, would you mind sharing the code you used to submit the transactions and/or the indexes themselves? I reckon this one's worthy of a Github issue now, even if it turns out to not be a bug 🙂
if they do contain data that you wouldn't want to share on a public forum, feel free to post them to @U899JBRPF or myself via DM, or <mailto:[email protected]|[email protected]>
@U050V1N74 sure. I won’t be able to share the “full” code, since it is client’s, but it really is just:
(cx/submit-tx db [[:crux.tx/put (assoc some-edn :crux.db/id
(make-id ...))]])
on the write side
and
(crux/entity (crux/db cx/db) id)
;; or
(crux/entity-history (crux/db cx/db) id :asc {:with-docs? true}
on the read sideI will send you indices later today: • one set of files from remote hosts • another set of files from the local machine a couple of other things I tried: I removed rocksdb from the picture, so it always replays incides on start: the problem still remains I know that LMAX stuff is finicky (agrona), so I tried java 8, 11, 13 | orcale / open jdk: the problem still remains another thing to notice is “local”, in the case above, is “OS/X Catalina 10.15.2", remote is Cent OS
I finally got around to trying to reproduce this using Heroku's free tier postgres service (which is actually pretty nifty, as I've discovered - dead simply to try without needing a credit card!) but had no luck observing these behaviours...everything seems to be working fine for me 😞
hmm, okay so that's one potentially big difference: we're not doing any meaningful testing with OS/X currently. Do you have a VM available to check the behaviour on Linux? I had 3 Rocks nodes communicating through a single Heroku Postgres instance
can you connect to the same heroku postgres instance from OS/X (you laptop/maybe?) with crux-jdbc
topology and see whether the indices are updated properly for you in OS/X when you make changes on heroku?