Fork me on GitHub
#xtdb
<
2020-06-26
>
euccastro01:06:11

in https://opencrux.com/docs#config-backup I read "Crux provides utility APIs for local backup and restore when you are using the standalone mode." Where can I find docs for those APIs?

euccastro01:06:44

i.e., how do I back up / restore a crux instance that is only backed by rocksdb or lmdb?

euccastro01:06:57

searching for how to restore such a backup... (maybe just pass its path to crux/start-node? I'll give that a try...

euccastro02:06:46

I tried it for the lmdb backend and it works. Nice!

euccastro02:06:11

lmk if you'd rather me delete this monologue šŸ™‚

euccastro02:06:32

yes, that seems better and more general than fiddling with the underlying stores directly

dvingo02:06:20

for rocksdb at least (not sure about lmbd) you can copy the data directories via the filesystem as long as the node is not running

euccastro03:06:38

thanks! I don't want to have to stop the node in order to back it up though

euccastro03:06:59

crux.backup/backup and crux.backup/restore are what I'm after really. the latter is broken. I could work around it by copying :crux.kv/db-dir to :db-dir and ignoring its final log (it wants to copy some :event-log-dir which doesn't exist for standalone K/V stores), but since all it's doing is copying the backup over to :db-dir I'll just do that myself

jarohen07:06:40

I'm sorry to say this local node backup and restore module has decayed over time. As part of our ongoing beta, we are looking into a more cluster-aware snapshot solution (predominantly for ease of bringing up new nodes rather than backup/restore) - will obviously announce any progress on here. If you do want to back up local nodes, it's most important to back up the golden store of the tx-log and the document store (as it is with non-local nodes, in fact) - the nodes themselves can be deleted and rebuilt so long as these golden stores are intact. If you are using the standalone topology, this means setting :crux.standalone/event-log-dir as part of your options, and then backing that up offline. That said, I'd recommend using any one of the other golden stores (Kafka or JDBC) for production usage, rather than standalone - they already have well-established backup/restore processes (not to mention scalability, availability, monitoring, etc).

jarohen07:06:58

Admittedly, we originally only envisaged standalone to be for dev-only usage - although we are aware that it is now being deployed into production in places šŸ™‚

euccastro08:06:46

thanks for the detailed explanation!

šŸ™ 3
euccastro03:06:59

crux.backup/backup and crux.backup/restore are what I'm after really. the latter is broken. I could work around it by copying :crux.kv/db-dir to :db-dir and ignoring its final log (it wants to copy some :event-log-dir which doesn't exist for standalone K/V stores), but since all it's doing is copying the backup over to :db-dir I'll just do that myself

euccastro03:06:33

I guess for LMDB it is safe to just use mdb_copy on the :crux.db/db-dir ? according to LMDB docs, that's safe even if the DB is in use

jarohen07:06:05

indeed - if memory serves, that's how crux.kv/backup is implemented for LMDB

jarohen07:06:40

I'm sorry to say this local node backup and restore module has decayed over time. As part of our ongoing beta, we are looking into a more cluster-aware snapshot solution (predominantly for ease of bringing up new nodes rather than backup/restore) - will obviously announce any progress on here. If you do want to back up local nodes, it's most important to back up the golden store of the tx-log and the document store (as it is with non-local nodes, in fact) - the nodes themselves can be deleted and rebuilt so long as these golden stores are intact. If you are using the standalone topology, this means setting :crux.standalone/event-log-dir as part of your options, and then backing that up offline. That said, I'd recommend using any one of the other golden stores (Kafka or JDBC) for production usage, rather than standalone - they already have well-established backup/restore processes (not to mention scalability, availability, monitoring, etc).

jarohen07:06:58

Admittedly, we originally only envisaged standalone to be for dev-only usage - although we are aware that it is now being deployed into production in places šŸ™‚

ordnungswidrig09:06:51

> bringing up new nodes rather than backup/restore @jarohen would this allow ā€œfast seedingā€ of nodes running in lambdas?

jarohen09:06:33

whether it'd be fast enough to bring up a node within a lambda timeout, I don't know, you'd still have some transactions to replay after the checkpoint - but certainly for bringing up nodes faster within an auto-scaling group, say

jarohen09:06:21

although I suppose you do have 15 mins these days, it might well be viable

refset10:06:23

15 minutes?! That's quite a big difference since the last time I looked. I think you could probably get something work well with sub-1GB data sets very handsomely, and perhaps a little bigger than that too depending on S3->Lamdba transfer speeds

jarohen10:06:00

would have to consider how/whether you can keep it warm somehow, I doubt it'd be financially viable if it needed to boot up every time

ordnungswidrig10:06:46

I wondered if one could decompose crux in a way that there is a single lambda processing the tx and updating a share kv store which is used by the lambda. But I think one major blocker is that the kv store must support read isolation (snapshots).

refset10:06:45

@U054UD60U we've not done any experimentation ourselves yet, but it's worth looking at https://github.com/rockset/rocksdb-cloud which has exactly the properties needed. In theory it's a drop-in replacement for normal RocksDB

ordnungswidrig10:06:40

Thatā€™s promising

refset10:06:53

Yeah, the Rockset team has a lot of the ex-Facebookers that have been working on RocksDB for many years. They've spent time decomposing RocksDB itself so that you have compaction workers etc. there are some good videos / blogs around with more info, e.g. https://rockset.com/blog/remote-compactions-in-rocksdb-cloud/

ordnungswidrig10:06:01

I wonder if there are java bindings for rocksdb-cloud though. The repo is confusing

refset10:06:07

I would start by copying everything (including Java bindings) in crux-rocksdb to create crux-rocksdb-cloud and then figure out how to swap-out the binary. Just guessing though šŸ˜…

ordnungswidrig10:06:54

hehe, if I only had time.

ordnungswidrig10:06:30

Maybe Iā€™ll try https://github.com/csm/konserve-ddb-s3 it was used by another crux-on-aws spike IIRC. In any case I think the single-tx-processing lambda would be key. No gain in having multiple lambdas do the processing concurrently.

šŸ‘ 3
refset16:06:34

Concurrent tx submission is good for a) high-availability and b) separation of concerns for deploying multiple apps operating against a single instance Also checkout the CruxIngest API which is purely for handling submissions

refset16:06:05

Which tx-log are you looking at using?

ordnungswidrig18:06:09

I've created a dynamodb tx-store.

ordnungswidrig18:06:46

You can submit concurrently to that log but I wonder if you would need multiple lambdas to process the tx into one shared kv store.