Fork me on GitHub
#xtdb
<
2020-05-13
>
Eric Ihli13:05:45

I'm still troubleshooting the problem I was having yesterday about queries not finding things I put into the database. I added a printline in each crux/api.clj PCruxDatasource/q and crux/standalone.clj StandaloneDocumentStore/submit-docs. crux/submit-tx prints that it's using a CachedObjectStore that is using a KvObjectStore with a CachedSnapshotKvStore with crux.kv.memdb.MemKv. crux/q prints that it's using a CachedObjectStore that is using a KvObjectStore with a CachedSnapshotKvStore with crux.kv.rocksdb.RocksKv. Is that the problem, and what would cause those to use different stores?

refset13:05:26

Hmm, that doesn't sound right but I'll check with the team. Are you using 1.8.3 for both -core and -rocksdb?

dvingo13:05:51

Did you not update your node config to the one I mentioned?

dvingo13:05:32

I'm fairly certain this is your issue.

refset14:05:23

Ah, yes, good spot, it looks like :crux.standalone/event-log-kv-store is meant to be :crux.standalone/event-log-kv

Eric Ihli14:05:34

https://opencrux.com/docs#config-standalone Are you referring to the first row in table 5 there?

Eric Ihli14:05:39

I just renamed the directory that I was using for the event-log-dir and the db-dir so that it would be recreated. Now things work. I assume one of those data directories was in a bad state.

refset14:05:21

Sorry, I have confused myself, :crux.standalone/event-log-kv-store looks correct 🙂

dvingo14:05:58

as a reference

refset14:05:22

Thanks, these are indeed good places to look (and looking at it now the bench examples are worthy of a link in the docs!).

refset14:05:45

I'm relieved to hear you have it working now @UJP37GW2K 🙂 but it if happens again and you can't work around it without deleting the event-log-dir then it sounds like there may a bug when changing between topology configurations (some of which are may be invalid) You should always be able to keep the event-log-dir and rm the db-dir as often as you like.

Eric Ihli14:05:38

I still have the contents of the data directory and I'm able to reproduce the issue by swapping between a new/fresh directory and the old one.

Eric Ihli14:05:11

Is it worth tarring up the directory and opening an issue on GH?

Eric Ihli14:05:52

Nothing obvious jumped out as the cause in my ls/`cat`ing of the files in there.

refset14:05:43

Sure, I'd certainly give it a look if you're willing

refset16:05:06

@UJP37GW2K wonderful, thanks! 🙂

rschmukler13:05:13

Hey all! I was wondering if anybody had done any benchmarking w/ Crux or just had a general sense of how many datoms / documents it could hold before performance becomes a consideration. Current write load is around 20 thousand documents / day, each w/ about 50 attributes. Also considering using it for more "real-time" data which would be about 3,000 records/minutes, ~5 attrs each

refset13:05:07

Hey 🙂 we're running nightly benches against various topologies, for our own regression testing and comparative benchmarking. This is a small part of our report from last night's run of the WatDiv test suite:

Crux Bench Logs
APP :watdiv-crux (embedded-kafka-rocksdb)
========
ingest (PT5M13.03S, +16.83%. 7D Min: PT3M59.288S, 7D Max: PT5M25.777S): {:av-count 2823472, :bytes-indexed 1936552400, :doc-count 521585, :entity-count 521585}
docs-per-second: 1666
avs-per-second: 9019
bytes-indexed-per-second: 6186475

refset13:05:54

I believe WatDiv documents typically have 10+ attributes each

refset13:05:07

The upper limits to this scaling are heavily dependent on Rocks, as it is doing all of the heavy lifting, but in general we've been really impressed with Rocks' ability to maintain a consistent throughput (both reads and writes) even whilst compaction is happening

refset16:05:37

@UEC8W94AE hey, just making you see these responses. Let me know if there are any other stats that might help

rschmukler17:05:25

@U899JBRPF thank you a ton for these! Super helpful. Does read performance degrade based on number of writes to the attribute over time? ie. is the materialized view computed from a EATV index? Is it cached? Otherwise does it degrades linearly with the an increase of writes over time T?

refset18:05:28

Glad that helped :) as to your questions: no, no, no and no

refset18:05:44

Crux is designed for fast ad-hoc point-in-time queries regardless of the amount of history (valid time or tx time). This is possible because of carefully crafted indexes, including a special z-order curve temporal index

refset19:05:37

The most authoritative place to observe the internal index structure is at the top of codec.clj

rschmukler19:05:43

Awesome, will take a look! Thanks again for the knowledgeable replies, super helpful 🙂

🙏 4
rschmukler13:05:25

I realize it'll likely depend a lot on backend store choice, but also, for the sake of discussion, let's say I choose the most applicable backend store

Eric Ihli21:05:48

I'm trying to view a list of all transactions. I'm using open-tx-log to get the cursor, but I can't figure out how to view the transactions. Any advice? Thanks.

(let [cursor (crux/open-tx-log adb/conn 0 true)
        coll (take 20 cursor)]
    (.close cursor)
    coll)
;; => ([:close-fn #function[crux.node.CruxNode/fn--50885]]
;;     [:lazy-seq-iterator
;;      #object[clojure.lang.SeqIterator 0x59fca46 "[email protected]"]])  

refset21:05:29

Hey again, you will need to use interator-seq before you can use take - for examples, see: https://github.com/juxt/crux/blob/2076b528a0a790b5f063ed3639de4775842a7287/crux-test/test/crux/api_test.clj#L264

🤙 4
Eric Ihli21:05:11

Thanks again!

🙏 4