This page is not created by, affiliated with, or supported by Slack Technologies, Inc.
2021-06-21
Channels
- # announcements (5)
- # babashka (81)
- # beginners (26)
- # calva (6)
- # cider (7)
- # clojure (26)
- # clojure-czech (1)
- # clojure-europe (19)
- # clojure-nl (4)
- # clojure-spec (5)
- # clojure-uk (21)
- # clojuredesign-podcast (2)
- # clojurescript (19)
- # conjure (6)
- # cursive (13)
- # datomic (2)
- # depstar (1)
- # editors (2)
- # graalvm (25)
- # honeysql (5)
- # jackdaw (4)
- # jobs (5)
- # lsp (8)
- # malli (13)
- # music (1)
- # polylith (3)
- # practicalli (1)
- # releases (1)
- # remote-jobs (2)
- # sci (10)
- # shadow-cljs (5)
- # sql (14)
- # tools-deps (25)
- # xtdb (65)
Happy Monday #crux , I was wondering how to run the benchmarks from https://github.com/juxt/crux/tree/master/crux-bench locally. It assumes some Kafka setup if I run bin/run-bench.sh
(which I have running) but still getting a Timed out waiting for a node assignment.
message. The reason I’d like to run them is that I’m experimenting with some changes to Clojure libraries and I wanted to see how they impact real world projects. So not urgent, but Crux was a good use case. Thanks!
Hey @U054W022G - thank you & Happy Monday likewise ☺️ Is your Kafka port open on 9092? I'm looking at https://github.com/juxt/crux/blob/master/crux-bench/cloudformation.yaml#L181-L196
Hey Jeremy, hope all well. Good hint, as I can telnet zookeeper but apparently not kafka on 9092, so trying to fix that and I’ll get back. Running kafka/zk dockerized with https://github.com/wurstmeister/kafka-docker/blob/master/README.md
Hey @U899JBRPF I’ve been able to get past the Kafka connection problem. It is now asking for AWS credentials. Do the benchmarks require an AWS account? I was hoping they would skip the step https://github.com/juxt/crux/blob/master/crux-bench/README.md#setting-up-aws-credentials But if necessary, what sort of tasks is the benchmark performing on AWS? That is to prepare for potential costs.
Cool! So you shouldn't need AWS. Are you happy running the benchmarks on your own hardware?
Right thanks for pinging. So after a few attempts with lein run -m crux.bench.main
I tried ./bin/run-bench.sh
which I guess is AWS dependent. So back to using lein run
. I can see the following:
λ: nc -z 192.168.99.100 9092
Connection to 192.168.99.100 port 9092 [tcp] succeeded!
λ: echo dump | nc 192.168.99.100 2181 | grep brokers
/brokers/ids/1001
λ: lein run -m crux.bench.main
Would post to Slack:
*Starting Benchmark*, Commit Hash: null
Syntax error (TimeoutException) compiling at (/private/var/folders/km/lcsz0x0j4kg2_4h36m7bvjcm0000gn/T/form-init9219056612490469057.clj:1:125).
Timed out waiting for a node assignment.
Full report at:
/var/folders/km/lcsz0x0j4kg2_4h36m7bvjcm0000gn/T/clojure-6052950630285481569.edn
Which unfortunately is back at the (likely) Kafka connectivity problem. What I solved is that the connectivity with Kafka/ZK can now be established from the command line, but apparently not from lein run
just noticed several localhost:9092
hardcoded, so going to change those to 192.168.99.100:9092
and see what happens
Progress! Now I’m getting a bunch of CloudWatchException: The security token included in the request is invalid.
but waiting to see if I can get to some relevant output
Is there a benchmark in particular that you're hoping to run? Or are you aiming for the full suite?
Whenever I run benchmarks locally I just do it via the REPL, and sidestep a lot of the orchestration code in crux-bench
I see… I guess the most interesting tests for me would be around crux-core
query engine. So I can see just the following after many pages of CloudWatch exceptions:
{"av-count":14005000,"time-taken-ms":409141,"bench-ns":"ts-devices","crux-commit":null,"bench-type":"ingest","bytes-indexed":2986412207,"doc-count":1001000,"crux-node-type":"kafka-rocksdb","success?":true}
{"bytes-on-disk":388133899,"compacted-bytes-on-disk":309455640,"time-taken-ms":19729,"crux-node-type":"kafka-rocksdb","bench-ns":"ts-devices","crux-commit":null,"bench-type":"compaction"}
{"success?":true,"time-taken-ms":426,"crux-node-type":"kafka-rocksdb","bench-ns":"ts-devices","crux-commit":null,"bench-type":"recent-battery-readings"}
{"success?":true,"time-taken-ms":186,"crux-node-type":"kafka-rocksdb","bench-ns":"ts-devices","crux-commit":null,"bench-type":"busiest-devices"}
{"success?":true,"time-taken-ms":35193,"crux-node-type":"kafka-rocksdb","bench-ns":"ts-devices","crux-commit":null,"bench-type":"min-max-battery-level-per-hour"}
Would that be a full output?Commented out https://github.com/juxt/crux/blob/master/crux-bench/src/crux/bench.clj#L211 and retrying
I think you would need to comment out the cw/reporter config lines in the various start-node config maps
This is an example of how I would run each ns one-by-one (i.e. just ignore everything in the main bench.clj ns) https://github.com/juxt/crux/blob/11fd82577223ac35c9666b74cee8aca2d39a9262/crux-bench/src/crux/bench/tpch_stress_test.clj#L57
In the meanwhile, I ran the bench commenting the reporter and without exceptions! However, it sounds like tests the one your pointing at above are not part of the suite. Good to know, I’ll probably scout the namespaces to search what would be good to run
ah, that user/node
is a legacy reference, you would now use dev/crux-node
https://github.com/juxt/crux/blob/11fd82577223ac35c9666b74cee8aca2d39a9262/dev/dev.clj#L93 which you get running when starting the repl and doing (dev)
then (go)
> it sounds like tests the one your pointing at above are not part of the suite This is a good point...we don't run all of the benchmarks in the nightly runs, since the ones we do run normally give such excellent coverage that more data would just be more noise 🙂
As a bit of an aside, I also found these generative tests very helpful when spiking a crux-redis KV module: https://github.com/juxt/crux/blob/6d602bb5b6caed199f10fd8c3711cb034d49248a/crux-test/test/crux/kv_test.clj#L235-L342 ...but they don't live in crux-bench 🙂 I can't think of other generative tests like this though!
I’m working on a replacement for clojure.set
and evaluating impact on real-world projects having a dependency on set operations. Crux seems to depend on it for querying (no idea to what degree, compile-time or runtime, etc). My thinking is that perhaps I’m lucky and I can run a before/after benchmarks to show some improvement. This worked for Datascript (for instance) and I’ll be presenting the results at the next Clojurians meetup.
Crux definitely uses clojure.set
operations during query compilation, for aggregates, and for generally returning the results (as per https://github.com/juxt/crux/blob/master/crux-core/src/crux/query.clj) - but I can't see that it's on the "hot path" for the runtime query execution /cc @U050V1N74
yeah, I don't think it's on the hot path, but it'd no doubt be an improvement at compile-time regardless - it's not unlikely that compile-time dominates for cold runs of low-latency queries. nice one @U054W022G 👏
Thanks @U050V1N74 for having a look. I see other sub-modules depending on clojure.set
do you think there could be other interesting hot-paths to be aware of? I’m sure that if you’re doing serious perf work you are not going to forget a set/*
call in your path :)
anyway, will see shortly if the change has an impact (assuming crux.bench.tpch-stress-test
is a good measure for that)
we do have other benchmarks which are more tailored to ingest, but I can't find any references to clojure.set in ingest
No problem, thanks for helping. How do I read the results from the benchmark? I got a query time out but the rest seems to be fine, with {"success?":true,"av-count":6258483,"bytes-indexed":1262752421,"doc-count":432844,"time-taken-ms":259212,"bench-ns":"tpch-stress","bench-type":"ingest"}
as a result
:success? true
is checking against the published TPC-H results, guessing you've spotted time-taken-ms
:bench-type :ingest
- we split the benchmarks out into :ingest
and :queries
- there should be another entry for the latter
:bench-ns
is a wider category - that's for the different benchmarks. e.g. ts-devices
is a different benchmark
the counts are more for space benchmarks - we added those when we were going through a period of bashing at the disk space usage
also good for smoke tests, can see if a change has resulted in less data (especially if it shouldn't have!)
ok thanks for the break down. Yeah I wasn’t sure if to expect a single elapsed for each query. There is a :query-stress
but it timed out
[{:success? true, :av-count 6258483, :bytes-indexed 1262752421, :doc-count 432844, :time-taken-ms 259212, :bench-ns :tpch-stress, :bench-type :ingest} {:error "java.util.concurrent.ExecutionException: java.util.concurrent.TimeoutException: Query timed out.", :time-taken-ms 1006201, :bench-ns :tpch-stress, :bench-type :query-stress}]
you might find it easier to base yourself on the crux.bench.tpch
namespace - there's a run-tpch
function in there that takes a node and scale-factor
for mid-sized (i.e. more than microbenches but not long runs) we use SF0.01 - that's the one that TPC-H provide expected results for, too
if you don't want any of the bench harness, you pretty much only need tpch/load-docs!
(only have to do that once for a node if you're benching query times) and run-tpch-queries
This is what I get, am I doing it correctly?
(require '[crux.bench.tpch :as tp])
(require '[crux.fixtures.tpch :as tpch])
(let [node (dev/crux-node) scale-factor 0.01]
(tpch/load-docs! node scale-factor tpch/tpch-entity->pkey-doc)
(tp/run-tpch-queries node {:scale-factor scale-factor}))
Transacting TPC-H tables...
Transacted 1500 customer
Transacted 15000 orders
Transacted 60175 lineitem
Transacted 2000 part
Transacted 8000 partsupp
Transacted 100 supplier
Transacted 25 nation
Transacted 5 region
FAIL in () (tpch.clj:678)
expected: (<= diff epsilon)
actual: (not (<= 22500.0 0.01))
FAIL in () (tpch.clj:678)
expected: (<= diff epsilon)
actual: (not (<= 417.0 0.01))
FAIL in () (tpch.clj:678)
expected: (<= diff epsilon)
actual: (not (<= 3090671.039999999 0.01))
false
think so, that’s straight after opening up a repl and can see only one set of “Transacting” messages
I’m going to take it as a positive :)
;; before "Elapsed time: 339841.871333 msecs"
;; after "Elapsed time: 291307.660398 msecs"
perhaps not that big, but the lib is a drop in replacement, just a require
away, so perhaps it has some sense for people dealing with sets. Thanks for your support today, very appreciated!
will it be becoming available on Maven soon? if so, happy to include it in our CI and overnight bench runs
nice work! Just to confirm though, are those before & after runs just for end-to-end query times? And are they completely distinct (e.g. full node shutdown and restart)? Or is it possible that there may be some effect from warm caches?
Here’s the repro case. Open repo top level and:
(dev)
(go)
(require '[crux.bench.tpch :as tp])
(require '[crux.fixtures.tpch :as tpch])
(let [node (dev/crux-node) scale-factor 0.01]
(tpch/load-docs! node scale-factor tpch/tpch-entity->pkey-doc)
(time (tp/run-tpch-queries node {:scale-factor scale-factor}))
(time (tp/run-tpch-queries node {:scale-factor scale-factor})))
Taking the time twice just in case warming up makes a difference, but didn’t see any. Then kill repl, replace all require [clojure.set :as set]
with require [tech.droit.fset :as set]
and try again.Would the content-hash replicate, similar to https://github.com/replikativ/hasch ?
I have, for example, a file with 300,000 records. I’ll process it for any new, deletes, or updates. It would be useful to use hashing extracted from crux, or to use crux itself. Ideally my coworker get the same hash on their system.
It's an internal function, but you could try re-using the hashing from crux.codec/new-id
?
Crux currently uses 20-byte SHA-1 hashes, which are regarded as purely internal to the Crux system, and so they aren't being depended on for any security properties (unlike the reasoning behind hasch
's 32-byte SHA-512 hashes). The various SHA-1 implementations should definitely be system-independent though, see here for details https://github.com/juxt/crux/blob/master/crux-core/src/crux/hash.clj