This page is not created by, affiliated with, or supported by Slack Technologies, Inc.
2016-08-25
Channels
- # admin-announcements (3)
- # alda (2)
- # beginners (10)
- # boot (44)
- # cider (31)
- # cljs-dev (5)
- # cljsjs (2)
- # cljsrn (17)
- # clojure (181)
- # clojure-austin (2)
- # clojure-brasil (18)
- # clojure-canada (1)
- # clojure-conj (5)
- # clojure-dev (11)
- # clojure-gamedev (30)
- # clojure-russia (380)
- # clojure-spec (50)
- # clojure-uk (35)
- # clojurescript (146)
- # clojutre (1)
- # component (1)
- # cursive (62)
- # datomic (27)
- # dirac (7)
- # editors (23)
- # emacs (7)
- # events (34)
- # funcool (22)
- # hoplon (134)
- # jobs (22)
- # jobs-rus (7)
- # juxt (1)
- # kekkonen (1)
- # lein-figwheel (54)
- # leiningen (7)
- # luminus (2)
- # off-topic (5)
- # om (4)
- # onyx (27)
- # proton (5)
- # protorepl (2)
- # re-frame (16)
- # reagent (29)
- # rethinkdb (2)
- # schema (1)
- # untangled (61)
- # yada (9)
@jasonbell Love your blog btw! I’m a data science geek too, so your posts are A+ in my book
@erichmond Thank you, that’s most kind.
I have a simple catalog thats import data from a ms-sql base into datomic. After the job has been running a while, the throughput drops to 0,0 according to the metrics log, (metrics on all tasks :all). Metrics keep getting written to the log file At the same time: I'm monitoring the datomic tx-report-queue, in a seperat repl - the transactions stop coming. I also have enabled debug log for the ms jdbcdriver, and its also stops writing new entries. In onyx-dashboard the job is still running. It generally happens after aprox 200k - 700K datomic transactions. When I kill the peer and start a new peer it continues the job, and after some time the throughput drops to zero again. The batch-latency looks fine, its not a jvm gc issue. No exceptions found in either logs (eg, onyx, jul etc) Its a ms-sql base with 42 mill records Its a single peer with 12gb heap, separate jvm for aero and an external zookeeper Catalog [[partition-keys read-rows] [read-rows prepare-datums] [prepare-datums transact-datomic]] transact-datomic uses write-bulk-tx-async-calls Onyx version : 0.9.7 Onyx/sql version : "0.9.9.1-20160816.124319-6" Do you guys have any idea how to troubleshot this further ?
Interesting. Any retries?
I guess if the throughput drops to zero then it's probably not retrying
my last run contains 101 retries on partition keys
Hmm. Each of those will amplify out to 500 rows or whatever you have configured. But you should see things pick up back again for a while when the retries happen.
No exceptions in the log?
retry_segment_rate_1s contains values between 0.99 to 2.00
No exceptions at all
What's the max-pending for the input task?
partitions-keys max-pending 1000, sql-rows-per segment 1000 and batch-size 100
Some blocking buffer could be getting stuck somewhere (which would be a bug). 1000x1000 = 1M rows in flight. Can you try reducing max-pending and rows-per-segment to help me debug it a little?
yes what size would you prefer ?
Let's just try 200 for each
around 160k datomic tx it stopped, but after a small (10-20s) time it picked up again.. Which hasn’t been observed before.
Interesting. Is it still going?
After 400-500k it stopped - but I noticed that partition keys pending msg is aprox 950. I did observe pauses in datomic report queue.
but still no exceptions in any of the logs. Could it be that something happens with the datomic connection ? (btw during the run i’m seeing info log from the datomic peer - no log that indicates a problem there.)
It's certainly possible, especially if onyx does keep retrying
Stick an onyx/fn on the output task that prints the segment and returns the segment. Then you'll have logging to see if segments are making it to the output
will do 🙂
Everyone give @vijaykiran a big round of applause for contributing a new, improved User Guide - http://www.onyxplatform.org/docs/user-guide/0.9.10-beta1/
Includes anchor linking to each section.
0.9.10 will be out next week after we merge in a few more fixes + some docs. This release includes the Peer Query Server. Each peer can optionally run an HTTP server to respond to a health check and provide a status report.