This page is not created by, affiliated with, or supported by Slack Technologies, Inc.
2018-01-03
Channels
- # aleph (1)
- # beginners (99)
- # boot (16)
- # cider (35)
- # cljs-dev (46)
- # cljsrn (7)
- # clojure (152)
- # clojure-austin (7)
- # clojure-dusseldorf (8)
- # clojure-italy (1)
- # clojure-uk (7)
- # clojurescript (3)
- # core-async (12)
- # css (8)
- # cursive (18)
- # datascript (2)
- # datomic (19)
- # defnpodcast (6)
- # duct (3)
- # editors (8)
- # emacs (8)
- # figwheel (1)
- # fulcro (20)
- # hoplon (18)
- # jobs-discuss (5)
- # lein-figwheel (1)
- # luminus (3)
- # lumo (19)
- # off-topic (15)
- # onyx (9)
- # parinfer (2)
- # planck (6)
- # portland-or (7)
- # re-frame (4)
- # reagent (7)
- # remote-jobs (1)
- # ring (6)
- # ring-swagger (4)
- # spacemacs (10)
- # specter (3)
- # unrepl (131)
I am doing a very large datomic import (over 14 million rows). I believe I’m doing all the ‘best practice’ things: 1) batch transactions 2) pipeline 3) initial schema w/o indexes 4) set transactor for import. I’m noticing however, that the transactions come in relatively quickly at the start (~ 4 sec for 100 transactions) and then gradually degrade as the import proceeds (now over 1 min for 100 transactions after 23,000 transactions. What are some other things to consider that might cause this? Transactor is running locally as dev and I’m using client api to a peer server.
@donmullen may i suggest using the peer library instead? what storage backend are you using? what are the threshold values for your transactor? higher = bigger indexing jobs
@robert-stuttaford I’m running locally against a dev disk storage. Would local-ddb be more efficient? Transactor settings memory-index-threshold=32m / memory-index-max=512m / object-cache-max=64m // running -Xms1g -Xmx1g -XX:+UseG1GC -XX:MaxGCPauseMillis=50
I seem to be running into memory / GC issues. Likely some rookie clojure dev mistake in the code somewhere (only recently ramping back up in clojure).
@donmullen - I'm pretty sure you're holding on to the head of your data sequence. As more of the sequence gets realized, the GC can't free up any of the first elements because you still have a reference to the head (your data
binding in your code). This can take some work to get right, and it's also a bit frustrating to debug because the feedback cycles are long. But the good news is that it's a fairly common problem in Clojure when dealing with large datasets, so there should be some good resources out there to learn more about it. Let me know if this doesn't make sense or if you're not sure where to go from here.
@jeff.terrell Thanks - was reaching the same conclusion.
@donmullen degradation on import is normal as the amount of data reaches indexing thresholds; indexing (done by the transactor) slows down the import process because it consumes cpu and io.
you can check your logs for backpressure alarms. that would at least tell you that the slowdown is because the transactor is applying backpressure