Fork me on GitHub
#datomic
<
2018-01-03
>
donmullen13:01:23

I am doing a very large datomic import (over 14 million rows). I believe I’m doing all the ‘best practice’ things: 1) batch transactions 2) pipeline 3) initial schema w/o indexes 4) set transactor for import. I’m noticing however, that the transactions come in relatively quickly at the start (~ 4 sec for 100 transactions) and then gradually degrade as the import proceeds (now over 1 min for 100 transactions after 23,000 transactions. What are some other things to consider that might cause this? Transactor is running locally as dev and I’m using client api to a peer server.

robert-stuttaford14:01:25

@donmullen may i suggest using the peer library instead? what storage backend are you using? what are the threshold values for your transactor? higher = bigger indexing jobs

donmullen14:01:45

@robert-stuttaford I’m running locally against a dev disk storage. Would local-ddb be more efficient? Transactor settings memory-index-threshold=32m / memory-index-max=512m / object-cache-max=64m // running -Xms1g -Xmx1g -XX:+UseG1GC -XX:MaxGCPauseMillis=50

donmullen14:01:57

I will try switching to peer library.

donmullen14:01:15

I seem to be running into memory / GC issues. Likely some rookie clojure dev mistake in the code somewhere (only recently ramping back up in clojure).

souenzzo14:01:12

datomic:mem will store on ram. 1Gb is enough for data + index's + datomic code?

jeff.terrell14:01:16

@donmullen - I'm pretty sure you're holding on to the head of your data sequence. As more of the sequence gets realized, the GC can't free up any of the first elements because you still have a reference to the head (your data binding in your code). This can take some work to get right, and it's also a bit frustrating to debug because the feedback cycles are long. But the good news is that it's a fairly common problem in Clojure when dealing with large datasets, so there should be some good resources out there to learn more about it. Let me know if this doesn't make sense or if you're not sure where to go from here.

donmullen14:01:35

@jeff.terrell Thanks - was reaching the same conclusion.

favila14:01:25

@donmullen degradation on import is normal as the amount of data reaches indexing thresholds; indexing (done by the transactor) slows down the import process because it consumes cpu and io.

donmullen14:01:05

@favila Is there indexing happening even if no attributes have :db/index true?

favila14:01:15

yes, there are still :eavt :aevt and :vaet (for refs) indexes

favila14:01:35

:db/index true only controls the presence of :avet for a given attr

favila14:01:30

you can check your logs for backpressure alarms. that would at least tell you that the slowdown is because the transactor is applying backpressure

favila14:01:20

(transactor logs)

donmullen14:01:11

@favila - thanks - not seeing alarms - so I think the issue likely GC/mem in my code.

favila14:01:25

I don't see any obvious head-holding in your code

favila14:01:51

normal process monitoring should tell you which process is consuming cpu

favila14:01:07

something like jstat or jvisualvm can tell you what is happening in each process