Fork me on GitHub
#datomic
<
2017-03-15
>
casperc15:03:43

I am wondering about this case I am encountering when setting txInstant in a unit test:

(let [t1 #inst "2001-01-01"
      t2 #inst "2002-01-01"
      conn (scratch-conn)
      _ (init-schemas conn)]
  (d/transact conn [{:db/id (d/tempid :db.part/tx) :db/txInstant t1} {:db/id (d/tempid :db.part/user) :bygning/id 1 :bygning/attr1 "1"}])
  (d/transact conn [{:db/id (d/tempid :db.part/tx) :db/txInstant t2} {:db/id (d/tempid :db.part/user) :bygning/id 2 :bygning/attr1 "2-1"}])
  (d/transact conn [{:db/id (d/tempid :db.part/tx) :db/txInstant t2} {:db/id (d/tempid :db.part/user) :bygning/id 1 :bygning/attr1 "2"}])
  (prn "without as-of" (d/q '[:find ?v ?tx
                             :where 
                             [_ :bygning/attr1 ?v ?tx]
                             [?tx :db/txInstant ?txInst]]
                           (d/history (d/db conn))))
  (prn "with as-of" (d/q '[:find ?v ?tx
                          :where 
                          [_ :bygning/attr1 ?v ?tx]
                          [?tx :db/txInstant ?txInst]]
                        (d/as-of (d/history (d/db conn)) t2))))
prints
"without as-of" #{["2-1" 13194139534315] ["1" 13194139534313] ["2" 13194139534317] ["1" 13194139534317]}
"with as-of" #{["2-1" 13194139534315] ["1" 13194139534313]}

casperc15:03:43

The second and third transaction are both at :db/txInstant t2, but when doing an as of, I don’t get the one on :bygning/id 1 and :bygning/attr1 “2”.

casperc15:03:19

Is it because datomic is adding “some extra” time, so all txInstants are unique?

favila15:03:41

@casperc as-of with an instance is resolved to a tx value as with (-> (d/datoms :avet :db/txInstant the-instant) first :tx)

favila15:03:18

So the as-of point is precisely 13194139534315, because that is the first match for that instant

favila15:03:51

so the one after that is not seen

casperc15:03:10

Ah, that explains it. Thanks!

favila15:03:38

d/datoms is actually wrong, it's more like seek-index

favila15:03:48

because inexact matches are allowed

a.espolov15:03:32

Guys this query [:find (count ?e) :where [?e :а-entity/а-attribute]] return outOfMemory How to be to count all the entities?

favila16:03:43

you need to do it lazily with d/datoms

dominicm16:03:18

I had assumed that dereffing a transaction would be a form of backpressure, but I'm starting to question it. In the docstring the "completion" of the transaction is mentioned, but I'm not osure what it means in this context

favila16:03:38

Completion means transaction committed

favila16:03:05

deref is backpressure only if you wait for it before issuing new txes

danielstockton16:03:30

I think total datom count can also be monitored from the transactor:http://docs.datomic.com/monitoring.html#sec-3

dominicm16:03:32

@favila I am doing a bulk lot of transactions (divided up into chunks). In some cases transacting millions of datoms. I want to avoid overwhelming the transactor

favila16:03:05

For example, (run! #(deref (d/transact-async conn %)) txes) would ensure only one tx is in flight at a time

favila16:03:26

as long as the individual txes are small, txor will not be overwhelmed

favila16:03:36

when indexing kicks in, tx rate will slow

favila16:03:59

however one-at-a-time is very slow, so cognitect recommends pipelining

favila16:03:09

keep a few uncompleted txes in the air at a time

dominicm16:03:58

@favila We've just increased the size of the individual txes, to increase the throughput of our queries to generate the txes.

dominicm16:03:11

Will take a look into pipelining. I had wondered how you'd keep a few in flight at a time.

favila16:03:30

large tx sizes are generally bad

favila16:03:45

better more txes more frequently than larger txes less frequently

favila16:03:08

I think the guideline they gave is ~1000 datoms per tx

favila16:03:13

although we have done larger just fine

favila16:03:32

but tens-of-thousands and tx timeouts become a problem

favila16:03:57

and the jitter upsets other txes from other peers (on an active db)

favila16:03:48

There was some docs somewhere on tuning for a bulk import job too, but I can't find them now

dominicm16:03:59

I'd been given the 10k datoms number. Hmm. We've just increased from ~10/tx to 10*1000 (chunking the job into 1000s)

favila16:03:43

there were some transactor tuneables to set, as well as temporarily raising storage if you use e.g. dynamo

favila16:03:05

e.g. raising the memory-index-threshold to avoid indexes as long as possible, then doing an explicit requestIndex at the end

favila16:03:25

another technique is to do the bulk index locally (dev storage) on a big machine, then backup+restore to remote storage

favila16:03:51

but that doesn't talk about memory-index-threshold

eraserhd18:03:04

Am I correct in assuming that, in pull expressions, defaults can't be supplied for reverse lookups?

eraserhd18:03:02

e.g. [{(default :foo/_bar []) [:foo/uuid]}]

favila18:03:53

Maybe it can be if it is a component entity (where reverse-lookup is a scalar)

favila18:03:05

component attribute rather

eraserhd18:03:41

Oh, good lord.. THere's some spec in my own app rejecting it.

favila18:03:47

I'm not sure you can default cardinality-many attributes is the thing

eraserhd18:03:00

@favila verified (not possible)

djjolicoeur20:03:51

setting the object-cache size on the transactor controls the object-cache size of the transactor, right? If I wanted to set that on a peer I would do so via a java option on the peer, is that right?

marshall20:03:37

Memory index setting on the transactor is used by all peers Object cache size is set independently on each, but defaults to 50% of the heap

djjolicoeur20:03:33

@marshall thanks, that what I though but wanted to make sure

djjolicoeur21:03:14

@marshall is there a good way to ensure that a transactor is running, i.e. something we would monitor? we have an HA setup, and we want to have some monitoring on both the primary transactor and the failover to ensure we get alerted if either dies

marshall21:03:06

HeartbeatMsec and HeartMonitorMsec

marshall21:03:16

for the primary and standby, respectively

djjolicoeur21:03:40

@marshall those are settings internal to both the primary and standby, right? I was looking for something we might be able to monitor externally

marshall21:03:21

are you using CloudWatch?

marshall21:03:26

for metrics

marshall21:03:00

or some other custom metrics callback?

djjolicoeur21:03:37

our transactors don’t run on AWS, so we don’t use CloudWatch. we track metrics in riemann with the yeller /datomic-riemann-reporter. but what I’m looking for is actually less around metrics and more around telling our automation framework that the transactor is up and ready. a port that is open or something along those lines. and if a call to that fails, or a number of calls to that fails, restart the transactor.

djjolicoeur22:03:07

if no such thing exists off the top of your head, that is fine, we will figure something out.

marshall22:03:29

the heartbeat would definitely serve that purpose

marshall22:03:00

i can’t think of much else other than trying to connect

djjolicoeur22:03:37

thanks, I’ll look into what we can do with the heartbeat