Fork me on GitHub

My transactor keeps getting stuck in a state where it can't start and just hangs on start for 50 min and dies with:

Jun 30 21:51:49 ip-172-16-211-135.ec2.internal bash[2005]: Launching with Java options -server -Xms1g -Xmx24g -XX:+UseG1GC -XX:MaxGCPauseMillis=50                   -Xloggc:/var
Jun 30 21:51:53 ip-172-16-211-135.ec2.internal bash[2005]: Starting datomic: ...
Jun 30 21:51:53 ip-172-16-211-135.ec2.internal bash[2005]: System started datomic:
Jun 30 22:55:36 ip-172-16-211-135.ec2.internal bash[2005]: Critical failure, cannot continue: Indexing retry limit exceeded.
Jun 30 22:56:07 ip-172-16-211-135.ec2.internal systemd[1]: datomic.service: Main process exited, code=exited, status=255/n/a


I'm at something of a loss, I can't make it get into this state intentionally and once it gets into that state it stays that way to up five days


ohh i'm running datomic-pro-0.9.5359


Is there such a thing in datomic as a "rebuild all the indexes from scratch" command or is this a silly question

Ben Kamphaus03:07:52

@arthur: doesn’t look like that’s necessarily the solution to your problem. Your system may be underprovisioned (i.e. not enough dynamo throughput provisioned), or you might have catastrophic GC happening with a heap that large (really no benefit to going more than ~12 max heap on a transactor). It’s hard to say what is, though logs/metrics will show bottlenecks for write throughput and things like laggy heartbeat that would point to catastrophic GC. If you can’t find an obvious culprit in storage throughput, you may want to reach out to someone at Cognitect support in case it’s a bug or something their diagnostic tools will help you locate.


Seeing some funky behavior where the tx metadata associated with a particular transaction becomes stale after the first transaction. That is, the db/txInstant and db/id of the transaction associated with the latest entity do not change after the first transaction, even when the actual entity attributes returned reflect more recent transactions.


Is there some kind of caching of transaction entities?


It can’t be Redundancy Elimination ( because of the :db/txInstant

Ben Kamphaus22:07:03

@pheuter not sure I follow what you mean. An entity can span multiple transactions, so you need some logic defining which transaction concerning the entity you want information about.


I see, I was assuming that given a single d/db value, a (pull ?tx-id [*]) will return the latest transaction attributes, no?


for a particular entity

Ben Kamphaus22:07:48

how does your query relate the entity in question to the transaction?


[:find (pull ?e [*]) (pull ?tx-id [*])
  :where [?e a v ?tx-id]]

Ben Kamphaus22:07:06

how are ?a and ?v bound? if not at all, should replace with _ — and that version of the query should return multiple relations of the pulled e to the pulled ?tx-id.


in our particular case we’re dynamically generating these queries from a json request, they’re being filled in with actual values, so it looks more like:

[:find (pull ?e [*]) (pull ?tx-id [*])
  :where [?e :user/firstName “Bob" ?tx-id]]

Ben Kamphaus22:07:32

so in that case you’ll get the ?tx-id for when that specific fact — the user first name — was specified.


let’s say the query is this:

[:find (pull ?e [*]) (pull ?tx-id [*])
  :where [?e :user/firstName _ ?tx-id]]


we’ll get back a list of users that have a firstName


which ?tx-id will we get back then?

Ben Kamphaus22:07:45

any ?tx-id in the database (note if you’re using a present database this does not include history) in which a fact about :user/firstName for that ?e was asserted.


i see, so we can potentially be getting back an older transaction as long as the entity has a first name, even if it has changed since then


is there a way to specify to get the latest transaction associated with that entity?


Damn, I guess that was my question.

Ben Kamphaus22:07:41

out of order reply

Ben Kamphaus22:07:46

No that’s not what’s going on, I don’t think.

Ben Kamphaus22:07:02

transaction ids are for facts about entities

Ben Kamphaus22:07:20

there is no idea of an entity id -> transaction id mapping that transcends facts

Ben Kamphaus22:07:25

entities are projected from facts

Ben Kamphaus22:07:28

transaction ids are for facts

Ben Kamphaus22:07:49

if you want to get the latest transaction with an entity, you get the max ?tx-id for [?e ?tx-id]


yeah, i guess i was hoping to be able to order all transaction facts associated with an entity by txInstant and get the most recent one


that could work

Ben Kamphaus22:07:46

not necessary to order by txInstant and less reliable to do so (transaction granularity means transactions can occur on the same instant) — transaction ids are guaranteed to ascend monotonically, just max that.


good point, thanks! this has been really helpful

Ben Kamphaus22:07:12

of course if you want readable time it might make sense. Here’s an SO answer where I don’t follow the advice I just gave you 🙂


hm, seeing the following error now: clojure.lang.PersistentList cannot be cast to java.lang.Number query looks like this:

[:find (pull ?e pattern) (pull ?tx-id [*]):where [?e :account/email _] [?e _ _ (max ?tx-id)]]


i’m basically trying to avoid having to break it out into two separate queries


Try the max in the find specification


Invalid pull expression (pull (max ?tx-id) [*])


[:find (max ?tx-id) :where [?e ?tx-id]]

Ben Kamphaus23:07:38

definitely can’t nest function calls inside single where clauses in general and aggregates can only legally be called in the :find clause as @marshall points out. I think you’ll run into issues using the pull combined.

Ben Kamphaus23:07:58

You could probably get max txInstant and still pull the ?tx-id? Or try and subquery it, e.g.




didn’t know subqueries is a thing o.0

Ben Kamphaus23:07:01

not the first thing I’d reach for, depends on what your need to do things in one query is. In general, you’re better off breaking up pull/entity (project analogues) of query from the where clause/select portions, they’re more composable and general purpose that way.


Also keep in mind since query occurs in process on the peer and a lot of this data will be cached there isn't the same cost for multiple queries as there would be in a traditional rdbms

Ben Kamphaus23:07:53

if you’re building queries (as your earlier discussion implies) via REST or a client layer where you don’t get the same benefit of caching and have to round trip to the peer it’s a different story.


yeah, in this case i’m building a query dynamically from a client request and the desired behavior is to return the latest tx metadata associated with the entities returned


it’s a fair point that queries are heavily cached on the peer at it may not be the worst thing to make two queries to fulfill each request


hm, is it possible to pass a particular entity into a subquery as a query arg?

[:find (pull ?e [*]) (pull ?latest-tx [*])
:in $
[?e :account/email _]
[(datomic.api/q [:find (max ?tx-id) :where [?e _ _ ?tx-id]]
                $ ?e) [[?latest-tx]]]
[?e _ _ ?latest-tx]]


getting error: processing clause: [?e _ _ ?tx-id], message: Insufficient bindings, will cause db scan


ah i forgot the in clause in the subquery, doh


yay it worked