Fork me on GitHub
#datomic
<
2016-07-24
>
cezar01:07:29

question regarding d/q vs d/datoms performance... I notice that for queries which grab a lot of entities the runtime performance and the memory usage of d/q is quite bad. Here's an example from my prototype. The query runs to determine the last time an attendee had an encounter with an RFID reader. This is modeled using the :encounter entity which contains a timestamp and links via a ref to a reader and an attendee. Here's my naive implementation of the filter:

(defn get-max-encounters [db]
  (d/q '[:find (max ?et) ?ea ?er :in $
         :where
         [?e :encounter/time ?et]
         [?e :encounter/attendee ?ea]
         [?e :encounter/reader ?er]]
       db))
the above query takes well over 10 minutes and frequently blows up with OOM on a Peer with Xmx4G. The number of :encounter entities is 20,000,000 and the number of readers around 600 and about 100,000 attendees. Using d/datoms however, is a whole different story with the "query" completed in less than 40 seconds and memory usage staying well within the Xmx limit. However, that means adding extra code to basically do Datomic's job by hand. Unfortunately my code makes assumptions about the number of datoms in the :encounter entity so in that sense it just feels "wrong" so I'd prefer not to use it. But maybe it's the only way to make the thing work. I'll post the snippet in the next message

Lambda/Sierra16:07:03

@cezar: What you see is what I would expect. The implementation of d/q realizes all intermediate results in memory. Roughly speaking, each datalog clause in the :where part of the query generates a set of intermediate results in memory. For this reason I would say that, in general, d/q is not appropriate for operations that must scan all (or nearly all) entities in the database.

Lambda/Sierra16:07:09

On the other hand, if your query were restricted to a single entity (for example, a single attendee) it could be made more efficient by placing that restriction first in the :where clause.

Lambda/Sierra16:07:29

(defn get-max-encounters [db attendee]
  (d/q '[:find (max ?et) ?ea ?er
         :in $ ?ea
         :where
         [?e :encounter/attendee ?ea]  ; narrow results to 1 attendee
         [?e :encounter/time ?et]
         [?e :encounter/reader ?er]]
       db attendee))

donaldball16:07:36

Just spitballing, but if I were faced with this problem, I’d consider storing the most recent encounter in a no-history attribute on the attendee datom. Would that be a bad idea?

cezar21:07:38

@donaldball: the problem is that I may be getting encounters out of sequence. so I can't rely on Datomic's time keeping for the exact calculations of the attendance based off that. I can for the rough approximates however, and I already do that.

donaldball23:07:37

When applying a transaction that records a new encounter, you could check and see if the attendee does not already have an encounter with a later time, right?