Fork me on GitHub
#datomic
<
2016-02-05
>
Ben Kamphaus00:02:04

:b/c is card one or many?

ljosa00:02:16

:c/d and :c/i contain short strings; b/n and n/x are floats.

ljosa00:02:31

whoa! the memcached solution reduced the cold query time from my house (25 ms ping time) from ~30 s to 2.2 s. I think we have our solution!

Ben Kamphaus00:02:56

cool, good to hear. I wonder if there’s a cost in the structure of that pull that’s non-obvious. I’m doing testing against a larger mbrainz than the sample we provide, I see several orders of magnitude bump in perf to put in the second pull statement, I’ll discuss that with the dev team, though, too.

Ben Kamphaus00:02:39

actually never mind, that time is only introduced when I have a typo in one of the pulled attributes, interesting.

ljosa00:02:41

thanks, we'll keep that in mind and see if we notice differences with two-pull queries.

Ben Kamphaus00:02:50

sorry thinking aloud simple_smile

ljosa00:02:59

thank you for your help!

Ben Kamphaus00:02:24

yeah, I’m not sure, I see < 150 msec w/local postgres storage for this query (larger mbrainz than public) with 10,340 count:

(time
  (count
    (d/q '[:find (pull ?t [:track/name :track/release]) (pull ?a [:artist/sortName :artist/startYear])
           :where
           [?a :artist/name "Pink Floyd"]
           [?t :track/artists ?a]]
         (d/db conn))))

Ben Kamphaus00:02:32

anyways, glad the memcached option seems to be helping! simple_smile

Ben Kamphaus00:02:29

~500 msec with reverse ref in first pull instead of typo 😛 (again 10,340 total results)

(time
  (count
    (d/q '[:find (pull ?t [:track/name :medium/_tracks]) (pull ?a [:artist/sortName :artist/startYear])
           :where
           [?a :artist/name "Pink Floyd"]
           [?t :track/artists ?a]]
         (d/db conn))))

currentoor18:02:03

Based on this stack overflow post I understand how I can get updated-at values using the history db. http://stackoverflow.com/questions/24645758/has-entities-in-datomic-metadata-like-creation-and-update-time But for performance I wanted to retrieve these timestamps together and part of another query. So is that possible? And is this the correct way to do it?

(d/q '[:find (pull ?a structure) ?created-at (max ?updated-at)
       :in $ structure
       :where
       [?a :action/status "foo"]

       [?a :action/id _ ?id-tx]
       [?id-tx :db/txInstant ?created-at]

       [?a _ _ ?all-tx]
       [?all-tx :db/txInstant ?updated-at]
       ]
     (d/db conn)
     ent/ActionStructure)

currentoor18:02:43

Assuming :action/id is a unique attribute that is only set when the entity is created.

Lambda/Sierra18:02:42

@currentoor: "for performance I wanted to retrieve these timestamps together and part of another query" There is usually no need to combine queries for performance reasons.

Lambda/Sierra18:02:08

Smaller, simpler queries usually perform better than large, complex queries.

currentoor18:02:54

Yeah I can totally see where you're coming from @stuartsierra but for this specific use-case I'm fetching about 1000 entities from the DB then mapping over them to get their created-at updated-at timestamps. The timestamp loop makes up about have my total execution time.

currentoor18:02:53

Individually these created-at updated-at queries are negligible but in aggregate they take a significant amount of time.

currentoor18:02:18

Do you think they would still take just as long if I put them inside the larger query?

Lambda/Sierra18:02:35

@currentoor: As with any performance question, measure first. But I would not expect the combined queries to perform any better than separate queries.

Lambda/Sierra18:02:27

I would look at the size of the ?updated-at query results. If you have many transactions updating each entity, that could account for some of the cost of the query.

currentoor19:02:52

Hmm. So I know this is hearsay but I'm getting pressured to store created-at updated-at attributes directly on the entity, just like other DBs. I know this is re-inventing stuff but what about performance, do you suspect this would be faster than using Datomic's built in time facilities?

Lambda/Sierra20:02:20

@currentoor: As always, test and measure. Make sure you have realistic-sized data to test.

currentoor21:02:57

Will do, thanks.

currentoor22:02:42

I'm having getting a set of tx-times with this query.

(defn timestamps [db lookup-refs]
  (d/q '[:find (min ?tx-time) (max ?tx-time)
         :in $ [?eid ...]
         :where
         [?eid _ _ ?tx _]
         [?tx :db/txInstant ?tx-time]]
       (d/history db)
       lookup-refs))
I'm passing in four lookup-refs so I would expect the result to be four tuples, one for each of the lookup-refs. But instead I get this.
[[#inst "2016-02-05T22:22:31.085-00:00" #inst "2016-02-05T22:31:29.292-00:00"]] 

currentoor22:02:02

Can a query be used to take in a collection and return a collection in the same ordering?

currentoor22:02:47

Oh I get, uniqueness is the issue. This works.

currentoor22:02:03

(defn timestamps [db lookup-refs]
  (d/q '[:find ?id (min ?tx-time) (max ?tx-time)
         :in $ [?eid ...]
         :where
         [?eid _ _ ?tx _]
         [?eid :action/id ?id ?tx _]
         [?tx :db/txInstant ?tx-time]]
       (d/history db)
       lookup-refs))