Fork me on GitHub
#datomic
<
2017-08-16
>
ibarrick15:08:05

I can't figure out what's wrong with the following query: (d/q '[:find ?tx ?e :in ?log ?t1 ?t2 :where [(tx-ids ?log ?t1 ?t2) [?tx ...]] [(tx-data ?log ?tx) [[?e]]] [?tx :db/txInstant ?time]] (d/log conn) t1 t2)

ibarrick15:08:00

It tells me IllegalArgumentExceptionInfo :db.error/invalid-data-source Nil or missing data source. Did you forget to pass a database argument? and it works fine if I take out the [?tx :db/txInstant ?time]

jazzytomato15:08:11

not sure but what if you add $ '[:find ?tx ?e :in $ ?log ?t1 ?t2 :where

favila15:08:10

@ibarrick d/log looks suspicious, are you sure that is serializeable?

favila15:08:47

I suspect that just won't work with client api

favila15:08:23

wait you're not talking about client api anymore

favila15:08:23

the clause where you go [?tx :db/txInstant ?time] requires a "data source" (i.e. a datomic db) to satisfy

favila15:08:32

so you need to provide a db as args too

ibarrick15:08:46

@jazzytomato That told me it was expecting 4 arguments but only received 3. I was able to get results by adding (d/db conn) right before (d/log conn) but I'm not positive why I need to include the db and the log or even if I'm getting the results I want with that.

ibarrick15:08:49

Maybe @favila just answered my question though. I switched from the client API because I couldn't get tx-ids or tx-data to work at all with queries from the client api

thegeez15:08:06

@ibarrick didn't test this but maybe this works:

(d/q '[:find ?tx ?e :in ?log ?t1 ?t2 :where [(tx-ids ?log ?t1 ?t2) [?tx ...]]  [(tx-data ?log ?tx) [[?e :db/txInstant ?time]]]] (d/log conn) t1 t2)

ibarrick15:08:35

So why can I query over the return of (d/log conn) (using just tx-ids and tx-data) if it isn't technically a "data source"?

favila15:08:04

look at your query clauses: none of them are actual clauses, they are just functions and destructuring

favila15:08:18

the data flows through just fine

favila15:08:50

it's when you have [$ ?e ?a ?v ?tx] clauses that a data source is involved

favila15:08:29

note also that ?tx = ?e

ibarrick15:08:25

I think I follow you on the first part but I'm not sure why ?tx would be equal to ?e and in my results they appear to not be equal

favila15:08:57

I was looking at @thegeez 's example

favila15:08:39

@ibarrick what is it you really want? tx, entities mentioned, tx-instant?

favila15:08:59

your end goal, not your impl

ibarrick15:08:18

Oh okay. what about the "$"? should my [?tx :db/txInstant ?time] have been [$ ?tx :db/txInstant ?time]?

ibarrick15:08:09

And yes I want exactly what you described

favila15:08:11

I am reminding you that clauses like that have a source var, which can be omitted (and defaults to $ if you do)

favila15:08:51

so if a clause could take a source-var at the beginning, it's a clause that needs a db ("data source")

ibarrick15:08:57

This seems like the easiest way to get a log of what changes were made to which entities during a time period, ordered by time.

favila15:08:19

you want what changes, or just the entities?

favila15:08:29

why not use the log directly and ignore query?

ibarrick15:08:18

I guess I could just use the log, buy how would I extract the timestamps of the transactions? Would I have to do a query for each unique transaction returned from the log?

favila15:08:44

(:db/txInstant (d/entity db ?tx))

favila15:08:55

or you could extract it from the tx data itself

favila15:08:04

(probably not worth the trouble)

favila15:08:45

(let [db (d/db conn)
      txs (-> (d/log conn) (d/tx-range 0 (inc (d/basis-t db))))]
  (->> txs
       (take 1)
       (map (fn [{:keys [t data] :as tx-info}]
              (conj tx-info (find (d/entity db (d/t->tx t)) :db/txInstant))))))

favila15:08:56

pretty much equivalent, but also lazy

ibarrick15:08:23

What would the implementation of the first option look like? the ?tx make me think that goes in the query itself but I didn't think you could do all that inside a query (and also you mentioned not doing a query)

favila15:08:45

first option?

ibarrick15:08:28

you said: (:db/txInstant (d/entity db ?tx)) or extract it from the tx data itself. I assumed the code sample was the implementation of the latter

favila15:08:42

no, the code sample is doing the first

favila15:08:12

so I gave you an example of reading the tx log and adding a :db/txInstant key to the returned maps, without doing a query

favila15:08:42

That is in this line: (conj tx-info (find (d/entity db (d/t->tx t)) :db/txInstant))))))

favila15:08:08

we're just grabbing the tx entity, then pulling out the :db/txInstant value and adding it to the tx-info map

ibarrick15:08:55

Oh I see. The code segment makes perfect sense I just misunderstood which one that referred to

favila15:08:59

as a query it would look like this:

favila16:08:01

(d/q '[:find ?tx ?tx-data ?time
       :in $ ?log ?t1 ?t2
       :where
       [(tx-ids ?log ?t1 ?t2) [?tx ...]]
       [(tx-data ?log ?tx) ?tx-data]
       [?tx :db/txInstant ?time]
       ]
  (d/db conn) (d/log conn) 1000 1001)

favila16:08:22

but queries are not lazy

favila16:08:48

and this is a hack that doesn't use a db:

favila16:08:00

(d/q '[:find ?tx ?tx-data ?time
       :in ?log ?t1 ?t2
       :where
       [(tx-ids ?log ?t1 ?t2) [?tx ...]]
       [(tx-data ?log ?tx) ?tx-data]
             ;; 50 is the :db/txInstant id
       [(identity ?tx-data) [[?tx 50 ?time _ true]]]
       ]
  (d/log conn) 1000 1001)

favila16:08:11

here we look in the tx-data itself for the tx instant

favila16:08:41

(I wouldn't recommend this approach, but it's just to show what's possible)

favila16:08:23

a safer version needs to determine what the :db/txInstant attribute's eid is

ibarrick16:08:35

Yep, that's what the query I got working looked like exactly. Is the only intrinsic advantage to not querying laziness?

favila16:08:30

it's presorted

ibarrick16:08:47

I was just asking about the last query and then you fixed it 😅

favila16:08:56

query assembles a set

ibarrick16:08:02

I really appreciate you taking the time to go over all of this for me.

favila16:08:16

so if you want to sort by time, the non-query version will already be sorted

favila16:08:24

the query version you will need to sort after

favila16:08:56

the query version in theory can make use of parallelism, not sure it matters here

favila16:08:21

honestly some things are really just easier to express without a query too

ibarrick16:08:31

In my case the version without querying gives me a more manageable shape for the data anyways

favila16:08:33

when I just need to read an entire index segment or tx log segment I find the non-query approach to be faster, use less memory, and be more straightforward

favila16:08:16

it's only when I need to do actual pattern matching or walking across entities (i.e. real queries) that using tx-log in a query makes sense

favila16:08:31

I always have the history db as input too in those queries

favila16:08:37

probably a tx-log query that doesn't have a history db as input is a sign that it's probably not an ideal query--just map over tx-range

ibarrick16:08:20

I think this is all starting to click, thanks!