Fork me on GitHub
#datomic
<
2017-10-18
>
boldaslove15600:10:17

Can we get when an eid is created using entity ?

boldaslove15600:10:47

Or the only way is to use q and get the :db/txInstant ?

potetm01:10:23

@boldaslove156 unless the entity is a transaction, the latter

uwo17:10:27

how to express a query to find all entities that have the same value for an attribute?

uwo17:10:03

(where that value is left unbound)

robert-stuttaford17:10:50

@uwo so yo don’t have the value, you just want to find entities that share values for an attribute? you could group entities by values

(->> (group-by :v (seq (d/datoms db :aevt :ns/key)))
     (map (fn [[v datoms]]
            [v (map #(d/entity db (:e %)) datoms)]))
     (into {}))

uwo17:10:14

hehe. I was wondering if this was something that is expressible in datalog. I had thought of just dropping into sequence fns

robert-stuttaford17:10:56

datalog = bound values. you said not to bind it. [:find ?e ?v :where [?e :attr ?v]] otherwise 😛 group-by ?v ofc

uwo18:10:20

so is there anything wrong with this kind of query

[:find ?t1 ?t2 ?y
 :where
 [?e1 :movie/title ?t1]
 [?e1 :movie/year ?y]
 [?e2 :movie/year ?y]
 [?e2 :movie/title ?t2]
 [(not= ?e1 ?e2)]]
it doesn’t smell right to me, but I can’t say why. totally legit?

favila18:10:24

not totally getting the point of the query? you want every possible pair of titles in a year?

favila18:10:48

but yes, the not= (or something like it) is needed to avoid the self-join

favila18:10:26

you can use != (native operator, not clojure function) and maybe the query engine can use that info

favila18:10:32

also I would reorder the clauses

favila18:10:53

put the years first, then the not, then the titles

uwo18:10:25

basically my colleagues are trying to express a query like “find all entities that have the same value for attribute X and different values for attribute Y” and are writing queries like the above

manutter5118:10:14

And I’m the colleague 🙂 So imagine you have a bunch of Address entities with city, state, and zip (etc), and you want to query for “Show me all cities with more than one zip code,” how would you query for that?

favila18:10:47

(->> (d/q '[:find (count-distinct ?zip) ?c
            :where
            [?addr :city ?c]
            [?addr :zip ?zip]]
       db)
     (remove (fn [[zipcount]] (== zipcount 1))))

favila19:10:10

Pure datalog without aggregation is the approach you have been doing. It looks like a self-join

favila19:10:13

(d/q '[:find ?c
       :where
       [?a1 :city ?c]
       [?a2 :city ?c]
       [(!= ?a1 ?a2)]
       [?a1 :zip ?zip1]
       [?a2 :zip ?zip2]
       [(!= ?zip1 ?zip2)]]
  db)

favila19:10:46

not sure which would be faster. Depends on how smart the query optimizer is about short-circuiting vs how fast it is simply to aggregate everything

manutter5119:10:24

Cool, thanks for the examples, I’m a longtime SQL coder just getting past the “Learn Datalog Today” stage so this is very helpful.

manutter5119:10:42

@U09R86PA4 My actual query is different from the simple example I gave, so I had to modify your queries to fit. The one with aggregation finished in about 1 sec, and the “self-join” I killed after about 8 minutes of running at 700% CPU.

favila19:10:21

well that answers that!

favila18:10:35

@uwo there's no check that t1 and t2 differ

favila18:10:52

two different movies (?e1 ?e2) could be in same year and have same title

favila18:10:48

you could use aggregation @manutter51

laujensen18:10:56

I have a bunch of data which is time-stamped. I want to count every occurance of a data-point, grouped by the day/month of the timestamp. Whats the datalog way to go about that?

uwo18:10:11

sorry, yeah i should have read the example I pasted before hand. my bad

laujensen19:10:37

@marshall, thanks, but how do I got about disregarding all information except date and month ?

marshall19:10:33

ah. they’re separate attributes? you may need to do a custom aggregation

laujensen20:10:49

Right. Then when I query [?x ?y] I only get one result and it doesnt allow [?x ?y] ..., or [?x ?y ...]. How do I get a list of filtered results ?

hmaurer20:10:49

@marshall hi! I have a quick question on Datomic Cloud: will the pricing include a license to use? and around how much it will cost?

marshall20:10:40

@laujensen I'll have to try a couple things. I might do the aggregations in app code and/or use nested or multiple queries depending on the data size. I'll try to get back to you tomorrow

marshall20:10:06

@hmaurer the license is included via purchasing through AWS marketplace. Solo topology will be around $1 a day. That's cost of the AWS infrastructure and the cost for Datomic

marshall20:10:23

Production topology will be more

hmaurer20:10:34

@marshall that sounds amazing for hobbyists

marshall20:10:44

That's definitely the hope :)