Fork me on GitHub
#datomic
<
2015-12-30
>
Ben Kamphaus00:12:55

@kschrader: ok, that’s expected behavior. I ran with analogous query and similar ones in mbrainz. The important thing is to look at the tuples you generate w/o the aggregate:

[:find ?artist ?release ?track
 :in $ ?name
 :where
 [?artist :artist/name ?name]
 [?track :track/artists ?artist]
 [?release :release/artists ?artist]
]
Returns 8250 tuples (the cartesian product of relations between ?track and ?release)

Ben Kamphaus00:12:11

but you can limit to unique values for each with count-distinct:

Ben Kamphaus00:12:53

[:find ?artist (count-distinct ?release) (count-distinct ?track)
 :in $ ?name
 :where
 [?artist :artist/name ?name]
 [?track :track/artists ?artist]
 [?release :release/artists ?artist]
]
Returns: 17592186047016, 30, 275

tcrayford01:12:02

@curtosis: @davebryand note that partitions only impact the clustering of the E portion of the sort, unlikely to have much impact outside eavt and aevt indexes. Potentially huge impacts on both of those though.

kschrader16:12:14

@bkamphaus: that’s the behavior that I want

kschrader16:12:29

didn’t know that count-distinct existed

kschrader16:12:12

the count behavior seems non-obvious to me, I can’t think of a case where I’d want that to happen

Ben Kamphaus16:12:32

Glad to help. If you’re building queries to do aggregation, might be worth a skim of the aggregates section of the docs: http://docs.datomic.com/query.html#aggregates - I have to confess to having needed to do that myself to reason about what you ran into and find the solution simple_smile

kschrader16:12:35

yeah, good to know

kschrader16:12:49

I searched Google for datomic count

kschrader16:12:12

and then searched within the query page for count

kschrader16:12:48

and the first thing that comes up is reference to count in the Not clauses section

kschrader16:12:51

The following query uses a not clause to find the count of all artists who are not Canadian:

kschrader16:12:28

didn’t occur to me that there might be another function to do counts that isn’t mentioned until further down the page

kschrader16:12:39

just FYI, how I got stuck

Ben Kamphaus16:12:23

I get that it seems non-obvious, it’s a case where count basically does row/tuple counting, so it depends on the shape of the relation from the query. There are different aspects where you run into issues using aggregates given the set of tuples model, :with allows duplicates for values so you can e.g. sum multiple instances of the same value, count-distinct will only count unique things if the relation you construct in the query ends up with multiple values from the implied many-to-many, i.e. when relating to reverse refs to an entity id.

kschrader16:12:50

is there a way to do this that’s faster? seems to be taking a lot of time in the console, but if you explore entities in the console it seems like they pull up the ref counts right away

Ben Kamphaus16:12:16

@kschrader: haven’t tested but it will probably be faster to get the counts with two separate queries and merge the results. count-distinct, while returning the answer you want, will still operate over the cartesian product from the many-to-many of ?project to ?story per ?org.

kschrader16:12:16

got it, that explains why every count-distinct I add seems to increase query time exponentially simple_smile

kschrader17:12:43

@bkamphaus: is there anyway to get a count of zero in an aggregation function?

kschrader17:12:05

right now it just leaves out the values when I run them and no results are found

kschrader17:12:29

and get-else doesn’t work on a cardinality-many ref

Ben Kamphaus18:12:42

@kschrader: even a custom aggregate won't do anything if there are no relations to aggregate over. I’d just check outside the query, if empty then 0.

kschrader18:12:36

ok, was hoping to have something that some of the less technical users here could just drop in the Console

kschrader18:12:40

but I can do that

alandipert19:12:59

anyone know if there's a better way to do something like this using q directly? https://gist.github.com/alandipert/73e923a690061d18ba3a

alandipert20:12:42

used the technique to query over AWS w/ amazonica, still interested to know if there's better ways - https://gist.github.com/alandipert/d2cb38ee869448182c4b