Fork me on GitHub

@kschrader: ok, that’s expected behavior. I ran with analogous query and similar ones in mbrainz. The important thing is to look at the tuples you generate w/o the aggregate:

[:find ?artist ?release ?track
 :in $ ?name
 [?artist :artist/name ?name]
 [?track :track/artists ?artist]
 [?release :release/artists ?artist]
Returns 8250 tuples (the cartesian product of relations between ?track and ?release)


but you can limit to unique values for each with count-distinct:


[:find ?artist (count-distinct ?release) (count-distinct ?track)
 :in $ ?name
 [?artist :artist/name ?name]
 [?track :track/artists ?artist]
 [?release :release/artists ?artist]
Returns: 17592186047016, 30, 275


@curtosis: @davebryand note that partitions only impact the clustering of the E portion of the sort, unlikely to have much impact outside eavt and aevt indexes. Potentially huge impacts on both of those though.


@bkamphaus: that’s the behavior that I want


didn’t know that count-distinct existed


the count behavior seems non-obvious to me, I can’t think of a case where I’d want that to happen


Glad to help. If you’re building queries to do aggregation, might be worth a skim of the aggregates section of the docs: - I have to confess to having needed to do that myself to reason about what you ran into and find the solution simple_smile


yeah, good to know


I searched Google for datomic count


and then searched within the query page for count


and the first thing that comes up is reference to count in the Not clauses section


The following query uses a not clause to find the count of all artists who are not Canadian:


didn’t occur to me that there might be another function to do counts that isn’t mentioned until further down the page


just FYI, how I got stuck


I get that it seems non-obvious, it’s a case where count basically does row/tuple counting, so it depends on the shape of the relation from the query. There are different aspects where you run into issues using aggregates given the set of tuples model, :with allows duplicates for values so you can e.g. sum multiple instances of the same value, count-distinct will only count unique things if the relation you construct in the query ends up with multiple values from the implied many-to-many, i.e. when relating to reverse refs to an entity id.


is there a way to do this that’s faster? seems to be taking a lot of time in the console, but if you explore entities in the console it seems like they pull up the ref counts right away


@kschrader: haven’t tested but it will probably be faster to get the counts with two separate queries and merge the results. count-distinct, while returning the answer you want, will still operate over the cartesian product from the many-to-many of ?project to ?story per ?org.


got it, that explains why every count-distinct I add seems to increase query time exponentially simple_smile


@bkamphaus: is there anyway to get a count of zero in an aggregation function?


right now it just leaves out the values when I run them and no results are found


and get-else doesn’t work on a cardinality-many ref


@kschrader: even a custom aggregate won't do anything if there are no relations to aggregate over. I’d just check outside the query, if empty then 0.


ok, was hoping to have something that some of the less technical users here could just drop in the Console


but I can do that


anyone know if there's a better way to do something like this using q directly?


used the technique to query over AWS w/ amazonica, still interested to know if there's better ways -