Fork me on GitHub
#datomic
<
2020-10-04
>
nando00:10:23

I'm getting an inconsistent result using the sum aggregate function on dev-local. If I include only the sum function, the result is much less than it should be. If I add the count function to the same query, the result of the sum function is then correct. Here's the query with only the sum function. There are multiple batch items per batch and I need the total weight of all batch items.

[:find ?e ?formula-name ?doses ?date (sum ?weight) 
         :keys e formula-name doses date total-weight 
         :in $ ?e
         :where [?e :batch/formula ?fe]
         [?fe :formula/name ?formula-name]
         [?e :batch/doses ?doses]
         [?e :batch/date ?date]
         [?bi :batch-item/batch ?e]
         [?bi :batch-item/weight ?weight]]
=> :total-weight 1027800

nando00:10:39

Here's the query with both the sum and count aggregate functions:

[:find ?e ?formula-name ?doses ?date (sum ?weight) (count ?bi)
         :keys e formula-name doses date total-weight count
         :in $ ?e
         :where [?e :batch/formula ?fe]
         [?fe :formula/name ?formula-name]
         [?e :batch/doses ?doses]
         [?e :batch/date ?date]
         [?bi :batch-item/batch ?e]
         [?bi :batch-item/weight ?weight]]
=> :total-weight 2009250, :count 45 I've confirmed that 2009250 is the correct amount. What am I not understanding here?

Joe Lane00:10:31

[:find ?e ?formula-name ?doses ?date (sum ?weight)
         :keys e formula-name doses date total-weight
         :with ?bi
         :in $ ?e
         :where [?e :batch/formula ?fe]
         [?fe :formula/name ?formula-name]
         [?e :batch/doses ?doses]
         [?e :batch/date ?date]
         [?bi :batch-item/batch ?e]
         [?bi :batch-item/weight ?weight]]

Joe Lane00:10:37

What does that return?

nando00:10:00

Reading now .... so duplicates are being excluded from the sum?

nando00:10:24

It's correct now!

nando00:10:33

That's subtle. Thanks @joe.lane

Joe Lane00:10:02

Does the concept of a set vs a bag make sense to you from the docs?

nando00:10:10

I understood immediately that duplicates might be excluded from (sum ...) when I saw the example, but that's not what one would expect from a sum function.

nando00:10:52

2 + 2 = 2 ???

nando00:10:07

So I think it might be good to point this out in the sum section of the documentation (if it isn't there already)

Joe Lane00:10:19

It's not related to the sum aggregate though, it's related to whether or not you want a bag vs a set of the ?bi lvar.

Joe Lane00:10:41

It's a more general concept.

nando00:10:43

;; query
[:find (sum ?count) 
 :with ?medium
 :where [?medium :medium/trackCount ?count]]
I see an example is in there, but I didn't understand the signficance.

nando00:10:32

I understand it doesn't only apply to the sum aggregate. I'm only saying that if it has a non-obvious impact on a specific aggregate function, it might be helpful for beginners like me to point that out.

seancorfield00:10:22

Interesting. I hadn't learned enough about Datomic to realize it specifically deals in sets by default instead of bags...

nando00:10:35

It is still quite vague to me when a query would return a set.

nando00:10:42

I guess it has to be kept always in mind, because as the example in the documentation on With Clauses shows, it isn't only an issue with some aggregate functions.

nando01:10:02

@joe.lane Here's a specific suggestion for the docs that might help to make this more clear for beginners. In the subsection on sum where it says "The following query uses sum to find the total number of tracks on all media in the database." You might change that to something like "The following query uses sum to find the total number of tracks on all media in the database. Note carefully the use of the with-clause in the query so that all trackCounts are summed. If the with-clause is excluded, only unique trackCounts will be summed."