Fork me on GitHub
#data-science
<
2022-06-26
>
Benjamin11:06:53

Hi I'm checking out dataset. How do I group by a coll and then aggreate the values? here my naive first try:

(-> [{:a 10 :b "fo"}
       {:a 10 :b "fa"}
       {:a 10 :b "fa"}]
      ds/->dataset
      (ds/group-by-column :b)
      (update-vals
       (fn [ds]
         (dfn/sum (ds :a)))))
  {"fo" 10.0, "fa" 20.0}

chrisn13:06:08

That is one way that is easy in terms of code. The reductions namespace contains https://techascent.github.io/tech.ml.dataset/tech.v3.dataset.reductions.html#var-group-by-column-agg to do this that will be a bit faster as then we do the aggregation during the group-by reduction.

bananadance 3
metasoarous17:06:14

Awesome; Thanks! I've been writing more or less the exact same code, so good to know the faster way of doing this.

quadron09:06:34

is it possible to reduce over rows?

quadron09:06:52

as in reducing over a sequence of maps?