Fork me on GitHub
#data-science
<
2022-07-03
>
Benjamin10:07:09

Jo was there not an easy way to drop the last row? I thought I read somewhere something about passing a negative number. But not for ds/drop-rows

1
Benjamin10:07:50

(ds/select-rows d (range (dec (ds/row-count d))))
works of course

genmeblog11:07:28

Looks like drop is implemented as a select, so good approach with select-rows and range https://github.com/techascent/tech.ml.dataset/blob/master/src/tech/v3/dataset/base.clj#L405

genmeblog11:07:11

you can also do (drop-rows d [(dec (row-count d))])

Benjamin15:07:56

(tc/add-column
     d
     :crypton-volume-mean
     (:crypton-volume-mean
      (tc/aggregate
       d
       {:crypton-volume-mean #(dfn/mean (% :crypton-volume))}))
     :cycle)
is there a better way to add an aggregation to every column?

genmeblog08:07:49

Could you share some example? I'm not sure if I understand the task.

genmeblog09:07:08

Is it something like that?

genmeblog09:07:12

(def ds (tc/dataset {:group (repeatedly 100 #(rand-nth [:u :i :o :p]))
                   :a (repeatedly 100 rand)
                   :b (repeatedly 100 #(rand 5))}))

ds
;; => _unnamed [100 3]:
;;    | :group |         :a |         :b |
;;    |--------|-----------:|-----------:|
;;    |     :i | 0.21923487 | 0.02757318 |
;;    |     :o | 0.47121141 | 1.03039666 |
;;    |     :i | 0.26676569 | 4.16760503 |
;;    |     :u | 0.38287395 | 2.24760673 |
;;    |     :o | 0.96585848 | 0.01097307 |
;;    |     :o | 0.88531448 | 4.46218351 |
;;    |     :p | 0.19173693 | 3.57395669 |
;;    |     :p | 0.53615392 | 2.74081683 |
;;    |     :p | 0.43978083 | 4.67902393 |
;;    |     :p | 0.72916344 | 1.01818710 |
;;    |    ... |        ... |        ... |
;;    |     :o | 0.75565644 | 0.38405762 |
;;    |     :p | 0.06499388 | 2.39519054 |
;;    |     :p | 0.33801667 | 4.25192711 |
;;    |     :u | 0.76001839 | 3.49197309 |
;;    |     :u | 0.71527541 | 1.33661718 |
;;    |     :u | 0.46813365 | 3.31539727 |
;;    |     :o | 0.90347413 | 0.59158482 |
;;    |     :i | 0.40986820 | 1.26976567 |
;;    |     :u | 0.62212172 | 2.57981844 |
;;    |     :u | 0.77436201 | 1.90787845 |
;;    |     :p | 0.95894755 | 4.35116263 |

(def agg (-> ds
           (tc/group-by [:group])
           (tc/aggregate-columns [:a :b] dfn/mean)
           (tc/rename-columns {:a :mean-a :b :mean-b})))

agg
;; => _unnamed [4 3]:
;;    |    :mean-b | :group |    :mean-a |
;;    |-----------:|--------|-----------:|
;;    | 2.20742202 |     :i | 0.53935031 |
;;    | 2.00721651 |     :o | 0.60963595 |
;;    | 2.69337664 |     :u | 0.48111990 |
;;    | 2.69662835 |     :p | 0.50979658 |

(-> (tc/left-join ds agg :group)
    (tc/drop-columns :right.group))

;; => left-outer-join [100 5]:
;;    | :group |         :a |         :b |    :mean-b |    :mean-a |
;;    |--------|-----------:|-----------:|-----------:|-----------:|
;;    |     :i | 0.21923487 | 0.02757318 | 2.20742202 | 0.53935031 |
;;    |     :i | 0.26676569 | 4.16760503 | 2.20742202 | 0.53935031 |
;;    |     :i | 0.97345184 | 3.68318639 | 2.20742202 | 0.53935031 |
;;    |     :i | 0.19880123 | 1.53596251 | 2.20742202 | 0.53935031 |
;;    |     :i | 0.13353903 | 4.57865084 | 2.20742202 | 0.53935031 |
;;    |     :i | 0.49446615 | 0.12514693 | 2.20742202 | 0.53935031 |
;;    |     :i | 0.69925871 | 0.14210092 | 2.20742202 | 0.53935031 |
;;    |     :i | 0.77344282 | 0.86134163 | 2.20742202 | 0.53935031 |
;;    |     :i | 0.56868174 | 2.86614456 | 2.20742202 | 0.53935031 |
;;    |     :i | 0.37321600 | 4.43636290 | 2.20742202 | 0.53935031 |
;;    |    ... |        ... |        ... |        ... |        ... |
;;    |     :p | 0.13506468 | 1.93568665 | 2.69662835 | 0.50979658 |
;;    |     :p | 0.23051436 | 4.22573832 | 2.69662835 | 0.50979658 |
;;    |     :p | 0.22231685 | 3.38299530 | 2.69662835 | 0.50979658 |
;;    |     :p | 0.98586194 | 0.82699630 | 2.69662835 | 0.50979658 |
;;    |     :p | 0.82490726 | 2.79908028 | 2.69662835 | 0.50979658 |
;;    |     :p | 0.35192255 | 1.06928476 | 2.69662835 | 0.50979658 |
;;    |     :p | 0.79226756 | 2.31301546 | 2.69662835 | 0.50979658 |
;;    |     :p | 0.11991132 | 2.11139863 | 2.69662835 | 0.50979658 |
;;    |     :p | 0.06499388 | 2.39519054 | 2.69662835 | 0.50979658 |
;;    |     :p | 0.33801667 | 4.25192711 | 2.69662835 | 0.50979658 |
;;    |     :p | 0.95894755 | 4.35116263 | 2.69662835 | 0.50979658 |

genmeblog09:07:44

Or if you don't need grouping, just call add-columns

genmeblog09:07:47

(tc/add-columns ds {:mean-a (dfn/mean (:a ds))
                    :mean-b (dfn/mean (:b ds))})

;; => _unnamed [100 5]:
;;    | :group |         :a |         :b |    :mean-a |    :mean-b |
;;    |--------|-----------:|-----------:|-----------:|-----------:|
;;    |     :i | 0.21923487 | 0.02757318 | 0.52913219 | 2.43060934 |
;;    |     :o | 0.47121141 | 1.03039666 | 0.52913219 | 2.43060934 |
;;    |     :i | 0.26676569 | 4.16760503 | 0.52913219 | 2.43060934 |
;;    |     :u | 0.38287395 | 2.24760673 | 0.52913219 | 2.43060934 |
;;    |     :o | 0.96585848 | 0.01097307 | 0.52913219 | 2.43060934 |
;;    |     :o | 0.88531448 | 4.46218351 | 0.52913219 | 2.43060934 |
;;    |     :p | 0.19173693 | 3.57395669 | 0.52913219 | 2.43060934 |
;;    |     :p | 0.53615392 | 2.74081683 | 0.52913219 | 2.43060934 |
;;    |     :p | 0.43978083 | 4.67902393 | 0.52913219 | 2.43060934 |
;;    |     :p | 0.72916344 | 1.01818710 | 0.52913219 | 2.43060934 |
;;    |    ... |        ... |        ... |        ... |        ... |
;;    |     :o | 0.75565644 | 0.38405762 | 0.52913219 | 2.43060934 |
;;    |     :p | 0.06499388 | 2.39519054 | 0.52913219 | 2.43060934 |
;;    |     :p | 0.33801667 | 4.25192711 | 0.52913219 | 2.43060934 |
;;    |     :u | 0.76001839 | 3.49197309 | 0.52913219 | 2.43060934 |
;;    |     :u | 0.71527541 | 1.33661718 | 0.52913219 | 2.43060934 |
;;    |     :u | 0.46813365 | 3.31539727 | 0.52913219 | 2.43060934 |
;;    |     :o | 0.90347413 | 0.59158482 | 0.52913219 | 2.43060934 |
;;    |     :i | 0.40986820 | 1.26976567 | 0.52913219 | 2.43060934 |
;;    |     :u | 0.62212172 | 2.57981844 | 0.52913219 | 2.43060934 |
;;    |     :u | 0.77436201 | 1.90787845 | 0.52913219 | 2.43060934 |
;;    |     :p | 0.95894755 | 4.35116263 | 0.52913219 | 2.43060934 |

Benjamin09:07:37

ah that is cool. I will save this as reference

👍 1
chrisn17:07:23

‘assoc’ also works

chrisn17:07:14

So a ->> op across columns terminated by apply assoc

👀 1