Fork me on GitHub
#data-science
<
2024-03-06
>
Mattias12:03:20

Hey all, I’m watching the amazingly appreciated London Clojurians talk by Kira on Tablecloth. There is something I can’t replicate in the beginning which annoys me to not understand. Basically, it’s about displaying/printing datasets after a group-by. Kira outputs to the repl by just: (starting a couple of lines down):

...
(tc/as-regular-dataset)
(tc/select-rows 0)
:data
But I have to go (full example from the start):
(-> tst2
    (tc/group-by [:name])
    (tc/as-regular-dataset)
    (tc/select-rows 0)
    :data
    first
    tc/print-dataset
    )

Mattias12:03:16

How would manage to output a sub-dataset without both the extra “first” and the explicit print-dataset call? Some wiring in the background? :thinking_face:😅

genmeblog13:03:42

Here is the answer!

(-> tst2
    (tc/group-by [:name]) ;; creating a grouped dataset
    (tc/as-regular-dataset) ;; this removes information about grouping, reveals internal structures
    (tc/select-rows 0) ;; selects first row
    :data ;; selects a `:data` column, which contains only one element
    first ;; a column is a sequence here, that's why we should take the first element...
    tc/print-dataset ;; ... which is a some subdataset
    )

genmeblog13:03:27

From certain perspective a data set is a map of columns and columns are seqs.

genmeblog13:03:01

Regarding the print-dataset call. It depends how your REPL prints a dataset but imho it's not necessary step here.

genmeblog13:03:46

print-dataset just prints a dataset to the *out* I suppose. Without it above form should return a dataset and let REPL to print it properly.

Mattias14:03:53

Thank you! ✌️

👍 1
Daniel Slutsky22:03:18

Announcing the 1st meeting of the Scicloj real-world-data group: https://clojureverse.org/t/real-world-data-meeting-1/