This page is not created by, affiliated with, or supported by Slack Technologies, Inc.
2022-07-02
Channels
- # babashka (68)
- # beginners (22)
- # calva (8)
- # cider (10)
- # cljs-dev (31)
- # clojure (35)
- # clojure-europe (6)
- # clojure-norway (17)
- # clojurescript (5)
- # conjure (10)
- # data-science (8)
- # datascript (10)
- # emacs (3)
- # fulcro (20)
- # humbleui (3)
- # london-clojurians (1)
- # membrane (9)
- # nbb (34)
- # off-topic (16)
- # pathom (15)
- # releases (1)
- # shadow-cljs (15)
- # sql (11)
Jo is ds/mapseq-reader
the correct thing to use when I for example want to provide data for #clerk ? (I have a dataset and I like a seq of maps).
Yea ok I figure the dosctring literally says my description 😅
How can I perform an aggregation like
SELECT
category,
stragg(id, ', ') AS ids
FROM table
GROUP BY category
using tech.ml.dataset?
stragg
is an imaginary function that aggregates all values in the id
column into a comma-separated string.
For example, given a dataset like
(ds/->dataset [{"id" 1, "name" "bob"} {"id" 2, "name" "bob"}, {"id" 3, "name" "alice"}])
| name | id |
|-------|---:|
| bob | 1 |
| bob | 2 |
| alice | 3 |
I would like to create a dataset like
| name | ids |
|-------+------|
| bob | 1, 2 |
| alice | 3 |
(-> (tc/dataset [{"id" 1, "name" "bob"} {"id" 2, "name" "bob"}, {"id" 3, "name" "alice"}])
(tc/fold-by ["name"] (partial str/join ", ")))
;; => _unnamed [2 2]:
;; | name | id |
;; |-------|------|
;; | bob | 1, 2 |
;; | alice | 3 |
And the https://techascent.github.io/tech.ml.dataset/tech.v3.dataset.reductions.html#var-reducer for larger datasets that avoids construction of the intermediate column values:
tech.v3.dataset.reductions-test> (ds-reduce/group-by-column-agg
"name"
{"id" (ds-reduce/reducer
"id"
(fn [ctx val]
(let [first? (nil? ctx)
^StringBuilder ctx (or ctx (StringBuilder.))]
(when-not first? (.append ctx ", "))
(.append ctx val)))
#(.toString ^Object %))}
[ds])
name-aggregation [2 2]:
| name | id |
|-------|------|
| bob | 1, 2 |
| alice | 3 |
This works great, but on my real dataset I get
1. Unhandled java.lang.Exception
Column appId has value whose length (109583) is greater than max-chars-per-column (65536).
when trying to export it to a CSV file using ds/write!
. Is there any workaround?
cider also can't show a preview of the dataset because the columns are too long. Any work around for this?https://techascent.github.io/tech.ml.dataset/tech.v3.dataset.print.html I guess here https://techascent.github.io/tech.ml.dataset/quick-reference.html print options
max-chars-per-column
can be changed in write
https://techascent.github.io/tech.ml.dataset/tech.v3.dataset.html#var-write.21