Fork me on GitHub

Jo is ds/mapseq-reader the correct thing to use when I for example want to provide data for #clerk ? (I have a dataset and I like a seq of maps). Yea ok I figure the dosctring literally says my description 😅

Konrad Claesson12:07:48

How can I perform an aggregation like

    stragg(id, ', ') AS ids
FROM table
GROUP BY category
using stragg is an imaginary function that aggregates all values in the id column into a comma-separated string. For example, given a dataset like
(ds/->dataset [{"id" 1, "name" "bob"} {"id" 2, "name" "bob"}, {"id" 3, "name" "alice"}])
|  name | id |
|   bob |  1 |
|   bob |  2 |
| alice |  3 |
I would like to create a dataset like
| name  | ids  |
| bob   | 1, 2 |
| alice | 3    |


tc/fold-by is your friend here:


(-> (tc/dataset [{"id" 1, "name" "bob"} {"id" 2, "name" "bob"}, {"id" 3, "name" "alice"}])
    (tc/fold-by ["name"] (partial str/join ", ")))

;; => _unnamed [2 2]:
;;    |  name |   id |
;;    |-------|------|
;;    |   bob | 1, 2 |
;;    | alice |    3 |


And the for larger datasets that avoids construction of the intermediate column values:

tech.v3.dataset.reductions-test> (ds-reduce/group-by-column-agg 
                                  {"id" (ds-reduce/reducer 
                                         (fn [ctx val]
                                           (let [first? (nil? ctx)
                                                 ^StringBuilder ctx (or ctx (StringBuilder.))]
                                             (when-not first? (.append ctx ", "))
                                             (.append ctx val)))
                                         #(.toString ^Object %))}
name-aggregation [2 2]:

|  name |   id |
|   bob | 1, 2 |
| alice |    3 |

Konrad Claesson14:07:55

This works great, but on my real dataset I get

1. Unhandled java.lang.Exception
   Column appId has value whose length (109583) is greater than max-chars-per-column (65536).
when trying to export it to a CSV file using ds/write!. Is there any workaround? cider also can't show a preview of the dataset because the columns are too long. Any work around for this?