data-science

Ben Sless 2026-01-13T07:46:05.101899Z

Another tmd question - can I run a reduction over multiple columns which produces multiple columns and accepts the new columns' previous value as an argument? roughly, for a ds with columns a, b and producing new columns x y (fn [a b x y] -> {:x (f a b x y) :y (g a b x y)})

Ben Sless 2026-01-15T14:40:19.685159Z

I ended up writing a regular boring loop, code was kind of ugly but it worked I think an efficient API for multi reduce is hiding here Something like a function of the columns and a callback that receives the values to write out, all positional

respatialized 2026-01-17T04:33:33.445139Z

https://cnuernber.github.io/dtype-next/tech.v3.datatype.reductions.html This API came to mind

Ben Sless 2026-01-14T11:38:24.794089Z

That sounds about what I wanted but I want to add a column and refer its previous value, this example is pretty much the code I want to run, I want a proper reduction, not map I may be forced to just run reduce in the end, but would have been nice to get a ds in the end (it's also possible to get syntax highlighting on slack) (I'm on the zulip, too, I'm just being lazy juggling between three machines)

Harold 2026-01-14T15:58:45.543089Z

Harold 2026-01-14T15:59:31.216469Z

Ah, if you need the previous row to compute the current one, then row-map isn't suitable (it does things in parallel and potentially out of order, for speed). Maybe something like the above? Still not completely sure what you're up to, but I'm sure that's on me. Thanks for the hint about the syntax highlighting, I didn't know about that.

Ben Sless 2026-01-14T16:51:18.428049Z

My gripe with this reduction it's it's quite inefficient, that's the most straightforward solution but I want something closer to iterating over the underlying arrays Maybe there's no facility for it in t.m.d at the moment I was slightly obtuse in my description so I share some of the blame here Maybe the most efficient solution will just be looping over Columns and wrapping them back in a DS in the end

Harold 2026-01-14T16:56:19.939969Z

👍 We definitely have performance sensitive code that operates on columns, building up new columns, which end up in the original dataset or other datasets. Since a dataset is just a map of column name to column, 'wrapping columns in a dataset' is free (`assoc` / merge) It comes down to understanding the real data dependencies.

Ben Sless 2026-01-14T17:01:35.320729Z

I'll give it a try and see how it goes

🆒 1
Ben Sless 2026-01-17T18:15:59.149469Z

This sort of does what I wanted but has more moving pieces than I would have liked Also needs more arities and not a made up API, but I think you get the gist

Harold 2026-01-13T13:16:27.043779Z

Almost certainly the answer to your question is 'yes', though I'm not 100% sure I know what you're asking. The powerful row-map function probably helps: • https://techascent.github.io/tech.ml.dataset/tech.v3.dataset.html#var-row-map Check it out:

user> (add-lib _ 'org.scicloj/noj)
#'user/_
user> (require '[tech.v3.dataset :as ds])
nil
user> (def ds (ds/->dataset {:a [0 1 2] :b ["foo" "bar" "baz"] :x [:α :β :γ] :y [1/2 2/2 3/2]}))
#'user/ds
user> ds
_unnamed [3 4]:

| :a |  :b | :x |     :y |
|---:|-----|----|--------|
|  0 | foo | :α | 0.5000 |
|  1 | bar | :β |      1 |
|  2 | baz | :γ |  1.500 |
user> ;; heh, ratios printed as decimals
user> (meta (:y ds))
{:categorical? true, :name :y, :datatype :object, :n-elems 3}
user> (type (first (:y ds)))
clojure.lang.Ratio
user> ;; The powerful row-map function...
user> (ds/row-map ds (fn [{:keys [a b x y]}]
                       {:x (str a b x y)
                        :y (str y x b a)}))
_unnamed [3 4]:

| :a |  :b |        :x |        :y |
|---:|-----|-----------|-----------|
|  0 | foo | 0foo:α1/2 | 1/2:αfoo0 |
|  1 | bar |   1bar:β1 |   1:βbar1 |
|  2 | baz | 2baz:γ3/2 | 3/2:γbaz2 |
user> ;; and if by 'previous value' you mean a value in a differen row, that's possible too:
user> (ds/row-map ds (fn [{:keys [a b x y]}]
                       {:x (str a b x (nth (:y ds) (mod (dec a) 3)))
                        :y (str y x (nth (:b ds) (mod (dec a) 3)) a)}))
_unnamed [3 4]:

| :a |  :b |        :x |        :y |
|---:|-----|-----------|-----------|
|  0 | foo | 0foo:α3/2 | 1/2:αbaz0 |
|  1 | bar | 1bar:β1/2 |   1:βfoo1 |
|  2 | baz |   2baz:γ1 | 3/2:γbar2 |
user> ;; Importantly `ds` is a _value_, so it is safe to read (even during row-map)

Harold 2026-01-13T13:28:52.656219Z

These conversations are perhaps better had on Zulip, where there are more interested and friendly folk, and super-futuristic things like messages being preserved through time, and syntax highlighting for shared code are supported simple_smile: https://clojurians.zulipchat.com/#narrow/channel/151924-data-science