Another tmd question - can I run a reduction over multiple columns which produces multiple columns and accepts the new columns' previous value as an argument?
roughly, for a ds with columns a, b and producing new columns x y
(fn [a b x y] -> {:x (f a b x y) :y (g a b x y)})
I ended up writing a regular boring loop, code was kind of ugly but it worked I think an efficient API for multi reduce is hiding here Something like a function of the columns and a callback that receives the values to write out, all positional
https://cnuernber.github.io/dtype-next/tech.v3.datatype.reductions.html This API came to mind
That sounds about what I wanted but I want to add a column and refer its previous value, this example is pretty much the code I want to run, I want a proper reduction, not map I may be forced to just run reduce in the end, but would have been nice to get a ds in the end (it's also possible to get syntax highlighting on slack) (I'm on the zulip, too, I'm just being lazy juggling between three machines)
Ah, if you need the previous row to compute the current one, then row-map isn't suitable (it does things in parallel and potentially out of order, for speed).
Maybe something like the above? Still not completely sure what you're up to, but I'm sure that's on me.
Thanks for the hint about the syntax highlighting, I didn't know about that.
My gripe with this reduction it's it's quite inefficient, that's the most straightforward solution but I want something closer to iterating over the underlying arrays Maybe there's no facility for it in t.m.d at the moment I was slightly obtuse in my description so I share some of the blame here Maybe the most efficient solution will just be looping over Columns and wrapping them back in a DS in the end
👍 We definitely have performance sensitive code that operates on columns, building up new columns, which end up in the original dataset or other datasets. Since a dataset is just a map of column name to column, 'wrapping columns in a dataset' is free (`assoc` / merge)
It comes down to understanding the real data dependencies.
I'll give it a try and see how it goes
ftr also posted on zulip #http://tech.ml.dataset.devhttps://clojurians.zulipchat.com/#narrow/channel/236259-tech.2Eml.2Edataset.2Edev/topic/MIMO.20scan.20.2F.20reduce/with/568668922 MIMO scan / reduce>
This sort of does what I wanted but has more moving pieces than I would have liked Also needs more arities and not a made up API, but I think you get the gist
Almost certainly the answer to your question is 'yes', though I'm not 100% sure I know what you're asking.
The powerful row-map function probably helps:
• https://techascent.github.io/tech.ml.dataset/tech.v3.dataset.html#var-row-map
Check it out:
user> (add-lib _ 'org.scicloj/noj)
#'user/_
user> (require '[tech.v3.dataset :as ds])
nil
user> (def ds (ds/->dataset {:a [0 1 2] :b ["foo" "bar" "baz"] :x [:α :β :γ] :y [1/2 2/2 3/2]}))
#'user/ds
user> ds
_unnamed [3 4]:
| :a | :b | :x | :y |
|---:|-----|----|--------|
| 0 | foo | :α | 0.5000 |
| 1 | bar | :β | 1 |
| 2 | baz | :γ | 1.500 |
user> ;; heh, ratios printed as decimals
user> (meta (:y ds))
{:categorical? true, :name :y, :datatype :object, :n-elems 3}
user> (type (first (:y ds)))
clojure.lang.Ratio
user> ;; The powerful row-map function...
user> (ds/row-map ds (fn [{:keys [a b x y]}]
{:x (str a b x y)
:y (str y x b a)}))
_unnamed [3 4]:
| :a | :b | :x | :y |
|---:|-----|-----------|-----------|
| 0 | foo | 0foo:α1/2 | 1/2:αfoo0 |
| 1 | bar | 1bar:β1 | 1:βbar1 |
| 2 | baz | 2baz:γ3/2 | 3/2:γbaz2 |
user> ;; and if by 'previous value' you mean a value in a differen row, that's possible too:
user> (ds/row-map ds (fn [{:keys [a b x y]}]
{:x (str a b x (nth (:y ds) (mod (dec a) 3)))
:y (str y x (nth (:b ds) (mod (dec a) 3)) a)}))
_unnamed [3 4]:
| :a | :b | :x | :y |
|---:|-----|-----------|-----------|
| 0 | foo | 0foo:α3/2 | 1/2:αbaz0 |
| 1 | bar | 1bar:β1/2 | 1:βfoo1 |
| 2 | baz | 2baz:γ1 | 3/2:γbar2 |
user> ;; Importantly `ds` is a _value_, so it is safe to read (even during row-map)These conversations are perhaps better had on Zulip, where there are more interested and friendly folk, and super-futuristic things like messages being preserved through time, and syntax highlighting for shared code are supported simple_smile: https://clojurians.zulipchat.com/#narrow/channel/151924-data-science