Fork me on GitHub
#data-science
<
2024-04-18
>
Hendrik08:04:23

What is an idiomatic way to express this pandas instruction with tech.ml.dataset?

ds[ds["Column A"]<5] = 42

Daniel Slutsky08:04:59

Do you actually wish to mutate your dataset in-place like in pandas? Or to create a new dataset with the values less than 5 replaced with 42?

Hendrik08:04:56

In the spirit of immutability: Get a new dataset 🙂

👍 1
Daniel Slutsky09:04:23

Good question, I am realizing I don't know what might be the most efficient way. The following does work:

(require '[tech.v3.dataset :as ds])

(def ds
  (ds/->dataset {"Column A" (range 10)}))

(ds/column-map ds
               "Column A"
               #(if (< % 5) 42 %))

_unnamed [10 1]:

| Column A |
|---------:|
|       42 |
|       42 |
|       42 |
|       42 |
|       42 |
|        5 |
|        6 |
|        7 |
|        8 |
|        9 |
But there may be more efficient ways, if your dataset is big.

Daniel Slutsky09:04:29

If performance is not an issue, then I think the above is just fine.

Daniel Slutsky09:04:33

It would be helpful to bring this discussion to the Clojurians Zulip chat. https://scicloj.github.io/docs/community/chat/ Some people there will benefit from this question and also probably help.

Hendrik09:04:15

thanks for your help. In the future, I will ask those kind of questions on zulip

🙏 1