data-science

Hendrik 2024-04-18T08:22:23.291309Z

What is an idiomatic way to express this pandas instruction with tech.ml.dataset?

ds[ds["Column A"]<5] = 42

Daniel Slutsky 2024-04-18T08:38:59.086859Z

Do you actually wish to mutate your dataset in-place like in pandas? Or to create a new dataset with the values less than 5 replaced with 42?

Hendrik 2024-04-18T08:58:56.482289Z

In the spirit of immutability: Get a new dataset 🙂

👍 1
Daniel Slutsky 2024-04-18T09:15:23.975169Z

Good question, I am realizing I don't know what might be the most efficient way. The following does work:

(require '[tech.v3.dataset :as ds])

(def ds
  (ds/->dataset {"Column A" (range 10)}))

(ds/column-map ds
               "Column A"
               #(if (< % 5) 42 %))

_unnamed [10 1]:

| Column A |
|---------:|
|       42 |
|       42 |
|       42 |
|       42 |
|       42 |
|        5 |
|        6 |
|        7 |
|        8 |
|        9 |
But there may be more efficient ways, if your dataset is big.

Daniel Slutsky 2024-04-18T09:16:29.549969Z

If performance is not an issue, then I think the above is just fine.

Daniel Slutsky 2024-04-18T09:17:33.924899Z

It would be helpful to bring this discussion to the Clojurians Zulip chat. https://scicloj.github.io/docs/community/chat/ Some people there will benefit from this question and also probably help.

Hendrik 2024-04-18T09:23:15.893019Z

thanks for your help. In the future, I will ask those kind of questions on zulip

🙏 1