Fork me on GitHub
#data-science
<
2023-09-02
>
Ronny Li13:09:45

My jupyter notebook stopped working so I just tried out tech.ml.dataset for the first time on a real-world dataset! My unsolicited opinion: • Filtering rows, adding/removing columns, and joining all behave as I would expect (coming from someone with 10+ years in Python and R). It's a nice combination of pandas and Clojure IMO. ◦ In all of the above, TMD is less verbose than Pandas since we don't have to repeat the name of the dataset if we use the thread macro. In Pandas we often have to repeat the dataset variable 2-3 times per operation ◦ In particular, creating new columns based on if/else statements is so natural in TMD and so un-natural in pandas • I really appreciate that I finally have a one-liner for inspecting CSVs in Clojure. • The most odd function I came across was group-by-column-aggA bit verbose getting the values of the columns I grouped on and having to repeat the dataset variable for each aggregation field ▪︎ (EDIT: Nevermind, the https://techascent.github.io/tech.ml.dataset/tech.v3.dataset.reductions.html#var-group-by-column-agg was weird but when I tried writing it the more intuitive way, it also worked!) Didn't mess around with dates yet but this has been a great first experience and I think I'll be able to drop pandas much sooner than I expected!

👍 2
💯 8
chrisn15:09:23

Ronny - that is great to hear! We have heard a similar thing in the past and I really should be collecting these types of comments. That documentation example has now confused several people so time to change it - thanks for speaking up. I was trying to emphasize that the function takes a sequence of datasets and instead the correlation of the number of datasets and the map entries just causes problems. An experienced data science user like yourself is my top priority in terms of converting a data science/analytics developer to Clojure. We don't always have the tools they need but when we do I find the feedback they give is just golden -- the feedback is appreciated.

🙏 2
chrisn16:09:41

For those reading this thread - I just updated the docs so the link perhaps shows a different set of problems but the original problem is gone.

Ronny Li16:09:22

Thank you for your hard work!

metal 2
chrisn19:09:48

Thanks ✌️