Fork me on GitHub
#data-science
<
2020-11-14
>
Santiago06:11:56

@daslu ☝️ the type of thing I felt clj would be the perfect candidate for

👍 3
zane00:11:16

I’d be interested in hearing more about what you mean. 🙂

Santiago07:11:01

Most of my time as a data scientist these days is not spent thinking about which model should I use, or which packages or which language etc it’s mostly “how can I turn this into a DAG to make everything reproducible”. I think immutable data and a functional style of programming is ideal for creating not only the actual DAG pipeline, but also the individual steps — because reproducibility is a first-class concept 🙂 Clojure AFAIK doesn’t have something like this and this dagli that you posted @U050CT4HR seems to go in a nice direction of being a single library to build and contain every step of an ML pipeline

Santiago09:11:32

we use http://dvc.org at work and I’m personally in love because it’s language agnostic. I have a DAG getting data, cleaning, transforming, splitting, training models, saving artefacts including plots and metrics and saves everything in S3. we have steps written in babashka, R and will probably add a python deployment script — all in one workflow

zane17:11:42

Thanks for the response, @UFPEDL1LY! http://dvc.org was new to me, and believe it or not I’ve been looking around for something like it!

zane17:11:59

Other candidates: • https://github.com/Factual/drake (written in Clojure(!), deprecated) • https://www.digdag.io/https://airflow.apache.org/make

Santiago17:11:44

give it a try, you won’t regret it 🙂 the team behind it is also super approachable. they also have another tool called https://cml.dev which you use as a github action. I’m not sponsored by them btw, I wish haha

zane17:11:09

Based on the README it’s more or less exactly what I’ve been looking for.

zane17:11:25

Will do, and thanks for the pointer to https://cml.dev!

Santiago18:11:45

:thumbsup: anytime ;D