Fork me on GitHub
#babashka
<
2022-12-28
>
lispyclouds13:12:13

with my very minimal usage experience of Airflow, do you mean we use Airflow via tasks or run bb tasks inside Airflow? what's the usecase are you imagining?

lispyclouds13:12:13

Airflow natively supports Python and the other things like docker or bash operator felt quite clunky

sheluchin13:12:17

I'm interested in using Airflow to orchestrate but writing the actual tasks in bb clojure. I saw the BashOperator and wondered if there's potential for bb there.

lispyclouds13:12:25

yeah you could call out to bb foo.clj from the bash operator and orchestrate using airflow. using bb tasks for orchestration could be redundant

lispyclouds14:12:52

if you just want to call out to certain tasks this should work. but id recommend orchestrating via airflow and not the task dependencies

sheluchin14:12:39

I may have been a little unclear. I'm not interested in bb's tasks functionality for this use case, but just using bb scripts to stand in as tasks which Airflow orchestrates, rather than bash or Python. I'll use Python if I must, but if using bb like this provides a comparable experience, that would be nice.

lispyclouds14:12:01

ah got it now! yep, the BashOperator should just work I have tried orchestrating inbuilt commands with it and works fine. Should just also work for bb calls

sheluchin14:12:33

You mention it was pretty clunky though? Anything in particular that didn't feel right?

lispyclouds14:12:00

well its mostly at some point you would reach some complex logic and the shell script grew to be something unmaintainable. but with bb i dont think you'd have that issue 🙂

lispyclouds14:12:17

its the line between how much of the conditional logic should be there in the bash scripts vs airflow that makes it clunkier

sheluchin14:12:37

Thanks for sharing your experience, @U7ERLH6JX. I'm not totally settled on Airflow yet but it's compelling, especially if I can leverage bb/Clojure for the task logic.

lispyclouds14:12:31

Yeah I’m interested to know how it goes! Could be something I would be doing at some point too!

Carsten Behring18:12:38

I have used DVC in the past for data pipelines. It is fully CLI + yaml based, so works as well with Clojure as with any other language.

sheluchin14:12:32

Thanks for the tip, @U7CAHM72M. I saw DVC mentioned in one of the #data-science videos recently but otherwise haven't heard of it. Looks pretty interesting. I might check that out because a few different sources have cautioned against Airflow for ETL. DVC looks like more of a lightweight solution which I find appealing. Looks like I'm going to be watching your presentation https://clojureverse.org/t/nlp-in-clojure-session-2-summary-recording-clojure-python-dvc-metamorph/9504