Fork me on GitHub
#announcements
<
2023-05-11
>
quantisan01:05:49

We started designing the interface for Stepwise.Next and would love some community feedback. If you're using Stepwise, AWS Step Functions (or maybe Airflow, etc), or use Clojure to orchestrate workflows, I'd love to hear about your use cases and pain points. Please don't hesitate to reach out to me here or <mailto:[email protected]|[email protected]> Here's my blog post outlining the problem we're trying to solve and a work-in-progress interface proposal. Apologies if this isn't the right channel! https://www.quantisan.com/simplifying-step-functions-and-stepwise-lessons-learned-and-a-new-approach/

👍 13
joshuamz02:05:26

Looks very interesting. I know you're building your tasks on top of AWS Step Functions, but are you planning to target different platforms? I'm interested in orchestrating work in Kubernetes, the Clojure way, but haven't invested much time and effort on that yet. Very cool to see more infrastructure tooling written and designed with the Clojure mindset.

joshuamz03:05:51

Aside from this tool, in your post you mentioned you're solely using Clojure for everything, dropping other technologies. I'm particularly interested in your replacement to Terraform. I've invested much time with that tool and I found it quite compelling. Did you write a Clojure layer on top of it? Or are you using a different approach for infrastructure management?

👀 2
quantisan06:05:56

Great suggestion, @U02BATFPN78, about keeping the library platform-agnostic. We're currently using Step Functions and don't have plans to support other platforms at the moment. While I'm not very familiar with Kubernetes, my understanding is that Step Functions operates at a higher layer, closer to the business logic than the infrastructure. I'm curious, could you explain what you mean by orchestrating work in Kubernetes? We're only using Stepwise.Next to replace Terraform for these state machines and associated assets, such as a couple of IAM roles and an S3 bucket. We'll continue using Terraform to manage the other 90% of our assets. If you're interested in replacing Terraform with Clojure, you might want to check out https://github.com/hashicorp/terraform-cdk. They already support using Java to declare Terraform resources. Someone is likely already working on porting that to Clojure.

Ivan08:05:40

What is the reason to have run-here and run-on-lambda as fns rather than part of the configuration?

(sfn/run-here client
              {make-sauce               {:concurrency 5}
               put-ingredients-on-dough {:concurrency 1}
               bake                     {:concurrency 1}})


(sfn/run-on-lambda client
                   {make-dough {:timeout         30
                                :memory-size     512
                                :max-concurrency 100}})
I would expect something like
(defn make-dough ...
)  ;; the steps

(def pizza-making-sm ...
)  ;; the wiring / state machine

(def config {make-sauce {:platform :local :concurrency 5 ...}
             bake {:platform :lambda ...}
) ;; the technical details / infra / config

(sfn/run config pizza-making-sm)

Ivan08:05:09

> SFN error handling configuration as metadata this is interesting 🙂

🙇 1
valtteri14:05:40

Sounds interesting!! Have you considered “data driven” approach to describe the FSMs? I mean something similar what malli and reitit does? E.g.

[:sfn/name :pauls-pizza-making-machine
 [:sfn/parallel
  [[:fn make-dough :config {,,,}]
   [:fn make-sauce :config {,,,}]]]
 [:fn put-ingredients-on-dough :config {,,,}]
 [:sfn/wait 2 :minutes]
 [:sfn/choice
  [:test-fn (comp not is-pizza-acceptable?)
   [:sfn/fail]]]]
This would make it trivial to create and manipulate FSM definitions programmatically. Dev-time validation could be added as well

valtteri14:05:50

Nothing wrong with the DSL approach though. 🙂

quantisan22:05:45

@U8ZE1VBSS, am I reading your intention correctly that using a configuration map with a single entry point would make a more consistent API and follow principle of configuration as data? It would also make extensions to additional platforms (as Joshua suggested above) trivial for the API, as it's just adding additional acceptable values for :platform key. I chose run-* as fns because they are first-class operations of this API that produces a critical side effect. Saying it another way, the JAR that's meant to run on AWS Lambda should only spin up the Lambda workers. The JAR that runs in a container should only run the local processes. So we'd end up needing a separate interface between the different platforms. Moreover, we found from Stepwise v1 that we often want to isolate our local workers into different containers. e.g. we currently put (run-here client urgent-workers) in one container, and (run-here client batch-workers) in another container. We do that with a -main with CLI actions that points to each (i.e. $ my-app start urgent-workers and $ my-app start batch-workers from command line). Thoughts?

quantisan23:05:07

@U6N4HSMFW, Stepwise v1 actually takes an EDN as a state machine definition. We deliberately moved away from that because it was hard to read the gist of a control flow apart from the configurations. Libraries like reitit are routing libraries designed to manage the intricacies at a single junction in a procedure. From readability perspective, they have benefit of not needing to deal with time and surfacing details is a benefit but for us was a cost. The "aha" moment for this new interface was realizing that Clojure already has a fantastic procedural programming operation in -> and others. Therefore, we want the new interface to work as you would expect from ->. One compromise we made involves ops like sfn/parallel and sfn/map, which are similar but not exactly the same as map and apply. We could hide them behind map and apply to make the interface more Clojure-like, but we chose not to because 1) they are not exactly the same ops, and 2) they produce a side effect. I could put more thought into making them "just work" with Clojure fns, though...

👍 2
phronmophobic20:05:31

Earlier this year, I wrote an ETL workflow in clojure that uses AWS Batch with spot instances. Stepwise seems really neat. Any idea how to compare the pricing of using stepwise vs Batch+Spot instances? It seems like Stepwise is focused more on an incremental approach. Is that right? Does it also support batching and scheduling?

quantisan01:05:11

@U7RJTCH6J workflows (AWS Step Functions) differ from ETL (AWS Batch) by that workflow is meant to be run many times but each execution is relatively small conceptually, e.g. make pizzas for customers. Whereas ETL is meant to process a big blob of a single thing but in many pieces, e.g. make me 1 ton of dough. So the two are not mutually exclusive. You can use SFN to coordinate Batch jobs (AWS provide built-in integrations of the two) as part of a larger workflow. e.g. a SFN workflow that runs BatchJob1 -> BatchJob2 -> Clojure Lambda Fn to cleanup meta data -> trigger SageMaker training -> ...

Lyn Headley16:07:00

Stepwise also supports processing large blobs of things in many pieces, correct? My understanding is that your worker functions can simply return regular maps which will get serialized to persistent storage if they are large, and can be broken up into chunks using the step functions map task type. All with a convenient clojure interface. Is that right? I am really impressed by this by the way and looking forward to the next version!

quantisan20:07:45

@UDF1WUJTH that's correct! we have a feature to selectively/entirely offload the payload map to S3

borkdude09:05:22

Babashka https://github.com/babashka/fs: file system utility library for Clojure v0.4.18 (2023-05-11) • https://github.com/babashka/fs/issues/48: support input-stream in fs/copyhttps://github.com/babashka/fs/issues/91: add 1-arity to xdg-*-home to get subfolder of base directory (https://github.com/eval) • https://github.com/babashka/fs/issues/94: updates to which: add :paths opt, allow absolute paths for program (https://github.com/lread)

👍 9
🎉 17
borkdude21:05:45

https://github.com/babashka/process Clojure library for shelling out / spawning sub-processes Version 0.5.19 released! Changes since last announcement in December 2022 (v0.4.14)! • https://github.com/babashka/process/issues/124: Allow :cmd to be passed in map argument • https://github.com/babashka/process/issues/113: Support redirecting stderr to stdout (https://github.com/lread) • https://github.com/babashka/process/issues/112: Support :pre-start-fn in exec • https://github.com/babashka/process/issues/100: preserve single-quotes in double-quoted string • Auto-load babashka.process.pprint if clojure.pprint was already loaded

🎉 18