clojuredesign-podcast

neumann 2024-01-25T19:37:06.129119Z

How do you test code that is littered with I/O? Aside from the I/O, is there anything left worth testing? Can the REPL and tests work together? In our latest episode, we start testing our code only to discover we need the whole world running first! https://clojuredesign.club/episode/108-testify/

phronmophobic 2024-02-02T07:14:16.880399Z

I think the secret to these types of pipelines is that you can reify them as workflows. The workflow is composed of an acyclic graph of steps and each step can be characterized by: • the steps it depends on, • the inputs it requires, • the outputs it produces. You can than have a workflow runner that does the dirty work of running each step and keeping track of the inputs and outputs. You can guarantee that steps will either: • succeed • fail • timeout If steps are executed in process, you may have to worry about them toppling over the whole program by going into an infinite loop, gobbling up all the program's memory or otherwise, but that's not always a big issue. An alternative is to run steps out of process or on another machine, but that's often overkill. Anyway, the key idea is that it's not hard to build trivial steps that always succeed, fail, or timeout and you can then build your workflow runner to handle these 3 cases. Once you know your workflow runner can handle these three cases, you're free to plug in any real-world steps you want. You can then slowly add in logging, automatic retries, pausing, resuming, partial reruns, manual recovery, progress tracking, resource monitoring, etc as needed.

JR 2024-01-29T21:16:24.514909Z

It seems that there's a similarity between the extractor methods + the data they create, and DDD idea of an anti-corruption layer. Both are protecting you from changes in the services you're consuming and changing response so the data you're working with is closer to your problem domain. Do I have that right?

neumann 2024-01-29T23:08:18.227639Z

@john.t.richardson.dev I'd say they share the same goal of decoupling your application logic from the schema of the external system. In a practical sense, the recommendations I've seen for an anti-corruption layer treat it more like a proxy service in a microservice environment. The internal services call the anti-corruption proxy instead of calling the external API directly. It's outside the scope of Sportify! (a monolith), but in general, I'm not a fan of proxy services. I believe a service should integrate directly with its external dependencies but limit that surface area as much as possible through a clear "ingestion transform". If you have a number of services that need data from an external system, at some point it may make sense to create an internally shared view of that external data. If so, I would recommend a journal-oriented dataflow (not microservice) architecture. That's a whole different conversation.

👍 2
neumann 2024-01-29T23:17:41.040239Z

For what it's worth, I think journal-based approaches are the way to go for integrations involving lots of data. If you've never read it, an influential and formative article in this space is: https://engineering.linkedin.com/distributed-systems/log-what-every-software-engineer-should-know-about-real-time-datas-unifying by Jay Kreps (co-creator of Kafka).

neumann 2024-01-29T23:22:37.192839Z

When working for "Big Esports", @nate and I created services that would poll microservice APIs for changes and generate changelogs of data. When the engineers of one such microservice couldn't figure out what happened to their data, we provided them with changelogs of their own data, which they then used to debug their own service. That lead them to add more logging to their service for the future, but I can't help but believe the big, mutable database-based microservice is a broken paradigm. For what it's worth, that experience contributed to the thinking we express in https://clojuredesign.club/episode/029-problem-unknown-log-lines/.

neumann 2024-02-13T22:21:24.346549Z

@smith.adriane Thanks for sharing that list! I need to go try some of these out!

neumann 2024-01-25T19:38:33.277439Z

How would you describe the difference between REPL-driven and Test-driven development?

neumann 2024-01-25T20:39:50.907969Z

@jr0cket Here's the episode on testing that you asked for! 😁

neumann 2024-02-02T15:32:29.118449Z

@smith.adriane Yes! Exactly! And a workflow (we called the "pipelines") is a composable part. We were able to spin up just part of the overall workflow (a "nested workflow") and iterate on it in isolation. We would feed it cached data and route it's output to another cache, the terminal, etc. In our case, the pipeline had to operate in real time (<200 ms) end to end, so we connected it all with core.async. That's another beautify aspect of workflow: you have a ton of flexibility on how the data comes in and goes out. The logic doesn't know nor care. It just gets the data via the workflow system.

➕ 1
phronmophobic 2024-02-02T18:05:22.141179Z

Yea, one of the cooler features I've used in these pipelines is that if it runs into an error, you can check the logs, fix the bug, and then resume the pipeline from where you left off.

neumann 2024-02-02T19:31:48.666219Z

Yes! That is extremely useful. Anytime the application persists or caches in the middle of the pipeline, you have a point you can resume from and visibility into the state of things. I've been able to inspect that intermediate data to figure out the bug pretty quickly.

1
Nick 2024-02-06T23:23:08.743259Z

@smith.adriane Is there a library you use for the "workflow runner" or do you write your own for each project/application based on it's requirements?

phronmophobic 2024-02-06T23:27:17.212079Z

I've used a few libraries and frameworks that handle workflows in the large (eg. AWS datapipeline, Apache storm). I've written some adhoc in-process workflow runners for various projects, but haven't taken the time wrap them in a library. I really wish there was a good in-process workflow library, but I'm not aware of one.

Nick 2024-02-06T23:31:48.125029Z

I wish for the same. I've used "plumbing" from the prismatic guys and it was nice in parts. But what you wrote above (a library that supports the three cases and then elegantly allows you to add in the other things as needed) would be awesome

phronmophobic 2024-02-06T23:35:12.107109Z

Yea, would also love it to support showing the workflow state so you can a make simple UI that let's you cancel, pause, and resume tasks.

Nick 2024-02-06T23:40:03.418749Z

yes, that's a great callout at as well. In the plumbing stuff we did (since the return value is graph that we could parse) we were able to make graphviz's of the code, and it's helpful in getting oriented and understanding what's going on (especially if you didn't write the code). Having a UI for cancel, pause, and resume would be a great next step

phronmophobic 2024-02-06T23:44:45.617019Z

There are a couple libraries in the space though: • https://github.com/nubank/nodely • various stuff from https://twitter.com/ryrobes including https://github.com/ryrobes/flowmaps • and more listed at https://clojurians.slack.com/archives/CQT1NFF4L/p1657482280025899 When I've looked into it previously, they all seemed to be missing some feature I was looking for.