Fork me on GitHub
#architecture
<
2023-06-29
>
mister_m01:06:44

Does anyone know off hand where I can learn more about building the sort of robust job scheduling/executing/tracking system discussed in this slightly tongue-in-cheek article about the problems with "pipelines"? https://cohost.org/tef/post/1764930-how-not-to-write-a

mister_m01:06:39

I would guess that "Designing Data-Intensive Applications" probably covers a significant part of the problems outlined in the article

Rupert (All Street)06:06:38

There's a fair number of books and blog posts on "Enterprise Architecture". People who spend too long on architecture (and have lost their understanding/sympathy with programming) are sometimes called 'Architecture Astronauts' - they design solutions from very high level. Many people end up choosing either the architecture or the tools (e.g. Kafka/SQS) first then building code around this. I think these end up hitting many of the issues described in the blog post because people think "so many companies are using architecture/tool that it mustn't have any crucial limitations." Turns out the limitations are often huge. Instead I often think its best just to design the right solution from first principles and think about all the edge cases at design time. Then incorporate the architecture/tools you need but don't rely on them to make your system good/resilient/performant.

potetm15:06:41

Release It! by Michael Nygard covers a lot of similar ground.

potetm15:06:06

Rich touches on queues/pipelines in a number of talks.

potetm15:06:51

> It's fine to glue things together with queues. You've just got to avoid persistence, and demand backpressure or load shedding, and you won't end up in the same mess over and over and over again. That's the important bit. That and "if you run things, you need to keep track of their states" Honestly, this is the best sentence in the whole article.

potetm15:06:49

You can also figure a lot of it out for yourself. You just gotta keep asking questions like, "What happens when X goes slow? What happens if X goes down? What happens if we 1000x load?"

Rupert (All Street)16:06:26

Some other key questions (if you are considering topics/queues): • What do I do if there is an item on the queue that is causing the consumer to constantly fail and retry? • What do I do if I need to reprocess a large batch of historical items without slowing down current items? • The items are expensive/slow to process - but I want to make a tiny change to the items - how do I get the update done quickly without fully reprocessing each item? • How do I delete items on the queue that I no longer want to be processed? • If the read side failed to process items correctly but has taken them all off the queue, who is responsible (read side or write side) for putting all items back on the queue for reprocessing? • If the sender needs confirmations or responses back - how do I do this? • If the sender makes breaking changes to the items and the read side needs to be upgraded in lock step to support the breaking change - has the queue actually helped at all with decoupling? • I now need an additional consumer to observe the queue - do I create a new queue for it? Is the reader or writer team responsible for setting that up?

mister_m02:06:26

Appreciate the insights, thank you all

Evan Bernard20:07:58

(thread necromancy mode activated) thanks for this thread! we’re thinking that we’re not able to continue using rabbitMQ for reasons that smell to me like we’re just holding a message queue/broker incorrectly, and i’m hoping to steer us from reinventing our message queue pipeline just to end up with the same problems. this post and thread give me some food for thought

Rupert (All Street)21:07:40

I think a key solution to the issue with message queues/topics in Enterprise architecture is simply: > A queue/topic should only be written to/read from by the same component. This solves a lot of ownership issues - like what happens if the message structure changes or reruns are required. There is only one component that can possibly be responsible for all of these issues since it is the only one that can read from or write to the queue/topic. Analogy if a component has a database - another component would never write data directly to another component’s database - instead they would do a REST request. Since a topic/queue is basically a database the same rule should apply. Components can interact by other mechanisms (e.g. REST). instead of over queues/topics When you have this rule you often realise the component doesn’t really need a topic/queue in the first place.

👍 2
Evan Bernard21:07:06

my biggest source of joy reading this so far is reading attributes of the exact solution we have (and are wanting to replicate…) > There’s just one problem: someone has suggested splitting up the [job] into seperate processes. That same someone suggests tying the parts back together with a message broker. ✔️ > Not just head of line blocking, but the whole “have you tried reading the logs” school of state management.” ✔️ > It’s still a little janky, each worker has to know the name of the worker after it to glue everything together, but it’ll work fine enough. ✔️