architecture 2023-08-03 | Slack Archive

didibus01:08:29

@smith.adriane linked me to this: https://bravenewgeek.com/you-cannot-have-exactly-once-delivery/ I've been having an issue related to something similar. I think someone asked for something on the same vain recently. You want to commit state to a DB and send out notifications to other systems in an atomic manner. But, there's no real way to accomplish this and get exactly once delivery of the messages. One option, you first send all messages, if they were all sent successfully, you commit to the DB. If any of the messages failed to send, or the commit to the DB failed, you retry it all. This causes the successful messages to be resent again. Now this gets even more complicated if the DB commit is optimistically locking. So a failure to commit can be a read after write issue. You read data from the DB, that data is at version 3, it tells you to send some messages, and update some value in the DB. By the time you go to commit, you realize it's at version 4, so you retry from the read. Now the message you were sending is dependent on say the balance read in the DB, and version 4 now shows a different balance, so the messages you send again are actually different from before, so it's not even that the receivers get duplicate messages, they actually received a bad message first, which was stale, and are now receiving the correct message, assuming that commit to DB succeeds. So you decide to reverse the order. Commit the DB first, now you avoid sending the stale messages from an out of date read. But now there's a chance you fail to send the messages after the commit. There's also a chance another commit happens before the messages are sent, and your messages are now out of order. Now you struggle to even guarantee at least once delivery. So you can try to instead commit the messages to the DB in a single atomic write. And have a separate system working of an ordered change data capture log of the DB. But if there was any issue with the message itself, for example, the message might contain the receiver as an attribute, messages are sent to the location specified on the message itself. If that location was misconfigured, that separate process will fail to send the messages, as the recipient's address is wrong, and will keep failing. It might be too late to recover now, the context needing to fix the message, or find the appropriate recipient might be lost. Maybe your change data capture only retains things for 24 hour, maybe you don't have an easy way to fix the bad messages in the change data capture, etc. There really doesn't seem to be a way to make things resilient to failure here.

didibus01:08:30

Right now, I've decided to reverse the order: 1. Commit to DB 2. If commit succeeded, send messages 3. If a message fails to send, retry a few time just that message 4. If a message fails all its retries, log the message that failed to be sent, but don't throw or fail the operation, just log failed message and continue Now, have an on-call process to manually pull failed messages from logs, and re-send them. Fixing the message content, if need be, say that's why the message was failing to be sent, like it had a bad address for where to send the message. And this secretly relies on the assumption that you'll never fail to log failed messages 😛

👍 2

hiredman02:08:40

There is generally no reason the process that writes actions to do to the database needs to be different from the process that does the actions

hiredman02:08:58

What you are doing is using the database as a write ahead log, a crash recovery mechanism, in the optimistic case you write all those ops to the db, do them, then mark them as done all in lone

hiredman02:08:01

In line

hiredman02:08:33

All your context whatever is there then if you are worried about it

hiredman02:08:21

In the pessimistic case, recovering from a crash or something, then yes you lost whatever context you didn't write to the wal (so write everything you need)

didibus02:08:35

I thought about this, I think. So write messages to the DB, then send them, then update the DB saying they were sent, all in the same thread of execution. But, other threads would be doing the same. So say commit succeeded, you send your message, other threads also committed to the DB more messages before you're done sending yours, your messages fail to send... Now what do you do?

hiredman02:08:27

The way you make the message sends themselves more robust is you make them 2 phase commit

hiredman02:08:38

Ish

hiredman02:08:48

You pre-send the message to the receiver, then the receiver holds it until you send a commit message (from the Wal in the database)

didibus02:08:38

Ya, I can see that, it's just very involved now for all the receivers. But you did give me an idea, I can probably lock the message sending. Like, mark the messages as being sent, so other threads know not to pick them up. But if another thread sees them as being sent for over a minute ago, assume it failed, and have that thread mark them as being sent, taking over the sending of them?

hiredman02:08:02

That is basically the crash recovery scenario for the wal

hiredman02:08:32

The crash zone might be a thread or might be a process

didibus02:08:00

Ya, it be both in my case, as it's a fleet of threaded request handlers

hiredman02:08:57

You can potentially have what are sort of like multiple master or split brain kinds of failures now

didibus02:08:03

Technically, I don't know if the receiver received the message and succeeded in processing it on their side. Which is where 2PC would add even more robustness. But I think in my case, I can live with that being the receiver's problem. At least I know I put the message in the message box.

didibus02:08:36

I guess issue here is, it retries failed messages only next time the same DB record is touched.

didibus02:08:06

Would maybe need some other job that goes over and retries instead.

seancorfield04:08:09

Gregor Hohpe: "Your Coffee Shop Doesn't Use Two-Phase Commit" https://ieeexplore.ieee.org/document/1407829 (2005)

seancorfield04:08:00

(funnily enough, this whole "send out notifications" thing is what I'm working on right now at World Singles Networks)

didibus05:08:01

Fun article! This is one of the most interesting problems I've had in a while. It kind of started because we had optimistic writes to a DB. But as part of that, notifications must also be sent out to other systems which will side effect on them. We thought we'd send the notifications first, and if all succeed, commit the write to the DB, but since it's optimistic, it can fail by losing the race. Thus it retries the whole thing, but from the new DB state. And then I realized, that means our notifications aren't even just duplicates, because the retry starts from a different DB state, it can result in different notifications. Unfortunately, the retry is also handled client side, thus our request regenerates a new UUID on the notification. So even if it results in the same notifications, they get a different ID, and so deduping for idempotency doesn't work on the receivers.

phronmophobic05:08:29

Lots of good advice and discussion here 😄. > when you place your order, the cashier marks a coffee cup with your order and places it into a queue. If you pay close attention at most starbucks, only "complicated" orders get added to the queue. If the order is quick (eg. filling a coffee cup) then it is handled right away. This procedure is actually something Starbucks does differently from most coffee shops. I think this is actually a direct consequence of https://en.wikipedia.org/wiki/Little%27s_law and comes from queueing theory. For quick tasks, it's actually much more efficient to skip the queue and just fulfill the request immediately.

phronmophobic05:08:30

For original discussion, there's an inherent trade-off between certainty about the state of a distributed transaction that may fail or be cancelled and efficiency. At one end of the spectrum are tasks where failure/cancellation is rare and the consequences of failure/cancellation are low. For these tasks you can take the YOLO approach and just try it once and ignore failures or cancellations. The other end of the spectrum are tasks where the consequences of failure or incorrect cancellations are high (like large money transfers). These typically have multi-phase transactions with large time windows to allow the system to stabilize. In the middle you can make various trade-offs between availability, consistency, efficiency, and waiting for the system to stabilize. A common sweet spot is if you can make tasks idempotent and have acknowledgements. You keep retrying tasks (probably with exponential backoff) until you get an acknowledgement or a sufficiently long time out is triggered (signaling failure). If tasks are idempotent, then you actually can have exactly once delivery for most practical purposes. For your actual use case, it sounds like your approach is similar to what I've usually ended up doing in the past. One minor alternative is to write messages to be sent to a message queue rather than sending directly. If you're interested in more resources, I would recommend going through the docs for any of the popular message queue services since they deal with this problem directly and generally provide guidance for different recipes with the various trade-offs available.

msolli09:08:24

This is exactly the type of scenario I created https://github.com/msolli/proletarian for (Postgres only). The idea is you commit both your business data and a description of the side effects you want - ie. a Job - to the DB in a transaction. The writes either succeed or fail together. Meanwhile, in a another thread or process, a worker will pick up your job and perform the side-effecting work, like sending a message. Now, the sending of the message might fail. So you have to think about that. Do you care? It is a business decision. So the jobs retry if they need to. Until they finally give up - then what? This is also a business decision. Maybe you alert someone, or you update some entity and set it to a failed state. Then it is the possibility that your side-effecting third party didn’t successfully acknowledge your message. That is, they got the message and processed it, but you didn’t get the acknowledgement. As the linked article stated, there’s really no way for you to know if the receiver just didn’t get the message, or if the acknowledgement was lost on its way back to you (the Two Generals problem). At this point you again got to decide what’s worse: the message not being sent, or the message received twice (or even several times). Or if you’re lucky, your third-party side-effecting service lets you specify a idempotency key, so that they can de-duplicate the message on their end (usually within some timeframe, like 24 hours).

gklijs10:08:15

A way around this, is to split the work using 'events'. Whatever it is that needs to send the actual notifications can listen to a stream of these events. It still needs some way to know when a notification was sent successfully, so it can be in sync with the events it still needs to read. But it at least decouples the database from sending the notifications.

lukasz15:08:49

Echoing what @U06BEJGKD said: I also built a PG-based queue/worker thing (with scheduled messages) and process "send notification" type jobs out of band, after writes/transactions succeed. Before that when I used RabbitMQ it was quite similar - just using a different tool. In both cases, we had to have a process to deal with jobs that failed - by either retrying or dropping them, depending on the product/biz requirements

gordon23:08:21

If a name is helpful, I've seen some of the designs described in this thread called "Transactional Outbox" https://docs.aws.amazon.com/prescriptive-guidance/latest/cloud-design-patterns/transactional-outbox.html

🆒 2

didibus00:08:32

Thanks, ya I knew of the name. But this AWS article does what the article I linked talks about, if they claim exactly ounce, they are lying. I'm actually using Dynamo and SQS in my case. The issue with it is twofold: • DynamoDB CDC streams are persisted only for 24h. So if the event processing service fails to send the messages from the stream, you as an on-call have 24h to figure out the issue, fix it, etc. Or if it was due to SQS downtime, you're hoping it will get resolved in less than 24h • If the failure to send the message from the event processing service are due to the message itself being wrong, you just can't recover. If you update the message in the DB, it creates a new coc at a different ordering. So you can't really fix up broken messages. And even if you could, you have 24h only to do it haha, before the cdc gets deleted.

didibus00:08:12

There are other issues as well. • In that design, the event processing is single threaded, and sends messages one by one. So if any of those message is having issues being sent, the whole message sending is blocked (to preserve order). ◦ Ideally, you would want to send messages in order but by group, like for each customer. If a transaction for one customer is having issues being sent, it should not stop all the other customers. • That single threadiness is slow, if you have a lot of events, you kind of have to have it keep up, if events are backed up, a customer might wait a while to actually have its transaction processed • It requires even more infra to setup and maintain, a stream, an additional service to process the stream, the lambda for it, etc.

phronmophobic00:08:52

> But this AWS article does what the article I linked talks about, if they claim exactly ounce, they are lying. I don't think they AWS docs are lying. Notably, they mention that the event processing service should be idempotent: > Duplicate messages: The events processing service might send out duplicate messages or events, so we recommend that you make the consuming service idempotent by tracking the processed messages.

didibus00:08:01

That's why I like what @U0NCTKEV8 said. The Flight Service could write the events to a separate attribute of the DDB table. And then proceed to synchronously send them. Deleting them from the DB when successful. Now you just need some kind of lock for which active thread should send the messages for a particular customer. You can do it by, when you add your events to the DB, you see that none are there, so you commit yours, if you succeed the commit, you know you are first. So you should proceed to send the events. When you go to delete the events from the DB, if you see there are still some left after deletion, you proceed to send those as well, etc. Other threads that would concurrently work that customer will see that there are already events when they are adding theirs. So they know someone else is sending them. They can just write theirs and move on. Only downside is, if the sender fails. So you might need a background job that checks if there are events that are older than X time, and if so, try sending them again, because otherwise it probably failed. You can also have another thread processing the customer again see that there are events, but they are too old, so assume it should go and send them.

didibus00:08:34

@smith.adriane > If you require the message to be delivered exactly once, with message ordering, you can use https://docs.aws.amazon.com/AWSSimpleQueueService/latest/SQSDeveloperGuide/FIFO-queues.html. Also, I kind of meant that they're similarly to the "Exactly once" claim, are kind of squinting when they say, ya you can use DDB streams, it's removed of the nuances that will matter for fault tolerance.

phronmophobic00:08:04

Yea, it seems like the FIFO queues have deduplication (ie. options to make adding messages to the queue idempotent), but it is true that it's just kicking the can down the road.

didibus00:08:11

Ya, I was referring how in the article you linked me they say: > The way we achieve exactly-once delivery in practice is by faking it. Either the messages themselves should be idempotent, meaning they can be applied more than once without adverse effects, or we remove the need for idempotency through deduplication

didibus00:08:50

It's nice that SQS FIFO can do the deduplication for you, but if I remember correctly, it has a few issues, like it's only over 5 min, and some other limitations.

phronmophobic00:08:00

for better or worse, services will sell things with the "exactly once delivery" label which will either be: • exactly once with idempotency (a typically good option) • usually exactly once, most of the time, if you ignore errors and failures (bad, probably dishonest)

phronmophobic00:08:23

> It's nice that SQS FIFO can do the deduplication for you, but if I remember correctly, it has a few issues, like it's only over 5 min, and some other limitations. Yea, there are trade-offs here with respect to latency, efficiency, and memory usage. Not saying this is always a good option, but it's probably not a bad tradeoff for many applications.

didibus00:08:20

Ya, but I think I agree with the article, it's lying to your face. The messages are delivered more than once. So it's not exactly once. Meaning, you still need to either design your receivers in an idempotent way, or you need to make sure your publisher has a way for duplicates to dedupe themselves.

phronmophobic00:08:36

I wouldn't go so far as to say lying since the docs explicitly tell you that the receiver needs to be idempotent.

phronmophobic00:08:07

and for better or worse, they're using "exactly once delivery" to mean basically the same thing that almost any other message service will use it to mean.

didibus00:08:05

I mean, it is a marketing fluff though. They could say, at least ounce delivery, and than say, automated deduplication feature, etc.

phronmophobic00:08:58

Sort of. If you want exactly once delivery, making your service idempotent is the way you do 95% of the time.

didibus00:08:58

Like with SQS, 6 minute later a duplicate will get delivered. So, it's not guaranteeing exactly ounce. Haha, anyways, I never had an issue with it before, but since the article pointed it out, I think I do have an issue that it has gone and perpetuated the idea that it is possible to have exactly ounce semantics. The team that consumes my messages were arguing that to me, hey you send us duplicate, so we crash, you have to fix this.

didibus00:08:48

And I was trying to tell them, you can't have exactly ounce semantics, they have to handle duplicates, drop them, or be idempotent.

👍 2

phronmophobic00:08:28

Or have a really good customer support team!

didibus00:08:57

Or that

2023-08-03

Channels