aleph 2023-03-20 | Slack Archive

p-himik10:03:13

I'm using Integrant and just noticed that something must've happened between 0.5.0 and 0.6.0. The Integrant component that creates an instance of the server calls .close on the instance in ig/halt-key!. In 0.5.0 that was enough. In 0.6.0, it causes all new requests (despite the docstring mentioning only in-flight requests) that come in within 15 seconds after halting the component to still go through, only to fail down the line because the rest of the system the handler is using is already halted. Even if a new instance of the server is started within those 15 seconds, all new requests are still going to the old instance. Probably because it still listens to the port? (Not sure why the new instance doesn't fail then since the port is busy.) I assume it was caused by this item in the changelog: "Add options to configure graceful shutdown timeout". But what then is the right workflow here? Perhaps there's a bug, given that a comment in the relevant changes says 1. Stop listening to incoming requests but I clearly observe that incoming requests are still being listened to.

p-himik10:03:33

Added a call to aleph.netty/wait-for-close to ig/halt-key!. Now halting the system takes 15 seconds, with all new incoming requests being handled, and with a subsequent timeout.

Arnaud Geiser10:03:25

Hello! It means you still have etablished connections. As soon as '.close' is called, the socket should not accept new connections. If you don't want to wait for those connections to be closed, you can pass 'shutdown-timeout: 0'. That way, Aleph will be stopped right away.

p-himik11:03:41

I thought so too, but I got rid of my WS setup specifically to test that. So all I have is Sente trying to re-establish its connection by issuing new requests every second. This screenshot was obtained after I started halting the system and then refreshed the web page. As you can see, there are no pending requests. No other web page issues any requests to the server. The only thing that's going on is Sente issuing those requests.

p-himik11:03:03

So it feels like Sente keeps the server alive by just poking at it, but don't know for sure.

p-himik11:03:45

Alright, it's not Sente - disabled it on the frontend and nothing has changed. However, pretty much all other requests have Connection: keep-alive. Could this be the culprit?

dergutemoritz16:03:10

Reading the above, I was gonna suggest keep-alive to be the culprit, too,

p-himik16:03:50

If that's indeed the case then I'd argue it should be somehow handled on the Aleph side, simply because it seems that nowadays browsers use that header by default.

Matthew Davidson (kingmob)06:03:06

That header is the default, yes, but it only has meaning for the old HTTP/1.0. From HTTP/1.1 on, it's just assumed to be always true, and browsers have to explicitly say to close the conn. Keep-alive code might be related, but that behavior's been there a long time, so we'd have to see what, if anything, changed there between 0.5 and 0.6. Personally, I suspect the shutdown changes. @U2FRKM4TW Can you file an issue with a minimal example, and we'll take a closer look?

p-himik07:03:39

Ah, hmm, interesting. Yeah, will try to come up with a minimal example.

Arnaud Geiser05:03:28

The way we handle graceful shutdown definitely has an impact on the time it might take to shutdown a server (I would say that's on purpose but you might disagree) We wait for all active connections to be closed within 15 seconds through a Netty ChannelGroup (Ithose MIGHT include 'keep-alive' connections, I would say yes but I'm unsure, it can be tested easily thougj). If your (HTTP) clients are gentle enough, those connections will be closed as soon as not in use. If not, you will have to wait for those 15 (configurable) seconds. Now.. maybe we can do better. Any obvious way to do it? I don't know. We are pretty happy with the current solution. It can even speed up the integration tests by using a 0 shutdown-timeout.

Arnaud Geiser06:03:17

A bit of reading to get the context : https://github.com/netty/netty/pull/3706#issuecomment-1303066857

p-himik09:03:46

To me it seems counter-intuitive that a connection that doesn't transmit any data and is only alive because of keep-alive is something that prevents server shutdown. At a higher level, a client still needs to initiate a new request - keep-alive is a low level thing, so from the perspective of the client it doesn't even care about it. But the server does care for some reason and then "stop listening to new requests" becomes "keep on serving requests for 15 more seconds on top of the system that's been already shutdown". I have no idea whether detecting things like "a connection was kept alive but is inactive" is possible at all. But if it is, wouldn't it be a better approach, to close such inactive connections upon shutdown?

Arnaud Geiser10:03:49

I totally agree with your last statement. I don't know what is doable or not currently. The thing is... the previous Aleph behaviour was not waiting for ongoing requests, that's the case now, and probably what most of us would expect. I would definitely prefer not waiting for idle connections, but I don't know what it involves (yet?). I would say the "workaround" today is to configure an idle-timeout to close those connections and thus have a quick shutdown.

p-himik10:03:46

Yeah, that's what I did. :)

2023-03-20

Channels