Re: taxonomy of dependencies from Jonathan de Boyne Pollard on 2015-06-08 (supervision)

From: Jonathan de Boyne Pollard <J.deBoynePollard-newsgroups_at_NTLWorld.com>
Date: Mon, 08 Jun 2015 18:29:39 +0100

Laurent Bercot:
> What is true for datacenters even more than for hobbyist PCs, however,
> is that you definitely do not want cascading failure. And instant
> restart is a recipe for cascading failure: if your dnscache cannot
> start for some reason and dies instantly, and your supervisor restarts
> it immediately, your CPU loses itself in that loop and now you have a
> whole machine down instead of just one process down.

You're assuming that all softwares work like daemontools, forgetting
that not even yours does. (-: As I pointed out, nosh makes what
happens in the event of termination user-configurable, including the
decision to even restart at all. dnscache terminating gracefully can be
distinguished from it dying from (say) a segmentation violation. I
already both mentioned this and pointed to the fact that the mechanism
can duplicate systemd's Restart=on-abort and the like. The world has
actually come around to the idea of auto-restart being a decision, not
the hardwired you-must-wait-one-second it-always-restarts one size fits
all behaviour of daemontools' "supervise" that I was telling the
pseudonymous person about. In this world, your argument that imediate
restart is of necessity a cascading failure breaks down. Of course it
isn't, because it's an immediate restart *on success* (not one of the
"bad" signals) here. There's no actual failure to cascade or otherwise
in the first place. So I repeat: Sometimes, one does _not_ want these
things. If it's doing a graceful restart, I want dnscache back up
*right now*, not 1 second from now. There's no "price" to this. Take
careful note of the words "If it's doing a graceful restart".
Received on Mon Jun 08 2015 - 17:29:39 UTC

This archive was generated by hypermail 2.3.0 : Sun May 09 2021 - 19:44:19 UTC