Re: Service watchdog

From: Laurent Bercot <ska-supervision_at_skarnet.org>
Date: Tue, 19 Oct 2021 09:47:09 +0000

>Yes, in my usecase this would be used at the place where sd_notify()
>is used if the service runs under systemd. Then periodically executed
>watchdog could check the service makes progress and react if it
>doesn't.
>
>The question is how to implement the watchdog then - it could be either
>a global service or another executable in service directory, which
>would be started periodically by runsv.

  If a single notification step is enough for you, i.e. the service
goes from a "preparing" state to a "ready" state and remains ready
until the process dies, then what you want is implemented in the s6
process supervisor: https://skarnet.org/software/s6/notifywhenup.html

  Then you can synchronously wait for service readiness
(s6-svwait $service) or, if you have a watchdog service, periodically
poll for readiness (s6-svstat -r $service).

  But that's only valid if your service can only change states once
(from "not ready" to "ready"). If you need anything more complex, s6
won't support it intrinsically.

  The reason why there isn't more advanced support for this in any
supervision suite (save systemd but even there it's pretty minimal)
is that service states other than "not ready yet" and "ready" are
very much service-dependent and it's impossible for a generic process
supervisor to support enough states for every possible existing service.
Daemons that need complex states usually come with their own
monitoring software that handles their specific states, with integrated
health checks etc.

  So my advice would be:
  - if what you need is just readiness notification, switch to s6.
It's very similar to runit and I think you'll find it has other
benefits as well. The drawback, obviously, is that it's not in busybox
and the required effort to switch may not be worth it.
  - if you need anything more complex, you can stick to runit, but you
will kinda need to write your own monitor for your daemon, because
that's what everyone does.

  Depending on the details of the monitoring you need, the monitoring
software can be implemented as another service (e.g. to receive
heartbeats from your daemon), or as a polling client (e.g. to do
periodic health checks). Both approaches are valid.

  Don't hack on runit, especially the control pipe thing. It will not
end well.
  (runit's control pipe feature is super dangerous, because it allows a
service to hijack the control flow of its supervisor, which endangers
the supervisor's safety. That's why s6 does not implement it; it
provides similar - albeit slightly less powerful - control features
via ways that never give the service any power over the supervisor.)

--
  Laurent
Received on Tue Oct 19 2021 - 11:47:09 CEST

This archive was generated by hypermail 2.4.0 : Tue Oct 19 2021 - 11:47:40 CEST