Re: Rare runsv logging problem

From: Gerrit Pape <pape_at_smarden.org>
Date: Tue, 29 Jul 2014 15:11:11 +0000

On Fri, Jul 25, 2014 at 05:16:08PM -0700, Caleb Spare wrote:
> I've been using runit for a while now and it has been mostly
> wonderful. I'm noticing a persistent issue and I'm not sure how to
> debug it.

> The problem I'm seeing is that, very occasionally, runsv will get into
> a bad state where svlogd is not running. (I'm not sure if it fails to
> start svlogd or if this happens later on after it has been running
> properly.) When the problem occurs, pstree shows something like this:
>
> runsvdir-+-runsv-+-foo---5*[{foo}]
> | `-svlogd
> |-runsv-+-bar---21*[{bar}]
> | `-svlogd
> `-runsv---baz---250*[{baz}]
>
> Here you can see that the baz process does not have an associated
> svlogd process. Further:
>
> $ sudo sv s foo
> run: foo: (pid 4885) 526260s; run: log: (pid 875) 526517s
> $ sudo sv s baz
> run: baz: (pid 2337) 2983swarning: baz: unable to open supervise/ok:
> file does not exist
> ; run: log: (pid 2337) 2983s
>
> Two strange things there: the warning about supervise/ok and also that
> the pid for 'log' is the same as for 'baz'.

The warning actually should be fine and applies to the log service, but
not how the output os formatted. For the final line, this is a bug in
sv, here's a patch:

diff --git a/runit-2.1.1/src/sv.c b/runit-2.1.1/src/sv.c
index 43224b5..5f3aa71 100644
--- a/runit-2.1.1/src/sv.c
+++ b/runit-2.1.1/src/sv.c
_at_@ -153,7 +153,7 @@ int status(char *unused) {
   int rc;
 
   rc =svstatus_get();
- switch(r) { case -1: if (lsb) done(4); case 0: return(0); }
+ switch(rc) { case -1: if (lsb) done(4); case 0: return(0); }
   rc =svstatus_print(*service);
   if (chdir("log") == -1) {
     if (errno != error_noent) {

> When runsv is in this bad state, the output from baz goes right to
> runsvdir and ends up in /var/log/upstart/runsvdir.log.
>
> The fix I've been using is to 'sv d baz' and then kill the offending
> runsv process. Runsvdir will quickly restart it and then everything
> will be working:

> I'm unsure what causes this rare problem. We only do simple things
> with the runit: sv {t,d,u}. When we deploy services, we rsync a
> directory from elsewhere on the box into /etc/services/<name> and then
> 'sv t <name>'. That source dir only has ./run, ./finish, and
> ./log/run.

Since the runsv service is running, but not the appendant log service,
and a restart of runsv fixes the issue, I guess it's a race when
deploying the service directories through rsync. It may be that runsv
starts up in the very moment where service/run already exists, but the
subdirectory service/log/ not yet.

Regards, Gerrit.
Received on Tue Jul 29 2014 - 15:11:11 UTC

This archive was generated by hypermail 2.3.0 : Sun May 09 2021 - 19:44:18 UTC