Re: Rare runsv logging problem

From: Caleb Spare <cespare_at_gmail.com>
Date: Tue, 29 Jul 2014 11:17:15 -0700

Thanks Gerrit! You've exactly pinpointed the issue. I can reproduce it reliably:

1. Create service/ and service/run, and let it get picked up by runsvdir
2. Create service/log/ and service/log/run

Now this is in the bad state I originally described.

I suppose a workaround for us is pretty simple: if the service
directory does not exist (initial deploy), do the rsync to a temp
directory and mv it into place.

Do you think that runsv should more gracefully handle this order of events?

Thanks for your help.

-Caleb

On Tue, Jul 29, 2014 at 8:11 AM, Gerrit Pape <pape_at_smarden.org> wrote:
> On Fri, Jul 25, 2014 at 05:16:08PM -0700, Caleb Spare wrote:
>> I've been using runit for a while now and it has been mostly
>> wonderful. I'm noticing a persistent issue and I'm not sure how to
>> debug it.
>
>> The problem I'm seeing is that, very occasionally, runsv will get into
>> a bad state where svlogd is not running. (I'm not sure if it fails to
>> start svlogd or if this happens later on after it has been running
>> properly.) When the problem occurs, pstree shows something like this:
>>
>> runsvdir-+-runsv-+-foo---5*[{foo}]
>> | `-svlogd
>> |-runsv-+-bar---21*[{bar}]
>> | `-svlogd
>> `-runsv---baz---250*[{baz}]
>>
>> Here you can see that the baz process does not have an associated
>> svlogd process. Further:
>>
>> $ sudo sv s foo
>> run: foo: (pid 4885) 526260s; run: log: (pid 875) 526517s
>> $ sudo sv s baz
>> run: baz: (pid 2337) 2983swarning: baz: unable to open supervise/ok:
>> file does not exist
>> ; run: log: (pid 2337) 2983s
>>
>> Two strange things there: the warning about supervise/ok and also that
>> the pid for 'log' is the same as for 'baz'.
>
> The warning actually should be fine and applies to the log service, but
> not how the output os formatted. For the final line, this is a bug in
> sv, here's a patch:
>
> diff --git a/runit-2.1.1/src/sv.c b/runit-2.1.1/src/sv.c
> index 43224b5..5f3aa71 100644
> --- a/runit-2.1.1/src/sv.c
> +++ b/runit-2.1.1/src/sv.c
> _at_@ -153,7 +153,7 @@ int status(char *unused) {
> int rc;
>
> rc =svstatus_get();
> - switch(r) { case -1: if (lsb) done(4); case 0: return(0); }
> + switch(rc) { case -1: if (lsb) done(4); case 0: return(0); }
> rc =svstatus_print(*service);
> if (chdir("log") == -1) {
> if (errno != error_noent) {
>
>> When runsv is in this bad state, the output from baz goes right to
>> runsvdir and ends up in /var/log/upstart/runsvdir.log.
>>
>> The fix I've been using is to 'sv d baz' and then kill the offending
>> runsv process. Runsvdir will quickly restart it and then everything
>> will be working:
>
>> I'm unsure what causes this rare problem. We only do simple things
>> with the runit: sv {t,d,u}. When we deploy services, we rsync a
>> directory from elsewhere on the box into /etc/services/<name> and then
>> 'sv t <name>'. That source dir only has ./run, ./finish, and
>> ./log/run.
>
> Since the runsv service is running, but not the appendant log service,
> and a restart of runsv fixes the issue, I guess it's a race when
> deploying the service directories through rsync. It may be that runsv
> starts up in the very moment where service/run already exists, but the
> subdirectory service/log/ not yet.
>
> Regards, Gerrit.
Received on Tue Jul 29 2014 - 18:17:15 UTC

This archive was generated by hypermail 2.3.0 : Sun May 09 2021 - 19:44:18 UTC