Re: Rare runsv logging problem

From: Caleb Spare <cespare_at_gmail.com>
Date: Fri, 25 Jul 2014 21:45:30 -0700

On Fri, Jul 25, 2014 at 9:35 PM, James Powell <james4591_at_hotmail.com> wrote:
> My question is why are you running Upstart? Runit has it's own init so Upstart is pointless.

Ubuntu uses upstart; this configuration is how runit is packaged in
that distro. Replacing PID 1 on our servers with runit is a large
effort I don't intend to undertake. We use various distro-packaged
services that are run by Upstart.

But runit has been great for managing our application processes thus
far, modulo the issue I described.

> Runit's binary should maintain runsv. It also could depend on the run script also having an improper handling.

Can you explain what this means?

>
> Sent from my Windows Phone
> ________________________________
> From: Caleb Spare<mailto:cespare_at_gmail.com>
> Sent: ‎7/‎25/‎2014 5:16 PM
> To: supervision_at_list.skarnet.org<mailto:supervision_at_list.skarnet.org>
> Subject: Rare runsv logging problem
>
> Hi,
>
> I've been using runit for a while now and it has been mostly
> wonderful. I'm noticing a persistent issue and I'm not sure how to
> debug it.
>
> On the servers we're running Ubuntu and we use runit 2.1.1 via the
> default package that comes with the distro. Upstart runs runsvdir and
> we use runit to manage all of our application processes. Each
> application has a simple ./run and ./log/run; the latter execs svlogd
> (this is all a typical configuration, as I understand it).
>
> The problem I'm seeing is that, very occasionally, runsv will get into
> a bad state where svlogd is not running. (I'm not sure if it fails to
> start svlogd or if this happens later on after it has been running
> properly.) When the problem occurs, pstree shows something like this:
>
> runsvdir-+-runsv-+-foo---5*[{foo}]
> | `-svlogd
> |-runsv-+-bar---21*[{bar}]
> | `-svlogd
> `-runsv---baz---250*[{baz}]
>
> Here you can see that the baz process does not have an associated
> svlogd process. Further:
>
> $ sudo sv s foo
> run: foo: (pid 4885) 526260s; run: log: (pid 875) 526517s
> $ sudo sv s baz
> run: baz: (pid 2337) 2983swarning: baz: unable to open supervise/ok:
> file does not exist
> ; run: log: (pid 2337) 2983s
>
> Two strange things there: the warning about supervise/ok and also that
> the pid for 'log' is the same as for 'baz'.
>
> When runsv is in this bad state, the output from baz goes right to
> runsvdir and ends up in /var/log/upstart/runsvdir.log.
>
> The fix I've been using is to 'sv d baz' and then kill the offending
> runsv process. Runsvdir will quickly restart it and then everything
> will be working:
>
> runsvdir-+-runsv-+-foo---5*[{foo}]
> | `-svlogd
> |-runsv-+-baz---25*[{baz}]
> | `-svlogd
> `-runsv-+-bar---20*[{bar}]
> `-svlogd
>
> I'm unsure what causes this rare problem. We only do simple things
> with the runit: sv {t,d,u}. When we deploy services, we rsync a
> directory from elsewhere on the box into /etc/services/<name> and then
> 'sv t <name>'. That source dir only has ./run, ./finish, and
> ./log/run.
>
> Any ideas of what we might be doing wrong, or how to otherwise avoid
> this issue? Or if not, what I could do to further debug?
>
> Sorry for the long email; I wanted to be thorough in my description
> and avoid making assumptions about what could be causing this problem.
>
> Thanks,
> Caleb Spare
Received on Sat Jul 26 2014 - 04:45:30 UTC

This archive was generated by hypermail 2.3.0 : Sun May 09 2021 - 19:44:18 UTC