Re: dependant services from Steve Litt on 2015-06-08 (supervision)

From: Steve Litt <slitt_at_troubleshooters.com>
Date: Mon, 8 Jun 2015 13:44:27 -0400

On Mon, 08 Jun 2015 07:00:09 -0700
Avery Payne <avery.p.payne_at_gmail.com> wrote:

> On 5/14/2015 3:25 PM, Jonathan de Boyne Pollard wrote:
> > The most widespread general purpose practice for "breaking" (i.e.
> > avoiding) this kind of ordering is of course opening server sockets
> > early. Client and server then don't need to be so strongly
> > ordered.
> This is where I've resisted using sockets. Not because they are bad
> - they are not. I've resisted because they are difficult to make
> 100% portable between environments. Let me explain.

Hi Avery,

Note that as I read further and further in your email, my understanding
of your intended subject narrowed, so please do as I say, not as I do,
and read this whole thing before responding :-)

Just so we're all on the same page, am I correct that the subject of
your response here is *not* "socket activation", the awesome and
wonderful feature of systemd.

You're simply talking about a service opening its socket before it's
ready to exchange information, right?

[snip discussion of socket usage disadvantages in some context]

>
> Personally, I would do the following:
>
> * Create a socket directory in whatever passes for /var/run, and name
> it /var/run/ucspi-sockets.
>
> * For each service definition that has active sockets, there would be
> /var/run/ucspi-sockets/{directory} where {directory} is the name of
> the service, and inside of that is a socket file named
> /var/run/ucspi-sockets/{directory}/socket. That is about as generic
> and "safe" as I can get, given that /var/run on Linux is a symlink
> that points to /run in some cases. It is consistent - the admin
> knows where to find the socket every single time, and is assured that
> the socket inside of the directory is the one that connects to a
> service. It is a reasonable name - the odds
> of /var/run/ucspi-sockets being taken for anything else but that is
> fairly low, and the odds of me stepping on top of some other
> construct in that directory are low as well, because any existing
> sub-directory in that location is probably there for the same reason.
>
> * Make socket activate an admin-controlled feature that is disabled
> by default. You want socket activation, you ask for it first. The
> admin gets control, I get more headache, and mostly everyone can be
> happy.

Isn't this all controlled by the service? sshd decides when to open its
socket: The admin has nothing to do with it.

[Snip 2 paragraphs discussing the complexity of sockets used in a
certain context]

>
> If I were to write support for sockets in, I would guess that it
> would probably augment the existing ./needs approach by checking for
> a socket first (when the feature is enabled), and then failing to
> find one proceed to peer-level dependency management (when it is
> enabled).

Maaaaannnnn, is all this bo-ha-ha about dependencies?

=====================================
if /usr/local/bin/networkisdown; then
  sleep 5
  exit 1
fi
exec /usr/sbin/sshd -d -q
=====================================

Is this all about using the existance of a socket to decide whether to
exec your service or not? If it is, personally I think it's too
generic, for the reasons you said: On an arbitrary service,
perhaps written by a genius, perhaps written by a poodle, having a
socket running is no proof of anything. I know you're trying to write
generic run scripts, but at some point, especially with dependencies on
specific but arbitrary processes, you need to know about how the
process works and about the specific environment in which it's working.
And it's not all that difficult, if you allow a human to do it. I think
that such edge case dependencies are much easier for humans to do than
for algorithms to do.

If this really is about recognizing when a process is fully functional,
because the process being spawned depends on it, I'd start collecting a
bunch of best-practices, portable scripts called ServiceXIsDown and
ServiceXIsUp. So if your service depends on your being "on the
Internet" and having DNS up and running, you could make the following
internetIsDown:

=============================
#!/bin/sh
ping -W 1 -w 1 -c1 google.com 2> /dev/null 1> /dev/null
rtrn=$?

if test "$rtrn" = "0"; then
  rtrn=1
elif test "$rtrn" = "1"; then
  rtrn=0
else
  rtrn=0 # something wrong, do not proceed
fi
exit $rtrn
=============================

Sorry for the DP101 shellscript grammar: Shellscripts are a second
language for me.

Anyway, each possible dependent program could have one or more
best-practice "is it up" type test shellscripts. Some would involve
sockets, some wouldn't. I don't think this is something you can code
into the actual process manager, without a kudzu field of if statements.

[snip a couple paragraphs that were way above my head]

>
> Of course, there are no immediate plans to support UCSPI, although
> I've already made the mistake of baking in some support with a bcron
> definition. I think I need to go back and revisit that entry...

I'm a big fan of parsimonious scope and parsimonious dependencies, so
IMHO the less that's baked in, the better.

> As a side note, I'm beginning to suspect that the desire for "true
> parallel startup" is more of a "mirage caused by desire" rather than
> by design. What I'm saying is that it may be more of an ideal we
> aspire to rather than a design that was thought through. If you have
> sequenced dependencies, can you truly gain a lot of time by
> attempting parallel startup? Is the gain for the effort really that
> important? Can we even speed things up when fsck is deemed mandatory
> by the admin for a given situation? Questions like these make me
> wonder if this is really a feasible feature at all.

Avery, I'm nowhere near your knowledge level on init systems, but I've
wondered that myself. A 2 second boot would be nice, but at what cost?
How much indeterminancy are we willing to put up with for a 30 second
savings that happens once a day or once a month? When services like
dhclient take aeons (15 seconds) to complete, is it our duty to code a
one-size-fits-all solution to make sure things not depending on having
an IP address get done in parallel during that 15 seconds? And if it
is, couldn't we do so simply by manually ordering our services (I have
that capability on daemontools and damontools-encore now). If the
number of services is reasonable and the services quickly become ready,
parallel instantiation saves time only to the extent that we have
multiple processors. On the other hand, if an admin brings up 60
processes, many poorly written, some of whom require a half a minute to
become useful, is he really doing himself a favor? As far as
stable-system changes, would init or process management really be the
best place to put code to react to network ups and downs and storage
being hotswapped?

Consecutive instantiation will never boot in 2 second, but it's
determinate, and everyone understands it. In most situations, that's an
awful lot of value to give away just to boot 20 to 30 seconds quicker.
Plus there's this: Even original daemontools is nowhere near serial:
Correct me if I'm wrong, but I believe that with daemontools, svscan
keeps spinning, testing and trying, while supervise modules are
instantiating their daemons.

Sometimes, in my more cynical moods, I wonder whether "parallel
instantiation" is less of a helpful feature than it is a marketing
bulletpoint, much like "magnesium paddle shifter."

SteveT

Steve Litt
June 2015 featured book: The Key to Everyday Excellence
http://www.troubleshooters.com/key
Received on Mon Jun 08 2015 - 17:44:27 UTC

This archive was generated by hypermail 2.3.0 : Sun May 09 2021 - 19:44:19 UTC