Re: dependant services from Avery Payne on 2015-06-08 (supervision)

From: Avery Payne <avery.p.payne_at_gmail.com>
Date: Mon, 08 Jun 2015 07:00:09 -0700

On 5/14/2015 3:25 PM, Jonathan de Boyne Pollard wrote:
> The most widespread general purpose practice for "breaking" (i.e.
> avoiding) this kind of ordering is of course opening server sockets
> early. Client and server then don't need to be so strongly ordered.
This is where I've resisted using sockets. Not because they are bad -
they are not. I've resisted because they are difficult to make 100%
portable between environments. Let me explain.

First, there is the question of "what environment am I running in"? This
can break down in to several sub-questions of "what variable settings do
I have", "what does my directory structure look like", and "what tools
are available". That last one - what tools are installed - is what
kills me. Because while I can be assured that the bulk of a framework
will be present, there is no guarantee that I will have UCSPI sockets
around.

Let's say I decide to only support frameworks that package UCSPI out of
the box, so I am assured that the possibility of socket activate is 100%
guaranteed, ignoring the fact that I just jettisoned several other
frameworks in the process simply to support this one feature. So we
press on with the design assumption "it is safe to assume that UCSPI is
installed and therefore can be encoded into run scripts". Now we have
another problem - integration. Using sockets means I need to have a
well-defined namespace to locate the sockets themselves, and that means
a well-known area in the filesystem because the filesystem is what
organizes the namespace. So where do the sockets live? /var/run?
/run? /var/sockets? /insert-my-own-flavor-here?

Let's take it a step further and I decide on some name - I'll pull one
out of a hat and simply call it /var/run/ucspi-sockets - and ignore all
of the toes I'm stepping on in the process, including the possibility
that some distribution already has that name reserved. Now I have (a)
the assurance that UCSPI is supported and (b) a place for UCSPI to get
its groove on, then we have the next problem, getting all of the
services to play nice within this context. Do I write everything to
depend on UCSPI sockets so that I get automatic block? Do I make it
entirely the choice of the administrator to activate this feature via a
"switch" that can be thrown? Or is it used for edge cases only?
Getting consistency out of it would be great, but then I back the admin
into a corner with "this is design policy and you get it, like it or
not". If I go with admin controlled, that means yet another code path
in an already bloaty ./run.sh script that may or may not activate, and
the admin has their day with it, but the number of potential problem
vectors grows. Or I can hybridize it and do it for edge cases only, but
now the admin is left scratching their head asking "why is it here, but
not there? it's not consistent, what where they thinking??"

Personally, I would do the following:

* Create a socket directory in whatever passes for /var/run, and name it
/var/run/ucspi-sockets.

* For each service definition that has active sockets, there would be
/var/run/ucspi-sockets/{directory} where {directory} is the name of the
service, and inside of that is a socket file named
/var/run/ucspi-sockets/{directory}/socket. That is about as generic and
"safe" as I can get, given that /var/run on Linux is a symlink that
points to /run in some cases. It is consistent - the admin knows where
to find the socket every single time, and is assured that the socket
inside of the directory is the one that connects to a service. It is a
reasonable name - the odds of /var/run/ucspi-sockets being taken for
anything else but that is fairly low, and the odds of me stepping on top
of some other construct in that directory are low as well, because any
existing sub-directory in that location is probably there for the same
reason.

* Make socket activate an admin-controlled feature that is disabled by
default. You want socket activation, you ask for it first. The admin
gets control, I get more headache, and mostly everyone can be happy.

We've answered the "where" and the "when", now we are left with the
"how". I suspect that you and Laurent would argue that I shouldn't be
using sockets inside of ./run as it is, that it should be in the layer
above in service management proper, meaning that the entire construct
shouldn't exist at that level. Which means I shouldn't even support it
inside of ./run. Which means I can't package this feature in my
scripts. And we're back to square one.

Let's say I ignore this advice (at my own peril) and provide support for
those frameworks that don't have external management layers on top of
them. This was the entire reason I wrote my silly peer-level dependency
support to begin with, so that "other folks" would have one or two of
these features available to them, even though they don't have external
management like nosh or s6-rc or anopa. It's a poor man's solution, but
I'm not presenting it any other way, you get what you see. So doing
UCSPI sockets as an optional feature is probably OK, as long as it's
clear that I'm not giving you full management out of the box.

If I were to write support for sockets in, I would guess that it would
probably augment the existing ./needs approach by checking for a socket
first (when the feature is enabled), and then failing to find one
proceed to peer-level dependency management (when it is enabled). So
you would have "no sockets and no peer dependencies" which is the
default out-of-box experience, and the one that is 100% compatible with
all frameworks. Nothing is checked and everything run-loops as
expected, and you can drop the ./run scripts into things like nosh or
s6-rc or anopa with confidence. You would have "sockets but no peer
dependencies", where there is a check for a socket performed, and if
present, that is used, otherwise it run-loops. You would have "no
sockets but peer dependencies", where there is no socket found, but it
then walks the dependency tree and starts things that way, run-looping
as needed. Finally you have "sockets and peer dependencies", if a
socket is found it is used, if it is not found peer dependencies are
used, and if either fail, it run-loops. This is "gracefully degrading"
as sockets would receive preferential treatment, moving to peer
resolution when it's active but not available, and if you accidentally
enable the feature where it isn't needed/wanted, things don't blow up
horribly because the end result is a run-loop that can be caught and
controlled.

Both features would be selectable by the admin, and both are independent
of each other - enabling none, one, the other, or both options are
possibilities. In situations where dependency management is externally
handled, you would simply keep both features turned off. In the case of
sockets, things would launch and dependencies would block. In the case
of peer-dependencies, the home user - who doesn't give two cares about
this and just "wants it to work" - gets what they want, ease of use. If
you want the full "belt and suspenders experience" turn on both
switches, sit back, and enjoy the light show. Everyone wins.

Of course, there are no immediate plans to support UCSPI, although I've
already made the mistake of baking in some support with a bcron
definition. I think I need to go back and revisit that entry...

As a side note, I'm beginning to suspect that the desire for "true
parallel startup" is more of a "mirage caused by desire" rather than by
design. What I'm saying is that it may be more of an ideal we aspire to
rather than a design that was thought through. If you have sequenced
dependencies, can you truly gain a lot of time by attempting parallel
startup? Is the gain for the effort really that important? Can we even
speed things up when fsck is deemed mandatory by the admin for a given
situation? Questions like these make me wonder if this is really a
feasible feature at all.
Received on Mon Jun 08 2015 - 14:00:09 UTC

This archive was generated by hypermail 2.3.0 : Sun May 09 2021 - 19:44:19 UTC