Who is responsible for a retry (was Re: Is System V.4 fork reliable?)

Mon Jul 30 00:21:52 AEST 1990

In article <7885 at tekgvs.LABS.TEK.COM> terryl at sail.LABS.TEK.COM writes:
[many wise words about KISS principle and the spirit of unix deleted]

But IMHO it's not quite appropriate here. I think the questions here is:

		Who should retry if a fork fails? 

To see the problem I think that we should generalize a little. Just
consider the case of disk reads for a moment. Surely, there's no
one of us who doesn't appreciate the ability of the device drivers
to issue retrys(%) if a read fails, and that an error from a read in
an application can be considered to be a permanent error.

(%: Maybe, if I were about to write a program which tests for flaky
disk blocks, I'm not so happy with kernal retries ...)

Of course, an application can choose to retry after bad reads and I've
had cases of "ill" disks, where running a program in the background
for some hours helped me to recover 100 % of "bad blocks" by patiently
retrying ... just 1 out of 100 reads or so happened to be succesfull.
On the other hand I would never embrace disk reads in "normal" programs
with a retry capability - why bother: the kernal-drivers solve the
problem in general well.

Now, why is the situation so different with "fork"?

As I understand all the traffic here, the "real" problem is in fact
that in case of the E_AGAIN-error two very different problems may
exist: The one is more a "long-term" problem (no slots in the process
table or user limited reached, where this could also be zombies caused
by careless programming techniques), the other is a very short-term
problem, which is difficult to correct in the kernal because the complexity
of the algorithms in that area.

So I think the complaints here *are* right from the view of an application
developper, but instead of embracing all the forks in application programs
with a retry capability, I think there's a more pragmatic (though not
ideal) approach: Why not enhance the interface to fork in the standard
library with a retry capability? For many of us, "library + kernal" are
more or less a monolithic block (we can't change both easily 1/2:-)), so
if an error from fork could be treated as the described "long-term" error
condition, everything were fine.

Well, only a suggestion, maybe someone will post such a piece of code
soon ...
-- 
Martin Weitzel, email: martin at mwtech.UUCP, voice: 49-(0)6151-6 56 83