Sys V fork IS broken!

Tue Jul 31 17:00:49 AEST 1990

In article <13447 at smoke.BRL.MIL> gwyn at smoke.BRL.MIL (Doug Gwyn) writes:
|In article <1990Jul28.195032.18746 at watdragon.waterloo.edu> tbray at watsol.waterloo.edu (Tim Bray) writes:
|>  if ((child = fork()) == -1)
|>    FatalSystemError("Serious system trouble! Can't create process!");
|>... I think describing absence of special-purpose backoff & retry code for
|>handling process creation failure by the OS as "bugs in application programs"
|>is pretty arrogant and unrealistic.
|
|The bug is that your application makes no attempt to recover from a known
|class of error, EAGAIN in this case. [...]

Well, yes, but ...

This is a "known class of error" that has been added to the meaning
of fork over the years.  (I don't know when in the various branches
of the family tree, but probably usually around the same time that
support for memory paging was being added.)  I tend to concur with
an earlier poster who suggested that returning EAGAIN even when it
is only a temporary lack of resources that is a problem would be
analogous to returning EAGAIN just because the disk buffer cache is
temporarily full instead of just putting the process to sleep while a
buffer cache entry is emptied on its behalf.

Obviously, from the intensity that is being used to suggest that this
really is an error, the situation is not that simple.  Could someone
please explain why.  Is it too difficult (or impossible) to distinguish
between a transient and a deadlocked the lack of resources?  Or would
the people claiming that "this is a policy decision that should not
be in the kernel" also claim that having the kernel automatically
wait for a buffer cache is a mistake according to the same design
philosophy?  (If not and if it is not just because of practicle
considerations like detectability and certainty of success, what is
the difference?)

While Doug claims that Tim's code above ignores a known class of
error, this was not always a known class of error - in earlier
versions of Unix it was not a class of error at all.

Certainly, from the perspective of someone who has been writing code
using fork since version 5 days, I can admit that I have never before
noticed the change from "error from fork is usually not recoverable"
to "error from fork is possibly recoverable if you try again in a
while" between S3 and S5.

Perhaps a document should be written for new system releases giving
changes to programming practice that should be used - it could contain
any change that has required a significant proportion of the standard
program set to be examined and fixed for the new (or newly noticed)
desired programming method.

Requiring programmers to change their normal programming practices
should not be done without justification (which I think can be provided
in this case), and without clear explanation (which is often lacking).
-- 
Algol 60 was an improvment on most           | John Macdonald
of its successors - C.A.R. Hoare             |   jmm at eci386