UNIX signal question
Chris Torek
chris at umcp-cs.UUCP
Wed Dec 25 12:55:30 AEST 1985
In article <106 at humming.UUCP> arturo at humming.UUCP (Arturo Perez) writes:
>> The signal(2) system call [when used on SIGCLD] checks to see if
>> any zombie child(ren) are present and sends the calling process
>> another SIGCLD if there are. ... Note that the reinstallation
>> of the handler must follow the call to wait, or infinite recursion
>> results.
>> Bob Lenk
>> {hplabs, ihnp4}!hpfcla!rml
> This isn't correct. The problem is that the implicit 'signal(SIGCLD,
> SIG_DFL)' is done AFTER the signal trapping function returns. Thus,
> if you call signal from within the trapping function it doesn't do
> you any good. At least, this is the way it works on our SYSV/BSD
> hybrids.
Judging from the replies I have received, I would say that Bob Lenk
is correct. I suspect your system is following 4.2BSD signal
semantics: What you described does not make sense, but with a little
reinterpretation it implies 4.2 style signal handling.
[I figured that as I originally raised the question, I should try to
point out the correct answers.]
------------------
To try to give everyone a better `feel' for the AT&T code, here is
what happens. Note that this is based on my interpretation of code
I have not seen; the implementation is obvious, so I think this
description is correct, but do not count on it.
Assume process 1000 is parent, 1001 and 1002 are children;
1000 has done a `signal(SIGCLD, getchild);'; and 1001 has
just exited. (1002 is about to exit.) The AT&T kernel
notes the exit of 1001, finds its parent (1000), and
discovers that SIGCLD is being caught. It therefore resets
1000's SIGCLD behaviour to SIG_DFL, and sets the bit for
SIGCLD delivery in 1000's signal delivery mask. 1001 is
left as an ordinary zombie, awaiting a wait() by 1000.
Now process 1002 is run; it exits. The kernel notes this,
finds 1000, and discovers that SIGCLD is not being caught
(nor ignored). 1002 is therefore also left as a zombie.
Finally 1000 is run. SIGCLD is delivered, sending the
program into its `getchild' routine. This routine does
one wait, collecting either process 1001 or 1002---just
which is unimportant, but let us say it collects 1002.
The routine does whatever it wants with the information
returned by wait, then, just before returning, calls
`signal(SIGCLD, getchild);'.
In the signal code in the kernel, the special case code
for SIGCLD now searches for zombies. It finds 1001, owned
by 1000; and this is what it was looking for: An exited
child owned by the calling process. It therefore does
*not* set SIGCLD to getchild, but rather to SIG_DFL; and
the bit in the delivery mask is once again set.
On return from the system call the bit is noticed and
`getchild' is called once again. This time it collects
1001. Again 1000 sets SIGCLD to getchild, but this time
there are no exited children, so the signal is simply
set as usual.
This probably describes the actual kernel code, with one exception:
I suspect the SysV kernel sets SIGCLD to the address of getchild
in the kernel signal function, and sets it to SIG_DFL in the same
kernel code that invokes every other user signal handler; but I
think the description is clearer without this. (The reason this
works is that the default action of SIGCLD is indistinguishable
from that of a caught SIGCLD when SIGCLD delivery is already pending;
oddly enough, this is because signals are *not* queued, despite
the SysV manual page.) The apparent behaviour of the kernel to
user code is the same either way, and Bob Lenk's posting is correct.
--
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 4251)
UUCP: seismo!umcp-cs!chris
CSNet: chris at umcp-cs ARPA: chris at mimsy.umd.edu
More information about the Comp.unix
mailing list