Children's exit() status
lenb at houxs.UUCP
lenb at houxs.UUCP
Tue Feb 23 03:13:02 AEST 1988
Okay UNIX Sys V hackers, here's a question for you.
In the following scenario, how should a parent process
wait for it's children to complete:
REQUIREMENT:
I have a parent process who forks 30 identical children.
The children conduct some measurements, and when done,
each sends a single IPC message with results back to the
parent and exits.
The children are identical, so they should all have roughly
equal life span, though that time may vary between 5 and 15 minutes.
The parent needs to be woken when the first child exits --
a straight forward wait(). The parent must also know if
any children complete in error.
It is preferable that the parent check the children's exit status
for any errors, since the system may indicate strange situations in
the exit status, and the children are already designed to use exit(code).
POSSIBLE SOLUTIONS:
Here's what I've though of so far:
There seem to be 2 types of solutions, either use wait() with or
without SIGCLD, or use blocking message receives.
I'd like to use wait(), because the children have a meaningful
exit status. The question is, is it possible that my program
be woken up only 20 times, for 30 children. Ie. could I miss
child deaths because several occur "simultaneously". (simultaneously
meaning while I'm awake checking one child's return code, another
2 children die -- the next wait() missing one or both of them.)
If I *do* miss children deaths, then upon each wake up from wait,
I could kill(pid, 0), each of the children to see if they're all dead.
I wouldn't miss any deaths that way, but I'd still miss some exit codes.
If I'm going to miss exit codes, I could use signal(SIGCLD, SIG_IGN)
after the first child's death to wait() for the last child's death.
Then I'd check to see if I have 30 messages waiting. There are
warnings about using this signal in signal(2), so this is no good.
Another possibility is to have the children send a software signal
to the parent just before they die. I wouldn't miss any deaths,
but this is no help with exit codes.
Another solution is to use vanilla blocking message receives.
I know how many children I have, and could expect that number
of messages. I'd have to change the children to not send a message
if they encountered a problem -- the message in effect acting as
a "normal" return code. However, error codes from built in exit()s
would be lost, unless redesigned to send the code in a message before
exiting. I'd also lose any system information encodes in the exit code.
Has anybody out there run in to this type of situation?
Any facts, clues or pointers appreciated. If you reply,
please cc: email since I don't often read news. Thanks.
Len Brown
201-949-0092
{ ihnp4 etc. }!houxs!lenb
More information about the Comp.unix
mailing list