Reaping zombie processes

Rick Ace pixar!rta at ucbvax.berkeley.edu
Fri Mar 31 09:25:45 AEST 1989


Here's the lowdown on exiting and zombie processes, circa SunOS 3.5.  It
may or may not be different under 4.0.

Process exit begins when 1) the process exits voluntarily via the "exit"
syscall, or 2) when it is forced to do so by an uncaught signal.  The
kernel enters a routine called exit() [those of you with source can sing
along, the rest just have to believe me :-].

Upon entering exit() (the kernel's exit(), that is), the kernel sets the
SWEXIT flag in the struct proc of the process.  This flag advises the
paging and swapping logic that the process is on its way out and should be
held in core so its demise will be quick.

The next step taken is to release the user virtual memory occupied by the
process.  This encompasses the text, data, and stack segments, but not the
kernel's "u. area" for the process (yet).

Now the kernel runs through all open file descriptors, closing each one.
This can result in calls to the "close" routines within device drivers.
The drivers are at liberty to suspend the process if they so choose (for
example, a tty driver may suspend the process until all characters in the
output queue have been delivered to the hardware).  Each driver is unique
in its behavior, so the reasons for suspending a process will vary.  One
would hope that the programmer who coded the driver would implement a
timeout, which would give up and resume the user process after a
reasonable amount of time, but unfortunately this is more the exception
than the rule.  If a device driver should choose to suspend the process,
"ps" will report the process as "exiting".  In this case, the WHCAN column
of the "ps" display will in an obscure way reflect the event the device
driver is awaiting to wake the process from its sleep.  When "ps" reports
a process as "exiting", the process is most likely delayed in the
close-the-file-descriptors phase of exiting.

After all of the file descriptors are closed, the kernel then discards the
page tables and "u. area" of the process, and places the process in the
"zombie" state, which is signified by the value SZOMB in the p_stat field
of the proc structure.  At this point, the proc structure is the only
vestige of the process remaining on the system (it's pretty minimal, see
/usr/include/sys/proc.h), and its purpose it to maintain process exit
status and accounting information for the parent.  A process in this state
will appear as a "zombie" in the "ps" display.  When the parent reaps the
process using wait(), wait3(), or whatever else is fashionable these days,
the proc struct is discarded and the process is completely gone.

Regarding "gcore":  Since the VM of the process is discarded very shortly
after the kernel sets the SWEXIT flag, when "gcore" sees SWEXIT, it
concludes that the process has no VM to dump, so it tells you that the
process is exiting and gives up.  It cannot dump memory because there is
no memory left to dump.

Rick Ace
Pixar
3240 Kerner Blvd, San Rafael CA 94901
...!{sun,ucbvax}!pixar!rta

[[ Thank you very much!  That was most informative.  --wnl ]]



More information about the Comp.sys.sun mailing list