processes that get stuck

Antony A. Courtney antony at lbl-csam.arpa
Mon Mar 12 17:56:47 AEST 1990


In article <THOTH.90Mar11213935 at springs.cis.ufl.edu> thoth at springs.cis.ufl.edu (Gilligan) writes:
>
>  While we're all discussing defunct and exiting processes, how about
>that SunOS 4.x that sometimes puts processes in permanent disk wait?
>Processes show up with a D in the STAT field of a ps -gux.  These
>processes are unkillable and can only been removed with a halt and a
>reboot.  They tend to collect other processes as well.  If you ever
>put an emacs into this totally-hosed-state it is quite likely that any
>others you start up after it will follow it into D-space.
>  I have on occasion been using X windows and watched the xload
>skyrocket to 15 as all my shells and compilations go to hell.  The
>L1-a is the only solution.
>  It hasn't happened as much since we got 8 megs of ram for all of our
>computers.
>
>  Comments? explanations?
>--
>( My name's not really Gilligan, It's Robert Forsman, without an `e' )

We noticed this a lot, too.

The best explanation that I could come up with was the following:

We frequently noticed that when this did happen, we were ocassionally getting
messages in syslog about how the system "lost interrupt from controller".  My
theory on what was going on is this:

your application requests access to some file.  The inode of the file or a
block of the file is allocated and is locked by the kernel on behalf of your
process.  Then the kernel initiates the disk controller for the read(), and
puts your process to sleep on the block pending the DMA transfer from the
disk controller.  IF THE SYSTEM LOSES THE INTERRUPT TELLING IT THE DMA TRANSFER
HAS COMPLETED, OR IF THE DMA TRANSFER NEVER OCCURES, YOUR PROCESS SLEEPS ON
THIS BLOCK INDEFINITELY.   Furthermore, any other processes which attempt to
access this file will find the particular block locked and will sleep pending
a brelse() of this block.  Since your first process is never woken up, it never
releases the block and those subsequent processes also sleep indefinitely. 

We have not had this happen for quite a while.  (we have also been running
SunSos 4.1Beta since X-mas).  We may have also replaced our disk controller,
I'm not sure.

Has anyone else experienced this problem?  Is my 'theory' a valid explanation?



				antony
--
*******************************************************************************
Antony A. Courtney				antony at lbl.gov
Advanced Development Group			ucbvax!lbl-csam.arpa!antony
Lawrence Berkeley Laboratory			AACourtney at lbl.gov



More information about the Comp.unix.wizards mailing list