Mysterious Sun-4 bug

Hugh LaMaster lamaster at pioneer.arc.nasa.gov
Fri Jun 28 02:26:31 AEST 1991


I previously wrote:

>The bug has appeared in 4.1, 4.1 + various patches (almost 4.1.1), 4.1.1,
>with and without DBE installed, with and without FDDI (ie, with NFS
>traffic over ethernet).  The same symptom has appeared in all cases:
>a process which is usually doing NFS I/O will hang in "D" state.  The 
>offending process cannot be killed, and eventually other processes
>start hanging as well.   During this period, Sybase activity
>will have been very heavy.  The Sybase datasever process itself, however,
>never hangs (note: Sybase is set up so that its I/O is local, *and*
>Sybase is using its own raw partitions). Even though Sybase itself 
>never hangs, *If Sybase asych. I/O is turned OFF,
>the problem rarely if ever appears.*


1) We are not running with /tmp in swap with tmpfs.  However, I understand
that this can cause a similar sounding problem, which may be related.  It
could be a bug somewhere in the allocation of swap space.

2) I should have made it clear that the Sybase raw partitions are local
to the machine with Sybase, and are not doing NFS on the Database files.
Only user-type files are mounted off of the fileserver using NFS.  Also,
lockd and statd are not running.  I believe that there is no need for
them to be running, since Sybase is not reading/writing over NFS, and
is not complaining about lock requests failing.

3)  We had another hang yesterday afternoon.  The processes which hung
this time looked like the following:


       F UID   PID  PPID CP PRI NI  SZ  RSS WCHAN    STAT TT  TIME COMMAND
200080001002  9562  9542  0  -1  0149376    0 kernelma DW   pa  0:00 model
200080011002  9529  4227  0  -1  0149376   72 kernelma D    pb  0:00 model


A pstat -Ts showed the following:

[149] pstat -Ts
>pstat: number of files is preposterous (14019)
>1470/1470 inodes
>454/4090 processes
>460952/781032 swap
>

We have a lot of swap space allocated, to run some of these big jobs.



-- 
  Hugh LaMaster, M/S 233-9,  UUCP:                ames!lamaster
  NASA Ames Research Center  Internet:            lamaster at ames.arc.nasa.gov
  Moffett Field, CA 94035    With Good Mailer:    lamaster at george.arc.nasa.gov 
  Phone:  415/604-1056                            #include <std.disclaimer> 



More information about the Comp.unix.wizards mailing list