Daemons stuck in 'D' "short-term" wait state

Brent Chapman capmkt!brent at uunet.uu.net
Thu Mar 2 17:30:31 AEST 1989


There has been a fair amount of discussion of this on the Sun-Nets mailing
list lately (questions about Sun-Nets go to sun-nets-request at brillig.umd.edu).
This is not a problem with TOPS, but with the server-side NFS.

Apparently there is some subtle filesystem inconsistency in an inode which
can cause an NFS daemon to deadlock when trying to access that inode.  The
NFS client who originally issued the request never gets a response, so it
issues another request, which is caught by a different NFS server daemon,
which then goes and gets itself deadlocked, and so on, until all your NFS
daemons are hung.  There doesn't appear to be any way to unhang them or to
kill them; the only solution anyone has found is to reboot the server
(ugh...).

This little nasty bites me (I'm running 3.5) once every few months; it
hits others more often (some folks with multiple servers and lots of disk
activity were complaining of this happening weekly or even daily).

>From the accounts I've seen, I suspect it's somehow tied to high disk
load; the few times I've seen it, it's always happened in the middle of a
lot of bashing on the disk.  Others have reported running into it after
accidentally starting a 'find' on an NFS partition from 30 clients at the
same time, and while running with quotas enabled (which apparently
increases disk activity). 

Someone (I forget who, and I've already deleted the message) said they'd
checked the Sun Online Bugs Database, but didn't find anything relevant.

-Brent
--
Brent Chapman					Capital Market Technology, Inc.
Computer Operations Manager			1995 University Ave., Suite 390
brent at capmkt.com				Berkeley, CA  94704
{cogsci,lll-tis,uunet}!capmkt!brent		Phone:  415/540-6400



More information about the Comp.sys.sun mailing list