BSD Unix machines hanging

Thomas Narten narten at purdue.arpa
Sun Oct 5 04:18:01 AEST 1986


We have been experiencing a rather odd and intermittant problem with
our Unix machines. It is not confined to a particular machine or Unix;
it has happened with 4.2, 4.2 NFS, and 4.3 BSD on VAX 780, 785 and
uVAX II machines.

Symptoms: The machines appear to lock up, users cannot get characters
echoed, console is hung. In short, the machine seems dead. The only
way to recover is a reboot. 

However, the machine is still running in a sense. One can ping the
machine in question, and it responds. One can open a TCP connection to
the machine, and the connection succeeds, but hangs at that point.

When this happens, we have halted the cpu, looked at the PC, continued
the system, repeating the above in hopes of finding the machine caught
in a tight loop somewhere. It is not in a tight loop. In fact, when
this nailed one of our idle machines, the system was spending all of
its time in the context switch routine "Swtch". Other attempts at this
have found the PC in unrelated procedures an each halt.

This has hit most of our machines at one time or another, but usually
only gets one at  a time. Sometimes its a month between hangs,
sometimes several times in a day.

I suspect that we are tweaking some sort of networking bug where the
setting of the processor priority level gets messed up, leaving the
machine in a higher priority than it should be, so that user processes
no longer are scheduled.

Evidence to support this is an increase in network traffic on our
Ethernets over the last 6 months. Also, the last time one of the
machines hung, the last message on the console was a "qe0: restart"
message, indicating that the DEQNA Ethernet board had become wedged.
The problem is not restricted to machines with a DEQNA.

Has anyone else run into a similar problem?

Thomas
----------



More information about the Comp.unix.wizards mailing list