help: random crashing of DS5000 running ULTRIX 4.1

Michael C. B. Ashley mcba at newt.phys.unsw.OZ.AU
Mon May 27 10:22:33 AEST 1991


Hi,

This is a rather long message describing a problem I have with a
machine crashing. If anyone could shed some light on a possible
solution, I would be most grateful.

I have a DS5000/200PX running ULTRIX 4.1 (Rev. 52), and the machine
crashes an average of once a day. The symptoms of the crash are that the
system does not respond to keyboard entry, or to /etc/ping from another
machine. If the screen saver is activated, the console screen remains
off despite mouse movement or keyboard presses. If the screen saver is
not activated then the screen remains on (no error messages visible),
and the mouse will move the cursor.

No error messages are observed in the output of /etc/uerf. The machine
(and memory and disks) pass every diagnostic in /usr/field, and I have
run the hardware (V5.3) ROM tests for days at a time without picking up
any errors. Last week the system board was replaced, however, the
problem remains.

Once I noticed a message similar to "swap error" appearing in the
Session Manager message area at the instant of a crash. As far as I can
see my swap space is configured correctly (about 300 MBytes of swap for
48 MBytes of memory). I have tried rebuilding the kernel a few times
with minor changes, all with no effect. Running /etc/sec/auditd doesn't
show up anything unusual at the time of the crash (although the
buffering of auditd would probably prevent the interesting information
being written to disk).

The machine will run without crashing if I disconnect the ethernet. The
crashes aren't related to some user's program, since there aren't any
users other than root at the moment.

Needless to say this is a very frustrating problem, can anyone make any
suggestions as to what I should do next? I have two ideas:

  (1) Maybe my copy of ULTRIX is corrupt. It came from a TK50, a
      rather unreliable medium in my experience. I have run
      /etc/stl/fverify to try and check the files, and everything
      appears to be OK although it is difficult to be sure since the
      *410.inv files show lots of checksum errors since they have
      been overwritten by *411.inv files.

  (2) Since the crashes appear to be related to the ethernet, maybe I
      need the "ln*.o kernel fix" that has been mentioned recently
      with respect to using tcpdump and LAT with ULTRIX 4.1. Note
      that our ethernet is teaming with exotic packets from all sorts
      of machines, and regularly crashes a couple of VT1000's we have
      in the building (they die with "illegal opcode 28", despite a
      recent ROM upgrade, but that is another story ...).

Thanks for any suggestions!
Michael Ashley mcba at newt.phys.unsw.oz.au



More information about the Comp.unix.ultrix mailing list