Sun4/280S S/N 922E0914 and associated service orders

Daniel R. Ehrlich ehrlich at shire.cs.psu.edu
Thu Aug 24 23:58:52 AEST 1989


A short history of our problem.

July 13, 1989		System installed by Sun FE.  SunOS 4.0.3 installed.

July 14-Present	System crashes frequently (2-3 times per
			day) with one of the following errors:
			"BAD TRAP.  Kernel write data fault."
			"Watchdog reset."

A Sun UNIX Technical Support Engineer has been logging in on occasion to
our system to look at the crash dumps generated by the "BAD TRAP" crashes.
There is no way I know of to force a dump after one gets the "Watchdog
reset" error, so there are no dumps from this error.

It has been pointed out to Sun that in the module machdep.c (at least in
SunOS 4.0) there are #ifdef's in the fault handling code that depend on
the CPU (4_260 vs 4_110) the module is compiled on.  Unfortunately Sun
does not supply this module as a source in the binary distributions, so it
is not possible to determine which type of CPU machdep.o was compiled for.
The gut feeling around here is that this is a possible cause of the "BAD
TRAP" errors.

The "Watchdog reset" errors seem to occur when both 7053 disk controllers
as busy.  One can usually generate a "Watchdog reset" in sigle user mode
by running fsck(8) in parallel on disks attached to the two controllers.
One might conclude that the 7053 controller has a timing problem and is
not being a good VME bus neighbor.  The other more ominous choice is that
more than one 7053 has never been fully tested in a machine with a
501-1491 CPU board installed which has a faster clock that the older 4/260
CPUs.

For reference here is our current configuration.  Please note that as
shipped from Sun the ALM-II and the SCSI adapter were not installed.
Also, please not that both types of crashes were occuring BEFORE these
boards were installed and have continued unabated since they were
installed.

	Slot #		Board Description
	  1		501-1491-05	Sun 4/200 CPU w/FPU2
	  2
	  3		501-1203-04	ALM-II Sixteen Port Async
	  4		501-1550-03	Xylogics 472 Tape Controller
					(Fujitsu 1600/6250 tape drive attached)
	  5		501-1249-04	7053 SMD Disk Controller
					(Two NEC D2363 disks attached)
	  6		501-1254-03	32Mb Memory Array
	  7		501-1249-03	7053 SMD Disk Controller
					(One NEC D2363 disk attached)
	  8		501-1217-03	SUN 3 VME SCSI Controller
			501-1220-01	VME 3x2 Adapter
					(ExaByte 8mm tape drive attached)

Any ideas, thoughts, or comments would be appreciated.

-- Dan Ehrlich
   Computer Science Department
   The Pennsylvania State University
   333 Whitmore Laboratory
   University Park, PA   16802



More information about the Comp.sys.sun mailing list