785 CPU bug strikes again

Rick Ace rick at nyit.UUCP
Sat Nov 16 06:48:30 AEST 1985


Back in October, Chris Torek posted the following description of a
bug that appeared in a Vax-11/785 CPU:

> On our 785, this bug takes the form of computing the value 0x7dfffffc
> on `extzv $0,$4,-4(r0),r0' instructions when r0 has the value 0x80000000.

Well, last week one of our 780s got a 785 upgrade, and it started crashing
with "panic: Segmentation fault".  Investigation revealed that the crash
occurred at the very same EXTZV instruction that Chris's 785 was stumbling
over, but our CPU mangled a different bit:  the effective address calculated
by the CPU in our case was 0x5ffffffc, which caused 4.2bsd to die hard and
fast.  It was rather odd, though, because most of the time the kernel sailed
right though this instruction without a hitch.  I wrote a small test program
to exercise the bug:

	main()
	{
		while (1) {
			asm("	movl	$0x80000000,r0");
			asm("	extzv	$0,$4,-4(r0),r0");
		}
	}

Sometimes the program would dump core right away; other times it took a few
minutes.  I made the point to Marc Merrill, our Field Service technician,
who agreed that *something* was amiss.  He ran diagnostics, but they found
nothing.  Marc then swapped CPU boards between the failing 785 and another
working 785 until the problem moved.  The failing board was the M7468 data
path module, and replacing it with a spare from the F-S office put us back
on the air.

DEC - if you're listening - maybe you should beef up your FA&T diagnostics
to catch this specific problem.  (It didn't show up under VMS!)

-----
Rick Ace
Computer Graphics Laboratory
New York Institute of Technology
Old Westbury, NY  11568
(516) 686-7644

{decvax,seismo}!philabs!nyit!rick



More information about the Comp.unix.wizards mailing list