mchk 2 --- tbuf error on 750 running 4.2 BSD

Gene Spafford spaf at gatech.CSNET
Thu Aug 1 15:15:01 AEST 1985


In article <2496 at sun.uucp> dcmartin at sun.UUCP (David C. Martin) writes:
>Okay, I will.  I already mailed John, but perhaps this could be rehashed
>one more time.  The problem does lie in the L0003 board, but the solution 
>is easy.  VMS has microcode to alleviate these parity problems, and 
>using the /boot program which reads microcode off the disk, the problem
>can be easily solved.  Mike Karels wrote up a patch and we have been running
>it at UC Berkeley for quite some time with favorable results.  If there is
>sufficient need, I will dig this up for those of you who need it, the microcode
>loading program was previously posted to the NET, so check your archives for 
>that.
>
Nope, that isn't the whole fix.  The microcode fix only cures about 1/4
to 1/3 of the tbuf crashes (from our experience with the 3 750s in our
lab).  I installed the microcode-loading boot just about a week after
the machines came in, and it didn't cure the problem.  The new microcode
fixes a different bug that causes tbuf faults.

Also, before anyone posts something about how the whole thing can be
cured by a patch to the machine check processing code -- I know about
that patch too, and it doesn't fix the problem.

To repeat, the problem is a well known HARDWARE problem, and if your
field service people don't believe it, tell them to call the Ultrix
support center for confirmation; everybody there should know all about
the problem. Most of the old boards with the bad lot of chips (I have
been told that the only way to identify some of them is to unsolder the
chips and read the lot numbers off the bottom) have been replaced or
installed in VMS systems where the problem will go unnoticed.
Unfortunately, some field service people don't know about the problem,
or blame it on Unix (because they don't understand).  One site I know
of had the field engineer swap out the L0003 board twice, and the
problem didn't go away.  He claimed that it had to be Unix, and as a
non-supported product he was not responsible for anything else.  The
problem was that the two boards he swapped out were spares that had
been sitting at the local office for months, and they had the faulty
chips.  Don't let this happen to you!

-- 
Gene "4 months and counting" Spafford
The Clouds Project, School of ICS, Georgia Tech, Atlanta GA 30332
CSNet:	Spaf @ GATech		ARPA:	Spaf%GATech.CSNet @ CSNet-Relay.ARPA
uucp:	...!{akgua,allegra,hplabs,ihnp4,linus,seismo,ulysses}!gatech!spaf



More information about the Comp.bugs.4bsd.ucb-fixes mailing list