mchk 2 --- tbuf error on 750 running 4.2 BSD

Steve Grandi grandi at noao.UUCP
Wed Jul 31 02:56:05 AEST 1985


> Okay, I will.  I already mailed John, but perhaps this could be rehashed
> one more time.  The problem does lie in the L0003 board, but the solution 
> is easy.  VMS has microcode to alleviate these parity problems, and 
> using the /boot program which reads microcode off the disk, the problem
> can be easily solved.  Mike Karels wrote up a patch and we have been running

Unfortunately, loading the proper microcode is not the complete solution.
Witness the following console output--

Jul  4 02:10
machine check 2: cp tbuf par fault
	va 802246f4 errpc 8001433b mdr 505 smr 8 rdtimo 0 tbgpar 3 cacherr 1
	buserr 8 mcesr c pc 80014336 psl c00000 mcsr 80318
panic: mchk
trap type 2, code = 0, pc = 80000fa2
panic: Reserved operand
trap type 2, code = 0, pc = 80000fa2
panic: Reserved operand
trap type 2, code = 0, pc = 80000fa2
panic: Reserved operand
trap type 2, code = 0, pc = 80000fa2
panic: Reserved operand
trap type 2, code = 0, pc = 80000fa2
panic: Reserved operand

4.2 BSD UNIX #5: Mon Jun 24 17:12:19 MST 1985
real mem  = 5238784
avail mem = 4198400
using 231 buffers containing 524288 bytes of memory
etc.

Maybe the combination of microcode rev. 98 (which we are already using) and 
the rev. 7 L003 board (which will be installed Someday, Real Soon Now)
will cure the problem and eliminate these irritating crashs.  But I doubt it.

Now the real question: Does anyone know why the system sometimes goes into
the mchk/Reserved operand panic loop shown above instead of trying its normal
recovery?  This happens on about half of our tbuf parity faults.
-- 
Steve Grandi, National Optical Astronomy Observatories, Tucson, AZ, 602-325-9228
{arizona,decvax,hao,ihnp4,seismo}!noao!grandi  noao!grandi at lbl-csam.ARPA



More information about the Comp.bugs.4bsd.ucb-fixes mailing list