/750 Machine check 2 (again)

dave at edcaad.UUCP dave at edcaad.UUCP
Fri Aug 26 22:48:00 AEST 1983


Thanks to extensive discussions with our helpful local DEC people,  and
several  flaky  /750s, I can add some details to the work of Peter Col-
linson (ukc!pc), Tucker Withington  (vaxine!ptw),  and  Dennis  Ritchie
(research!dmr) on the /750's machine check 2 handler in 4.?BSD:

1.   If the TB PARITY ERROR bit in the stored Error Summary Register is
     set (mcf->mc5_mcesr&4), and irrespective of the state of the other
     bits in this register, recovery may be attempted.   We  have  seen
     these errors with bits 0 and 3 set.

2.   It appears that the TB must be invalidated, by mtpr(TBIA,  0),  as
     soon  as possible, and in any case before the Error Summary Regis-
     ter is cleared by mtpr(MCESR, 0xf).

3.   It is NOT always  possible  to  recover  from  these  errors.   An
     instruction may be resumed if:

     a)   It has not affected the processor mode.  This can  be  deter-
          mined  by  comparing  the processor mode in the machine check
          frame with the mode in the interrupt frame.  A panic must  be
          issued if they differ.

     b)   If the instruction is single-byte, and its op-code has a  one
          bit in the following table:

                  0000111101101011        REI,RET,etc.
                  1111111110111111        JSB
                  1111111111111111
                  1111111111111111
                  1111111111111111
                  1111111111111111
                  0000000000101111        EMODF,CVTFD,etc.
                  0000111100000000        Double Prec. FP
                  1100000101001010        EMUL,EDIV,etc.
                  1111111111111111
                  1111111111111111
                  1111111111111111
                  0000001111111111        PUSHR,POPR,etc.
                  1111111111111111
                  1111111111111111
                  1111111111111111
                  0000000111111111        CALLG,CALLS,etc.


Further, VMS disables the cache if cache errors happen less than  100ms
apart, and disables half the Translation Buffer and uses the other half
if it detects failures less than 100ms apart.

Code to implement all these features for 4.1c  BSD  has  been  written;
when it has been tested it will be posted.  Unfortunately, testing is a
matter of sitting and waiting for the hardware.  In the meantime,  here
are the fixes to /usr/sys/vax/machdep.c to improve machine check handl-
ing.

1.	The error messages for the different machine check types for the
	/750 should read as follows:

char *mc750[] = {
	0,		"ctrl str par",	"cp tbuf",	0,
	0,		0,		"ucode lost",	"bad ird"
};

2.	The 750's case in the first switch in machinecheck() should look
	like:

#if VAX750
	case VAX_750:
		printf("%s fault\n", mc750[type&0x7]);
		break;
#endif

3.	The /750's case in the second switch in machinecheck() should be:

#if VAX750
	case VAX_750: {
		register struct mc750frame *mcf = (struct mc750frame *)cmcf;
		mtpr(TBIA, 0);	/* Assume bad - ala VMS */
		printf("\tva %x errpc %x mdr %x smr %x rdtimo %x tbgpar %x cacherr %x\n",
		    mcf->mc5_va, mcf->mc5_errpc, mcf->mc5_mdr, mcf->mc5_svmode,
		    mcf->mc5_rdtimo, mcf->mc5_tbgpar, mcf->mc5_cacherr);
		printf("\tbuserr %x mcesr %x pc %x psl %x mcsr %x\n",
		    mcf->mc5_buserr, mcf->mc5_mcesr, mcf->mc5_pc, mcf->mc5_psl,
		    mfpr(MCSR));
		mtpr(MCESR, 0xf);
		if ((type&0xf)==MC750_TBPAR
		 && (mcf->mc5_mcesr&0x4)
		 && ResumeableInstr(mcf)) {
			printf("tbuf par!?!: flushing and returning\n");
			return;
		}
		break;
		}
#endif

4.	The following routine should be added to machdep.c

#if VAX750
static u_short InstrBitMap[] = {
	0x0f6b,	0xffbf,	0xffff,	0xffff,
	0xffff,	0x002f,	0x0f00,	0xc18a,
	0xffff,	0xffff,	0xffff,	0x03ff,
	0xffff,	0xffff,	0xffff,	0x01ff
};

static int
ResumeableInstr(mcf)
	register struct mc750frame *mcf;
{
	register u_int OpCode;
	register u_int ret;

	/*
	 *  If instruction changed mode cannot resume
	 *  (this part untested)
	 */
	if ((mcf->mc5_svmode)&03 != (mcf->mc5_psl&PSL_CURMOD)>>24) {
		printf("CP mode changed\n");
		return (0);
	}
	/*
	 *  VMS has the process mapped in to the system's
	 *  address space.  Don't think UNIX does.
	 *  (this part tested)
	 */
	OpCode = ( mcf->mc5_errpc&0x80000000 ?
		*((char *) mcf->mc5_errpc) : fubyte(mcf->mc5_errpc) );
	ret = ((InstrBitMap[(OpCode&0xf0)>>4])>>(OpCode&0xf))&1;
	printf("Instruction %x %s resumable\n", OpCode, (ret ? "" : "not"));
	return (ret);
}
#endif VAX750

              David Rosenthal {vax135|mcvax}!edcaad!dave



More information about the Comp.unix.wizards mailing list