VAX 11/785 memory ECC errors

Dan Forsyth dan at msdc.UUCP
Thu Feb 13 07:36:46 AEST 1986


We're having a problem with one of our 785s that manifests itself in
the following way:

    mcr0: soft ecc addr 515e syn 16
    mcr0: soft ecc addr 515e syn 16
    mcr0: soft ecc addr 515e syn 16
    mcr0: soft ecc addr 515e syn 16
    mcr0: soft ecc addr 515e syn 16
    mcr0: soft ecc addr 515e syn 16
    mcr0: soft ecc addr 515e syn 16
    mcr0: soft ecc addr 515e syn 16

    ?INT STK INVAL ...

    <normal, error free reboot>

We have nothing but DEC memory on the thing, so I'm making the assumption
that the BRL release of 4.2 that we're running should be able to handle
these errors correctly.  I've looked at it and it seems to do reasonable
things (but who am I to judge).

This scenario is now occurring at least three times a week; no other
memory errors show up at all.  Each time we get a single (different)
value for "addr" and always a "syn" of 16.  And the system crashes.

DEC has interpreted these addresses to refer to array 0.  We're now on
the third board.  This weekend DEC replaced array 0 and the lower memory
controller.  The system stayed up about 9 hours.

Does anyone have any experience with such behavior?  Is it definitely
hardware, or is the kernel doing something it shouldn't?  Do we go for
a new memory backplane next?

Thanks,

Dan Forsyth ({agkua,gatech,mcnc}!msdc!dan)
Medical Systems Development Corporation, Atlanta, GA



More information about the Comp.unix.wizards mailing list