cp rds fault

utzoo!decvax!watmath!dmmartindale utzoo!decvax!watmath!dmmartindale
Wed Mar 24 13:36:29 AEST 1982


	We had the same sort of symptom when our 780 was first installed.  It
would run fine for a few weeks, then start crashing every few hours with cp rds
faults, always rebooting itself and cleaning up.  I'd run the microdiagnostics
and there would be no error, and the problem would promptly vanish.  I
eventually discovered that it was a memory board (Trendata in this case)
which wasn't being initialized properly during the powerup sequence, and
after a powerup every other quadword on that card had bad ECC bits.  Since
UNIX clears memory with a movc5 (which does longword writes) and the memory
controller writes the bad data back into memory when it gets a multiple-bit
ECC error on a longword write, the memory contents are still garbage after
UNIX has tried clearing them.  Then, as soon as the process which was allocated
this memory referenced it, the RDS fault would occur (RDS (Read Data Substitute)
means uncorrectable ECC error; I should have explained that earlier).  We would
reboot and run for several more hours before UNIX even tried to use the bad
memory again, since the bad card was near the high end of the 4Mb.  When I
got around to running microdiagnostics, they would fix the problem; I suspect
that the microdiagnostics start out by clearing memory using clrq's, which
WILL overwrite the bad data in memory and fix the problem until the next
power flicker.
	So, if you get repeated RDS faults but aren't getting any soft ECC
errors on the memory and the memory diagnostics run fine, take a look at the
physical address that generated the fault.  If it's always on the same
memory card, suspect the problem above.  You should be able to verify it
by exchanging this card with another one and seeing if the problem migrates.
	Has anyone else seen this problem?

			Dave Martindale, watmath!dmmartindale



More information about the Comp.unix.wizards mailing list