UDA50 and bad blocks (and a bug in dump - 4.2BSD)

Tue Aug 6 14:33:32 AEST 1985

[Please note that all my responses are based only on what I *think* is
true; I have almost no hard data on UDA50s.]

>As I understand it the UDA-50 controller transparently does bad-block
>forwarding PROVIDED the blocks were flagged the last time the disc
>surfaces were formatted. The problem occurs when you start to get
>previously good sectors reporting hard errors - even though you
>cannot read the contents of the sector it might not really be a
>bad block.

Let us agree on some definitions first:

	bad sector: a sector from which data written cannot reliably
		be reread.
	soft error: an error that is correctable, in this case by
		using ecc information.
	hard error: an error that is not correctable (the original
		data cannot be reconstructed).

(Note that all these errors can only be detected by attempting to
read a sector.)

Anyway, one may get a hard error that is not due to a bad sector,
if the data has been lost due to (e.g.) write current failure rather
than media problems.

Now, as to UDA50 bad sector forwarding:  DEC SDI (Standard Disk
Interface?) format specifies that there are some number of RCT
(Replacement and Cacheing Table?) areas on the disk (in the case
of an RA81 there are four).  For each sector, the controller will
look in the RCT tables to see if the sector has been forwarded.
It will never add a sector to these tables itself.

>We recently had a problem on an RA-80 with gradually increasing
>intensity of soft and then hard errors.

How unusual :-).  (I wish I knew the magic words that would transform
RA81s into Fuji Eagles.)

>The DEC service engineers just said well, if you ran VMS...

Sigh.  However, partial good news: DEC is rumored to have a standalone
program called "rabads", which can be used to add sectors to the
RCT tables.  If your field service rep hasn't heard of it, try to
contact someone in Ultrix support.  (I have not actually seen this
program myself, however.)

>[...] Then of course a hard error occurred in the inode area of
>the user filesystem [...].

>Then we found out that DEC diagnostics cannot just read a disc to
>find errors, it must first write known data...well the last backup
>was fairly recent.  The diagnostics laboured happily over the night
>and reported about 16 sectors [...]. Reformatting the disc and adding
>the bad sector info reported 20 sectors revectored, and then
>retesting the disc gave a similar number of fresh bad blocks. The
>problem turned out to be the read/write amplifier board - there
>was nothing wrong with the head/disc assembly.

I reacall some ECOs on the r/w board: problems with the write
current levels, I believe.  In any case we still get an inordinate
number of "lost rd/wr ready drive error" errors (code 11, subcode
4, in MSCP lingo).  I wonder.

>Lessons: (well they were new to me)

>1. BUG IN DUMP: it reads inodes in 8k chunks - fine... but if one
>sector out of the 16 is unreadable you've lost the lot. By that
>stage it is probably impossible to recompile dump with a smaller
>block size.

When the driver does bad block forwarding itself this is less of
a problem, since it occasionally recovers.  However, your point is
well taken: dump should retry using 512 byte reads.

>2. If the software added bad blocks to the hardware revectoring
>table on its own account then there would have been a race in our
>case to see whether we first filled up the bad-block table with
>not-really-bad blocks or clobbered one of the inode blocks. No
>operating system can survive having its directory structures
>corrupted (even, I am told, VMS) and if that happens there is
>nothing to do but a dump/reformat/restore. Until that occurs, and
>if the errors are in file data areas only, it is a fairly simple
>matter to allocate sectors with hard errors to dummy files that
>can be ignored.

The bad block forwarding should only be done after the "bad" sector
has been tested, since the driver sometimes reports bad sectors
when they have only transient hard errors.  Dave Gehrt's driver
does this.  Also, when the block is forwarded the replacement sector
must be initialized with a "forced error" if the original data is
suspect.  This error will vanish when the block is rewritten later.

Unix *can* recover from losing directories; losing inodes is worse
(the files are essentially gone) but not all is lost: if the
remainder of the disk is readable, fsck will usually handle it,
even if you have to copy just the readable portions to a new drive
first.

Replacement is done like this: when you get a bad block report,
1. read the original data, and remember whether it succeds,
2. copy that data to RCT sector 1 ("spare" sector) (all RCTs),
3. write test pattern, if fail, replace,
4. read test pattern, if fail or doesn't match, replace,
5. ignore the error, copy the spare sector back and return
   (with forced error iff step 1 failed),
6. replace: allocate a replacement sector,
7. write the replacement sector entry in all the RCTs (i.e.,
   mark the RCT entry in use),
8. issue M_OP_REPLACE command to replace the original sector,
9. copy the spare sector to back to the original logical
   block (which has now been remapped).

Of course there are all sorts of things that can go wrong, so it's
not quite that simple.

You are indeed in trouble when the RCT fills up.  At that point
your only option is to replace the HDA.  Fortunately an RA81 RCT
holds over 600 sectors.  (RA81s tend to come with 50-200 bad sectors
already mapped!)

>The difficulty inherent in any automatic bad-block table rewriting
>lies in judging when the unreliability of a given sector becomes
>intolerable; certainly a single instance of failure which is cured
>by rewriting it should not be sufficient.

(This was covered above.)

>This leads to variable criteria depending on the location within
>the disc partition. I would suggest that the simplest solution to
>impliment and to use would be a user program allowing manual entry
>of a block into the re-vector table (all volunteers one step forward
>please).

I intend (in my copious spare time :-) ) to someday allow an ioctl
in the UDA50 driver that forwards a given sector.  This can then
be done by hand or by some program that tallies /usr/adm/messages
or whatever.  In the meantime rabads (if it exists) is a fairly
weildy solution (weildy being the opposite of unweildy).

>3. reliable file systems? We may have umpteen cloned superblocks
>in 4.2BSD but for a reliable system we would also need duplicated
>inodes. Try mounting a filesystem with unreadable inodes and see
>what happens.

Depends on how you define "reliable".  4.2 can recover from many
kinds of disk trashings, but you're going to lose *something*.
(And you're not supposed to *mount* filesystems with bugs anyway.)

>4. How do you tell which block is giving the hard error [....]

The driver should report the LBN (logical block number) from the
hard error datagram.  (There may be no datagram; in such cases one
must guess.)

>Cameron Davidson
-- 
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 4251)
UUCP:	seismo!umcp-cs!chris
CSNet:	chris at umcp-cs		ARPA:	chris at maryland