How does Microport System V/AT handle bad blocks?

Steve Nuchia steve at nuchat.UUCP
Sun Dec 25 05:57:56 AEST 1988


In article <464 at tarpit.UUCP> rd at tarpit.UUCP (Bob Thrush) writes:
[concerning microbug phantom disk errors on second drive]

>The 2nd disk (that I'm having trouble with) is mostly used as the news
>spool directory, so it is definitely getting a whole lot different 
>activity than it did before the onset of the problems.  Each time the

>From my extensive experience with this problem if it gets you it
gets you in proportion to the frequency of write access.  News
spool is about the worst thing to put out there but I kept mine
there because I didn't want the errors eating anything I wanted
to keep.  Now I'm using Interacteve on Bell Tech.  Still have
some problems but nothing like Microport.  I spent a year and
a half of my life working with those clowns.  Boy am I a sucker.

>problem shows up, I find that each subsequent fsck finds more problems,
>usually associated with duplicates in the free list.  I wind up
>mkfs'ing the news file system to correct(?) the problem.  I am usually

The problem here is a BUG in FSCK.  There is a workaround.  I know
of at least two people in Microport who have been assigned to fix
it, I don't know if either of them made any more progress than I did.

The bug is that, for large filesystems, fsck's free block bitmap
gets corrupted.  The bitmap is built in phase 1, corrupted in phase 2
by an as-yet undiscovered mechanism, and used to rebuild a bad freelist
in phase 5/6.  Note that it will report a bad freelist on a perfectly
good filesystem, then proceed to trash it, if you let it.  When it
rebuilds a random freelist it uses some blocks assigned to files
as freelist chain block, corrupting the files.  When some of those
blocks fall in directories you really get filesystem hash.

The workaround is to run fsck on your filesystem but NOT ALLOW it
to REBUILD THE FREELIST.  Then run fsck -f on it.  The -f option
says to just run phase 1 and 5/6, and it can be allowed to rebuild
the freelist since it didn't scribble on its bitmap in phase 2.

My analysis of the code says that this is a compiler bug, but
there is the possibility that it is a subtle architecture
dependency in fsck itself.  In any case the mechanism appears
to involve aliasing of one or more blocks in fsck's "virtual memory"
code -- it manages a file-backed buffer pool using some of the
most twisted code I've ever laid eyes on.  The problem is not
sensitive to optimization when compiling fsck.  It is extremely
sensitive to the size and contents of your filesystem.  In my
experience filesystems that are small enough to not require a
temporary file are safe.

>BTW, I got a complete rundown of the meaning of the hard disk i/o
>errors from Randy Jarrett who copied a posting <358 at uport.UUCP>
>by Marc de Groot (then of Microport).  When I return from the
>holidays, I'll repost that if there is interest.  Thanks, Randy
>(and Marc).

Please do.
-- 
Steve Nuchia	      South Coast Computing Services
uunet!nuchat!steve    POB 890952  Houston, Texas  77289
(713) 964 2462	      Consultation & Systems, Support for PD Software.



More information about the Comp.unix.microport mailing list