Reliability of (Sys V) file systems on power failure

Karl Denninger karl at naitc.naitc.com
Thu Sep 27 01:31:08 AEST 1990


In article <1990Sep24.231148.18053 at ico.isc.com> rcd at ico.isc.com (Dick Dunn) writes:
>> Every UNIX I have seen behaves in the manner you describe.  If you
>> hit the red switch or experience a power outage without performing a
>> graceful shutdown, you deserve whatever you get...
>
>Years ago, that was generally true...and it was one of the major objections
>to using UNIX in "commercial" systems.  As a result, essentially all
>variants of UNIX have had file system changes to "harden" them against
>problems caused by power failure.  Damage from a power outage should be
>limited to files being written at the time the power went away, and should
>be localized (e.g., a frozzed/missing block of data, not an entire file
>gone or destroyed).  Going back to the original question:  If you're
>seeing major file system damage due to power failures, there's something
>wrong that should be fixed.  I'm not just spouting applehood/motherpie; I
>haven't seen a file system damaged by power failure in years.  I've even
>tried to damage file systems by getting things as busy as I could, then
>turning off machines.  (Of course, the T-storm just now gathering over the
>hills will probably destroy all my files and prove me to be drastically
>wrong.:-)

Ok, I've seen filesystem damage of this type, on your Operating System
(2.0.2), and another employee here has seen the same thing on his copy of
ISC 2.2.

To put it bluntly, there's something wrong that should be fixed.

>The software in hardened file systems is pretty good at ensuring that
>things get written when they should, as they should, so that fsck can pick
>up the pieces.

OK, so why did my /etc/default/boot file get whacked a few months back when
we had a power failure?

(For the unknowing, lacking an /etc/default/boot file, which is READ ONLY,
you can't boot the machine!)

>(Detail:  One pin out of a PC power supply is POWER GOOD.  On a low-
>voltage condition, the power supply is expected to drop POWER GOOD; the
>motherboard logic must use this to drive RESET on the bus.  Bus cards
>must honor RESET as an indication of either system start-up or power
>failure.  If this doesn't work, you've got a hardware problem.)

Host adapter was a Adaptec 1542B, disk a Maxtor (which has power-safe logic
that disables the write gate when power goes out of safe margins).

>If you need constant availability of systems, an UPS is essential.  If data
>integrity is paramount, an UPS helps but there are other things you need to
>do as well.  My point is that file systems and hardware are expected to be
>robust enough that you should *not* tolerate power failures corrupting
>file systems.

Ok Mr. Dunn, the gauntlet has been thrown down.  If you want details of the
failures we have had with YOUR OS (btw, SunOS4.1 doesn't seem to take these
hits) you're welcome to call me here.  

I await your response.


--
Karl Denninger	AC Nielsen
kdenning at ksun.naitc.com
(708) 317-3285
Disclaimer:  Contents represent opinions of the author; I do not speak for
	     AC Nielsen on Usenet.



More information about the Comp.unix.questions mailing list