Disk Mirroring (was Re: Altos 5000)

Dick Dunn rcd at ico.isc.com
Sat Sep 1 03:23:29 AEST 1990

dtynan at altos86.Altos.COM (Dermot Tynan) writes, starting from the

> > > Even "reliable" disks eventually die.
> > True.  So do reliable controllers.

> I don't know what your hardware background is...

Hmmm...you probably don't want a bio right now, but I did spend some fair
time working in a disk test engineering group.  I won't make any great
claim based on that, only that my experience with disk failures is more
than casual and anecdotal.  Whatever...

>...but let me assure you that the
> following statement is Law:
> 	MTBF(controllers) >> MTBF(disks)	..........................(i)

Now, see, here's how flame-fests get started...you assert something as a
"Law" when I "know" it's not so.  In the past (ten years or so, let's say)
you were close enough to right.  It's really no longer true.  Depending on
a handful of factors, either
	MTBF(controllers) > MTBF(disks)
	MTBF(controllers) ~ MTBF(disks)
> No-one can claim to produce a completely fault-free system.  Most of the
> rhetoric is exactly that.  "Fault Tolerant", "Fault Resilient", etc...

We agree there, and so we move on (as you suggest) to trying to find the
hot spots for failures.

> ...In general
> terms, if you want to make a system more resilient to failure, the first
> place to look is in any non-solid-state system.  Ie, anything with moving
> parts.  In the average system, this means the disk drives...

This is a good place to start.  It's conventional wisdom and common sense.
(I'll add that the second place to look is wherever you've got true analog
circuits--which is *also* in the disk subsystem, though it may be split
between controller and drive.)

But now consider:  *Every*body knows that the disks are potentially a
serious weak point--not only are they mechanical, but they hold your "per-
manent" data.  Even the disk manufacturers know it, and they don't like
being the fall guys for every system failure.  So they find ways to make
their disks more reliable.  Now, it's not exactly news that the disk boys
are in the hot seat, but in the past it was relatively harder to make
reliable disks at a decent price, so we accepted higher failure rates and
did other things to mitigate them.

Disk manufacturers are doing a much better job these days.  It's not
cheap--the price of disk is one of the larger chunks of the total price of
most systems.  What's really happened is that the disk manufacturers and
system architects have agreed that disk reliability is important enough
that they are spending enough money there to bring the reliability of the
disk subsystem in line with the reliability of the rest of the system.
That's just good engineering--it doesn't make sense to have one part of a
system (particularly a critical part) far less reliable than the rest of
the system...you go spend money on the unreliable part until it's good
enough or until it's not wise to spend any more on it.  The change in
recent years is that it's possible to buy good enough reliability without
screwing up the overall system cost.

The true MTBF of small disks has probably increased by almost a factor of
10 in the last decade.

> ...Disk mirroring will slow down disk writes (which aren't the bulk of
> disk operations, anyway), but it will double your disk reliability.

1.  Yes, writes aren't the bulk of the operations.  However, they can
commonly vary from about 1/3 (two reads for every write) to 1/10 of the
total load.  Your point is good, but you have to be a little careful about
how much weight you give it.

2.  Disk mirroring will double the reliability of the disks themselves,
but that doesn't translate into a doubling of the reliability of even the
disk subsystem, let alone the whole machine.

>...Certainly "journaling" is another approach.  However, it puts the onus on
> the person writing the application, rather than hiding it in the OS...

Not necessarily.  For an application writer, you might do that if the
system doesn't support it.  But you folks are system designers; you'd put
it in the system.  (Nothing novel about that...after all, you've modified
the file system for mirroring, right?  You could just as easily have
implemented journaling.)

> ...Altos, like most companies is a slave to its user community...
> ...They want mirroring.  We implemented it...

All understood...design by customer is uncomfortable.  But I'm more inter-
ested in looking at the real technical aspects of mirroring.

> 	MTBF(controller) >> MTBF(disks)		Get it?

Now, now, don't get too pushy...:-)

I still say "get better disks."  MTBF of good modern disks is many years of
power-on time.  You will get card failures in that amount of time based on
connector oxidation, if nothing else.

> > I've seen as many motherboard and controller
> > failures as disk failures.  I don't pretend my experience is typical...
>...I suggest that you have some serious design flaws here.  See Law (i).

I don't design hardware, and Law (i) isn't a law.  But while we're talking
about MTBF, let's note that
	MTBF(hardware) >> MTBF(software)
for most systems.  That's another reason I suggested journaling; it gives
a second version of your data created by different code than the first.

> Furthermore, even if the controller *does* die, you can snap on a new
> controller, and continue, a lot faster than you can replace a disk, and
> restore from backups...

*After* you figure out that you've got a bad controller.  Depending on the
failure mode, you might have done some real damage in the meantime.

> > In this case, I'm not arguing that
> > mirroring is worthless, but I do argue that it's inordinately expensive
> > and only addresses one small part of the overall reliability problem...

> A third time:
> 	MTBF(controller) >> MTBF(disks)

while (strcmp(grab_input(),"MTBF(controller) >> MTBF(disks)") == 0)
	puts("buy better disks!);

> What exactly do you mean when you say "expensive"...

I mean that the cost of disk mirroring is a doubling of the cost of disk
drives in the system...and they're already a major part of the cost of the
Dick Dunn     rcd at ico.isc.com -or- ico!rcd       Boulder, CO   (303)449-2870
   ...I'm not cynical - just experienced.

More information about the Comp.unix.i386 mailing list