SURVEY: Fault Tolerant UNIX (Repost)

the main man root at tndsyd.oz.au
Wed Nov 28 10:48:47 AEST 1990


I am currently writing a book about "fault-tolerance". One of the chapters
is aimed at measuring the Mean Time Between Failures (MTBF) of a
UNIX computer system. Although I work for TANDEM computers this survey
is an independent survey conducted for my own use only. It has nothing
to do with any business or issues relating to TANDEM computers.

I would appreciate your help. All information will be treated in strict
confidence to myself only (this includes other employees at my company).
The closing date for the survey is 25/11/90. Results will be published
at the end of November 1990. No information will be disclosed regarding
those who participated in the survey.

PREREQUISITES:
This survey is aimed at those of you who are responsible for "looking after"
a UNIX computer system, i.e., if you are a System Administrator and
have responsibilities which may require you to do any of the following, then
you are the person to whom this survey is aimed at:

1) Installing application software.
2) Installing System Software including upgrade software.
3) Installing New file systems (i.e., formatting, using mkfs or similar)
4) Installing and/or managing communications software
5) Configuring a new UNIX kernel (i.e., installing a new device driver)
6) Adding users, groups, system maintenance (i.e., general admin)

QUESTIONS:
-------------------------------------------------------------------------
1) How many of the above prerequisites do you consider to be your
   responsibility?

2) How long have you been responsible for these tasks (years) ?

3) Is your computer system (A) a super computer (B) a large mainframe
   (C) a small mainframe (D) a medium range mini (E) a mini (F) a PC
   (G) other (please specify) ?

4) Is your operating system reliable (i.e., does it crash often )?
   (A) it never fails (B) it fails yearly (C) it fails monthly 
   (D) it fails weekly (E) it fails daily (F) it fails more than
   6 times in a year (G) Other (please specify).

5) Is your hardware reliable (i.e., does it crash often )?
   (A) it never fails (B) it fails yearly (C) it fails monthly 
   (D) it fails weekly (E) it fails daily (F) it fails more than
   6 times in a year (G) Other (please specify).

6) Is your system connected to an uninterruptible power supply ?

7) If your system suffers an outage (failure) due to an operating system
   failure or hardware failure, do you contact your vendor or supplier
   and report the failure.

8) If your system suffers an outage because of an administrative
   error, do you contact your vendor or supplier and report the failure ?
   (please be honest)

9) would you contact your vendor and report an outage if it was caused by:
	a) a power outage.
	b) an act of god (earthquake for example)
	c) a communications fault.
	d) Sabotage
	e) Terrorism or war
	f) Other (please specify)
	

10) Does your site have disaster recovery procedures ?

11) If you had to call an engineer to come and fix a hardware fault
    how long do you think it would take him/her to arrive on site ?

12) In your own words give a brief description of your understanding of
    the meaning of:
	a) data integrity
	b) high availability
	c) continued availability
	d) fault tolerance

--------------------------------------------------------------------------
Your help is very much appreciated. Thank you.



More information about the Comp.unix mailing list