One trashed file system can ruin your whole day

Alex S. Crain alex at umbc3.UMD.EDU
Wed Jul 6 04:55:37 AEST 1988


	This is my experiance with major disaster recovery. It's kind of long.

	Well, I knew it was coming, but as usual I was totally unprepared
for what happened. The senerio...

	I'm debugging a mail filter with adb. At some point adb halts on
SIGALRM. I continue, but adb is unhappy and confused. I try to restart the
program, adb breaks at a weird place, i continue, adb dumps core, and the 
kernal panics with

	panic:iaddress > 2^24

	Now, I'm smart enough to figure that this is bad, so I'm not too
surprised when the reboot dies with the same message. Well, to make a long
story short, 6 hours later things are back to normal. My best guess is that
the kernal through its guts up onto some random place in the file system,
probably over things like the freelist, a few directories, etc.

	As far as *why* it did that, I have no clue. I've been playing with
some custom syscalls, so it could either be a vicious kernal bug, or a vicious
device driver bug. I'm not willing to continue testing to find out :-).

	But I'm not posting to complain, but rather share some experiance. I 
was faced with the problem of fscking the hard disk, and relaoding some of
the software without reformatting. What I should have done was....

	1) boot floppy unix. floppy unix lives on disk 2 of the foundation set,
and uses a floppy mounted file system. You can get a root shell by first saying
that you do not want to save your files, but then changing your mind. The
conversation goes like...

	Do you want to save any existing user files?
		no
	This action will destroy any data on the hard disk. do you 
want to continue?
		no
	#

You can also substitue your own custom file system for disk 3 of the 
foundation set. Just mount disk3, cpio everything off, make a new mounted file
system with your changes. don't forget /lib/shlib, It has to be there so that
it can get loaded. once its loaded, it can go away.

UNDOCUMENTED FEATURE #1: /unix doesn't have to use /etc/init. If /etc/init is
not present, /unix runs the shell script /etc/profile in single user mode. If
this script contains the line 'exec sh', you get a single user shell.

So I should have had a custom floppy file system that would mount the hard disk
and give me a root shell. (/etc/profile => mount /dev/fp002 /mnt; exec sh)

	2) fsck the hard disk. fsck won't let you fsck a mounted file system.

UNDOCUMENTED FEATURE #2: fsck will let you fsck a *raw* mounted file system.

fsck isn't on the floppy, so we

	# /mnt/etc/fsck /dev/rfp002

	3) delete init files on the hard disk

	# rm /mnt/etc/init
	# rm /mnt/etc/rc

	4) make a startup file on the hard disk.

	# cat > /mnt/etc/profile
	exec sh
	^d
	
	4) boot hard disk from a floppy. this is disk 4 of the foundation set.
It is just like /unix, but lives on a floppy. So you can boot from the hard 
drive even if /unix is corrupt. to do this, do

	# sync;sync;sync;
	# /etc/reboot "a message"
	"a message"
	[Hit any key to continue]

the message string get reboot to issue the [hit any key...] message and wait 
till you put disk 4 in the drive.

	5) now unix is running with a single user shell. Disks 5-12 contain
a big cpio file of the foundation set. If you ever want to make a new 
foundation set, make a big cpio file of /, /bin, /dev /etc, etc, and use
that when the system asks for disk 5. 

Anyway, put the foundation set in /tmp, and then recover whatever files you 
might need from the root shell. you will need good copies of /etc/init, 
/etc/inittab, /etc/rc, and whatever is referenced from rc. 

use /bin/sum to check file validity (the sums should match between new & old
copies)

	Thats roughly how disaster recovery *could* go. It *did* go vaguely
like that, with alot of redundancy, and many more fscks, since I found out
all of this by trial & error.

	I hope this helps someone avoid some pain someday.-- 
					:alex.
					Systems Programmer
nerwin!alex at umbc3.umd.edu		UMBC
alex at umbc3.umd.edu



More information about the Comp.sys.att mailing list