fsck -p not checking everything

jpl at allegra.att.com jpl at allegra.att.com
Tue Feb 28 22:27:14 AEST 1989


We ran into a similar problem with the 4.3 fsck.  The basic problem was
this.  fsck waited for parallel fscks to complete using

	if (preen) {
		union wait status;
		while (wait(&status) != -1)
			sumstatus |= status.w_retcode;
	}

However, if the process terminated abnormally, retcode was 0, so fsck
failed to detect the error.  We changed the code to be

	if (preen) {
		union wait status;
		while (wait(&status) != -1) {
			if (status.w_termsig) {
				printf("child died with signal %d during pass %d\n",
					status.w_termsig, passno);
				sumstatus |= 8;
			} else
				sumstatus |= status.w_retcode;
		}
	}

This treats abnormal termination (MUCH more serious than a bit of file
system corruption) as an error as well.  How could a process terminate
abnormally, you might ask?  There's a line in pass1 that looks like

	ndb = howmany(dp->di_size, sblock.fs_bsize);

ndb (the number of data blocks) is subsequently used as an array index.
But we found that with suitably huge di_size, howmany could make ndb go
negative, so the array reference caused a dump.  We cleaned that one up by
adding the check...

	if (ndb < 0) {
		if (debug)
			printf("bad size %d ndb %d:",
				dp->di_size, ndb);
		goto unknown;
	}

Until we put in these fixes, we had a file system that would make fsck
drop core, but fsck -p didn't notice it, so the condition persisted for
weeks.  We finally caught the problem when we ran a fsck without the -p,
and noticed that it died on that file system.

John P. Linderman  Department of Bounced fsck's  allegra!jpl



More information about the Comp.sys.sun mailing list