fsck -p not checking everything
jpl at allegra.att.com
jpl at allegra.att.com
Tue Feb 28 22:27:14 AEST 1989
We ran into a similar problem with the 4.3 fsck. The basic problem was
this. fsck waited for parallel fscks to complete using
if (preen) {
union wait status;
while (wait(&status) != -1)
sumstatus |= status.w_retcode;
}
However, if the process terminated abnormally, retcode was 0, so fsck
failed to detect the error. We changed the code to be
if (preen) {
union wait status;
while (wait(&status) != -1) {
if (status.w_termsig) {
printf("child died with signal %d during pass %d\n",
status.w_termsig, passno);
sumstatus |= 8;
} else
sumstatus |= status.w_retcode;
}
}
This treats abnormal termination (MUCH more serious than a bit of file
system corruption) as an error as well. How could a process terminate
abnormally, you might ask? There's a line in pass1 that looks like
ndb = howmany(dp->di_size, sblock.fs_bsize);
ndb (the number of data blocks) is subsequently used as an array index.
But we found that with suitably huge di_size, howmany could make ndb go
negative, so the array reference caused a dump. We cleaned that one up by
adding the check...
if (ndb < 0) {
if (debug)
printf("bad size %d ndb %d:",
dp->di_size, ndb);
goto unknown;
}
Until we put in these fixes, we had a file system that would make fsck
drop core, but fsck -p didn't notice it, so the condition persisted for
weeks. We finally caught the problem when we ran a fsck without the -p,
and noticed that it died on that file system.
John P. Linderman Department of Bounced fsck's allegra!jpl
More information about the Comp.sys.sun
mailing list