Root filesystem bad free list problem -- HELP!! {whimper}

RobertsonAL alanr at drutx.UUCP
Fri Apr 27 08:35:12 AEST 1984


This is almost certainly due to a bug in the concurrency management
in the UNIX free list handler.
We experienced the same problem here for two months!! on 3-8 machines
involving PDP-11's running 3.0, vaxen and 3b-20's running System 5.
It is fixed in UNIX/370 (where concurrency is a BIG issue -- many
users, lots of CPUs, and a non single-thread kernel),

and (if I recall correctly) the problem went like this:
	
	1)	superblock free list runs out, and it 
			begins getting rebuilt by the O/S
while
	2) someone allocates, then frees a block before 1 completes.


	This fouls things up, since the free list is not locked
	for the duration of the rebuild.

	UNIX/370 fixed this by locking the free list for the duration
		of 1) above.  This makes response time on the machine
		occasionally glitch, while the free list gets rebuilt.

We tried the goat, it had absolutely no effect (except for a residual
smell, we can't seem to get rid of).

We NEVER FIXED THE PROBLEM -- It came by itself, then it went away by
itself, without us ever being able to track it down to the exact logic
that was causing the problem.  And we did try mightily.
The UNIX/370 folks I talked to indicated that this same problem exists
in EVERY version of UNIX since Version 7.

You just get lucky almost all of the time, until you don't get lucky
anymore.  If you want to file this problem with WECO, please call me,
and we'll gladly substantiate your claim.  THIS OUGHT TO GET FIXED!!!

	-- Alan Robertson
	   ihnp4!drutx!alanr
	   AT&T Information Systems Laboratories
	   Denver, Colorado
	   Room 31Y-27, x4796



More information about the Comp.unix.wizards mailing list