inode table full

Rick Ace rick at nyit.UUCP
Mon Mar 24 23:55:13 AEST 1986


Barry Shein writes,

> There's definitely an inode bug in the original 4.2 tape distribution.
> Not sure what the fix is tho it's been discussed a few times on this
> list. The way to find out if that is your problem is to use pstat
> to determine if any inodes have a ref count of -1 (will, I believe,
> appear as ff or 255 on the output of pstat.) If so, you got it and
> I believe it can lead to inode table full messages. Temporary fix?
> Re-boot and pray for the best...

Yes, there are several bugs, all of which revolve around the subject
of file descriptor management and its interaction with devices such
as terminals, whose open and close routines can sleep at a priority
greater than PZERO.

Consider the case where a 4.2bsd user program issues a close() syscall.
The kernel can (not necessarily in this order):

	1.  Free the file descriptor (clear u.u_ofile[fd] and
	    u.u_pofile[fd]).
	2.  Decrement the f_count value for the corresponding "file"
	    table entry.  If the count goes to zero, release the entry.
	3.  Decrement the i_count value for the "inode" structure.
	4.  In the case of a character-special device like a tty,
	    call the driver's d_close routine.

Problem:  there are cases where the kernel gets halfway through doing
an open() or close(), sleeps > PZERO, and gets interrupted by a signal
before the rest of the operation is complete, leaving file/inode/user
tables in an inconsistent state.

One scenario that can cause i_count to go below zero goes like this:
A user program calls close() to close a tty file descriptor.  UNIX
decrements f_count and i_count and then calls ttyclose().  If the tty's
output character queue is not empty, the kernel sleep()s at a priority
greater than PZERO, waiting for the queue to drain.  Normally, once the
queue has drained, the kernel awakens and proceeds to clear the u_ofile
and u_pofile entries for the file descriptor.  Assume, though, that while
the process is sleep()ing on t_outq, it receives a signal.  The kernel
aborts the sleep AND NEVER CLEARS U_OFILE AND U_POFILE.  When the process
subsequently issues another close() call to that file descriptor (either
explicitly, or implicitly via the "exit" syscall), f_count and i_count
are decremented AGAIN, SPURIOUSLY.  i_count can fall below zero, behaving
like a very large count that will never reach zero.  Result:  jammed
inode till next reboot.

The kernel performs two main tasks during close():
	1.  Adjust all share counts on "inode" and "file" table entries,
	    freeing these entries when appropriate.
	2.  Call device-specific logic to close the device.

When the kernel calls the device's d_close routine, it assumes the risk
that the routine will sleep and be interrupted by a signal.  It is
therefore imperative that the kernel do either:  all of #1 followed by
all of #2, or all of #2 followed by all of #1.  4.2bsd begins some of
the work in #1, then does #2, and finally finishes #1, giving rise to
the bugs.  There are places where a process can reference "file" table
entries it does not own anymore.

The essence of our fix was to rearrange the kernel's close() logic to
do task #1 completely first, and then do task #2.  It is possible in
this case for close() to return an EINTR error code while closing a
tty file descriptor, even though u_ofile and u_pofile have been cleared.
This seems preferable to the other alternative (#2, followed by #1)
because most programs don't examine the value returned by close().

-----
Rick Ace
Computer Graphics Laboratory
New York Institute of Technology
Old Westbury, NY  11568
(516) 686-7644

{decvax,seismo}!philabs!nyit!rick



More information about the Comp.unix.wizards mailing list