Trouble killing processes

Wed May 18 05:51:13 AEST 1988

> The real fix would of course be in the kernel.  I would suggest setting
> a timeout on each system call.  This way, an lseek on a dead tape drive,
> say, would fail after n secs of cpu.  Some sort of context might need
> to be saved before the syscall starts, so things can be restored.  This
> could be expensive.  Comments?

Probably not a good idea.

"lseek" is a bad example; in all current UNIX systems that I'm familiar with,
"lseek" only sets a "seek pointer" in memory - it never goes near the device.
This pointer is then used by the driver to position the tape before doing any
I/O operation.

A more germane example *might* be an I/O operation or an "position the tape"
"ioctl" operation on a dead tape drive, except that the *only* reason this
would require a timeout should either be that the tape driver is buggy and
doesn't immediately detect a dead drive or that it doesn't have some timeout
scheme *in the driver* to detect a dead drive.  Even such a timeout could be
tricky; some magtape operations can take a *very* long time to complete.

Basically, system calls should take as long as they need to; this could very
well be infinite ("pause()" or "sigpause()") or, worse, finite but
indeterminate.  In either case, no timeout can be imposed.

A typical "wedged" process is either waiting for something that *must* complete
(in which case its unkillability is unfortunate but unavoidable) or is hung due
to a kernel bug (in which case the real fix is, of course, in the kernel - but
it's not to kludge in a timeout).

(P.S. the timer obviously doesn't want to be based on CPU time - a blocked
process tends to consume CPU time *extremely* slowly, if at all.)