4.2bsd kernel auto-nicing, scheduling

Tue Feb 18 08:50:44 AEST 1986

    When I have to kill infinite-looping processes, I usually find
them at nice 0, while much more deserving processes are at nice 4,
getting nowhere.  This is brought about by the formula that the
4.2bsd softclock() routine uses to determine when to "nice" a
process (names simplified for clarity):

	if (uid != 0 && nice == NZERO && utime > 10*60)
		nice = NZERO+4;

Only processes doing "real work" are penalized.  Runaway sendmail's
and syslog's go unbridled, since they are system overhead; likewise
page-thrashers and processes looping on EOF go un-niced, since these
use only system time, no user time (or nearly so).  Processes already
at nice 1 are left alone, so the malicious user can get an advantage
over the "real work" jobs at nice 4 (and magnify that advantage simply
by starting 20 infinite-loop jobs at nice 20, as one user demonstrated;
the load average was so high that the nice 4 job got no runtime).

    No doubt the intention of this code was to improve keyboard
response, but the side-effects are awful:  no one can get any
heavy-duty computing done.  Thus it seems that the code ought to
be removed, but if nothing gets nice'd, response gets pretty slow.

    One possibility would be to make the test unbiased:

	if ( utime+stime > 10*60 && nice >= NZERO && nice < NZERO+4)
		nice = NZERO+4;

but now we're even more likely to zap long-lived login shells
and editors; the cure is no better than the bug.

    What we really want is to favor processes that someone is
waiting for, and disfavor "overnight" types of processes.  The
problem is, how do you tell which is which?  This is what nice(1)
is for, but we've tried waving a carrot (steeply declining cpu
charges for nice'd processes) at the users, and they still don't
use nice.

    Despite lack of mind-reading hardware, the system managers
can often tell when a job is not being waited for:  when the
owner hasn't typed anything for a long time, we renice(8) his
running processes unmercifully.  Complaints, if any, relate
more to perceived uneven application of this policy than to
the policy itself.  Unfortunately, this type of scheduling is
crude and often too-little-too-late.

    I would prefer that the scheduler itself knew something
about interactive processes, and gave them a higher percentage
of the cpu.  Interactive processes are distinguished by a very
bursty pattern of cpu usage:  compute, I/O wait, compute, I/O
wait, .... i.e, frequent waitng for slow events, while processes
that we want to disfavor do not wait for slow events.

    My proposal is a modification to kern_synch.c such that when
a kernel process calls sleep() with pri > PZERO (waiting on a
slow event), the associated user process is given a short-term
boost, probably by forgiving some of p_cpu (recent cpu usage).

    Shells waking up after a command exits would get a boost.
Processes waking up from a read() on a tty or pipe would get
a boost if they did, indeed, have to wait; processes doing
single-character reads would probably find typeahead waiting
for them, not have to go to sleep, and not get the boost.
Disk I/O and paging would NOT get a boost (they are not slow
events).

    Does anyone see a way that this could be abused?  I am
reminded of VMS, which gives a short-term priority boost for
doing I/O; doing single-character writes to a tty would make
one's priority soar, so much that no one else could get any cpu.
That wouldn't happen here, since tty output usually just stuffs
another character in the output buffer, with no sleep required,
hence no priority boost granted.  Are there any other ways that
this could go wrong?  Will I cause lots of context switches from
rapidly varying priorities?  Will pipe readers and writers start
preempting each other in a battle of priorities?

    Comment, analysis, suggestions, flames to:

Don Speck	seismo!cit-vax!speck  or  speck at vlsi.caltech.edu