Conjecture on why 4.0 is hurting on small systems

Thu Mar 16 09:29:14 AEST 1989

I've been following this discussion for a while, and the note from Wayne
Hathaway about the global page replacement strategy is in line with some
simple experiments I've been making.  Basically, releases prior to 4.0 had
two distinct memory regions: the buffer pool for file block caching and
read-ahead, and the page pool for programs.  The boundaries between the
two were fixed, so file I/O requests only competed with each other for
memory; similarly, processes competed against each other for page frames.

With 4.0, this all changed, as anyone who's booted a system can see.  The
old message about XXX,XXX bytes used for buffers is gone.  Instead, as I
understand it, the buffer and virtual memory manager compete for page
frames from one big pool.  While this makes it easier to do some things
(notably mapped files), the different access patterns for files vs.
process code + data may be the cause of the poor performance on small (4MB
is small????) machines.

For example, suppose you have Emacs in one window, a shelltool/cmdtool in
another window, and a couple of other applications (like a clock).  You've
been doing light work in all the windows, so response is OK.  Now you
decide to compile and link a large program.

Consider what happens: cpp reads in the original source and writes out the
expanded version; similarly, the various passes of cc proper, the
assembler, and the linker all read previous information and write new
stuff.  Now in the "old" days, all the file activity would be confined to
the buffer cache.  If the files were small enough, the cache could hold
the output from the previous pass (the input to this one) while creating
the output from this pass.  Once a pass is over, its input is not needed,
and will soon be overwritten.  With sufficient buffer space, this works
fine fine, and is biased towards the common Unix file processing pattern:
sequential access.

Out in process page space, the passes come and go quickly, so there are
typically many free pages around.  Sure, a few pages might be stolen from
Emacs, but this is relatively rare.

Now in 4.0, with the common pool, the Emacs pages are ripe for replacement
by *either* file data or process pages.  As both the file data and
compiler pass pages are "newer" they will tend to flush out Emacs pages
rapidly.  Also, unless there is some control in 4.0 that I'm unaware of,
it looks like old stale file data will be kept even if *more* Emacs pages
must be stolen.  Finally, Emacs code pages are particularly vulnerable, as
they are pure, and thus need not be written back.  Of course, as you madly
type in the Emacs window, trying to keep decent response, these pages will
be reloaded, only to be released in a few hundred milliseconds in response
to more file I/O.

This is all conjecture, based on simple "black box" experiments I've been
performing.  I have no access to 4.0 source code, so I can't verify either
the general hypothesis or the specific page replacement tactics.  I'm sure
Guy Harris will correct me if I'm all wet :-).

Mike Lutz
Rochester Institute of Technology
mjl at cs.rit.edu