Prevention of Convoys in NFS daemons

Sun Mar 24 22:32:00 AEST 1991

The following is a simple idea which would fix an egregious bottleneck in
NFS service.  It should be relatively simple to implement.

I thought of this a couple of years ago while watching my servers crawl
under various load conditions.  The idea seems obvious to this
practitioner.

I am posting this article because the current debate over intellectual
property rights and software patents makes me feel like even simple ideas
like this should be published to establish their status as prior art
and/or obvious art.

Lately I have heard of a software-only NFS-accelerator product called
eNFS.  The marketing literature for the product has been pretty
uninformative about what it actually does.  It is my hope that this idea
would be incorporated into many NFS-server products.  Considering its
obviousness, I hope that the high-performance NFS vendors have already
thought of this also.  Finally, I hope that all vendors will put their
efforts into product innovation instead of litigation.

The problem:

The stock implementation of NFS service (as delivered by SUN) runs a small
number of "nfs daemons" to handle requests from NFS clients.  Each of
these daemons blocks (accepts no more requests) while it services its
current request.  If you have N nfsd processes, then you only get
simultaneous service on N nfs requests at once.

The stock implementation makes no effort to prevent all the nfsd processes
from blocking on the same shared resource, e.g. a disk arm.  If there is a
burst of heavy traffic on disk arm, then all or most of your nfsd
processes will set in a convoy waiting for action from that disk arm.
Meanwhile, the other arms in the system will sit idle, even if additional
I/O requests are pending for them, because the I/O requests are sitting in
a queue waiting for an nfsd to pick them up.

"Packing" a large MH folder is one good way to demonstrate this.  Not only
will it take a while for the packing client to get it done (all those
renames are pretty slow), but response time for all of your other
filesystems will become poor, because the nfsd processes are being hogged
by one client and are mostly sitting in a single convoy waiting for
something to happen.

The solution:

A better implementation would dedicate an nfsd process to each arm, so
that heavy activity on one arm would not cause an inordinate increase in
time spent on the queue for I/O requests on other arms.  Think of it as
ensuring fairness in distribution of I/O requests.  The same idea could be
implemented by eliminating the multiple nfsd processes and finding some
other parallel way for NFS work to find its way to the disk.

Liudvikas Bukys
<bukys at cs.rochester.edu>