NFS [un]reliability

Karl Kleinpaste karl at cbrma.UUCP
Mon Nov 10 12:37:40 AEST 1986


mike at louis.UUCP writes:
>Recently we have been doing a study of NFS fileservers and we have
>come across unreliability in NFS (i.e writing something to a remote
>file and finding something different when reading it back) when the
>server was under extreme load. Now we are starting to notice the same
>behaviour on our existing Sun fileservers. 
>
>The question is, have other noticed this and does anyone know why
>it happens? 

[mumble]

Yes, I've seen such a thing.  At OSU, there is a small set of Suns
(11?), 3 of which are Sun-2s and the rest are recently-purchased
Sun-3s.  Unfortunately, one of the Sun-2s is the server for *all* the
rest.  Some would call this a Bad Thing, and they would be right.  It
is equipped with 2 Eagle drives for a decent amount of disc, and all
those other Suns are usually quite busy during office hours.

This problem was first noticed in, of all things, the "hack" game, and
more recently in GNU Emacs.  GNU Emacs has lisp code to detect whether
a file has changed on disc more recently than the last time the
current user either read the file in or wrote his changes out.
Periodically, when the server node is seriously overloaded (which is
the case more and more often), GNU Emacs utters the evil phrase, "File
has changed on disc; save anyway [y or n]?"  It is *believed* (that
is, we can't quite prove it yet) that this is due to the sequence of
events where [a] Joe User saves his file, which causes additional work
for an already-overloaded server, [b] GNU Emacs stat(2)'s the file to
get its modification time, but [c] the server is so overloaded that
the file wasn't finished being written at the time of the stat(2), so
[d] Joe goes on and hacks at his file a while longer, [e] issues
another save for it, at which time [f] GNU Emacs stat(2)'s the file
again, compares it against its saved write-time, and [g] finds that
the last modification time is later than the saved write-time.

Potent words of evil tend to get uttered by Joe when he sees GNU
Emacs' comment, because (generally speaking) he hasn't the FAINTEST
idea what caused it.

> And, of course, does anyone know how to stop it?

OSU is choosing to solve the whole problem (that is, overall
performance, not just GNU Emacs and similar programs' foolish
comments) by replacing the Sun-2 file server with >1 Sun-3 file
servers.  You do what you have to.  Unfortunately, it costs
significant $$$ to do what you have to in such cases.
-- 
Karl Kleinpaste



More information about the Comp.unix.wizards mailing list