Another reason I hate NFS: Silent data loss!

Boyd Roberts boyd at prl.dec.com
Mon Jun 17 18:45:33 AEST 1991


In article <17105 at darkstar.ucsc.edu>, jik at cats.ucsc.edu (Jonathan I. Kamens) writes:
> My point is that the problem is fixable, and it isn't even difficult to fix. 
> Whether Sun (not to mention other vendors) has or will ever fix it is another
> question entirely....
> 

Sure, the problem is fixable.  The protocol is the problem!
It would appear that's the last thing Sun are likely to change.

All this nonsense about statelessness is just a smoke screen.
As soon as anyone proposes a change the immediate response is
`but then it's not _stateless_'.  We'll as far as I'm concerned:

    s/stateless/bug-full/

The whole thing is a charade.  You see that real disk there?  What's
contained on it.  Is it files?  Is it data?  Is it state?  Yes is it!

I could never understand this nonsense.  What makes them so sure that
when a crashed server comes up your data will still be intact?  If
a server crashes your system calls should error, no re-trying;  error --
plain and simple.  How will NFS ensure that the kernel or fsck or
the buffer cache won't have trashed my file as a result of the
crash?  Don't say `inode generation number',  it's just not a defense.

That UDP `protocol' really sucks the mop.  Soft/hard mounts.  What a joke.
What's needed is a connection based stream protocol.  Then you know the
difference between remote slow and remote dead.  It's all a question
of flow control.  NFS has none.  Not even sequence numbers.  

We run a lot of NFS here, and it's as flakey as C shell.  Two of the machines
here just go to sleep every once in a while, when the traffic gets a little
strong.  God knows why.  It's going to take a lot of pondering to track it
down.  Even then, it's probably a fundamental design problem that can't,
or won't, be fixed.



Boyd Roberts			boyd at prl.dec.com

``When the going gets wierd, the weird turn pro...''



More information about the Comp.unix.wizards mailing list