Lock Daemon, lockf fails

dan at bbn.com dan at bbn.com
Sat Feb 10 06:05:03 AEST 1990


With regard to problems with Sun's locking daemon and network locking in
general, here are some problems we've found:

1. Sun's lockd and statd (3.4) are awful in the presence of other machines
implementing NFS but not lockd, such as Ultrix machines for Ultrix < 3.0.
If you NFS-mount such a machine's filesystem onto a Sun, then try (on the
Sun) to lock a file on the Ultrix machine, your process will hang forever,
unkillable.  To get around this we had to write a test preceding the
locking call that sends an inquiry to the host holding the file to be
locked; if we learn that no lockd or statd exists (you must check for
both) then we use local locking on the file instead.

2. Other vendors are no better.  The Ultrix 3.0 lockd sometimes pauses for
2 minutes when you first try to use it in a process.  We are still
tracking this one down, but it seems to depend on configuration issues
like where you're getting your hostnames.  A given configuration (i.e.,
/etc/svcorder, /etc/hosts, etc.  and the up/down status of the other
machines NFS-mounted to the one in question) will either always show this
problem or never show it.  A trace of the process shows it repeatedly
sending a message to some other host and waiting 5 seconds for a response.

3. Another problem we have seen under Ultrix 3.0 is that it often takes
several (non-blocking) fcntl calls before a file lock is granted.  We're
not sure what precipitates this behavior. We've seen this after locking
and unlocking a file with one process: when we try to lock the same file
through another process, several attempts are required. (To demonstrate
this bug, run a test program that simply calls fcntl in a loop, reporting
the number of iterations necessary to acquire a lock.) There are patches
for this bug, once you realize what's going on.  (On a DECstation, you
should upgrade to 3.1 before applying the patches; they don't work so well
under 3.0.)

4. It's worth pointing out a SunOS fcntl locking "feature" that may not be
obvious: if you open() 2 file descriptors on a single file, fd1 and fd2,
establish a lock on fd1 and then close(fd2), the lock established through
fd1 is lost.

Mark Sommer and Dan Franklin



More information about the Comp.sys.sun mailing list