File locking on networks

Nathaniel Mishkin mishkin at apollo.uucp
Thu Jan 16 01:29:22 AEST 1986


Here's how we (Apollo) deal with locking.  It's not perfect, but in
practice (e.g. on our internetwork of 1000+ workstations on 7 networks)
it works quite well:

There are two nodes associated with every lock:  the home node (i.e.
the node the file lives on), and the locking node (i.e. the node that
the process requesting the lock is running on).  The existence of a lock
is registered on both the home node and the locking node.  However, the
information on the home node is the one that really matters to the world,
since every lock request for files on that node come to it, not any other
locking nodes.  (Obviously, sometimes the home node and the locking node
can be identical, but this case is trivial, so I won't consider it.)

Locks are held in volatile storage (i.e. virtual memory, not disk) and
hence evaporate when a node goes down.  If a node is explicitly shut
down, many locks will be unlocked by virtue of processes holding locks
being killed.  Of any remaining locks, those held BY the node shutting
down, are force-unlocked.  Then the node broadcasts an "unlock all" message
to all other nodes.  Recipients of such a message force-unlock all locks
held BY the recipient ON files on the node that sent the message.

When a node boots, it broadcasts an "unlock all" message too.

When a node N locks a remote file, it sends a message to the remote (home)
node asking if it is OK to lock.  If the home node says "no, because
process P on node M has the file locked", N sends a message to M asking
if he really has that file locked.  If N says he doesn't have the file
locked, N tells the home node to force-unlock the file, and then N tries
to lock the file again.  This strategy is helpful in case a node has
missed an "unlock all" message.  (Since broadcasts aren't propagated
across bridges between networks, this can happen.)  Note that if node
M is unreachable, this scheme doesn't help.

So what do we do if you run into a "bad" case -- internet partition or
crashed node that hasn't been rebooted?  Well, someone will try to open
a file (and try to get a lock since all opens must be accompanied by
locks) but will get the error "object is in use".  We supply tools for
USERS to see who (what node and process) has the lock.  The user can
then decide whether it's safe to forcibly break the lock (there's another
tool to do that).  It's not a perfect scheme, but let's remember,
considering people run on Unix systems all the time with NO locking (even
in the local case), it's clearly a step up.

            -- Nat Mishkin
               Apollo Computer
               apollo!mishkin



More information about the Comp.unix.wizards mailing list