File locking on networks

Doug Gwyn <gwyn> gwyn at brl-tgr.ARPA
Sun Jan 12 13:21:05 AEST 1986


> I don't see how the above proposal solves anything.  Take case (2).
> The system that contains the data notices a lock conflict.  It pings
> the system holding the lock.  It gets "network not reachable".  It
> voids the lock and the database is now accessible.  OK, but the
> database is in an inconsistent state.  Maybe when it breaks the lock it
> does a database cleanup.  OK, now suppose the comm link comes back up.
> The system that was out of touch still thinks it holds the lock; it's
> been pinging the server trying to get an I/O request in (for example).
> When the link comes up, the I/O request will get thru.  What does the
> server do with this request?  If it satisfies it, it has permitted the
> database to be changed by someone who doesn't have the lock.  It must
> reject the request (e.g. a Unix read() or write() call) specifying some
> kind of lock failure error code.  The application program on the remote
> machine thinks it owns the lock.  It must be written to go back to the
> top of the transaction and try to obtain the lock again, when it gets
> this error code.  There are no such provisions in the System V locking
> facilities.  Thus programs written for those facilities will break when
> moved onto networks.

The model I have in mind requires the owner of the actual file
(where the data is stored) to be the master of the file's locks.
Whenever it has to communicate with any slave about the locked
region, if there is a problem it cancels that slave's lock.
Similarly, each time a slave accesses a locked region, it tells
the master about it, and in case of disagreement about the state
of the locks, the master so informs the slave, which must correct
its local records.

Clearly, this can (as you say) make locks go away if the comm link
is flaky, but you should be doing this on top of virtual circuits
anyway, so that long-lasting communication flakiness is as severe a
problem as losing a disk (something that happens a lot around here).

I agree with your analysis of the necessary actions on the slave
when a lock breaks.  The slave is either trying to free a lock
(which is already done by the comm link breakage) or is trying to
do I/O on the locked region, which should return an error if the
master and slave do not agree as to the status of the lock.

Are Gilmore and I the only ones who care about this?
Does anyone have an elegant solution to the problem?
(Disallowing locks is not elegant!)



More information about the Comp.unix.wizards mailing list