4.0{,.1{,+}} ypbind subnet-sensitive bug - report and fix

Ken Manheimer klm at goon.cme.nbs.gov
Sat May 6 12:02:49 AEST 1989


Issue: Sun OS 4.0.1 rc.local hangs if designated subnet mask is
       different than standard.  This is fixable by one transposition
       in the rc.local sequence.

			     Environment
			     -----------
OS:	Sun OS 4.0.1 both with and without "Net-Security" patches applied

CPUs: 	Sun 3/280 (file server), 3/180, 3/60, 3/50 (clients)

Subnet mask: 255.255.255.0, designated in /etc/netmasks file

			 Problem Description
			 -------------------
Boot hangs forever on remote mounts if any.  The boot will hang on any
of nfsd, rarpd, sendmail, and (maybe) rpc.statd, and skipping them
using ^C will result in a message shortly being emitted like:

Cannot register service: RPC: Unable to send; errno = Network is
unreachable rpc.statd: unable to register service (SM_PROG, SM_VERS, udp)

A boot that is "unhung" by using ^C's to progress past hanging
initializations is useless - you cannot log (or remote log) into the
machine and none of the net services mentioned above are provided.

Eliminating activation of ypbind (by either commenting out the relevant
rc.local lines or 'mv'ing /usr/etc/ypbind aside so it is not found) will
circumvent all these problems but sacrifice access to the yellow pages.
Starting ypbind after boot is completed in this situation compensates for
this to some degree but is chancy - it will work but is more than normally
susceptible to ypbind hanging at random times.

				 Fix
				 ---
The fix simply entails transposing the initiation of routed (in.routed) in
the /etc/rc.local from just after where netmask is set (rc.local ifconfig
lines) to just before where netmask is set.  All the above problems are
alleviated and ypbind appears to operate normally.  NOTE that it is
probably essential to have /etc/gateways explicitly designate the local
host as the default gateway for this to work, though i have not yet
confirmed this.

			     Conjectures
			     -----------
Despite the fact that i have no certain knowledge about the mechanism
involved (nor do i have access to source), i do have a conjecture about
what's causing the problem (with credit to Barry Warsaw for helping hash
this provisional explanation out).  The essence of my suspicion is that
just after the netmask change (or perhaps during, what with the snazzy new
features in ifconfig that engage yp for setting it) but before routed is
initted ypbind is called upon to establish a connection that it is not
able to make, and it hangs trying to do so.  Consequently, subsequent
dependencies on queries to ypbind hang, hence the behaviors of mount,
nfsd, and so forth.

Sun has released upgrades to its OS before (3.3, 3.4) that had obvious
unexercised subnetworking faults.  These have caused myriad and grievous
problems for those of us that rely on subnetwork partitioning to deal with
what would otherwise be unmanageably large networks.  This appears to be a
very similar if not the same problem that occurred before.  I hope that if
this does, in fact, turn out to be the case, more pre-distribution
attention will be paid to it in the future.

Aargh,

Ken Manheimer		 	klm at cme.nbs.gov or ..!uunet!cme-durer!klm
National Institute of Standards and Technology
(Formerly "National Bureau of Standards")
CME Factory Automation Systems, Software Support



More information about the Comp.sys.sun mailing list