3.5 Diskless client boot problems

Functionally Obsolete DHOLAKIA at intellicorp.com
Fri Feb 24 01:05:02 AEST 1989


Here is a note on some diskless boot problems that we had with Sun OS 3.5.

Setup :
        Diskless nodes (3/150s, 3/60s etc) running off two servers (3/280s) all
running OS3.5.  One server is a yellow pages (yp) master the other is the
slave, both running ypbind and rarpd.

Problem : 
        Some of the diskless nodes would broadcast the ether address and
then hang (getting no response to the request for the internet address).
The only way to get them to boot would be to kill ypbind and rarpd on the
associated server and then restart rarpd.  This would have the problem
nodes booting like crazy and you could then restart ypbind (ypbind and a
new rarpd actually...).  This seemed to indicate that the yellow pages was
somehow screwing up the boot process. 

There was no discernable pattern (architecture, date of install, prom
revision  etc...) as to the problem nodes.

Solution :
	After some fruitless conversations with Sun Support, we gathered the
evidence and stared at it and noticed that all of the diskless Suns that had
this problem had a single digit someplace in the last two fields of the
ethernet address.  e.g.

	    8:0:20:1:3:D8	or	    8:0:20:1:AE:1
		     ^					^
Since ether addresses also come in the flavour 8:0:20:1:68:73 the ethers
file had at some point in time been prettied up by padding zeroes on the
single digits.  e.g.

	    8:0:20:1:03:D8	or	    8:0:20:1:AE:01
		     ^^					^^

This was a hangover from pre-yp times and posed no problem to the diskless
boot path taken in the absence of yp.  Yp, however turns out to be
sensitive to this matter e.g. 

	ypmatch 8:0:20:1:3:D8 ethers.byaddr ---->  myopia (or whatever)
			 ^
	ypmatch 8:0:20:1:03:D8 ethers.byaddr ----> nil
			 ^^

This was easy enough to fix.  

MORAL :  It is not just the p's and q's that matter, check the zeroes as well.

-Rajiv

Comment about Sun support...

A few comments about the Sun support person who was assigned to help us
out.  I don't know if very many people have had such experiences but this
one particularly unfortunate.

(a) The first exchange between us was "Can you set up an account for me to
log into your system and fix the problem...?".  I'd hope to hear a few
questions and some diagnostic recommendations before a suggestion like
that.  I don't know if this is SOP in the Sun community and support
circles but I am not thrilled about letting a foreign object into my
system.  [[ SOP == Standard Operating Procedure.  --wnl ]]

(b) This person billed himself as a yp expert and we went through the
usual checks (tabs in hosts, ethers files etc) and after none of these
yielded anything the tone of the call degraded markedly.  

Our yp setup takes advantage of a feature that allows you to bunch your
ypfiles into a single directory instead of scattering them in /etc, in our
case this was /etc/ypfiles on the ypserver.  Also the ypserver passwd file
does not consult yellow pages to restrict access to the ypserver.

We were told that "your yp setup is all messed up..." and had to put up
with grumbling about how "...can't expect to get support for non standard
setups " in the hope of getting some clue to the problem.  

When finally we did point him to the appropriate sections of the manual
which illustrated the use of such a setup, we were told that that option
"did not work" and that the supporter always told his customers not to use
it.  We asked if this was a known bug, documented someplace because there
was no change in the SunOS4.0 manual either, we got a whole bunch of hand
waving about it being documented someplace in the software release
bulletins (no specific reference available...)

(c)  The sad part about this is that the tone of the support call was more
accusative than helpful and could easily have intimidated a new user. 



More information about the Comp.sys.sun mailing list