Sun-Spots Digest, v6n26

Fri Mar 11 09:43:08 AEST 1988

SUN-SPOTS DIGEST         Wednesday, 9 March 1988       Volume 6 : Issue 26

Today's Topics:
                              Administrivia
                          Re: Bug in unlink (2)
                Re: lpd connection to terminal server (2)
                 Re: Mysterious ethernet misbehavior (4)
                   Re: Spurious level 3 interrupts (2)
                        Re: Stuck in caps lock (2)
                Re: Sun-4 bcopy warning ; fast 68020 bcopy
                          Re: adding new clients
                     Bug in rpc.yppasswdd (with fix)
                          New improved calentool
                                Monthtool

Send contributions to:  sun-spots at rice.edu
Send subscription add/delete requests to:  sun-spots-request at rice.edu
Bitnet readers can subscribe directly with the CMS command:
    TELL LISTSERV AT RICE SUBSCRIBE SUNSPOTS My Full Name
Recent backissues are stored on "titan.rice.edu".  For volume X, issue Y,
"get sun-spots/vXnY".  They are also accessible through the archive
server:  mail the word "help" to "archive-server at rice.edu".

----------------------------------------------------------------------

Date:    Wed, 09 Mar 88 14:25:55 CST
From:    William LeFebvre <phil at Rice.edu>
Subject: Administrivia

Greetings!  I currently have a tremendous backlog of messages still
waiting to appear in a digest (179 kilobytes not including this digest---
enough for 9 digests).  Rather than the backlog getting smaller, it seems
to have increased!  Since I usually try to find the best in bad
situations, I decided to be more topical than chronological.  So for this
issue I grouped together replies to the same question, messages on
identical topics, etc.  As a result, this digest asks few questions and
answers many!  I am beginning to feel that more drastic action may be
necessary to lessen the backlog, such as increasing the digest size from
20K to 30K.  If anyone has objections to this idea, please send me mail.
I am also open to other suggestions.

A note to all BITNET readers and potential BITNET readers:  remember that
you are now getting your messages via a BITNET listserver.  You no longer
need to solicit me for add and delete requests.  See the information
immediately after "Today's Topics" to find out how to subscribe.  You can
remove your BITNET address from the listserver with the following CMS
command:  "TELL LISTSERV AT RICE SIGNOFF SUNSPOTS".

			William LeFebvre
			Department of Computer Science
			Rice University
			<phil at Rice.edu>

------------------------------

Date:    Tue, 1 Mar 88 16:16:04 MST
From:    dbd%benden at lanl.gov (Dan Davison)
Subject: Re: bug in unlink (1)

The bug reported by steve Maurer (steve at ut-sally.UUCP) in unlink was
described in one of the Sun Software Technical Bulletins (STBs) recently.
I don't have the issue handy but it was the issue that was the size of a
small city's phone book.  The issue contained the Customer Distributed
Buglist and makes for pleasant just-before-bedtime reading.  In the unlink
bug discussion I recall there also being a work around, but I don't recall
the details.  Send me mail if you want it posted.

dan davison
dbd at benden.lanl.gov theoretical biology, los alamos national laboratory

------------------------------

Date:    Thu, 3 Mar 88 16:49:12 EST
From:    Root Boy Jim <rbj at icst-cmr.arpa>
Subject: Re: Bug in unlink (2)

>Steve Maurer:
> I would appreciate any information or advice anyone might have on how to
> remove such links....

Try moving the data somewhere else, and point clri at the offending
directory. While not a general solution, it should fix things for you.

(Root Boy) Jim Cottrell	<rbj at icst-cmr.arpa>
National Bureau of Standards
Flamer's Hotline: (301) 975-5688

------------------------------

Date:    Wed, 2 Mar 88 20:12:16 EST
From:    Terry Slattery <tcs at usna.mil>
Subject: Re: lpd connection to terminal server (1)

Starting with the enclosed 'ttcp' program might be a useful way to get
your connection between lpd and the terminal server.  The usage message
says:

Usage: ttcp -t [-options] host <in\n\
	-l##	length of bufs written to network (default 1024)\n\
	-s	source a pattern to network\n\
	-n##	number of bufs written to network (-s only, default 1024)\n\
	-p##	port number to send to (default 2000)\n\
	-u	use UDP instead of TCP\n\
Usage: ttcp -r [-options] >out\n\
	-l##	length of network read buf (default 1024)\n\
	-s	sink (discard) all data from network\n\
	-p##	port number to listen at (default 2000)\n\
	-B	Only output full blocks, as specified in -l## (for TAR)\n\
	-u	use UDP instead of TCP\n\

The -r flag is for receive and -t is transmit.  It prints out transfer
rate info which is useful for network testing (use default -l and -n plus
-s to get 1Mby transfer).  I also use it on occasion to create a network
pipe between untrusted hosts:

	On source machine:	tar cf - . | ttcp -t rem_host
	On dest machine:	ttcp -r | tar xvf -

I'm sure you can think of other uses. Modifying for bi-directional
operation shouldn't take long.

	-tcs

[[ The program has been placed in the archives as "sun-source/ttcp.c" and
is 11924 bytes in length.  It can be retrieved via anonymous FTP from the
host "titan.rice.edu" or via the archive server with the request "send
sun-source ttcp.c".  For more information about the archive server, send a
mail message containing the word "help" to the address
"archive-server at rice.edu".  --wnl ]]

------------------------------

Date:    Thu, 3 Mar 88 14:28:25 EST
From:    Root Boy Jim <rbj at icst-cmr.arpa>
Subject: Re: lpd connection to terminal server (2)

I don't know what you are trying to do, but perhaps you are using the
wrong tool for the job. Presumably the `raw TCP datastream' has another
end somewhere on another machine. Why not mail to a fake account on the
other machine which does what you want to do.

For example, we have in /usr/lib/aliases the following lines:

laser: "|/usr/ucb/lpr -Plzr -p"
prt:   "|/usr/ucb/lpr       -p"
lpr:   "|/usr/ucb/lpr"
lzr:   "|/usr/ucb/lpr -Plzr"

Since I don't know what you're doing I don't know whether this will solve
your problem, but somebody might find this useful.

I can't remember wherther this prints the mail header or not, but if it
does, you might pipe it to "sed '1,/^$/d'" before in the alias file.

(Root Boy) Jim Cottrell	<rbj at icst-cmr.arpa>
National Bureau of Standards
Flamer's Hotline: (301) 975-5688

------------------------------

Date:    Fri, 26 Feb 88 19:06:40 CST
From:    kane%fang at gswd-vms.gould.com (Patrick E Kane)
Subject: Re: Mysterious ethernet misbehavior (1)

Try killing the "rwhod" processes that you really don't need.  The Rwhod
program likes to broadcast packets which cause nodes with
"/usr/spool/rwho" on a remote system to send NFS packets to their server.

I modified our local rwhod (that runs on our diskless nodes) to not write
out rwho info, but still send it.  I found that having 20 sun diskless
suns running the standard rwho would effectivly flush my root server's
disk buffer cache every few minutes.

Pat Kane

P.S.
My root server has a very small ( < 2 Megs) kernel address space.

------------------------------

Date:    Mon, 29 Feb 88 10:07:36 CST
From:    Jim Knutson <knutson%SW.MCC.COM at mcc.com>
Subject: Re: Mysterious ethernet misbehavior (2)

There are two things to watch for when running lots of diskless clients on
a single ethernet.  One is running rwhod without the broadcast only hack
and two is things run from cron.

The best way to run rwhod is to modifying it or obtain a modified copy of
rwhod and run it broadcast only on all clients.  The clients would then
NFS mount the servers copy of /usr/spool/rwhod.  This prevents all clients
from trying to do simultaneous writes on the reciept of a single rwho
broadcast packet.  I wish Sun would distribute this.  I sent the fix to
them back with release 2.2.

The other problem to watch for is something run from cron.  If all your
diskless clients are using the same crontab file, and the clocks are all
in sync, then it is likely that they will all request a copy of the same
thing at exactly the same time.  This can not only flood your server with
disk requests but also the net as well.  At the University of Texas, we
hit 95%+ saturation of the net on the quarter hour with about 100 suns due
to atrun firing up from cron.  This can be resolved by staggering startup
times in cron or by subnetting to keep the packets local to a server and
its clients.

Jim Knutson
knutson at mcc.com
knutson at milano.uucp

------------------------------

Date:    Tue, 1 Mar 88 10:12:17 CST
From:    boyle%antares at anl-mcs.arpa
Subject: Re: Mysterious ethernet misbehavior (3)

I'm not exactly an old ethernet hand, but at Argonne we have a network of
4 servers (3/280s) and 40 clients (3/60s and 3/140s).  In addition, there
are several other Unix machines on our net:  a VAX, Encore Multimax,
Sequent B21000, Alliant FX/8, Intel Hypercubes, and AMT DAP 510 on a Sun
host.

We sought Sun's advice about configuring the network, and they recommended
a backbone for the 4 servers and other machines.  Each server has a second
ethernet board, and that ethernet goes to its 10 clients.  This system
works very well, and we have had none of the problems you describe.
(Incidentally, we run NFS among the Suns and some of the other machines.)

Since the second ethernet boards are relatively cheap (a few K$), I
recommend you try this configuration.  40 clients on one net sounds like a
lot to me.

Jim Boyle

------------------------------

Date:    4 Mar 88 02:14:25 GMT
From:    ksr!benson at uunet.uu.net (Benson Margulies)
Subject: Re: Mysterious ethernet misbehavior (4)

I reported periodic storms of ie: no carrier and ethernet jammed.

Contrary to those who believe that there are necessarily hardware-related
problems, we were suffering from cron-ic load problems. That is, every few
minutes all of the workstations would hit the network to page in whatever
cron told them to do. When the clock sync was working particularly well,
the results were collision storms.

This sure looks like a candidate for Sun documentation, or even a cron
feature to randomize the times slightly.

Benson I. Margulies                         Kendall Square Research Corp.
harvard!ksr!benson			    ksr!benson at harvard.harvard.edu

------------------------------

Date:    Sat, 27 Feb 88 01:22:33 +0200
From:    leonid at TAURUS.BITNET
Subject: Re: Spurious level 3 Interrupts (1)

When we first got our 3/180 there was not problem, then suddently we had a
hardware problem, eventually located to be with the backplane.  The
synpmtom was that at some (early) point it would shout "Suprious level 3
Interrupt" and crash. We have replaced the backplane and all went fine.
Except that the replacement backplane was an old revision, older than our
original, and since then we would get spurious interrupt message once or
twice a day with no apparent damage (no crashes etc.)

Anyhow, we asked our SUN rep to replace the backplane with a real new one.
So he did, and since then we have no more of these messages.

My guess folks that if you get such messages, it means that your backplane
revision number is not compatible with your CPU board revision number.
Isn't it simple ?

Leonid

------------------------------

Date:    7 Mar 88 15:27:23 GMT
From:    jc at piaget.uucp
Subject: Re: Spurious level 3 interrupts (2)

I recently encountered a site with a second Ethernet controller and they
were getting spurious interrupt messages associated with it.  One
knowledgeable person I talked to said that that's one of the things you
get with a second Ethernet controller and that there is a) no known fix
for it and b) no known cause.  Can anybody tell me more about the cause
and (hopefully) a fix for this problem?

--jc

John Cornelius
(...!sdcsvax!piaget!jc)

------------------------------

Date:    Thu, 3 Mar 88 14:17:14 EST
From:    Michael Sykora <sykora at violin.ctr.columbia.edu>
Subject: Re: Stuck in caps lock (1)

>From:    AARON KONSTAM <79343382 at TRINITY.BITNET>
> Second, there is some combination of keys that one can hit on the keyboard
> that puts one irreversibly in upper case....

The key being hit is probably "F1".  Just hit it again to get out of
[CAPS] mode.

[[ Mr. Konstam wasn't clear about what environment he was using in which
he got stuck.  This solution works for SunView's shelltool.  The one that
follows works (I assume) for X.  --wnl ]]

Mike Sykora
System Manager
Computer Communications Research Lab
Center for Telecommunications Research
Columbia University
e-mail: sykora at ctr.columbia.edu

------------------------------

Date:    Tue, 08 Mar 88 15:53:24 PST
From:    Craig Leres <leres%lbl-helios at lbl-rtsg.arpa>
Subject: Re: Stuck in caps lock (2)

You didn't say, but I assume that you're running X10r4 on your Sun. If
this is the case, here's the solution. The following comment is from the
routine ConvertEvent() in libsun/events.c:

    /*
     * The static count is keeping track of how many
     * keys I have down for the given function.
     * Only need to do this for shift and meta.
     * On an up event I decrease the count.  If it is
     * not the last one up then I convert to a down event
     * which really won't do anything.  I should ignore
     * the event, but this works.
     * At odd times the sun keyboard gets confused and I
     * miss an UP event.  This may get you stuck in
     * shift mode.  I assume there is only 2 shift keys
     * and only two meta keys.  If count ever goes above
     * 2 I make it 2 again, assuming I have missed an up
     * event.  If you get stuck in shifted mode, just his
     * both shift keys and you should be fixed.
     */

So sometimes when you type too quickly (remember that you generate two key
events for each key stroke) a shift down key event gets lost AND (I swear
to God, it happened to me just now!) your X server becomes hosed.  As we
see from the above Enlightening Comment, pressing both shift keys at the
same time resets things.

		Craig

------------------------------

Date:    Thu, 3 Mar 88 07:57:50 EST
From:    suneast!ozone!murph at sun.com (Joe Murphy, Manager ECD Hardware)
Subject: Re: Sun-4 bcopy warning ; fast 68020 bcopy

>  Although the main loop does do eight movl
>instructions to move the data, the fastest possible version would be
>completely unrolled; in other words, no looping at all....--wnl

Sorry wnl, but I don't agree.  You are ignoring the instruction cache of
the 68020.  Even with an external cache, it is better to fetch an
instruction from the internal instruction cache than it is to fetch
external to the chip (free's up the bus for actually moving the data for
one thing).  However an unwound loop is still a win because you don't have
to break the pipe as often.  The tradeoff is the cost of caching in the
larger unwound loop (plus the replacement of other instructions you might
have had to kick out) .vs. the increased speed of the loop you have cached
in.  The optimal amount of loop "unwinding" is dependent on how much data
you want to move.  I did an analysis a couple of years ago for the 3/110
color map update that showed for a total of 4.5 clocks on the read, and
4x4 clocks on the write (the color map is a byte-wide device), and for 256
x 3 bytes moved as long words, the optimal loop size was 31 movl's per
iteration.  Experimental results correlated with this quite well.  For a
generic routine where you don't know how much data you will be moving in
advance, like bcopy, unwinding the loop a little bit is probably a good
idea;  8, 16, or 32 would be my guess as to the "optimal" amount.

[[ You are quite correct.  I had forgotten about the instruction cache.
This brings up an interesting point regarding a 68010 (if anyone cares
about them any more):  a bcopy whose main loop is small enough to get the
'010 in loop mode might be faster than any amount of unrolling.  --wnl ]]

One thing to be wary of BTW on the Sun3 when considering user level
"bcopy"'s is that the 3/2xx series has special bcopy hardware that the
kernel takes advantage off to keep the large amount of non repeating
sequential accesses from trashing the cache.  The 3/xx and 3/1xx machines
don't have external caches, and don't have any special hardware, so your
best "bcopy" loop has a good or better chance of being optimal as the one
available via the bcopy system call.

For machines without internal instruction caches, you are correct, the
optimal "loop" is the completely unwound one.

-murph

[[ Of course none of this has helped to solve the original problem....
what is an near-optimal bcopy for the Sun4?  Is the one that Sun
distributes in the library sufficient?  --wnl ]]

------------------------------

Date:    Fri, 26 Feb 88 09:27:17 EST
From:    uunet!dmnhack!phb at ut-sally.UUCP (Paul Breslin)
Subject: Re: adding new clients

The problems associated with adding diskless clients makes me wonder why
Sun doesn't build 3/50's and 3/60's with an optional mini-winchester built
in. (Something akin to a hard-card for the IBM-PC.) A small 30 or 40 Mb
disk with a generic root and swap partition would save on network
bandwidth and eliminate the hassles of configuration. You would boot it
up, mount some NFS partitions, configure a custom kernel and be rolling
within about an hour or two. Such small disks only cost a few hundred
dollars (not counting Sun's huge markup) and wouldn't increase the cost
too much.

------------------------------

Date:    Fri, 26 Feb 88 14:11:38 PST
From:    dredge at cheshire.stanford.edu (Michael Eldredge)
Subject: Bug in rpc.yppasswdd (with fix)

Product: rpc.yppasswdd (versions through Sun OS3.5)
	This bug is in the 3.0, 3.4, and 3.5 versions.

Problem: Incorrectly handles updates when the name is a subset of another.

Example (more or less):
	/etc/passwd:
			...
		jeffery:PASSWD1:100:10:Longer Name:/u/jeffery:/bin/csh
		jeff::101:10:Short Name:/u/jeff:/bin/csh
			...

	% yppasswd jeff
	Old yppasswd:<cr>
	New yppasswd:passwd2
	Again:passwd2
	can't change passwd
	% yppasswd jeff
	Old yppasswd:PASSWD1<cr>	# jeffery's passwd
	New yppasswd:passwd2
	Again:passwd2
	%

	/etc/passwd:
			...
		jeffery:PASSWD2:100:10:Longer Name:/u/jeffery:/bin/csh
		jeffery::100:10:Longer Name:/u/jeffery:/bin/csh
			...

	Note that entry for 'jeff' is gone!  Very bad!

Fix:
	In the source (which we took from 3.0 since that is the most
	recent that we have:

	There is a rewrite of the function "getpwnam()".  When it compares
	the given name with each entry in the passwd file it just does
	a strncmp() with the length of the given name.  Thus if the given
	name is shorter than (a subset of) an entry, strncmp() will match.
	The fix is to make sure that the lengths of both the given name
	and the name from the passwd file are the same and THEN do the
	strncmp().

diff rpc.yppasswdd.c rpc.yppasswdd.c-fix
265a266,267
> 	char *e;
> 	char *index() ;
273,274c275,279
< 	while ((p = fgets(line, BUFSIZ, pwf)) && strncmp(name, line, cnt))
< 		continue;
- - ---
> 	while ((p = fgets(line, BUFSIZ, pwf))) {
> 		e = index(line, ':') ;
> 		if (e && (e-line)==cnt && strncmp(line,name,cnt)==0)
> 			break ;
> 		}

Michael Eldredge 
Manager Electrical Engineering Computer Facility
Stanford University
dredge at hitchrack.stanford.edu

[[ Thanks to Keith Vincent <keith%lccr.sfu.cdn at ean.ubc.ca> who also
pointed out this bug.  -wnl ]]

------------------------------

Date:    Thu, 25 Feb 88 09:18:59 PST
From:    Bill Randle <billr at tekred.tek.com>
Subject: New improved calentool

Here is a copy of the new improved calentool, originally distributed
on the Sun Users' Group tape (1987). See README2 for a list of new
features and additions.

	-Bill Randle
	Tektronix, Inc.
	billr at tekred.TEK.COM

[[ The source has been placed in the archives as two separate shar files:
"sun-source/calentool.shar.1" and "sun-source/calentool.shar.2".  They are
50197 and 48343 bytes, respectively.  They can be retrieved via anonymous
FTP from the host "titan.rice.edu" or via the archive server with the
request "send sun-source calentool.shar.1 calentool.shar.2".  For more
information about the archive server, send a mail message containing the
word "help" to the address "archive-server at rice.edu".  --wnl ]]

------------------------------

Date:    8 Mar 88 18:44:53 GMT
From:    Sarah Metcalfe <sarahm at cognos.uucp>
Subject: Monthtool

A number of people sent me mail concerning Monthtool.  Unfortunately, my
mailtool crashed and I lost a lot of messages.  Luckily the core file had
all the headers in it, but I don't have paths for the following people:

    vixen!ronbo
    cfa247!joe
    esj at bikini.cis.ufl.edu
    emmy.umd.edu!dna at eneevax.umd.edu
    bdrc!jwc at mcnc.org
    hope.lanl.gov!dwf
    studguppy.lanl.gov!roberts
    esmond at msr.epm.ornl.gov
    allegra!dnelson

Can these people please resend their messages?  Thanks!

Sarah Metcalfe         decvax!utzoo!dciem!nrcaer!cognos!sarahm
Cognos Incorporated    P.O. Box 9707, 3755 Riverside Drive, 
                       Ottawa, Ontario, CANADA  K1G 3Z4
                       (613) 738-1440

------------------------------

End of SUN-Spots Digest
***********************