Sun-Spots Digest, v6n40

Fri Apr 1 09:28:56 AEST 1988

SUN-SPOTS DIGEST         Thursday, 31 March 1988       Volume 6 : Issue 40

Today's Topics:
                        Re: Ethernet problems (2)
                  Re: Ethernet problems/ collision rates
                   Re: Strange Ethernet error messages
                     Re: Mysterious Ethernet problems
                  Re: TCP packet size bug in 3.4 AND 3.5
             Re: Sun-3/2xx user level vs. kernel level bcopy

Send contributions to:  sun-spots at rice.edu
Send subscription add/delete requests to:  sun-spots-request at rice.edu
Bitnet readers can subscribe directly with the CMS command:
    TELL LISTSERV AT RICE SUBSCRIBE SUNSPOTS My Full Name
Recent backissues are stored on "titan.rice.edu".  For volume X, issue Y,
"get sun-spots/vXnY".  They are also accessible through the archive
server:  mail the word "help" to "archive-server at rice.edu".

----------------------------------------------------------------------

Date:    Fri, 18 Mar 88 20:48:04 PST
From:    Craig Leres <leres%lbl-helios at lbl-rtsg.arpa>
Subject: Re: Ethernet problems (1)
Reference: v6n20,v6n28

Actually, this problem is the result of the collaboration of two bugs in
the SunOS kernel. One bug is the failure to correctly recognize all
possible ip broadcast addresses. But this wouldn't hurt if it wasn't for
another bug that makes the system think it's ok to forward a packet when
it isn't. Clearly, you shouldn't forward a packet if you only have one
network interface. Nor should you forward a packet out the same interface
it came in on.

> Hosts simply should not forward packets of any sort, and they certainly
> should not *under any circumstances* forward a broadcast packet.  I don't

I don't think you really mean this; if hosts stopped forwarding packets,
the internet would cease to exist! Perhaps your definition of a host is a
system with only one network interface as opposed to a gateway which has
more than one network interface? In any case, neither hosts nor gateways
should forward broadcast packets.

> Of course, there is this nice kernel variable "ipforwarding" which can be
> used to disable forwarding and which you might think can be used to stop
> this antisocial behavior.  Guess again.  In a 4.2BSD system, if you turn
> off ipforwarding, all that will happen is that you'll swap ICMP Network
> Unreachable messages for ARPs (at a possible packet savings, as you'll

Arp requests are broadcasts; they must be received by all stations but are
of interest only to the one station that is being arp'ed for. Bogus arp
requests must be received and discarded by all stations. So as it turns
out, turning off ipforwarding is a BIG win. Instead of wasting cycles on
all systems, you only waste cycles on the system that did the broadcast.

Another way to reduce this problem is to reduce the number of broadcasts
that occur on your ethernet. One way we've done this at lbl is to outlaw
rwho. It's just too expensive when you have more than a handful of hosts
participating.

Our Net Police have worked overtime to disable ipforwarding and turn off
rwho; as a result, our lab-wide ethernet is pretty healthy.

Craig

------------------------------

Date:    Sun, 20 Mar 88 12:14:25 EST
From:    steve at cs.umd.edu (Steven D. Miller)
Subject: Re: Ethernet problems (2)
Reference: v6n20,v6n28

From: Craig Leres <leres%lbl-helios at lbl-rtsg.arpa>
> ...One bug is the failure to correctly recognize all possible ip
> broadcast addresses.

Agreed.  I wonder... is there any time when one might want to forward a
packet that was sent to the local broadcast address?  I can't think of
any, but someone else may have different ideas.  If one never wants to
forward local wire broadcasts, it would be nice if the device drivers got
modified to pass back an indication of whether or not a particular
incoming IP packet had been sent to the local wire broadcast address.  If
so, one could hack the code so that it would never be forwarded.  All this
mucking about with IP addresses and guessing whether the sender was
broadcasting would then go away.

> Clearly, you shouldn't forward a packet if you only have one network
> interface.  Nor should you forward a packet out the same interface it came
> in on.

Agreed.  The "forward iff > 1 IP interface" rule holds in 4.3BSD.  I
disagree that the second should hold; what if you're playing gateway, and
someone sends you a packet that should have gone to another gateway on the
local net?  You should send a redirect, but it would be nice to forward
the packet anyway.

> ...Perhaps your definition of a host is a system with only one network
> interface as opposed to a gateway which has more than one network
> interface?

Same meaning, different terminology.

> ...So as it turns out, turning off ipforwarding is a BIG win. Instead of
> wasting cycles on all systems, you only waste cycles on the system that
> did the broadcast.

Processing an incoming ARP doesn't take a whole lot of code, and should
not take much CPU.  I think that trashing the net is the real problem
here, not wasting a few cycles.

> ...One way we've done this at lbl is to outlaw rwho....

This sounds reasonable, but I admit that I'm hooked on rwho.

> Our Net Police have worked overtime to disable ipforwarding and turn off
> rwho...

This, too, sounds familiar.  One must be ever-vigilant to keep this sort
of behavior from cropping up; one well-meaning OS upgrade or system
configuration change, and a quiet Ethernet can turn noisy again.

-Steve

------------------------------

Date:    Fri, 18 Mar 88 10:59:43 PST
From:    celeste at coherent.com (Celeste C. Stokely)
Subject: Re: Ethernet problems/ collision rates
Reference: v6n20,v6n28

This is in reply to the person with the le0 error messages, and also the
person asking about collision rates.

1. Concerning the messages:
	le0: Received packet with ENP bit in rmd cleared
	le0: Received packet with STP bit in rmd cleared

Under normal operation the LANCE driver should never encounter receive
descriptors with either the ENP or STP bit cleared.

The driver sets up its buffers to be large enough to hold the maximum size
packets allowed by the Ethernet spec.  This means that it has no need to
chain receive buffers together so that an individual packet straddles
multiple receive buffers.  Translating this into receive descriptor bits,
there should be exactly one descriptor for each incoming packet, and that
descriptor should have both the start-of- packet (STP) and end-of-packet
(ENP) bits set.

However, if there's traffic on the net in violation of the Ethernet spec,
it's possible for an incoming packet to be too big to fit into a single
receive buffer.  In this case, the packet will span multiple descriptors,
with the ENP bit clear on all but the last and the STP bit clear on all
but the first.

That's where the first two error messages are coming from.  The error
message about babbling confirms the condition indicated by the first two
messages.  

The two error messages indicate that the chip has decided that things are
screwed up enough that it should stop its transmitter and receiver
sections and has done so.  The driver will restart them upon getting this
error.

The bottom line is that it is very likely that there's other equipment on
the net that's operating in violation of the Ethernet spec by sending out
giant packets.

2. Concerning what are reasonable ethernet collision rates:
Collisions are completely expected on ethernet. Collision is one of the
ways that more than 1 machine can live on the cable. The problem comes
when there are too many collisions. Here are my rules of thumb for "how
many":

0% - 2% --All is well. textbook perfect, healthy (collision-wise) net.
2.5%-5% --Not super, but ok. I expect this with a lot of nd clients on
	  a net.
5% - 10%--Uh-oh, bad problems developing. Get out the diagnostic tools. Find
	  where the problem is, and fix it. (Has a machine lost the ability
	  to detect collisions, and so is blabbering whenever it feels like
	  it?)
 > 10%  --Serious trouble. Users probably complaining loudly. You should
	  have fixed the problem before it got this bad, but at least it
	  should be easy to find by now. Fix it now.

Of course, these are my guidelines, but they've worked well for me over
the years.

ALSO, please remember that the formula for computing the collision rate is:
(Collisions/Opkts)*100=collision rate
[Opkts is the number you get in netstat -i]

..Celeste Stokely
Coherent Thought Inc.
UUCP:   ...!{ames,sun,uunet}!coherent!celeste  Domain: celeste at coherent.com
Internet: coherent!celeste at ames.arpa or ... at sun.com or ... at uunet.uu.net
VOX:  415-493-8805 
SNAIL:3350 W. Bayshore Rd. #205, Palo Alto CA  93404

------------------------------

Date:    Fri, 18 Mar 88 20:56:44 PST
From:    Craig Leres <leres%lbl-helios at lbl-rtsg.arpa>
Subject: Re: Strange Ethernet error messages
Reference: v6n28

Here's a rehash of a posting I made to sun-spots last June.

The Ethernet driver gives the LANCE chip a block of memory large enough to
hold 40 full sized packets. The errors:

	le0: Received packet with STP bit in rmd cleared
	le0: Received packet with ENP bit in rmd cleared

are indications that the LANCE received packets that were bigger than the
driver was expecting. The error:

	le0: Receive buffer error - BUFF bit set in rmd

indicates that the LANCE chip ran out of memory to put incoming packets
in. The Intel interface will spew the message:

	ie0: giant packet

when it receives packets that are too big.

One source of large packets are devices that violate the minimum
inter-packet spacing. Some combinations of transceivers and interfaces see
these too-closely-spaced back-to-back packets as a single large packet.

Another source large packets are old DEQNA Ethernet interfaces. When they
receive more packets than they can handle, they transmit all one's for a
short spell. This garbage looks like an impossibly large broadcast packet.

Craig

------------------------------

Date:    Mon, 21 Mar 88 20:31:32 PST
From:    paula at boeing.com
Subject: Re: Mysterious Ethernet problems
Reference: v6n30

In Sun-Spots v6n30, leonid at TAURUS.BITNET described a problem that looks
identical to what we're seeing with five of our 3/280 servers.  With one
exception (see below), these machines are running 3.4.  Our building is
wired with thick Ethernet, and most machines connect to the cable through
at least one IsoLan fan-out unit.  We have ~60 Suns (80% diskless) and
perhaps another 60 other Ethernet boxes ranging from Ungermann-Bass NIU's
and PC's with 3-Com cards to Xerox and Symbolics Lisp machines.  The
'traffic' tool consistently shows a steady 30% background load with
frequent much higher spikes.  For the past month or so, we've averaged
about one server per day going down with a continuous stream of

	ie0: lost interrupt: resetting

console messages.  Once a machine gets into this state, the only recourse
is to reboot.  The accumulating evidence seems to be pointing to a problem
with these servers' connection to the net.

	- One of the servers was moved to another room about two weeks ago
	  and has not crashed since.

	- Replacement of the cpu in one of the servers did not prevent that
	  machine from crashing.

	- I vaguely remember hearing that this problem results from a bug in
	  the 3.4 ethernet driver.  I have installed 3.5 on one of the machines,
	  but it is too soon to tell if that fixed it.

	- The man who maintains our building's Ethernet tells me that 4-6 weeks
	  ago he changed the way those five servers connect to the backbone 
	  cable.  Previously, the servers were connected to an IsoLan fan-out 
	  unit which connected to the backbone.  Now, the fan-out unit connects
	  to the backbone through another IsoLan.  He's working on rearranging
	  things back to the old configuration.

I have been talking to Sun about this.  I initially called it in as a
software problem.  The fellow who took the call was very helpful, but
really didn't think it was a software problem.  The call was re-directed
to hardware and our local field engineer was out the next day to try a new
cpu in one of the machines.  That now appears not to have corrected the
problem, which seems to be a software bug exacerbated by some
configuration of cables and/or fan-out units.  As I learn more, I will let
you know.  Is there anyone else out there who has seen this problem?

Paul Allen
Boeing Advanced Technology Center
paula at boeing.com
...!uw-beaver!ssc-vax!bcsaic!paula

------------------------------

Date:    Sun, 20 Mar 88 15:11:55 cst
From:    grunwald%guitar.cs.uiuc.edu at a.cs.uiuc.edu (Dirk Grunwald)
Subject: Re: TCP packet size bug in 3.4 AND 3.5
Reference: v6n28

I applied the patch to allow larger TCP packets & measured the results.
The server is a 3/280 (idle during the tests) and a 3/50 (which was used
to run tests).

Here's what I found:

Test				w/512Bytes		w/1024Bytes
--------------------------------------------------------------------------
cp latex /dev/null		11 (9 -> 13)		15(14 -> 19)
cp latex /usr/tmp		1:04			1:03
latex paper.tex			2:22 -> 2:03		2:28 -> 2:05
rcc				1:32			1:36

Times are in seconds, with the range (if available) marked as low -> high.
The first two tests just measure disk throughput. As you can see, the
change seemed to actually degrade things for the simple test, but when you
put some contention on the wire, the difference seems meaningless.

The third test basically checks paging & more random disk traffic. Latex
is a big program on our hosts & the paper is pretty big -- lots of files
get read in. However, the difference doesn't seem very great.

The 'rcc' task uses 'rsh' to do a remote 'cc' on an Intel 310 system. It
should use the ethernet a lot, since files get copied there & back. Again,
the difference is in the noise.

Because the difference isn't that great, I left everything at 512 byte
packets, mainly because test #1 ran faster that way & everything else
seemed about the same.

I didn't measure the performance change on the 3/280. I doubt that it's
all that great.

Dirk Grunwald
Univ. of Illinois
grunwald at m.cs.uiuc.edu

------------------------------

Date:    Fri, 18 Mar 88 11:01:08 PST
From:    root at lll-crg.llnl.gov (Gluteus Vaximus)
Subject: Re: Sun-3/2xx user level vs. kernel level bcopy

> From:    suneast!ozone!murph at sun.com (Joe Murphy, Manager ECD Hardware)
> One thing to be wary of BTW on the Sun3 when considering user level
> "bcopy"'s is that the 3/2xx series has special bcopy hardware that the
> kernel takes advantage off to keep the large amount of non repeating
> sequential accesses from trashing the cache.

That's a fine observation, but what's the solution?  We had a group at UC
Berkeley trying to do Astronomical Image Processing on a 3/260.  They
required the ability to copy 1/4 Mb 30 times a second.  Their first effort
gave them 1/3 the performance of a 3/160.

I got into the picture when they started asking around about how to turn
the cache off.  (They were pragmatic about the problem - they were stuck
with the machine, now they needed to make it usable.)  Eventually I was
able to back engineer a semi-solution from a description of the cache
addressing algorithm: it happened that the source and destination arrays
in the image processing system were a multiple of 64Kbytes offset from
each other.  By moving one of the arrays 24 bytes relative to the other we
were actually able to get slightly better than 3/160 performance (only a
few percent better).

There's some interesting periodic math about why 24 bytes offset (and any
offset with mod(abs(D-S),65536)>16 and mod(abs(D-S),16)=8) is optimal, but
the real question is: how do we get even better performance?  Is there any
way to get the kernel to do our copies for us using its internal bcopy?
The whole reason the group at Berkeley bought the 3/260 instead of a 3/160
was because they'd determined that the 3/160 just wouldn't be fast enough.
Now they find themselves stuck with this slow memory bandwidth.

It should be pointed out by the way, that no cache should evidence this
kind of brain damaged behavior.  Worse case performance should be memory
bandwidth, not sub-memory bandwidth.  The behavior of the 3/2xx cache is
totally unacceptable in this case.  I'm also curious, does the 4/xxx
series suffer the same cache problems?  It's trivial to test, just write a
program that copies a 1/4 Mb a 100 times and time it from the shell.
Eg:
	#define N (1024*256)
	char src[N];
	/* char spacer[24]; /**/
	char dst[N];
	main()
	{
		int i;
		for (i=100; i; i--)
			bcopy(src, dst, N);
	}

Casey

------------------------------

End of SUN-Spots Digest
***********************