UNIX IPC Datagram Reliability under - (nf)

rpw3 at fortune.UUCP rpw3 at fortune.UUCP
Tue Jan 31 22:06:39 AEST 1984


#R:allegra:-220500:fortune:11600049:000:4908
fortune!rpw3    Jan 31 02:26:00 1984

[This lengthy tutorial probably belongs in net.arch, but the discussion
has been here so far.]

O.k., nobody has come forth to defend "UNIX domain datagrams", so here it is...

	>>> Why datagrams SHOULD be "unreliable". <<<

The internet datagram "style" is based on the observation that
the end processes in any communication have to be ultimately
responsible for "transaction integrity" so they might as well be
resonsible for all of it. No amount of intermediate error checking and
retransmission can GUARANTEE reliable synchronization if the ultimate
producer and consumer do not do the handshake. The layers on layers
of protocols don't hack it, if the critical state is outside the end
process. Nodes can crash; links can crash; nodes and links can go down
and up. Servers (e.g. mail) still have to do their own ultimate lost
message and duplication checking.  (I will not argue that point
further. If you disagree, go see your local communications wizard and
get him/her to explain.) (Also, a moment of silence for anyone who
thinks X.25 is a "reliable" protocol.)

Given that the responsibility for ultimate error correction lies in the
end-point processes, the transmission and switching portion of the net
can get A LOT cheaper and simpler. Instead of trying (vainly) to GUARANTEE
that no data is lost (with the attendant headaches of very careful buffer
management, flow-control, load shedding, load-balancing, re-routing,
synchronizing, etc.), in the internet datagram style (DoD IP, Xerox NS, etc.)
the transmission system makes a "good effort" to get your packet from
here to there. The only thing that IS demanded is that the probability
of receiving a bad (damaged) packet that is claimed to be good should
be VERY small. (Since that is a one-way requirement, it's fairly easy.)

So if the packet has a bit error, throw it away; if the outgoing queue
won't hold the packet, throw it away (that line's overloaded anyway);
if the route's not valid anymore, toss it. Somebody (the end process)
will try again soon anyway. (Two notes: 1. It is considered polite
BUT NOT NECESSARY to send an error packet back, if you know where "back"
is; and 2. if the system is to be generally considered usable, the
long-term error rate should be less than 1%, although short-term losses
of 10% or more don't hurt anything.)

This seemingly cavalier attitude results in ENORMOUS savings in complexity,
memory, and CPU ticks for the intermediate nodes, which merely make a
(good but not perfect) attempt to throw the packet out the next link.
Packet switching rates of several hundred to several thousand per second
are easily attainable with cheap micros. The routers don't have to have
any "memory" (other than the routing tables). They are not responsible
for "connections", or "re-transmissions", or "timeouts". They don't know
a terminal from a file (since they don't know either!).

Secondly, the CPU/memory load of handling the connections/retransmisions/etc.
is spread out where there is lots of resources -- at the end points. The
backbone nodes just move data, so they can move lots of it. (Think of a
hundred IBM PC's using your VAX to move files back and forth. Who do you
want to do the busy work, the VAX or the PC's?)

Thirdly, the end process always had to do about 70-90% of the work anyway,
duplicating the work the network was doing (and sometimes triplicating the
work that the kernel was duplicating, on top of that); the added 30-10%
is easily justfied by the savings in the net (or in the kernel, if we are
talking about process-to-process on a single host -- I didn't forget).
The total number of CPU ticks on an end-point processor can even go DOWN,
because of the smaller number of encapsulations (layers) packets have to go
through. (In the simplest case, there are only three layers: client, datagram
router or internet, and physical.)

Lastly, there are some applications (voice, time-of-day) where you do not
want the network trying to "help" you. A voice packet that is perfect but
is late because it got retransmitted might as well have been lost -- it's
useless. Ditto time-of-day.

(whew! is it soup yet?)
 
So "unreliable" when talking about datagrams means "not perfect",
and is a desirable attribute. Desirable, since the cost of "reliability"
is very high and the goal illusary in any case. On a single processor,
it makes sense sometimes to have other (reliable) inter-process primitives
besides datagrams, if (1) throughput is paramount and (2) the set of
cooperating processes will NEVER be distributed. But the overhead of
handling the "retransmission" can be made small (and processes DO die
sometimes, even on uni-processors), so the argument for "reliable" IPC
is weaker than most people think.

Rob Warnock

UUCP:	{sri-unix,amd70,hpda,harpo,ihnp4,allegra}!fortune!rpw3
DDD:	(415)595-8444
USPS:	Fortune Systems Corp, 101 Twin Dolphins Drive, Redwood City, CA 94065



More information about the Comp.unix.wizards mailing list