KEEPALIVE's do not always work.

Thu May 30 03:48:23 AEST 1985

Here is a rather complex BSD implmentation of TCP protocol bug.
It is a problem with TCP not BSD, but a good solution is hard to decide;
what's your opinion.

Scenario:

Suppose two programs have a tcp/ip connection over a socket with
keepalive's set.  (example rlogin or telnet).  Keepalive's mean that
every thirty seconds a null packet is sent.  The idea is that if this
null packet is not acknowledged, then eventually the idle timer will go
off and the connection will terminate. 

For discussion, let's call the two hosts A and B.  Suppose host B goes
down then the keepalive packet from A to B will never be acknowlged so
A will know the connection is severed.  The idle timer is 15 min. in
4.2BSD.

Problem:

Suppose host B goes down and then comes back up in less than 15 min.
(obviously it can't be a vax :-) ) When B gets a keepalive packet, it
will correctly send a RST (reset) back to A.

The problem is that since the keep alive packet is a window update
pointing one past the valid window, the RST is ignored by A.
Yet, the idle timer on A is cleared by this invalid RST, so
A will never abort the connection.  The socket on A will hang forever
open!

Possible solutions:

A) Make RST's outside window valid.
	The TCP protocol says that RST outside the current window
	are to be ignored.  Let's not violate the protocol.

B) Send a different keepalive packet. (i.e. inside window).
	What is a packet which is a guarenteed no-operation?
	Any data inside window has already been seen by B
	and therefore no acknowledgement is neeeded.

C) Don't use keepalive's.
	Then how to you keep idle connections alive.

D) Don't use tcp.
	then what, ISO (are you kidding? :-( ).

Comments?