Bug in 4.3bsd networking
Larry Philps
larry at hcr.UUCP
Wed Aug 9 03:35:37 AEST 1989
Subject: Running out of mbufs can cause a system panic
Index: sys/uipc_socket2.c 4.3BSD
Description:
When a 4.3BSD system is accepting a new TCP connection, the following
scenario can take place.
i) sonewconn gets called to allocate a new struct socket for the
connection. It calls soqinsque with the old and new sockets
to queue the new one on the list of connections waiting to
be accepted by the user. As well, so_head in the new socket
structure is set to point at the original socket.
ii) tcp_usrreq is called with the new socket and the cmd PRU_ATTACH.
It in turn calls tcp_attach.
iii) tcp_attach calls in_pcballoc to allocate a new internet protocol
control block. It then calls tcp_newtcpcb to allocate a new
tcp protocol control block.
iv) This is where the problem starts. Say you are out of available
mbufs and can't get another without sleeping so that the call
to tcp_newtcpcb fails. (Very hard to reproduce, but it happened
to me!)
v) tcp_attach decides to take the connection apart, so it calls
in_pcbdetach to undo the in_pcballoc that completed succesfully.
vi) in_pcbdetach calls sofree to free the data held in the socket.
sofree undoes the the effect of the soqinsque done in sonewconn
by calling soqremque, then it sets so->so_head to NULL.
vii) We return up the call chain to sonewconn who calls soqremque to
undo the effect of its soqinsque. Ooops. sofree has already
done this, and a NULL value in so_head will cause a segmentation
violation in soqremque.
Repeat-By:
Good Luck! The code in question was running on a multiprocessor version
of System V Release 3 with BSD networking, and the number of locks
involved in acquiring physical pages from the shared memory made the
probability of failing to get an mbuf much higher than on most single
processor implementations. Even so, the code had been running for over
6 months before we saw this happen.
You will probably have to hack tcp_attach to "fail" the tcp_newtcpcb
call under some conditions if you want to see this happen.
Fix:
The simplest thing to do to prevent this problem is to avoid the second
call to soqremque. This is easy to do since the routine that did the
first one, sofree, set so_head to NULL afterwards. So, just supress the
second call unless so_head is still non-NULL. This will be the case
if the in_pcballoc fails instead of the tcp_newtcpcb.
*** uipc_socket2.c.orig Thu Jul 20 13:22:04 1989
--- uipc_socket2.c Thu Jul 20 13:22:07 1989
***************
*** 194,200 ****
soqinsque(head, so, 0);
if ((*so->so_proto->pr_usrreq)(so, PRU_ATTACH,
(struct mbuf *)0, (struct mbuf *)0, (struct mbuf *)0)) {
! (void) soqremque(so, 0);
MFREE(m, junk);
goto bad;
}
--- 194,213 ----
soqinsque(head, so, 0);
if ((*so->so_proto->pr_usrreq)(so, PRU_ATTACH,
(struct mbuf *)0, (struct mbuf *)0, (struct mbuf *)0)) {
! /*
! * If we got most of the way through the attach before
! * failing on the tcp control block, in_pcbdetach could
! * have been called, which in turn would have called sofree.
! * sofree does soqremque(so, 0) and soqremque(so, 1) then
! * sets so->so_head = NULL. Thus another call here will cause
! * a segmentation violation becasue soqremque does not check
! * for a valid so->so_head field.
! *
! * We fix this here by not calling soremque again if so->head
! * is already NULL.
! */
! if (so->so_head)
! (void) soqremque(so, 0);
MFREE(m, junk);
goto bad;
}
Larry Philps HCR Corporation
130 Bloor St. West, 10th floor Toronto, Ontario. M5S 1N5
(416) 922-1937 {utzoo,utcsri,uunet}!hcr!larry
More information about the Comp.bugs.4bsd.ucb-fixes
mailing list