unix domain loop or mbuf loss
Alex White
arwhite at watmath.UUCP
Tue Feb 21 17:54:53 AEST 1984
Subject: Loss of mbufs - or - system hang
Index: sys/uipc_usrreq.c 4.2BSD
Description:
Tearing down queued connections in the Unix domain upon a soclose
is done incorrectly. As distributed, it will cause a hang while
things loop in the kernel; the fix distributed:
From: madden at sdccsu3.UUCP (Jim Madden)
Newsgroups: net.bugs.4bsd
Subject: 4.2 IPC machine hang
Article-I.D.: sdccsu3.1238
Posted: Mon Nov 7 03:09:21 1983
Organization: U.C. San Diego, Computer Center
will fix the hang, but will cause you to loose 3 mbuf's every
time you have a queued connection which hasn't yet been accepted
when you do the soclose.
Repeat-By:
First do a netstat -m. Run the following in the background:
#include <stdio.h>
#include <sys/types.h>
#include <sys/socket.h>
#include <sys/un.h>
main()
{
struct sockaddr_un address;
int s;
s = socket(AF_UNIX, SOCK_STREAM, 0);
address.sun_family = AF_UNIX;
strcpy(address.sun_path, "xxx");
if(bind(s, &address, sizeof(address.sun_family) + strlen(address.sun_path)) < 0) {
perror("bin");
exit(1);
}
listen(s, 5);
pause();
}
Then run as many instances of the following as you want - 8 seems
to be the max.
#include <stdio.h>
#include <sys/types.h>
#include <sys/socket.h>
#include <sys/un.h>
main()
{
struct sockaddr_un address;
int s, i;
s = socket(AF_UNIX, SOCK_STREAM, 0);
address.sun_family = AF_UNIX;
strcpy(address.sun_path, "xxx");
if((i = connect(s, &address, sizeof(address.sun_family) +
strlen(address.sun_path))) < 0) {
perror("connect");
exit(1);
}
}
Kill the first process. Do a netstat -m and compare, you will find that
there are 8 extra mbuf's allocated to socket structures, 8
to protocol control blocks, and 8 to socket addresses.
(Note - if you didn't put in the fix from madden at sdccsu3
you will loop in the kernel)
Fix:
madden's fix was totally wrong - the other protocols all free up
the socket in their cleanup routines - some to a disastrous extent
such as udp_usrreq, which in PRU_ABORT it does a sofree right
after invoking in_pcbdetach which also does one; and before doing
a soisdisconnected on the socket it just freed!
(I suspect you should delete the sofree call).
However, for the above described problem first take out the
fix from madden at sdccsu3.
Then in uipc_usrreq.c, unp_drop change
unp_disconnect(unp);
to
unp_detach(unp);
sofree(unp->unp_socket);
**DISCLAIMER: This works and fixes the above bug. I haven't the
foggiest idea if it'll not break various other things; for example
the flow of control it different in the above various routines
for datagram service and has a different set of queues and I really
don't know if it will blow it for them. If somebody has a better
fix please send it to me.
More information about the Comp.bugs.4bsd.ucb-fixes
mailing list