Xenix TCP/IP

Fri Feb 16 20:09:45 AEST 1990

In article <1990Feb13.131255.3683 at dlcq15.datlog.co.uk> cpm at dlcq15.datlog.co.uk (Paul Merriman) writes:

   Problem 1)
   ---------

   Occasionally we get a kernel panic as follows:-

   TRAP 0000000E in SYSTEM, error code 06000000
   eax=FF030202 ebx=00000000 ecx=4A000001 edx=00000030
   esi=0008A204 edi=4A000001 ebp=06000620 fl=00010282
   udc=00030018 es=00000018 fs=0003003F gs=0000003F
   tr=00000100 pc=0090020:0001A12b ksp=060005B8

   kernel: PANIC: non-recoverable kernel page fault

I have seen this also.  I assume you have tcp/ip 1.0.1d.  We were
trying to get things going over StarLAN and the WD driver was buggy.
We contacted WD and got a new driver.  We still get the panics, and
sometimes a message which says: "qenable would have been called with
NULL in wdsched() for XWAIT" and then a panic.  What module (use nm)
is at the pc above?  SCO told us they have an even more recent WD
driver than the one we got from WD.  The said they just fixed a bug on
Friday, February 9, 1990!

   Problem 2)
   ----------

   This has been seen on the above Unisys machines with Western Digital network
   card and a Compaq with 3Com card.

   A number of processes which have socket connections to other machines break
   their connections. It should be mentioned here that these processes use 
   non-blocking writes and an alarm call to determine when to "give up" on the
   write and break the connection. In one case you could not then connect to
   the machine across the network (telnet, rlogin), though the machine is 
   running and can be accessed from the console. In some cases the connections 
   have managed to re-establish themselves some time later. 

We have a similar problem where our main host on the network goes
deaf (can send out packets but not receive them).  It seems to be load
related, i.e., it occurs when we have lots of activity into the
machine (4 or more telnet sessions).  I used the streams watch
utility, sw, but couldn't see anything unusual.

We have another problem here with SCO TCP/IP one host, the main one,
spits out "Note: tcp sum: source <ip-address> sum <hex number>" every
now and again.  I assume that these are warnings that a packet has
been received with a TCP checksum error.  The scary thing is that the
network is very clean and the IP address of the source is sometimes
the IP address of another Xenix box on the network.

We have been told that the new TCP/IP code is in QA at SCO right now.
Our plan of attack is to try and get a copy of the new (newer :-) WD
driver and see if that helps things.  We have not tried 3com boards.
Maybe we should.  Also, it was suggested that we try doing some
telnets to ourselves (which uses the loopback driver) to see if the
problem is driver related or socket related.  (If it just weren't so
intermittent...)

I hope this rambling helps.  You have my sympathy.
--
Adnan Yaqub
Star Gate Technologies, 29300 Aurora Rd, Solon, OH, 44139, USA, +1 216 349 1860
[...cwjcc!ncoast ...uunet!abvax ...ism780c ...sco ...mstar]!sgtech!adnan