Stuck messages in queues with btrieve

Robert Story story at can503.UUCP
Wed Sep 20 21:24:31 AEST 1989


In article <295 at can503.UUCP> I wrote of the following :
>The problem seems to arise under heavy load, with 3 to 6 users all running
>our financial application and printing documents.  A process will msg to 
> btrieve and then set an alarm for 60 seconds and sit on the msgrcv call.  
>With a large load one or two of the processes will get the alarm signal.  
>Examination of the message queues with ipcs shows messages from/to Btrieve in
>the queues but attempts to read these messages with msgrcv() and message type
>set to zero show an empty queue.  A call to msgctl() with IPC_STAT reports 
>messages in the queue but the pointers to the first and last messages are 0.
>Subsequent messaging to btrieve carries on as normal.

We had a person from SCO on site for a week and last Saturday found the
problem.  IT IS a kernel bug.  If the kernel is copying to/from the user's
data area and suffers a page fault then the kernel will put this process to
sleep.  In the meantime another process also using the message queues can
steam through and do its thing.  When the original process wakes up it will
have had its pointers realigned and, of course, weird things begin to happen.
Sometimes the free list turned up on queue 1 or queue 0 turned up on the free
list.  Which explains why ipcs thought that there were messages when there
weren't.  This problem has been fixed in the ATT 3.1 code and the SCO 3.2 code
by using semaphores in the critical areas.

I hope this helps others.  It cost our company a lot of money to discover
this one.  This bug only surfaced before a major release so things were
pretty tense here.  I had a good time, though.  It's not every day I get to
assist in debugging kernel code. If anyone wants more details, please e-mail
me.

-- 
[ Robert Story    ..{!utzoo!censor,!uunet!zardoz!avcoint}!avcocan!story     ]
[ SnailMail : AFS 201 Queens Avenue London Ontario Canada N6A 1J1           ]
[        or : AFS 3349 Michelson Drive Irvine California USA 92715-1606     ]
[ Voice     : +1 519 672-4220 xtn 233                                       ]



More information about the Comp.unix.xenix mailing list