Lots of weird hangs reported - could be a 3.51m bug?

Bruce Lilly bruce at sonyd1.Broadcast.Sony.COM
Fri Feb 8 08:07:22 AEST 1991

In article <1991Feb6.220445.8640 at athena.mit.edu> pschmidt at athena.mit.edu (Peter H. Schmidt) writes:
>  To be brief, I think we've hit on a bug in 3.51m.
>My own problem always has the same basic structure: some background program
>gets weird, like smgr stopping the processing of cron jobs; then getty's on
I've seen this once.

>tty000 and ph1 become unable to answer the phone; a few minutes later, the
>system clock freezes, and at this point all I can do is mouse on the windows -
>typing is never echoed, and the SHFT-SUSP hotkey takes ~2 minutes to change

In cases where no typing echos and things really slow down, it's usually
because all of the clists are full (second bar in first group goes to
zero). This has always been due to a jabbering quasi-modem -- yanking the
plug on it restores it and the computer to sanity.

>Unfortunately, this behavior is maddeningly inconsistent.  It is not related
>to disk I/O, or power supply voltages.  The programs fail in random order, and
>it can happen in as little as 12 hours, or after over a week.  Sometimes, just
>for variety, I get a panic, but never the same one.  Before 3.51m, I would get

I've had quite a few panics, also.

>what now seems like the same behavior, but at intervals of months.  However, I
>can't say for sure that the increased frequency started with 3.51m.  It has
>only gotten really bad in the past 4 months.

I'd say unhesitatingly that things got noticeably worse after
``upgrading'' from 3.51 to 3.51m.  But I don't want to go back
because of the metermaid and a few other added features.
When I ran 3.51, systems would often stay up and running for 4-5
months. Now I'm lucky if things stay up for 2 months, and I
occasionally have to restart a dead or comatose process.

>MeterMaid always shows ample clists and serial buffers, and a decent amount of
I've never seen this dip below about 95% -- I wish it were
configurable so I could use the space elsewhere.

>this, or ideas on how to fix it?

I guess we just have to wait for fixdisk 3 :-).

