4D-200 series hangs frequently

Chris Wagner jwag at moose.asd.sgi.com
Fri Jun 21 12:28:10 AEST 1991


In article <84172 at bu.edu>, jdh at pub.bu.edu (Jason Heirtzler) writes:
|> We have a problem with our 4D-220 and 4D-240 series machines
|> hanging a lot.  The symptoms vary from just the window system
|> locking up (you can still rlogin in) to sometimes the whole
|> machine will hang -- and with five 4D-200 series machines, we
|> probably average one machine hung each day.  Sometimes, when
|> the machine(s) keep running, various combinations of running
|> /etc/gl/restart_gl and using the window system "hot key" sequence
|> (F12-/-whatever) will return the console to normal.  But then,
|> the other times, only /etc/reboot (or pressing "reset") will do.
|> 
|> After numerous calls to the hotline, there's been no improvment,
|> and I'm sure I've personally installed every "dot dot" release
|> since the early 3.1 days.  Everything is running release 3.3.2
|> at the moment, and I'm waiting for my latest call to be returned
|> with another "Gee.. dunno.. have you tried 3.3.3?"
|> 
-- 

The problems you present are most likely  derived from a few different
issues. It is usually important, when trying to improve things
to start classifying the 'hangs'.

For example, if the graphics wedges, and the rest of the system (network, etc)
seems ok, then look in /usr/adm/SYSLOG for any messages from the
graphics hdw, and do some ps listings to see if there is a particular
process that is usually present, that is doing graphics...


As for the entire machine hang, again, trying to classify the problems
can help to zero in on the problem

so:
1) any nfs hard mounts???

2) any suspicious logs in SYSLOG (like disk errors??)

3) can you ping it

4) are the front panel LED digits blinking???

5) can you rsh in (not rlogin necessarily)

There are also some statistics that may help - like running netstat -m
to determine network memory usage, and sar to determine system load
sometimes these statistics can help characterize what your
doing thats slightrly different than others and therefore bringing
out some bug (software or hardware)

I would also suggest running the ecc(1) command to be sure that
your memory is ok.

Listings of #users (how are they logged in - telnet, rlogin, ftp??)
are also useful


This data should be able to help the hotline - keep bugging them!!!

(and by the way, have you tried 3.3.3? :-)

----
Chris Wagner (jwag at sgi.com)



More information about the Comp.sys.sgi mailing list