emacs or csh problem?

Thu Jul 26 23:32:10 AEST 1990

OK - I could use some help from you, the people, who use DECstations.

I have submitted a problem to Digital about trouble starting jobs from
/bin/csh, specifically gnu emacs.  Unfortunately, Digital seems highly
skeptical that I am really having a problem:

   "I think I mentioned that we have hundreds of
   people here using emacs and the reported seg fault problem has never been
   seen here."

	-- unname Digital Ultrix Engineer latest response to the bug report

If other people are having the problem, please let me know (and perhaps
Digital).  I will also let them know that I am not alone with the
problem (or maybe I am!).  Here is the problem:

You are running in /bin/csh like normal, and try and start an application (like
the gnu emacs Digital distributes an unsupported subset):

% emacs
Segmentation fault
% 

In fact, if you keep trying to run emacs, it just keeps faulting:

% emacs
Segmentation fault
% emacs
Segmentation fault
% emacs
Segmentation fault
% emacs
Segmentation fault
% 

However, if you run another task:

% emacs
Segmentation fault
% emacs
Segmentation fault
% emacs
Segmentation fault
% emacs
Segmentation fault
% ls
fish
% emacs
<emacs starts up OK>

The Segmentation fault occurs almost instantly, and core rarely dumps.  The
few times core has dumped, it is always a core dump of /bin/csh, not emacs.
Since the dumps are of the csh, and it fault happens so fast, I believe the
csh is having trouble starting the emacs task, rather than emacs causing the
problem.

I have built two newer versions of gnu emacs and it still happens.  I should
also mention that these faults happen quite rarely to some users (I may see
it once a month), while other users see it a few times a week or even daily!
But no one at my site can make it happen at will.  We have looked for a
pattern, like it happens when you first log on, or after you have been
logged on for days, or just to vt100s, or just X Windowing emacs, etc. 
No pattern
is obvious to us.  Also, we have looked at everyone's .cshrc and .login and
they are all quite different, but everyone sees this problem to some extent.
Also, we have diskless systems (with local swap disk), "dataless" systems
(with a local swap and root partition, but NFS mounting /usr), and "diskfull"
(all disk locally attached) that all experience the problem.

We have been suffering with this problem since the first release of UWS for
RISC - and only our RISC DECstation 3100s show this bug.  It has never happened
on our VAX/Ultrix systems.  We have installed UWS 2.1, UWS 2.2, and
Ultrix 3.1D/2.2D on each DECstation, and no change.

Once, we believe another task failed the same way.  It was the expire program
that runs nightly on one DECstation to clean up network news.  Also possibly
related is a strange observation the occationally things to do seem to get
started in crontab.  Some of our scripts that run out of crontab append to a
log file as their first step - the next day after the job did not run, we look
at the log file and it is not touched!  It also seems to happen rarely, but
once in a while the same task at night will not run for 2 or 3 days, then it
will run nightly for months correctly.  The expire program that failed to start
was in a /bin/csh script started bu cron, and all of these other scripts that
failed are /bin/csh scripts.

Again, if you are think you are experiencing these problems, please let me
know so I can let Digital know it is not a problem unique to my site.  Also,
if you know the problem, PLEASE LET ME KNOW - it's driving us crazy!!!!!!!!!

--Kid.