hp-ux 7.0/800 select() strangeness?

Chris Hanson cph at zurich.ai.mit.edu
Thu Sep 6 19:40:07 AEST 1990


   From: MAH at awiwuw11.wu-wien.ac.at (Michael Haberler)
   Date: 30 Aug 90 15:16:05 GMT

   I have encountered a strange behaviour of several programs which use
   select(2) on hp-ux 7.0 on the Series 800. All of these programs are
   'ported' BSD code, so I have the suspicion there's something in common:

   It seems that programs which have select(2) in their inner loop sometimes
   start using enormous amounts of system cpu time, just as if the select()
   call would return immediately as if it were polling. Among those programs
   are Xemacs 18.55, Greg Minshall's tn3270, and named4.8.3.

   Xemacs tends to do this especially if the X server terminates before emacs.
   I did'nt find a explanation for named behaviour. With tn3270, it looks like
   a modem disconnect and thus eof on the tty would cause tn3270 looping.

I managed to get emacs into that state last night, and debugged it.
What happened was as follows.

I normally run several subprocesses under emacs.  At the time that the
problem occurred, there were two active subprocesses, and two exited
subprocesses.  Emacs still had all four subprocesses in its tables.
Emacs's command reader checks all of the subprocesses periodically for
input, using the `select' call on the input file descriptors of the
processes, and due to some peculiarities of its design, it was
checking all four of the subprocesses, even though two of them no
longer existed.

This `select' call was returning with a single bit set, which
indicated that the input file descriptor from one of the dead
subprocesses had some input that could be read.  Emacs then dutifully
went into a `read' call on that descriptor, which fortunately was set
to non-blocking mode, and the `read' call returned saying that of
course there was no data.

In summary: we have two processes and a pipe from one to the other.
The read side of the pipe has been set to non-blocking mode by the use
of O_NONBLOCK.  The process on the write side of the pipe finishes by
calling `exit'.  The process on the read side receives SIGCHLD and
uses `waitpid' to extract the exit status of the now-dead subprocess.
It then does a `select' on the read side of the pipe, which returns
indicating that the pipe has some data to be read.  The process calls
`read' on the pipe, which returns zero indicating no data is
available.  Etc.

Now I'm no expert, but it's my belief that `select' shouldn't indicate
that the pipe has input in this situation.

For information: this behavior has been observed (by others) when the
subprocess is using a PTY to communicate with emacs, although it has
not been debugged and thoroughly examined in such a case.

PS: Emacs is being changed so that it does not attempt to use `select'
on connections to dead processes.  Version 18.56 will not have this
problem.  If anyone is interested in a patch for 18.55, they should
contact me directly by e-mail.



More information about the Comp.unix.internals mailing list