sys5, job control, etc.

Karl Kleinpaste karl at cbrma.UUCP
Sat Mar 8 07:40:22 AEST 1986


> The one and only real thing which bugs me about system V is the
> total lack of job control. e.g. in the mail program and having to check
> something out .. fancy being faced with either writing your mail
> out (and later having to read it in again), or having to fork up
> another shell. With a big load up either of these options can be
> painful. The functionality of ^Z and fg is tremendous!

*Sigh*.  Yes, ^Z, along with fg, bg, stop, and all those other
neat-nitfy functions are sure useful.  It should be important to
realize that it's possible to do something job-control-like (notice
the -LIKE) in plain old vanilla UNIX System V (even 5.0, such as I'm
running).  There is a facility in any version of UNIX absolutely
anywhere for stopping processes.  It's called ptrace(2).

(You may now depart momentarily to the men's or ladies' room in order
to lose your lunch.  You may then feel a mad desire to spend 2 or 3
minutes laughing hysterically; one person actually did so when I
described the following implementation to him.)

I've implemented pseudo-job-control in a SysV version of csh.  This is
a csh which started out life in 2.8BSD, that is, utterly
job-control-less.  In fact, it didn't even understand the concept of a
complete job, but just a bunch of arbitrary, unrelated processes.
Quite some time back, I convinced it to believe in the unit of a
`job,' which I defined as a set of processes spawned during a single
(cumulative) call to execute().  This csh at that point understood how
to wait for a whole job, and even to deal with processes terminating
due to SIGPIPE.  (Previously, it couldn't cope with SIGPIPE very well,
because you don't necessarily hear about the dead pipe-writing process
before the dead pipe-reading process.)

>From there, it was no really major expansion to convince the csh that
it should be able to call ptrace(2) just before exec-ing a program
(unless it's setuid or setgid, which is defeated by ptrace), and that
it should know what to do with a process or group of processes which
show the standard ptrace-style exit (aka `stopped') status.  All that
remained was to give it a standard mechanism for generating a
SIGTSTP-equivalent signal.  Since no one I know uses SIGQUIT very
much, SIGQUIT became an emulated SIGTSTP, and I `stty quit ^Z' all the
time now.  You can still force a core dump of a program by first
stopping it with SIGQUIT/SIGTSTP, and then issuing the new builtin
command `core,' which restarts the job with SIGQUIT pending against
it.  Presto, core dump.  So no capability was lost, except that the
ability to core dump is now delayed one stage.

This csh now also understands a `suspend' builtin, as well as fg, bg,
and stop commands, albeit with a slightly different syntax, and there
is no concept (yet) of the `current' job.  (Fg, bg, stop, and core all
assume the last job created is `current.')  In fact, there's a
suspended superuser csh sitting right now under the same csh which
started this incarnation of rn.

Note that this mechanism is inherently dangerous.  There is still no
means of preventing one signal from getting distributed to every
process attached to the terminal, and hence there are problems with
managing SIGINT; in general, it's only safe to restart fg jobs in the
fg, and bg jobs in the bg.  Also, sub-processes of the csh-started
processes are not controlled by ptrace, so things which create
children like this (notably, make(1)) should be run in the bg.

However, it's possible to turn job control off (with the jobs
builtin), so you can avoid all the problems entirely if you like.  The
result then is a csh which is like it was before I put in the job
control emulation, understanding and waiting for whole jobs, but not
doing any real management of them.

One last thing is that there are 2 bugs in ptrace which ought to be
fixed in order to make things maximally wonderful.  One is that ptrace
should be extended slightly to allow ptrace(-1, 0, 0, 0) to turn off
ptrace-ing; csh would like to do this any time it starts up, so that
hitting *it* with SIGQUIT doesn't stop it.  Two is that the routine
stop() in os/sig.c should not merely wakeup the parent when a ptrace'd
child stops, but rather should psignal(parent, SIGCLD); without this,
bg jobs which get hit with signals are not restarted right away as
they should be.  I asked net.unix-wizards about these fixes about 6
weeks ago.  I got no responses at all, so I just assumed that no one
thought they would be any problem.
-- 
Karl Kleinpaste



More information about the Comp.unix.wizards mailing list