Autoconfig

Thu Jun 30 04:29:51 AEST 1988

I'm led to believe from reading "Building Berkeley UNIX* Kernels with
Config", that if your system works, you should be able to power down
your system, pull out a controller from the bus (replacing it with a
grant card), and reboot the system, and your system will still boot,
as long as the controller that you removed wasn't critical for
booting.  Unfortunately, this usually does not seem work for us.

Depending on which controller I pull out, some different sorts of
things happen.  For some boards it works.  An example of this is a
DHV11.  If I remove this, the system still boots fine.  For some
controllers, the system hangs when it gets to the point in the
autoconfig sequence where the missing controller would normally be
found.  A more preculiar mode of failure happens when I remove a
second disk controller.  The autoconfig sequence finds the first
controller twice!  And both times it finds it at the same CSR address.
It assigns each disk drive to two different device names.  The
autoconfig sequence then merrily continues on, and seems to be working
fine, until the system finally gets to the point where it tries to
give you a /bin/sh.  At this point it hangs.

Does someone have any idea what is going on, and how I can get things
to work, so that I can remove controllers without building a new
kernal?

We use VAXstation II's, running 4.3BSD+NFS (from U of Wisc).  The disk
controllers are Sigma RQD11-EC's (ESDI MSCP Qbus controllers).

I also have another, perhaps related, problem, which maybe someone has
an idea about.  We have a uVax-II with two of the aforemention disk
controllers and the aforementioned kernal.  It also has a Wespecorp
tape controller.  I want to put in a DHV11, but whenever I do, it
doesn't work right.  With the DHV11 in, autoconfig seems to find it
fine, but if I try to run 'stty' on one of the DHV11's terminal lines
(let's say "stty all > /dev/ttyS0"), it hangs.  If I do this from the
Bourne Shell, I can ^C out of it, but I get some sort of error (I
don't remember the exact message...  perhaps something like "no such
device").  If I do this from the C Shell, ^C and ^Z don't do anything.
Another problem that seems to occur with the DHV11 in, is that some C
programs, occasionally, when trying to dump core, cause the whole
system to become wedged.

I'm pretty sure I have the right device numbers on /dev/ttyS0, because
we have other systems with a DHV11 and the same kernal, and the DHV11
works on them.  The other systems, don't, however, have a tape
controller and two disk controllers.  Another piece to the puzzle is
that the tape controller in the past seemed to be causing us some
problems.  The problem was that whenever a filesystem on a disk
controller that was farther out on the bus than the tape controller,
was dumped to tape, any process, including the process accessing that
disk drive would hang.  The fix for this was to move the tape
controller to be further out on the bus than all the disk controllers.

I thought for a while that perhaps the problem was that we weren't
using the official DEC CSR addresses and interupt vectors for the disk
controllers and DHV11. I didn't think with Unix this should make any
difference as long as everything was spaced out enough.  (The official
DEC CSR addresses and interupt vectors are a real pain, because if you
add another disk controller, you have to go and perform hairy
calculations and then use those to guide yourself in flipping dip
switches on the DHV11).  In any case, I went through all the work of
making all the CSR addresses and interrupt vectors be up to DEC
standard, and this changed nothing.

Anyone have any ideas?

|>oug /\lan
   (or nessus at athena.mit.edu
       nessus at mit-eddie.uucp)