2xDEQNA + ra disk = hang? (4.3+NFS uVAX)

der Mouse mouse at thunder.mcrcim.mcgill.edu
Fri May 24 17:13:42 AEST 1991


Gotta call for help on this one.  We have a MicroVAX here (try to
contain your winces :-) which has two DEQNAs and a disk controller
using the UDA-50 driver (I would call it a UDA-50, but I don't think
that's the proper term for the Q-bus board; it's an RQ??-50 or
third-party emulator or some such; I can get details if it matters).

This machine had a very old Ultrix (1.something) running on it, and
everything worked fine, so there is nothing drastically wrong with the
hardware setup.

I am trying to put mtXinu 4.3+NFS on this machine.  (This is not the
machine I really want to get this working on; this is a machine we have
on loan from another department for use as a scratch machine debugging
the problem, because the real target machine cannot be taken down
randomly to debug the problem.)

Everything works fine, provided I somehow disable recognition of at
least one of the DEQNAs, by almost any method: by taking the board out
of the system, by removing it from the configuration, by adding code to
if_qe.c to ignore it...and it doesn't matter whether I disable qe0 or
qe1; in each case, the other one works fine.

But if I build a kernel that tries to use both qe0 and qe1, the system
hangs.  When I crashed it a post-mortem stack trace seemed to implicate
the uda driver.  So the next thing I did was to install Chris Torek's
uda driver; with it in there, I get "uda0: lost interrupt" followed by
a bus reset.  The system remains hung, and after a short time (probably
somewhere between 10 and 30 seconds, from memory) this repeats.  It has
kept on repeating for as long as I've had patience to let it.

So I started in debugging it.  First action was to add code to if_qe to
disable initialization of the second qe at various points.  After a bit
of this (the edit-compile-test cycle is not the zippiest), I convinced
myself that the call to if_ubaminit() in qeinit() was at fault.  I then
moved the test in there and localized it to the loop that calls
if_ubaalloc for the receive mbuf clusters.

I must admit this has me baffled.  No problem is caused by running
through this code once, for the "working" qe.  It's only when it's
called twice, for qe0 and qe1 both, that there's any problem.  And even
then, it's the uda driver, not the qe driver, that loses interrupts!

And I'm convinced it's a software problem because it happens on both
this machine and the real target machine, and this one ran Ultrix just
fine with the same hardware.  And, there's another uVAX in another lab
with two DEQNAs, ra disks, and Ultrix, running fine.

I'll be nosing around looking for any further hints, but this is
getting weird enough that I don't really have any confidence left that
I can find it in a reasonable amount of time, so I thought I'd ask the
net and see if anyone could help....

Anyone?

					der Mouse

			old: mcgill-vision!mouse
			new: mouse at larry.mcrcim.mcgill.edu



More information about the Comp.unix.wizards mailing list