strange problems (looking for help)

Rick Ace rick at nyit.UUCP
Tue Apr 29 00:16:13 AEST 1986


> I wonder if anyone recognizes the following symptoms as symptoms of
> something concrete I can try to fix.  We are running 4.3BSD on a
> VAX11/785.  The disks are 3 RA81s on a single UDA.  The uda device
> driver is version 6.12 from Berkeley (9/16/85) which seems to be equal
> to or derived from a DEC driver from January 84.  I am not getting any
> kernel error messages at all.  Here is symptom number 1:
> 
> % ls -l data
> -rw-r--r--  1 pcraig     4480000 Apr 17 11:19 data
> 
> % cmp data data
> data data differ: char 1777665, line 30650
> 
> % cmp data data
> data data differ: char 1654785, line 28531
> 
> % cmp data data
> data data differ: char 1683457, line 28955

...

> All of our symptoms could be explained by bad reads.  That is, if we
> don't always get the same data off the disk when we read it we would
> get the symptoms we're getting.  However, we have never gotten any sort
> of disk read error messages on the console or anywhere else.  Thanks.
> 
> Steve Hubert
>  Dept. of Stat., U. of Wash, Seattle
>  {decvax,ihnp4,ucbvax!lbl-csam}!uw-beaver!entropy!hubert
>  hubert%entropy at uw-beaver.arpa

Sounds like flaky hardware.  Trouble is figuring out which piece of
gear is the culprit.  Here are some ideas:

1.  The UDA50 is sick.  See if Field Service will swap it for a spare
and try your experiments again.

2.  Another peripheral on the UNIBUS with the UDA50 is misbehaving and
corrupting the data transfer between the UDA50 and the UBA.  Try your
experiment after removing all UNIBUS devices except the UDA50 (remember
to install grant cards and NPG jumpers where necessary).  I've seen a
malfunctioning UNIBUS device make trouble for its neighbors before!

3.  The UNIBUS DD11 backplane has a problem.  This is a bit of a pain
to troubleshoot unless you have a spare backplane.  Or, if your
backplane is in two or more sections, shorten it to one section
and run the experiment, then try another one of the sections.

4.  The UBA or the UNIBUS cable is malfunctioning.  Again, ask Field
Service to swap as much gear as they can.

5.  Other unix wizards suggested possible problems in the KA785 CPU
and the memory controllers; these are also suspect.  Ask Field Service
to check the revision level of your CPU hardware, and to apply any
FCOs that you don't already have (i.e., get your money's worth for
your service contract).

If I read the DEC PDP11 Bus Handbook correctly, it appears that the
data lines on the UNIBUS are not parity-checked.  This would explain
why you're not seeing any diagnostic printf's from the kernel:
UNIBUS data can get mangled undetectably on its journey from the
UDA50 to the UBA.

The tried-and-true "swap it for a spare" approach is often the most
expedient route to solving problems like yours.  For what it's worth,
we're using an RA81/UDA50 on a Vax-11/780-5 (that's a CPU that was
born a 780 but received a 785 CPU transplant later in life) under
4.2bsd with the RIACS UDA50 driver, so such a hardware configuration
*can* work.  Our UDA50 sits alone on its own UBA because it won't play
nice with the boys on the other UBA, tho.

-----
Rick Ace
Computer Graphics Laboratory
New York Institute of Technology
Old Westbury, NY  11568
(516) 686-7644

{decvax,seismo}!philabs!nyit!rick



More information about the Comp.unix.wizards mailing list