SCSI Errors on Sparc 1+ with HP97549

Kenton C. Phillips kenton at space.mit.edu
Wed Apr 24 10:00:00 AEST 1991


We're having problems with an HP 1GB (HP-97549S) disk drive on a
Sparcstation 1+, running SunOS 4.1.1.  The system crashes with SCSI
errors, and can be made to do so consistently by issuing a command such as
'chmod 666 *' in a directory with several hundred files.  (However, 'echo
*' doesn't have a problem---it lists those hundreds of files.)  It has
occured to me that chmod (also chown) might have a problem handling a huge
number of files, but that shouldn't crash the system, and it certainly
shouldn't generate SCSI bus errors.  It seems more likely that the
numerous fast disk accesses made by the chmod process are exciting a bug.

We've replaced the CPU board on the sparc 1+.  We've sent the disk back to
the vendor, to verify that it works.  We've put it on a sparc 1 (SunOS
4.0.3, with a re-made filesystem, of course), and it works fine.  The disk
has the 9049 PROM in it, which is supposed to rectify some earlier
problems with the HP drives.  We have identical drives on other systems
(an SLC and an IPC, both 4.1.1) with no problems.  We've had field
engineers in, who couldn't find what was wrong.

Here are some logged error messages (sd1 is the HP disk; sd0, sd2 are
internal Quantum 105's):

> esp0: No command for reconnect of Target 1 Lun 0
> esp0:   polled command timeout
>         State=UNKNOWN Last State=RESEL
>         Latched stat=0x17<XZERO,MSG,CD,IO> intr=0xc<FCMP,RESEL> fifo 0x0
>         last msg out: <unknown msg 0xff>; last msg in: IDENTIFY
>         DMA csr=0x80000000
>         addr=fff14030 last=fff12030 last_count=2000
>         Cmd dump for Target 1 Lun 0:
>         cdb=[ ]
>         pkt_state 0x0 pkt_flags 0x9 pkt_statistics 0x0
>         cmd_flags=0x100 cmd_timeout 0
>         Mapped Dma Space:
>                 Base = 0x0 Count = 0x0
>         Transfer History:
>                 Base = 0x0 Count = 0x0
> panic: polled command timeout
> syncing file systems... [1] 1 [1] 1 [1] 1 [1] 1 [1] 1 [1] 1 [1] 1 [1] 1 [1]
> 1 [1] 1 [1] 1 [1] 1 [1] 1 [1] 1 [1] 1 [1] 1 [1] 1 [1] 1 [1] 1 give up!

Does anyone have any suggestions?  It's gotten to the point where
defenestration seems like a reasonable option.

Thanks in advance for any information.

Kenton C. Phillips
Computer Systems Manager
MIT Center for Space Research
kenton at space.mit.edu




More information about the Comp.sys.sun mailing list