panic: bad rmfree under Ultrix 2.0-1

John Sloan jsloan at wright.EDU
Fri Feb 12 02:52:42 AEST 1988


We're hoping someone has seen this before, and knows more about it than
anyone we've been able to talk to at DEC so far (we're still talking, so
by the time you read this, the problem may have been resolved. If anyone
else if having this problem, and we get some useful information, we'll
pass it along.)

Crash:		panic: bad rmfree

System:		VAX 785
		Ultrix-32 2.0-1

Background:

We're running 2.0-1 on a 785 with two RA81s. We had the second RA81
installed a couple of weeks ago, and installed Ultrix immediately after
that, after which we began to experience the panic described above. We
also run Ultrix on a 750 (and have been since the 1.0 days), and have
never seen this problem before.

During the install on the _785_, we changed the partitioning on BOTH
RA81s. Both RA81s are partitioned exactly the same. Among other things
we increased the swap area on both drives. Our _750_ has an RA80 (ra0)
and a RA81 (ra1) and we increased the swap space on the RA81 on the 750
in the same fashion when 2.0-1 first came out and have been running it
without incident since then. The new partitioning and file system
structure on the 785 looks like this (the NEW RA81 is ra1).

/dev/rra1a
Current partition table:
partition     bottom        top       size    overlap
    a              0      15883      15884    c
    b          15884      82763      66880    c
    c              0     891071     891072    a,b,d,e,f,g,h
    d         131404     254396     122993    c,g,h
    e         254397     377389     122993    c,h
    f         377390     891071     513682    c,h
    g          82764     246923     164160    c,d
    h         246924     891071     644148    c,d,e,f

Filesystem    total    kbytes  kbytes  percent
   node       kbytes    used    free   used    Mounted on
/dev/ra0a       7423    5692     989    85%    /
/dev/ra0g      77983   52209   17976    74%    /usr
/dev/ra0h     306551   28387  247509    10%    /usr/local
/dev/ra1a       7423       9    6672     0%    /tmp
/dev/ra1g      77983   13927   56258    20%    /usr/spool
/dev/ra1h     306551      11  275885     0%    /thor/users

We believe that our /etc/fstab, /etc/rc.local, and our kernel are
set up correctly for swapping on the second disk (as it is on our
750).

When we first starting having this panic, the new RA81 was ra0. To see
if the problem followed the drive we backed up, swapped drive plugs,
and restored, and the problem appears to follow the new drive (which is
now ra1). In fact, it panic'ed during the restore to the new RA81.  Of
the three crash dumps we've examined (out of perhaps a dozen), this is
what typical of what dbx -k tells us.

csh> dbx -k /usr/adm/crash/vmunix.5 /usr/adm/crash/vmcore.5
dbx version 2.0 of 4/2/87 22:10.
Type 'help' for help.
reading symbolic information ...
[using memory image in /usr/adm/crash/vmcore.5]
sbr 80061470 slr 7e00
p0br 804ffa00 p0lr 160 p1br 7fd00200 p1lr 1fffdc
(dbx) where
sleep(0x80110aa0, 0x14) at 0x80025e04
biowait(0x80110aa0) at 0x80006436
bwrite(0x80110aa0) at 0x80005d3b
dirremove(0x7fffed94) at 0x80041077
ufs_unlink(0x800bd5d8, 0x7fffed94) at 0x8004179e
unlink() at 0x8000a5b5
syscall() at 0x8004ea5f
Xsyscall(0x7fffe15c) at 0x80001d7b
(dbx) q
csh>

This is consistent with tracing by hand the pc's in the kernel stack
panic dump that is printed on the console with a printout of the
namelist from /vmunix.

In all three cases the system was doing a bread or bwrite on the new
RA81.  We think the problem might be swap space related on the new disk
because rmfree is used to deallocate entries in a resource map;
resource maps seem to be used mostly in virtual memory management; the
system dies in a sleep while it is presumably trying to switch
processes to wait for the I/O to complete. We don't have Ultrix source,
so this is all conjecture. At all times, the system had only one or
no users (but things may have been running in background).

We can't get it to fail consistently. We ran the DEC standalone disk
formatter and scrubber many times. The first time they both reported
68 bad blocks. The second time 66. The third and fourth and fifth
times 67 bad blocks. The blocks reported bad are always a subset of the
originally reported 68. We ran the DEC s/a disk exerciser for 14 hours
without incident. We ran the Ultrix disk exerciser and it panic'ed
within a few minutes, but it does not do so consistently. DDC and Field
Service so far haven't been able to tell us anything else, although
DDC seems unusally knowledgable about Ultrix lately (which is nice to
see).

We think we may have done something really stupid with the
repartitioning or changing the swap space, but we've RTFM'ed until
we're blue in the face, talked to DEC diagnostic center, swapped disks,
and done similar (but not identical) things with our 750 without
problems. We are reluctant to repartition the disks because [1] since
we can't get it to fail consistently, we won't know if its fixed or not
(although that is our next fall back position), and [2] we need the
extra virtual memory.

Has anyone else seen this? Any ideas? Have we missed something obvious?

Thanks. A lot.

John Sloan (CSNET: jsloan at SPOTS.Wright.edu, USENET:...!cbosgd!wright!jsloan)


-- 
John Sloan                     	 Wright State University Research Center
CSNET: jsloan at SPOTS.Wright.Edu  3171 Research Blvd., Kettering, OH 45420
UUCP: ...!cbosgd!wright!jsloan           (513) 259-1384   (513) 873-2491
Logical Disclaimer: belong(opinions,jsloan). belong(opinions,_):-!,fail.



More information about the Comp.unix.questions mailing list