SunOS 4.1 multi-user dump causes crashes

Fuat C. Baran fuat at cunixf.cc.columbia.edu
Sat Jun 2 08:02:53 AEST 1990


Last weekend we upgraded our Sun-4/280's from SunOS 4.0.1 to SunOS 4.1.
Since then they have been crashing (panic: writeback error) every time we
try to backup the disks in multi-user mode using dump (our usual procedure
for daily and weekly backups).  Backups are done to a 1/2 inch tape drive
on a Xylogics 472 tape controller.

The systems crash with the GENERIC kernel as well as a custom config'ed
kernel.  Hardware configuration is 4/280 with rev 26 CPU's (PROM 3.0) (as
well as a rev 22 CPU with PROM 2.8.4 and a rev 14 CPU with PROM 1.7), 3 16
Mb memory boards, one ALM-II, one Xylogics 472 tape controller with one
tape drive, one Xylogics 450/451(?) controller, 2 Hitachi DK815-10 drives.
Most of the time the system hangs after the panic, though once we were
able to get a core dump.

Output on the console at crash time is (addresses vary slightly):

Memory Error Register 1d4<INTR,INTENA,CE_ENA,WBACKERR>
DVMA=1, context=0, virtual address=fff3cfc0
pme=0, physical address=fc0
panic: writeback error
syncing file system...  {at this point it hangs and we have to reset
			 from the cpu board, though in one of the 20
                         or so crashes it saved a core image}

stack backtrace of the vmcore file shows:

_panic(0xf80d1272,0x0,0x1bdc,0xfff3fbdc,0x0,0xf80bcf20) + 6c
_ecc_error(0xffff6004,0xf80a3120,0xc000,0xf80e86f0,0x0,0xf80d1272) + 1c4
_memerr(0x0,0x0,0xffff8000,0x1f0,0xc0,0xd4) + 80
memory_err(?)
_splx(0xf817fc74,0xff005f74,0xff005f74,0x0,0x1,0x64c000) + 14
_hat_pagesync(?)
_page_sortadd(0xf81c4d84,0xf817fc9c,0x80,0x0,0x566000,0xf817fbd4) + 1c8
_pvn_getdirty(0xf817fc9c,0xf81c4d84,0x0,0x12000,0x566000,0xff005f74) + 29c
_pvn_vplist_dirty(0xff005f74,0x0,0x100,0x0,0xf817fcc4,0xf817fc9c) + 110
_spec_putpage(0xff005f74,0x0,0x0,0x100,0x0,0xf8128348) + 1dc
_spec_sync(0x0,0xf80cab90,0xf80cb850,0xf80de9d8,0xff005f70,0xff0fd234) + 98
_sync(0xf81c4fe0,0x120,0xf80c85f8,0xf80c8718,0xf81c5000,0xf80cab48) + 3c
_syscall(0xf81c5000) + 3b4

Since our summer semester started on Tuesday, we haven't had the
opportunity to do exhaustive tests such as single-user vs. multi-user, tar
vs dump, remote dumps, etc., though we have used rdump on our Encore
Multimax systems to back them up onto the Sun tape drives successfully.

Sun software support is currently "working on it".  We made enough of a
fuss so they have given it "high priority".  The first response I got was
"All dumps have to be done single-user, and multi-user dumps are not
supported.  If you want, we can design a custom program to do it, though
you'll have to contract us to develop it," though they retracted this when
I asked for that statement in writing.  Since it crashes the OS, it is a
bug regardless of what is and isn't supported in the application, and they
have finally begun to look into it.  So far they haven't gotten back to me
with an analysis, fix, or estimates on how long it will take for both.

Does anyone else have a similarly configured system running SunOS 4.1?
Can you do backups with the system multi-user?

Does anyone have any ideas as to what the problem is?  We currently are
forced to take the systems standalone to do backups.  Needless to say,
these machines are in constant use 24 hours a day by students working on
homework, and they don't appreciate a 2-3 hour interruption of service for
backups, no matter what time of day or night we schedule it for.  One
other alternative is to give up and downgrade to SunOS 4.0.1, which for
the most part worked (ignoring such things as NFS bugs, VNODE hangs,
etc.)...

Any help, suggestions, or reports of similar occurences would be
appreciated.

Internet: fuat at columbia.edu         U.S. MAIL: Columbia University
BITNET: fuat at cunixf                            Center for Computing Activities
UUCP: ...!rutgers!columbia!cunixf!fuat         712 Watson Labs, 612 W115th St.
Phone: (212) 854-5128    Fax: (212) 662-6442   New York, NY 10025



More information about the Comp.sys.sun mailing list