strange happenings/bugs on a 3B1 (medium long)

Sat Jul 29 00:56:22 AEST 1989

First, some background.  Backups on 3b1s can be a royal pain on floppy
disks, and we can't afford to buy tape backups for all of our machines
(40+ in 4 states).  Our solution: a new backup program.  We're
currently hacking on `afio', which was posted to the net some time
ago.  In the process we've come across several bugs that we'd like to
bring to the attention of the net at large, and see if anyone has
solutions.

	Our mods to afio: in order to speed/simplify verification of
the floppy, we keep 1 floppy disk worth of data in-core at all times.
Second, we've added an option to automatically compress(1) the data
before writing (which necessitates a fork(2)).  For 360k floppies,
this is no big deal.  However, for those machines where we've
installed 3 1/2 inch drives, the 795k buffer required causes forks to
fail with great regularity, presumably because we're out of swap
space.  We have at least 2 1/2 M RAM on each machine, with 5000 blocks
of swap space.
	Our solution: use enough shared memory segments to hold the
disk image.  This works perfectly as long as the output of afio is
directed to a regular file -- forking is MUCH faster.  (By the way,
the maximum size of a shared memory segment on the 3b1 running 3.51 is
262144 bytes == 2^18.)  If we attempt to write a block of data from
the shared memory segment to /dev/{r}fp021, the write(2) hangs and the
process is unkillable -- we have to reboot to get rid of the process.
Our current work-around is to malloc(3) an additonal buffer the size
of a disk block, memcpy(3) from the shared memory segment to the
malloc()ed buffer, and write the malloc()ed buffer (ICK!!).
	The next problem, which we DON'T have a workable solution for
yet, is difficulties opening /dev/tty.  I know that there have been
discussions of this on the net in the past, but I don't recall ever
seeing a definitive statement of the cause/cure (possibly because
there isn't one :-)).  Running afio directly or 1 shell `deep' works
fine (so far, anyway).  Running a shell script which runs another
shell script which runs afio causes all open(2)s of /dev/tty to fail.
This makes it extremely difficult to prompt for additional disks.  Any
suggestions (other than "don't do that")?
	We can mail a small (98 line) C program which demonstrates the
`hanging write(2) from shared memory' to anyone who would like an
immortal process of his/her very own.  We will also make the patches
to afio available as soon as we get it working.  WARNING:  We're well
aware that compress(1)ing the data before backing it up makes the
backup much more fragile, but we have so much data that needs to be
backed up on a daily basis that anything else (short of tape drives
which we can't afford right now) is unworkable.
	Finally, there are apparently problems with the GDGETA and
GDSETA ioctl(2)s as documented in gd(7).  While GDGETA will return a
struct gdctl, GDSETA apparently only sets the in-core copy of this
buffer.  Dismounting the floppy, closing it and re-opening it reveals
that the struct gdctl on the disk is unchanged.  We also can't figure
out how to do the checksum mentioned in <sys/gdisk.h>.

We have open tickets at the AT&T Hotline for all the problems
mentioned above, but no answers yet.  RE the hanging write problem:
one of the techinicians at the hotline told us "we don't support C
programming...".  Interesting...

Thanks in advance for any help.

Mail can be addressed to me Rich Kuhns(rjk) or Jeff Buhrt (buhrt) at
{the_known_world}!newton.physics.purdue.edu!sawmill!{rjk, buhrt}

PS An additional feature we've added to afio which makes users very
happy is the ability, when a verify on the (for example) 3rd disk
fails, to reformat the disk (or a new one) directly from afio and
continue with the backup from the point where it failed.