Questions bru-ing in my mind

Sat Dec 29 04:07:22 AEST 1990

[Consisting of further comments on Dave's reply, plus a new thread at the end.]

In article <1990Dec26.200730.7738 at odin.corp.sgi.com> olson at anchor.esd.sgi.com (Dave Olson) writes:
>In <1990Dec24.155048.29640 at cunixf.cc.columbia.edu> shenkin at cunixf.cc.columbia.edu (Peter S. Shenkin) writes:
>| > = olson at anchor.esd.sgi.com (Dave Olson)
>
>| [[ bru -eZ ]]
>| But if I could do this, I could do something like
>| 	bru -evZ / >& tmpfile
>| at the beginning of my backup day.  It would take a few hours to run,
>| but then I could feed tmpfile into an awk script or other simple program which
>| would divvy the files up into single-volume-sized groups, and then I could
>| bru the groups one by one.
>
>I've been convinced by you and other people's responses that -eZ
>should work.  I'll change that for the next major release, probably
>with a warning message at the start that it will take a long time.

Thanks, Dave.  Just a comment:  'bru -eZ' won't take any longer than 
'bru -cZ', just as 'bru -e' won't take any longer than 'bru -c', and 
warning messages get to be very annoying.  I suggest you save the warning 
message for the manual -- and put it under Z, not e!  In fact, my 4d25 
can't stream the tape drive using 'bru -cZ', even with virtually nothing 
else going on.

>Note that making such a partitioning at any time prior to the backup
>always introduces the 'risk' that the size of that directory tree may
>increase due to new files, growing files, core files, etc., so it may
>not fit no matter what you do...

This is possible, but hardly likely if you are either (1) leaving a
margin of error appropriate to your system, or (2) backing up a single-user
workstation by popping a tape in as you leave for the day.  Yes, of course
you might have jobs running in background that could be growing files,
but I still think we deserve to be able to make the best guess we can, and
of course bear the consequences when we guess wrong.  After all, if you 
couldn't shoot yourself in the foot with it, it wouldn't be UNIX.

NEW THREAD:

I've been thinking about something that I first noticed several years ago
when restoring a multi-user VAX from a 0-level (ie, full) dump tape plus
several incrementals, following a disk crash.

When you do such a restore, you get all the files that were there as of
the time of the last incremental, but you also get files -- a whole lot of
files, in my experience -- that users had deleted since the 0-level dump
was made.  That is, you don't really restore the file system;  you get
a lot of chaff in there along with all the wheat.  I personally found that
weeding the extraneous stuff out was a real chore.  And where disk space is 
tight, this process could actually overflow available storage.

Is this enough of a problem for people to consider a backup strategy
that eliminates the problem?  One way to do this would be:  each time you
do an incremental backup, make a complete list of all files present, as well
as copies of those files that have changed "recently".  When restoring from
an incremental tape, have an option that deletes from disk any file that is 
not on the list -- or, alternatively, instead of deleting it, putting it in 
a special place, such as a duplicate file-tree built under, say '/delete'. 

It would be possible for users to emulate this functionality as follows.
Each time an backup is done, first write a "table of contents" of the file
system to some place on the disk, and make sure this table is included in
the backup.  If it becomes necessary to do a complete restore, a new table
of contents could be made following the final incremental restore.  A program
could read this new table, and look check for the presence of each entry in 
the old table, then make a list of the entries in 'new' that are not in 'old'.
In fact, if the same program makes both tables, then 'diff' will suffice for 
this.  Then another program could go through the 'diff' list and delete or 
move the files in it. 'find / -print' could be used to make the tables.

Well, now that I've said that, it seems so straightforward that I have no
suggestions to Dave.  But it seems such a departure from the backup strategies
I am aware of that I'd like to get peoples' opinions of it.  I think this 
would also lengthen the practical time interval necessary between full 
backups.  It may be that some people do this already, but if so I'm unaware of
it.

	-P.
************************f*u*cn*rd*ths*u*cn*gt*a*gd*jb**************************
Peter S. Shenkin, Department of Chemistry, Barnard College, New York, NY  10027
(212)854-1418  shenkin at cunixf.cc.columbia.edu(Internet)  shenkin at cunixf(Bitnet)
***"In scenic New York... where the third world is only a subway ride away."***