Why is restore so slow?

Fri Feb 8 09:03:56 AEST 1991

In article <15866.27b02da2 at levels.sait.edu.au> xtdn at levels.sait.edu.au writes:
>One such optimisation could be to write the raw disk to tape (actually you'd
>only dump those blocks that contain data that you want backed up, but the
>point is that you'd be reading from the raw disk).  This would be quite fast
>because you wouldn't be opening each file (which takes time), or reading the
>file sequentially - see how much disk head movement you avoid?  Now such a
>tape would consist of a number of chunks, each chunk detailing the file, the
>file offset, and the data to write at that offset.  The restore process then
>becomes a matter of reading the next chunk, opening and seeking the file, and
>then writing the data.  All that head movement, opening files, seeking to the
>right spot, and later, closing files, would certainly slow down the process.
>
>I already said that I don't know how dump/restore works, but I would almost
>be willing to bet that it's something like the scheme I just outlined.  Maybe
>someone who does know could tell us what really happens?

You're not terribly far off, with the exception that UNIX doesn't keep
a timestamp for individual blocks -- only inodes hold the timestamp, and
there's no way to tell whether a particular block in the file has been
updated (this would be terribly inefficient anyway -- chances are that if
you've blown away a file, only having the changed blocks would be useless).

Dump works by reading the disk partition directly -- it performs all the
directory/file mapping on its own by reading the on-disk inode list for that
partition.  It looks in /etc/dumpdates to determine how recent changes
have happened and, by looking at the inodes, makes an internal map of
those inodes which have been affected within the requested period of time
(with a "level 0" dump, everything since the beginning of time ( 4:00 pm,
New Year's Eve, 1969 on the American West Coast ... (-:), and then starts
mapping the directories in, dumping the directory information out and
finally dumping the contents of the files.  Wandering through the file-
system by oneself and performing only the necessary operations is going
to be much faster than sitting and going through the kernel's filesystem
overhead.

[ Side note:  I *hate* operators who cannot think to keep track of the
  inode number of the file that is being dumped when they do multiple
  tape dumps!  Makes restores a *pain*. ]

Restore, on the other hand, is a dog.  Why?  It *has* to be.  When files are
getting restored, one cannot simply re-write the raw disk ; the filesystem
overhead cannot be avoided on anything less than a full restore.  Even there,
a reason for avoiding just doing a raw data dump (via dd(1) (yes, I know
that's not what dd stands for)) is that full backup/restores serve to reduce
the disk fragmentation by putting everything back more or less contiguously.

(We used to have to do this periodically back at the lab because class
users had a tendency to produce lots and lots of little files.  The /users
file system would fragment ridiculously quickly over the semester.  I think
fragmentation reached about 5% (which is very high).)

It's also kind of convenient that if a normal user wishes to effect a
partial restore, he/she generally can, without having to be placed into a
special group or be given super-user privileges.

>
>
>David Newall, who no longer works       Phone:  +61 8 344 2008
>for SA Institute of Technology          E-mail: xtdn at lux.sait.edu.au
>                "Life is uncertain:  Eat dessert first"

-- 
thought:  I ain't so damb dumn!	| Your brand new kernel just dump core on you
war: Invalid argument		| And fsck can't find root inode 2
				| Don't worry -- be happy...
...!{ucbvax,acad,uunet,amdahl,pyramid}!unisoft!greywolf