Some thoughts on enhancing cpio(1)

Sam Kendall sam at delftcc.UUCP
Thu Apr 3 06:29:50 AEST 1986


I've had some thoughts recently about features that cpio(1) needs.  Some
of these apply to tar(1) also.

(1) Optional error recovery.  If the header of just one file in a cpio
    archive is munged, cpio will issue the pitiful message "Out of
    phase--get help" and terminate.  This message is confusing to
    ordinary users, and it then takes a guru to recover the files in the
    archive past the garbled point.  This is a bit ridiculous.  There
    should be some optional error recovery, like the ability to retrieve
    the file following the garbled header (even if its name is unknown),
    and then to recognize the next file header in the garbled archive
    and proceed from there.  This might break down if another cpio
    archive were one of the files in the garbled archive, but no big
    deal.
    
(2) Automatic recognition of -c vs.  non-"-c" formats.  The -c option
    could be ignored with -i (copy in); cpio should recognize which
    format the archive is in.  This is easy to implement.  It
    complicates error recovery, though, in the case that the beginning
    of the file is munged.
    
(3) Fix the bug that -m (restore file modification times) is ineffective
    on directories that are being copied.  This is vital for the next
    feature:
    
(4) Optional save and restore of directory contents, with file
    deletion.  The purpose of this feature is to correctly handle full
    and incremental backups with cpio; specifically, to correctly
    restore a directory in which files have been removed after the full
    backup was made, but before the incremental backup was made.
    
    Currently, when -o (copy out) gets the name of a directory, it
    outputs a header for that directory, but no contents.  My proposal
    is for an option "-D" which would work with both -o and -i.  With
    -o, a list of files in a directory is saved along with the
    directory.  With -i, when a directory is being restored and is
    "replacing" an already existing directory on disk, all files that
    are in the existing directory but NOT in the archived directory are
    REMOVED.
    
    Another way to look at it: with a cpio -i, the action of a file
    replacing an already existing file means, of course, that the
    archived contents replace the contents on disk.  But there is no
    corresponding action for directories.  -D adds such an action.
    N.B.: as with files, the archived directory will replace the
    existing directory only if it is newer or the -u option is given;
    this is why (3) above is necessary.
    
    -D would also work with -p (pass), of course.

    Example: a directory "d" contains files "a" and "b".  A full backup
    (using cpio) is made including "d" and its contents.  The file "b"
    is deleted.  Now an incremental backup of files that have changed
    since the full backup is made using cpio -D.  "d" is on the
    incremental backup, because it has changed since the full backup was
    made.  (It changed when "b" was deleted.)  Now suppose "d" is lost on
    disk, and we try to restore it to disk from backup.  We first
    restore the full backup; "d" contains "a" and "b" again.  We next
    restore the incremental backup.  On the incremental backup, "d"
    contains "a" but not "b".  So "b" is deleted from disk.  The restore
    has worked correctly.  With the current cpio, "b" would still exist,
    incorrectly, after the incremental backup was restored.
    
    This is extremely useful for backup purposes.  It sounds
    complicated, but it fits in beautifully.
    
(5) Preservation of printable ASCII + short lines.  It is too late for
    this, since the format is already frozen, but it would have been
    good.  The idea here is that an archive of mailable files should be
    itself mailable, except perhaps for its size.  A file that is
    mailable has only printable ASCII characters, and has no lines
    longer than some length, maybe 80 characters (I'm not sure).
    
    A cpio -c archive has headers which are about 80 characters plus the
    length of the pathname; this can get too long.  Also, the header
    includes a NUL character or two.  I wish someone had thought about
    this a little bit more before designing the format.  It is so close
    to preserving mailability!
    
    Of course, "shar", and also Martin Minow's (decvax!minow; I think
    it's his) "arch" programs do preserve mailability in almost all
    cases.
    
(6) Should be public domain.  This would avoid the annoying scenario
    where people get cpio archives but cannot unpack them.
    
I haven't recommended that checksums be introduced into cpio, because I
think this can be handled by some other filter.  (There are some tools
to package software for transmission, available through the AT&T
Toolchest, that probably do what I want here.)  One could argue that
mailability can also be handled by other filters; but I would rather
keep things simple for unpacking mailed archives.

Comments?

----
Sam Kendall			{ ihnp4 | seismo!cmcl2 }!delftcc!sam
Delft Consulting Corp.		ARPA: delftcc!sam at NYU.ARPA



More information about the Comp.unix.wizards mailing list