tar .vs. cpio

Greg Noel greg at sdcsvax.UUCP
Fri Aug 10 20:10:26 AEST 1984


In article <198 at ucbopal.CC.Berkeley.ARPA> tut at ucbopal.CC.Berkeley.ARPA writes:
>Could someone justify the existence of cpio?  What's wrong with tar?

Actually, it's easy to justify the existance of cpio.  There are at least
three reasons:

(1) History.  They grew up at different organizations.  Tar comes from
Berkeley and cpio from Bell (now AT&T).  (This may not be completely
accurate -- I first saw cpio in the PWB release and I first saw tar in
a Berkeley system; for all I know, tar may have come with Version 7.  The
point remains the same, even if the different organizations were within
Bell.  Tar has changed (grown?) (groan?) at Berkeley (witness the recent
complaints about the incompatible tar formats); they picked tar and ran
with it, while AT&T went with cpio.  I don't justify it, or point fingers
at either organization; I just report it.)

(2) Problem.  They grew up to solve different problems.  Tar is a "tape
archiver" and its major function is to produce backups of filesystems.
(This was in the days when a filesystem would fit on a single tape.)  (All
right, that's oversimplified.)  Its functionality is based upon a program
called tp from Version 6 (did tp make it into Version 7?).  On the other
hand, cpio was designed from the ground up to solve a very different
problem -- selectivly copying lists of files (actually, filesystem elements).
Thus, it is useful for distributions, or for copying recently-changed files
for backup, or for copying a selected part of a directory tree somewhere
else, or .....  Tar takes its list of files from the command line, effectivly
limiting the number of arguments, while cpio takes them from the standard
input, giving no such limitation (this is why tar copies directory trees --
otherwise you couldn't get enough on the tape to make it useful).  (I actually
consider it a flaw of tar that you MUST copy ALL of a directory tree; there
is no way to make the choice of files conditional.)

(3) Philosophy.  Cpio is more in keeping with the Unix (tm) philosophy,
since it seperates the job of SELECTING the files from the job of COPYING
the files.  ANY algorithm can be used to select the files to be copied,
but cpio can still be used to copy them.  In fact, I have an application
that tries to keep two sets of files in sync on different computers -- it
does it by running a shell script that scans a set of files and determines
which files have changed since the last run and then passes the names to
cpio to be copied.  There are about six thousand files to select from; on
a given day, anywhere from a hundred to several thousand will be selected
for transfer.  I don't think tar could do that as well.

In case you hadn't noticed, I prefer cpio.  There are times when tar is
better (if what you really want to do is copy all of a directory, it's
just fine, and the interface is simpler), but I find that if it is
complicated enough to need to write a shell script then cpio is usually
the program of choice.

Don't get me wrong -- cpio isn't perfect.  Internally, it's a nightmare,
and AT&T would be better off to rewrite the whole thing.  But it works
just fine, and it does what I want it to.

BTW, the -c option of cpio does not cause it to write one character at
a time; it causes the headers for each file to be in ASCII characters
instead of binary.  The output is still blocked.  Now if the null at the
end of the header could be changed into a carriage return, we could use
cpio instead of shar format......

(tm)  Unix is a footnote of AT&T Bell Laboratories
-- 
-- Greg Noel, NCR Torrey Pines       Greg at sdcsvax.UUCP or Greg at nosc.ARPA



More information about the Comp.unix mailing list