TAR DOES NOT SWAP BYTES

Guy Harris guy at sun.uucp
Tue Sep 24 16:06:40 AEST 1985


> All I know is the tar program swaps bytes when writing a tape so
> that a VAX running 4.2 must use dd to swab things before un-tar-ing them.

"tar" does no such thing.  The control information on a "tar" tape is in
printable ASCII form, so that it's independent of byte order (and, with any
luck, other greasy architectural details).  "tar" tapes written on purely
big-endian machines (3[67]0, M68K, etc.), purely little-endian machines
(VAX, etc.), and mixed-up machines (PDP-11), can be read on machines of any
other byte sex.  Unless the files in question are text files, however, the
data might not be directly usable on the target machine, but that's not just
a problem of byte order.

"cpio" has a rather stupid byte-swapping option which swaps the data but
*not* the control information.  Since most data does not consist of a huge
uniform stream of "short"s or "long"s, an option to swap the data is
useless.  The control information, by default, consists of a bunch of
"short"s (yes, even the file size and modification/access time are stored as
pairs of "short"s), which should be swapped if the order of bytes in a
"short" is different on the source and target machines, and a bunch of
"char"s making up the file name which should not be swapped under any
circumstances.  This means, BTW, that

	dd if=/dev/rmt0 conv=swab bs=<whatever> | cpio -ib 

doesn't work, since it swaps the bytes in the names of all created files.
What they *should* have done was detect that the source and target machines
had different byte orders by checking whether the "magic number" was 070707
or a byte-swapped 070707, and automatically byte-swap the header "short"s
but not the path names or the data.

However, there is a "-c" option to "cpio" which tells it to write the
control information in - you guessed it - printable ASCII!  I believe it had
bugs in its System III incarnation, but you can read "cpio -c" tapes made on
a machine with different byte order.  The S5 "find" has an undocumented
"-ncpio" option which works like the "-cpio" option, only it writes "cpio
-c" instead of "cpio" tapes.  If you must use "cpio", use "cpio -c";
however, "tar" is more universal - it's in V7, 4.xBSD, and Systems III and V.

There are known cases of brain-damaged *hardware* swapping bytes.  The case
I know of is a big-endian Multibus machine with an extremely stupidly
designed tape controller.  If you write a tape on this machine, and want to
read it in on a sane machine, you have to stick "dd" in front of the "tar"
(or "cpio" or whatever).

The rule for correctness of byte order in a tape controller is simple.  If
you have the string "Now is the time for all good parties to come to the aid
of man" in memory, and tell the tape controller to write this to a tape, the
first byte in the block should be a capital "n", followed by a lower-case
"o", followed by a lower-case "w", followed by a blank, etc..  Violate this
and you'll force everybody who didn't violate this to swap bytes when
reading your tapes.

	Guy Harris



More information about the Comp.unix.wizards mailing list