REPOST lharc102A Part 01/04 BSD Unix to Amiga archives

Kent Paul Dolan xanthian at zorch.SF-Bay.ORG
Fri Feb 1 20:19:05 AEST 1991


 bernie at metapro.DIALix.oz.au (Bernd Felsche) writes:

> De-flamed deliberately.

Spoil sport.

xanthian at zorch.SF-Bay.ORG (Kent Paul Dolan) writes:

>> Second, "compress,uuencode,recompress" is not the best use of
>> technology; I did a little test with the same files in just one big
>> shar, to simplify the reporting of the results:

> WHOAH THERE! Shouldn't you be using tar to generate the archive
> instead of shar? Its wrapper information is more compact and
> efficient.

It is more efficient yet because putting everything in one big file
lets compression proceed across file boundaries rather than start
fresh at each file, but filewise storage is nearly as efficent.

> Then you compress the tar archive... and uuencode it. Please try this
> and publish the results for comparison.

You had to ask; well, I was sitting home grumpy because I was too sick to
make the party tonight, so why not:
-------------------------------------------------------------------------
original data:

  3091 Makefile
  3841 amiga_patch
  2885 generic_patch
 11521 lh.doc.japanese
  2800 lh.inst.japanese
  6783 lh.n.japanese
 13133 lhadd.c
 29556 lharc.c
  7568 lharc.doc.posted
 11220 lharc.doc.revised
  9279 lharc.h
  9588 lharc.l
  2010 lhdir.c
   886 lhdir.h
  6154 lhext.c
  6504 lhio.c
  1483 lhio.h
  6672 lhlist.c
 22476 lzhuf.c
  1229 read.me_1
   486 read.me_2
  1770 read.me_3

original data size total of file sizes (from wc -c)

160935 lha

three files uuencoded because they contain control characters:

   15910 lh.doc.japanese.uu
    3895 lh.inst.japanese.uu
    9376 lh.n.japanese.uu

original data size but with those three uuencodings instead:

169012 lha3uu


Plan a, just sharing the original files, is unworkable, shars with control
        characters won't unpack reliably:

176274 lha.sh

Plan b: current net practice; shar, compress:

184153 lha3uu.sh              shar three files uuencoded, rest plain text;
 82885 lha3uu.sh.Z            its size as transmitted after compression

Plan c: other current net practice; tar, compress, uuencode, compress:

180224 lha.tar               original data tarred - not transmittable, so
 73149 lha.tar.Z             compress it and
100810 lha.tar.Z.uu          uuencode it for safety;
 91533 lha.tar.Z.uu.Z        its size as transmitted after compression

Plan d: improve plan b by replacing compress with lharc, uuencode, compress:

 63604 lha3uu.sh.lzh         lharc of shar file is binary
 87666 lha3uu.sh.lzh.uu      must be uuencoded to hide control characters;
 79863 lha3uu.sh.lzh.uu.Z    its size as transmitted after compression

Plan e: improve plan c by replacing first compress by lharc:

 56476 lha.tar.lzh           lharc of tar file is binary
 77844 lha.tar.lzh.uu        must be uuencoded to hide control characters;
 70839 lha.tar.lzh.uu.Z      its size as transmitted after compression

Plan f: improve plan d by replacing tar | compress by lharc:

 56944 lha.lzh               lharc of original files is binary
 78484 lha.lzh.uu            must be uuencoded to hide control characters;
 71211 lha.lzh.uu.Z          its size as transmitted after compression


Note: step c is not the same as simple news transmission, where tar |
compress | transmit | uncompress | untar is the paradigm, but that
process is not required to create a news article as an intermediate
product, and steps b to f must and do.)

Note: zoo could also have been used whereever lharc was, but lharc compresses
better, and so dominates the zoo data.

Results:

      Costs in bytes
 Data   Telecomm
storage  volume   Plan


184153   82885             b: partial uuencode, shar, compress
100810   91533             c: tar, compress, uuencode, compress
 87666   79863             d: partial uuencode, shar, lharc, uuencode, compress
 77844   70839             e: tar, lharc, uuencode, compress
 78484   71211             f: lharc, uuencode, compress

The absolute storage champion is plan e, but plan f is nearly as good, and
requires one fewer tools; neither of the current plans, nor plan d, has a lot
to recommend it.  The choice between e and f should be made mostly on economic
grounds.
-------------------------------------------------------------------------

> Depending on software versions, you can do all this in a pipe (which
> you undoubtedly know) "tar cf - files | compress | uuencode
> >bugs.tar.Z.uu"

> For transmission, it can be compressed again, (it would be smarter to
> uudecode) though this _should_ be done by a network layer, even though
> it often isn't. Wouldn't it be nice if modem transfer protocols were
> smart enough to compress on the fly?

>> So in fact, for the files being sent, there is some modest _gain_ in
>> telecommunications efficiency by using the best compression
>> technology on text, and then uuencoding it and letting the standard
>> net node to >node compression have its way with the files.

> Agreed. In fact, the more text, the better the gain.

>> I have yet to see a single argument for the present methods that
>> comes down, at the last, to anything but sheer laziness on the part
>> of those who don't want to change their habits. Compressed, uuencoded
>> transmission methods win on every reasonable criterion.

> Although one should be wary of zoo archives, which don't work well if
> there are many small text files in it (i.e. typical source code).
> Compression can be as little as 10-15%, which uuencoding explodes past
> the original size.

Yeah, lharc is _much_ better at compressing small files than is zoo, which
is why putting a shar or tar wrapper around them and then zooing them looks
better than zooing them separately.

>> By the way, it is _not_ a solution to replace compress with a filter
>> form of lharc as the typical file compressor for telecommunications;
>> lharc is _much_ too slow to use at every step along the way, so it
>> needs to be done just once at the originating site to accomplish
>> these savings.

> TANSTAFL.

Kent, the man from xanth.
<xanthian at Zorch.SF-Bay.ORG> <xanthian at well.sf.ca.us>



More information about the Alt.sources.d mailing list