Unnecessary tar-compress-uuencodes

Sat Jul 14 00:12:48 AEST 1990

In article <3124 at psueea.UUCP> kirkenda at eecs.UUCP (Steve Kirkendall) writes:
> Here's an idea: Lets compromise!  Come up with a format that really works!

I've suggested this before... the software tools format.

> 1) The archive should be plain-text.  That is, each text file in the archive
>    should be easy to locate within the archive, and it should be readable
>    without the need to extract it.

Headers and tailers are marked by "-h-" and "-t-". Other sequences could
be added, like "-d-" for directories.

> 2) The format would only be used to combine several text files into a single
>    text file.  If you really must include a non-text file, then uuencode
>    that one file.

Exactly.

> 3) Archives should begin with a table of all printable ASCII characters,
>    so we can tell when transliteration has gone awry.

That's a nice enhancement.

> 4) The archive program should split long lines when the archive is created,
>    and rejoin them during extraction.

Not currently supported, but see below.

> 5) Tabs should be expanded to spaces.  The extraction program should convert
>    groups of spaces back into tabs.

No. Tabs should be converted to a unique escape sequence.

> 6) The program that creates the archive should give a warning message when
>    a file's whitespace is likely to be reformated.  For example, spaces at
>    the end of a line are a no-no.

No, spaces at the end of a line should be marked.

> 7) The extraction program should be clever enough to ignore news headers and
>    other introductory text, just for the sake of convenience.

Anything not between "-h-" and "-t-" can be safely ignored.

> 8) It should be possible to embed one archive inside another.  This ability
>    probably wouldn't see much use, but lack of the ability could sure be a
>    nasty surprise to somebody.  "What?  You mean it only works on *some*
>    text files?"

Leading dashes are escaped with another dash.

> 9) Should we use trigraphs for some of the more troublesome ASCII characters?
>    The extraction utility could convert them back into real characters.

Yes, but not trigraphs. A two-character sequence should be enough... how
about "@x" for some value of x? @t would be tab, @! would be |, and so on.
Of course "@@" would be "@".

Begin *all* lines between -h- and -t- with X, or C if it's a continuation
of the previous line. Trailing spaces would have a "@" appended. (of course,
some other escape character could be used... Kernighan and Pike use "@" for
other software tools tools, is all.).

Or how about this: begin each line with T for text, C for continued text,
and M for uuencoded lines?
-- 
Peter da Silva.   `-_-'
+1 713 274 5180.
<peter at ficc.ferranti.com>