A readable, robust encoding for source postings

Tue Jan 1 10:26:24 AEST 1991

rhys at batserver.cs.uq.oz.au writes:
> darcy at druid.uucp (D'Arcy J.M. Cain) writes:

>>In article <1990Dec29.114801.5895 at Daisy.EE.UND.AC.ZA> Alan P. Barrett writes:
>>> [...]
>>>I think that the correct way to fix this is to use an encoding that is
>>>both readable and robust.  A version of shar that does stuff like
>>>encoding tabs as \t and wrapping lines in a reversible way would do it.

>>I posted my genfiles program which I hoped would be a jumpimg off point for
>>such an effort.  Has anyone looked at it and have suggestions to enhance
>>the protocols I suggested?

>I missed the original discussion, so I may be repeating things, but
>the central problem I think there will be in getting a new transmission
>standard off the ground is actually making it a standard :-).  unshar,
>uuencode and the like are very widespread, and trying to shake their
>ground may be very hard.  Maybe in the interim a cut-down "encoder"
>is needed that can be wrapped-up in a shar archive, and will be unpacked,
>compiled and run to unpack the rest.  e.g. the shar archive could look
>something like this:

>		... head information ...
>		sed ... >/tmp/decode.c <<EOF
>		... source code for decode.c ...
>		EOF
>		cc -o /tmp/decode /tmp/decode.c
>		sed ... | /tmp/decode >file <<EOF 
>		... file contents ...
>		EOF

>It should be possible to get a very compact decoding program that could
>be wrapped up with the shell archives.  Won't solve all the problems
>but may help, as well as its being reasonably compatible with the
>existing shar archiving system.  Well, that's my thoughts on the matter,
>what do you think?

Problem is, lots of shars are unpacked on systems where the C compiler
command isn't spelled "cc", lots of shars don't contain C code and may
be unpacked on systems where, e.g., Modula-2 is the only compilable
language, in fact, I unpack lots of shars on my Amiga, where "sed"
doesn't exist, and the "unshar" program fakes it by knowing the format
of ordinary shar file "sed" commands and doing what's right.

Probably, despite the calls here for clear text, a much more robust way
to transmit source files is the one used in, for example,
comp.binaries.ibm.pc, where the expected resources at a site are
"uudecode", which can be transmitted in clear text as a BASIC or C
program, and some widely available archiving program; the one of choice
now is zoo, but lharc is coming up fast due to a superior packing
algorithm.

Add to that the "brik" CRC check, the zoo internal CRC checks, and the
short line, limited character set, uuencode format with line by line
checksums, and you have an extremely robust encoding that can transit
ASCII to EBCDIC to ASCII intact, and doesn't challenge developmentally
disabled news software, which we will always have with us.

The major requirement for this method is that there needs to be a very
explicit clear text explanation of the purpose and contents of the
archive to let the reader make a decision whether it is worth unpacking.

I'm not thrilled when I take the time to unpack and catenate and
uudecode an archive with an interesting description from the PC-clone
universe, to find out that it doesn't contain the source code I was
seeking/expecting; in hopes of stealing some code and ideas for a port
of the functionality. A minimal description should include source or
not, data types, platforms, compiler technology required, functionality,
and copyright status.

To another poster's comments that folks on EBCDIC systems have to solve
their own character set and newline encoding problems, that misses the
point.  Lots of ASCII to ASCII routings these days arrive with a BITNET
host as an intermediary, so even the ASCII destination sites have to
be concerned about the problem of an encoding that can survive the
transit.

I think the current pleas to keep the comp.sources.{unix,games,misc} and
alt.sources postings all clear text, while understandable, are
misdirected on today's net.

And, again to another posting, no, the world is not all becoming USENet,
to live under our way of doing things, just because the nets are being
gatewayed together and sharing code in a much larger universe. The
greater net is a community of peer networks, each with its own peculiar
needs and requirements, not a set of subordinates to the least organized
and most contentious member of the set, USENet.

Thus it behooves us to find methods that cause as few problems as
possible in getting code across this wider universe of communication,
and clear text transmission doesn't seem to be the appropriate technique
anymore.

In my opinion, but I pack and unpack a _lot_ of source; .6 gigabytes
compressed, at last count, not bad for a personal archive.  That translates
into several thousand archives of various sorts that I've unpacked.

Kent, the man from xanth.
<xanthian at Zorch.SF-Bay.ORG> <xanthian at well.sf.ca.us>