trigraphs (was Why are character arrays special)

Wed Feb 15 06:19:40 AEST 1989

In article <15941 at mimsy.UUCP> chris at mimsy.UUCP (Chris Torek) writes:
>	Do you want to have trigraphs available?
>If the user answers `yes', the next prompt is:
>	Why?

I'd be about the last person to defend trigraphs as a technical element
of the C language, as anyone who has attended X3J11 meetings could
confirm.  However, by now I've heard the official party line enough
times that I think I can answer questions about this "feature".

Trigraphs are intended as a means of portably transmitting maximally
portable C programs between systems with potentially different character
sets.  Because separate preprocessors, data transmission protocols, etc.
were outside the charter of X3J11 but nevertheless the Committee desired
to ensure this degree of source code portability, they agreed that the
minimal ISO character set requirements could be taken as the basis for
such source code transfer.  Because C traditionally uses symbols not in
the ISO base character set, some substitutes for such symbols, that could
be expressed entirely within the ISO base set, had to be found.  The ??*
form of trigraphs was chosen as the least problematic of all suggested
alternatives.

The important practical point is that C programmers are NOT expected to
use trigraphs when they type in their source code, and they should not
see trigraphs when displaying source code on any device on common modern
computing systems.  Trigraphs are intended for program interchange only.
(Quite honestly, I doubt that everyone in X3J11 originally had this
notion, but it appears to be the current party line.)

Note that trigraphs may best be dealt with by a separate translator,
ideally a separate program that could practically be skipped except
the first time that code is imported from another site.  The translator
could be officially defined as part of one's Standard-conforming
implementation, but in practice used only for validation testing
and for translating imported source code.  One can imagine
circumstances in which some such translation would always be necessary,
for example in some existing European character set environments.
An extra level of translation (having nothing to do with trigraphs) is
allowed in translation phase 1 to deal with such environments, which are
beyond the scope of X3J11 or indeed any programming language standards
group.  In fact the C source code character "x" need not look anything
like a Roman "X" as stored, displayed, or manipulated externally, and it
can occupy any number of bytes in external storage.  Therefore, even
in character sets lacking a representation for the letter "x" it is
possible to devise an encoding for C program source that might contain
instances of source code character "x".  Fortunately the ISO base set
includes all the traditional C alphanumerics, just not all its special
symbols such as "\".  Thus in some ISO environments, "\" and other
special C source symbols must be mapped into external encodings.
Trigraphs were an attempt to standardize this mapping for ISO-based
systems.  Looking back at the consequent noise and confusion, I think
many X3J11 members now wish we hadn't tried to "pioneer" in this area.