C and national character sets

Martin Minow minow at decvax.UUCP
Thu Aug 30 11:41:27 AEST 1984


Keld J|rn Simonsen brings up an important point concerning C
and its standardization.  (By the way, the | is the oe ligature
character, needed in the Scandinavian languages as well as
German.)  He notes that several characters used by C are
reserved by ISO standards for "national replacement characters"
The reserved characters are #@[\]^_`{|}~ -- most of which are
used in some way by C.  There isn't any really good solution --
it is highly unlikely that the C standardization committee will
remove these characters from the language.  While most of them
can be replaced by suitable #defines, several cannot, notably
backslash.  The only short-term solution would be for the
parties affected to write NRC-specific pre-processors.

In the long term, however, the problem will go away as people
move to an 8-bit character set such as Dec-Multinational or
the pending ISO standard that is almost identical to it.
In this standard, the characters in the range 0-128 are identical
to the U.S. ASCII 7-bit standard.  Characters in the range
128-159 are used for additional controls, and 160-255 for
additional graphics.

It is actually possible -- though rather messy -- to intermix
NRC's and Multinational, allowing Standard C to be written from
a terminal that normally displays a non-English NRC set.
Unfortunately, this will require a pre-processor that understands
the character-set switching escape sequences.  This could
be done as a Unix filter, of course.

Hope this helps.  Hej s} l{nge.

Martin Minow
decvax!minow



More information about the Comp.unix.wizards mailing list