Programming and international character sets.

George Hart george at mnetor.UUCP
Thu Nov 3 02:21:25 AEST 1988


In article <532 at krafla.rhi.hi.is> kjartan at rhi.hi.is (Kjartan R. Gudmundsson) writes:
>
>How difficult is it convert american/english programs so that they can 
>be used to handle foreign text?

If you just need to handle full 8 bit characters, it is merely painful.
If you need to handle multibyte characters (e.g. Kanji) or a mix of
character sets, it is excruciating.

>In other european countries than England
>the ASCII character set is also widely used but with extension.
>The character set is 8 bit thus allowing 256 characters. 
>The problem is however that the extension is not standard.

There is, of course, the ISO 8859 family of 8 bit character sets which
contain ASCII as a perfect subset.

> 	< excerpts of MicroEmacs code >
>
>Ugly isn't it?

Yes. vi and the Bourne shell were(are) other offenders. I believe recent
releases of SysV have cleaned up the naughty uses of the 8th bit.

> < sample ctype.h invocations >
>
>This code is better (most of the is.. things are macros that mask
>the argument and return the binary mask that is either zero or positve)
>has more style to it and is easiear to port to a diffrent character set.

Unfortunately, the results of the macros are undefined unless isascii(c)
is positive which sort of defeats the spirit of what you intend.  Of course,
you could develop an 8 bit ctype.h compatible with a particular 8 bit
character set.

>An other bad habit of american programmers is this:
>character_value = (character_value & 0x7F ) 

This has more to do with assumptions about character sets supported
by the system than nationality.

Historically, assuming an ASCII environment was not unreasonable.  While
this is no longer true, until vendors and standards bodies get off their
collective pots and develop practical character sets and conventions for
multilingual environments (including multibyte characters), things will
remain confused, fragmented, and incompatible.
-- 
Regards.....George Hart, Computer X Canada Ltd.

UUCP: {utzoo,uunet}!mnetor!george
BELL: (416)475-8980



More information about the Comp.lang.c mailing list