Programming and international character sets.

Mon Oct 31 14:06:25 AEST 1988

In article <532 at krafla.rhi.hi.is> kjartan at rhi.hi.is (Kjartan R. Gudmundsson) writes:
>The problem is however that the extension is not standard.
There is an international standard for 8-bit character sets: ISO 8859.
There are several versions of 8859, just as there were several national
versions of ISO 646 (of which ASCII was only one).  All versions include
ASCII has the bottom half.  ISO Latin 1 (8859/1) is pretty close to DEC's
Multinational Character Set, and is supposed to cover most West European
languages (including Icelandic).  There is a Cyrillic version (I think it
is 8859/2) and others are under way.

>An other bad habit of american programmers is this:
>character_value = (character_value & 0x7F ) 
>don't do this!!  If you must, you can use 0xFF insted:
>character_value = (character_value & 0xFF )

The only time when I've wanted to do this is when stripping off a parity
bit, and using 0xFF would be totally wrong.  The toascii() macro *might*
be appropriate.  When you're dealing with a 7 data + 1 parity bit device,
there is no point in pretending that you're prepared to accept anything
other than 7 data bits.

The real problem is trying to write portable code that uses character
classes which _aren't_ in <ctype.h>.  Consider isvowel()...