wchar_t values

Wed Apr 10 15:23:54 AEST 1991

Sorry, I'm a bit late with this reply. Just a few minor nits:

Al Harkcom writes:
> 'c' in all three of
> the popular multibyte encodings (EUC, JIS, SJIS) is 0x63 (same as
> ASCII). The most common wide character format (UJIS) has 'c' as
> 0x0063 (ASCII in 2 bytes).

EUC is the name of the scheme, while UJIS is the name of the Japanese
EUC. UJIS is not a wchar_t encoding.

>  Keld Simonsen writes:
>  =}Thus the internal widechar representation of 'c' and the external
>  =}multibyte representation SHOULD not be the same for character sets
>  =}like ISO 10646, JIS X 0208, KS C 5601 and GB 2312.
>  =}At least this should hold for characters in the C character set.
> 
>    Huh? This doesn't follow... It doesn't even sound correct. A single
> byte wide character set using values above 0x80 in addition to the
> ASCII characters would become difficult...

You're probably referring to the European characters with the 8th bit
up. These are not relevant in this discussion since the ANSI C wchar_t
spec explicitly refers to the basic character set, which does not
include these European characters.

>  =}The reason why the Japanese have not seen the problem before with
>  =}JIS X 0208, but first with 10646, is beyond my understanding.
>  =}Maybe some Japanese could enlighten us (me!) on this?
> 
>    What 'problem' do the 'Japanese' see with ISO 10646?

Keld is referring to the problem that I brought up in the first
article in this thread. I.e. 10646 'c' does not have the same numeric
value as ASCII 'c'.
-
-- 
Erik M. van der Poel                                      erik at sra.co.jp
Software Research Associates, Inc., Tokyo, Japan     TEL +81-3-3234-2692