wchar_t values

Fri Apr 12 16:16:14 AEST 1991

I'm directing followups to comp.std.internat. I apologize to
comp.std.c readers for the current noise level, which I seem to have
started.

Al Harkcom writes:
>    Though the term EUC is used as the name of an encoding scheme, it is
> also the name used for the multibyte encoding of the JIS standard using
> SS2 and SS3 single shifts.

Yes, people often say "EUC" when they mean "Japanese EUC". That
doesn't mean that they are right. Think of it this way: EUC is the
generic international `class', while UJIS is a name for the particular
Japanese `instance'.

Also, you refer to "the JIS standard". This is rather misleading,
since several implementations use *two* JIS standards, namely JIS X
0208 (Kanji, etc) and the right-hand part of JIS X 0201 (`half-sized'
Katakana, etc).

> UJIS is the name used to refer to the 2 byte
> encoding of the EUC scheme JIS standard. The 2 byte (4 byte on HP) wide
> character encodings for Japanese are usually UJIS...

Perhaps we're getting confused because we are looking at different
documents. I got my information from a paper by Yasushi Nakahara,
"Nihongo Koodo No Genjo To Mondaiten", Jan. 1988. In this paper, he
says that UJIS was the name that the Sigma project gave to a Japanese
usage of EUC. He refers to codesets 1, 2 and 3 (i.e. not only 0208
Kanji, etc).

According to this paper, UJIS is not a 2 byte code. It is an encoding
in which characters require 1, 2 or 3 bytes each. I.e. it is an mb
code, definitely not a wc code.
-
-- 
Erik M. van der Poel                                      erik at sra.co.jp
Software Research Associates, Inc., Tokyo, Japan     TEL +81-3-3234-2692