wchar_t values

Tue Apr 23 10:55:32 AEST 1991

   I've been asked to clarify this and I'm tired of writing notes
saying "please see comp.std.internat" so I'm reposting my reply to
Mr. Van der Poel's post here. Sorry for wasting the time of those
of you to who this is old news...

--------------------Repost Alert-----------------------------------------------
In article <1130 at sranha.sra.co.jp> erik at srava.sra.co.jp
   (Erik M. van der Poel) writes:

 =}Also, you refer to "the JIS standard". This is rather misleading,
 =}since several implementations use *two* JIS standards, namely JIS X
 =}0208 (Kanji, etc) and the right-hand part of JIS X 0201 (`half-sized'
 =}Katakana, etc).

   Actually 3 popular codesets are JIS standard 0201, 0208, and 0212.
JIS X 0212 is a set of additional kanzi.

 =}Perhaps we're getting confused because we are looking at different
 =}documents.
 =} [...]
 =}He refers to codesets 1, 2 and 3 (i.e. not only 0208
 =}Kanji, etc).

   Yes, I'm looking at the documentation from various software packages
which use the UJIS encoding. They refer to four code sets:
   G0:	ASCII
   G1:	KANZI	(JIS X 0208)
   G2:	HANKAKU	(JIS X 0201)
   G3:	GAIZI
All four code sets are 16 bits wide.

 =}According to this paper, UJIS is not a 2 byte code. It is an encoding
 =}in which characters require 1, 2 or 3 bytes each. I.e. it is an mb
 =}code, definitely not a wc code.

   I hate to disagree, but all of the implementations I have seen which
use a mb encoding refer to the Japanese EUC as EUC and the wc encodings
refer to it as UJIS (except of course HP which refers to both as UJIS).

Al
-------------------------------------------------------------------------------