Multibyte characters

Mike Banahan mikeb at inset.UUCP
Tue Jul 3 23:47:58 AEST 1990


On the interesting subject of wide characters, multibyte characters and
so on, I haven't noticed a discussion in this group which touches on
the following.

Let's say that I do have a multibyte execution character set which supports
for the sake of argument, English and Greek, with Greek using a shift-in
shift-out mechanism.
A string of the form "abc at d" is valid C (using @ to represent the Greek
character `alpha'.
It will contain 8 bytes, counting the shift-in, shift-out and the null
at the end.

Presumably the integral constant '@' is a three-byte constant, no matter
what it may look like? An alternative interpretation is that it violates
the constraint in 2.2.1.2 `a .. character constant .. shall begin
and end in the initial shift state', but presumably I can expect my
implementation to do the necessary good deeds and put a shift-out
in there too.


Since it is a three-byte constant (assuming I'm right), then can I be
sure that I do not get overflow when I assign it to a char variable?
3.1.3.4 says that the value of a multi-character character constant
will be implementation-defined, and 3.2.1.2 says that that (paraphrase)
demoting an int to a char gives an implementation-defined result.
So to call it `overflow' is perhaps overstating the case, but I clearly
end up in implementation-defined territory twice over.

Sorry if this has been discussed before. If not, could someone enlighten
me as to the actual situation?

Thanks in advance,
Mike Banahan
-- 
Mike Banahan, Technical Director, The Instruction Set Ltd.
mcvax!ukc!inset!mikeb



More information about the Comp.std.c mailing list