sizeof(char)

Guy Harris guy at sun.uucp
Wed Nov 5 07:24:05 AEST 1986


> X3J11 as it stands requires sizeof(char)==1.  I have proposed that
> this requirement be removed, to better support applications such as
> Asian character sets and bitmap display programming.  Along with
> this, I proposed a new data type such that sizeof(short char)==1.
> It turns out that the current draft proposed standard has to be
> changed very little to support this distinction between character
> objects (char) and smallest-addressable objects (short char).  This
> is much better, I think, than a proposal that introduced (long char)
> for text characters.

Why?  If this is the AT&T proposal, it did *not* "introduce (long char) for
text characters"; it introduced (long char) for *long* text characters.
"char" is still to be used when processing text that does not include long
(16-bit) characters.  I believe the theory here was that requiring *all*
programs that process text ("cat" doesn't count; it doesn't - or, at least,
shouldn't - process text) to process them in 16-bit blocks might cut their
performance to a degree that customers who would not use the ability to
handle Kanji would find unacceptable.  I have seen no data to confirm or
disprove this.

(Changing the meaning of "char" does not directly affect the support of
"bitmap display programming" at all.  It only affects applications that
display things like Asian character sets on bitmap displays, but it doesn't
affect them any differently than it affects applications that display them
on "conventional" terminals that support those character sets.)

> Unfortunately, much existing C code believes that "char" means "byte".
> My proposal would allow implementors the freedom to decide whether
> supporting this existing practice is more important than the benefits
> of making a distinction between the two concepts.

Both "short char"/"char" and "char"/"long char" make a distinction between
the two concepts; one may have aesthetic objections with the way the latter
scheme draws the distinction, but that's another matter.  (Is 16 bits enough
if you want to give every single character a code of its own?)

> It is possible to write code that doesn't depend on sizeof(char)==1,
> and some C programmers are already careful about this.

It is possible to write *some* code so that it doesn't depend on
sizeof(char)==1.  Absent a data type one byte long, other code is difficult
at best to write this way.

> Transition to the more general scheme would occur gradually (if at all) for
> existing C implementations, with only implementors of systems for
> the Asian market and of bitmap display architectures initially taking
> advantage of the opportunity to make these types different sizes.

I think "if at all" is appropriate here.  There are a *lot* of interfaces
that think that "char" is a one-byte data type; e.g., "read", "write", etc..
I see no evidence that converting existing code and data structures to use
"short char" would be anything other than highly disruptive.

Adding "long char" would permit new programs to be written to support long
characters, and permit existing programs to be rewritten to support them,
without breaking existing programs; this indicates to me that it would make
it much more likely that "long char" would be widely adopted and used than
that "short char" would.  I see no reason why a proposal that would, quite
likely, lead to two different C-language environments existing in parallel
for a long time to come is superior to one that would permit environments to
add on the ability to handle long characters and thus would make it easier
for them to do so and thus more likely that they would.  (This is especially
true when you consider that most of the programs in question would have to
be changed quite a bit to support Asian languages *anyway*; just widening
"char" to 16 bits, recompiling them, and linking them with a library with a
brand new standard I/O, etc. would barely begin to make them support those
languages.)
-- 
	Guy Harris
	{ihnp4, decvax, seismo, decwrl, ...}!sun!guy
	guy at sun.com (or guy at sun.arpa)



More information about the Comp.lang.c mailing list