sizeof(char)

Sun Nov 9 00:02:04 AEST 1986

Guy is still missing my point about bitmap display programming;
I have NOT been arguing for a GUARANTEED PORTABLE way to handle
individual bits, but rather for the ability to do so directly
in real C on specific machines/implementations WITH THE FACILITY:
	typedef short char	Pixel;	/* one bit for B&W displays */
				/* fancy color frame buffers wouldn't
				   use (short char) for this, but an
				   inexpensive "home" model might */
	typedef struct
		{
		short	x, y;
		}	Point;
	typedef struct
		{
		Point	origin, corner;
		}	Rectangle;
	typedef struct
		{
		Pixel	*base;		/* NOT (Word *) */
		unsigned width;		/* in Bits, not Words */
		Rectangle rect;
		/* obscured-layer chain really goes here */
		}	Bitmap;	/* does this look familiar? */
Direct use of Pixel pointers/arrays tremendously simplifies coding for
such applications as "dmdp", where one has to pick up typically six
bits at a time from a rectangle for each printer byte being assembled
(sometimes none of the six bits are in the same "word", no matter how
bits may have been clumped into words by the architect).

Now, MC68000 and WE32000 architectures do not support this (except for
(short char)s that are multi-bit pixels).  But I definitely want the
next generation of desktop processors to support bit addressing.  I am
fully aware that programming at this level of detail is non-portable,
but portable graphics programming SUCKS, particularly at the interactive
human interface level.  Programmers who try that are doing their users
a disservice.  I say this from the perspective of one who is considered
almost obsessively concerned with software portability and who has been
the chief designer of spiffy commercial graphic systems (and who
currently programs DMDs and REAL frame buffers, not Suns).

I'm well aware of the use of packed-bit access macros, thank you.  That
is exactly what I want to get away from!  The BIT is the basic unit of
information, not the "byte", and there is nothing particularly sacred
about the number 8, either.  I agree that if you want to write PORTABLE
bit-accessing code, you'll have to use macros or functions, since SOME
machines/implementations will not directly support one-bit data objects.
That wasn't my concern.

Due to all the confusion, I'm recapitulating my proposal briefly:
	ESSENTIAL:
		(1) New type: (short char), signedness as for (char).
		(2) sizeof(short char) == 1.
		(3) sizeof(char) >= sizeof(short char).
		(4) Clean up wording slightly to improve the
		    byte (storage cell) vs. character distinction.
	RECOMMENDED:
		(5) Fix character \-escapes so that larger numeric
		    values are permitted in character/string constants
		    on implementations where that is needed.  The
		    current 9/12 bit limit is a botch anyway.
		(6) Text streams read/write/seek (char)s, and
		    binary streams read/write/seek (short char)s.
		    This requires addition of fgetsc(), fputsc(),
		    which are routines I think most system programmers
		    have already invented under names like get_byte().
		(7) Add `b' size modifier for fscanf().

I've previously pointed out that this has very little impact on most
existing code, although I do know of exceptions.  (Actually, until the
code is ported to a sizeof(short char) != sizeof(char) environment,
it wouldn't break in this regard.  That port is likely to be a painful
one in any case, since it would probably be to a multi-byte character
environment, and SOMEthing would have to be done anyway.  The changes
necessary to accommodate this are generally fewer and simpler under my
proposal than under a (long char)/lstrcpy() approach.)

As to whether I think that mapping to/from 16-bit (char) would be done
by the I/O support system rather than the application code, my answer
is:  Absolutely!  That's where it belongs.  (AT&T has said this too,
on at least one occasion, taking it even so far as to suggest that the
device driver should be doing this.  I assume they meant a STREAMS
module.)

I won't bother responding in detail on other points, such as use of
reasonable default "DP shop" collating sequences analogous to ASCII
without having to pack/unpack multi-byte strings.  (Yes, it's true
that machine collating sequence isn't always appropriate -- but does
that mean that one never encounters computer output that IS ordered by
internal collating sequence?  Also note that strcoll() amounts to a
declaration that there IS a natural multibyte collating sequence for
any single environment.)  Instead I will simply assure you that I
have indeed thought about all those things (and more), have read the
literature, have talked with people working on internationalization,
and have even been in internationalization working groups.  I spent the
seven hours driving back from the Raleigh X3J11 meeting analyzing why
people were finding these issues so complex, and discovered that much
of it was due to the unquestioned assumption that "16-bit" text had to
be considered as made of individual 8-bit (char)s.  If one starts to
write out a BNF grammar for what text IS, it becomes obvious very
quickly that that is an unnatural constraint.  Before glibly dismissing
this as not well thought out, give it a genuine try and see what it is
like for actual programming; then try ANY alternative approach and see
how IT works in practice.

If you prefer, don't consider my proposal as a panacea for such issues,
but rather as a simple extension that permits some implementers to
choose comparatively straightforward solutions while leaving all others
no worse off than before (proof: if one were to decide to make
sizeof(char) == sizeof(short char), that is precisely where we are now.)
What I DON'T want to see is a klutzy solution FORCED on all implementers,
which is what standardizing a bunch of simultaneous (long char) and (char)
string routines (lstrcpy(), etc.) would amount to.  If vendors think it
is necessary to take the (long char) approach, the door is still open
for them to do so under my proposal (without X3J11's blessing), but
vendors who really don't care about 16-bit chars (yes, there are vendors
like that!) are not forced to provide that extra baggage in their
libraries and documentation.

The fact that more future CPU architectures may support tiny data types
directly in standard C than at present is an extra benefit from my
approach to the "multi-byte character" problem; it wasn't my original
motivation, but I'm happy that it turned out that way.  (You can bet
that (short char) would be heavily used for Boolean arrays, for example,
if my proposal makes it into the standard; device-specific bitmap
display programming is by no means the only application that could
benefit from availability of a shorter type.  I've seen many people
#define TINY for nybble-sized quantities, usually having to use a
larger size (e.g., (char)) than they really wanted.)

>From the resistance he's been putting up, I doubt that I will convert
Guy to my point of view, and I'm fairly sure that many people who have
already settled on some strategy to address the multi-byte character
issue are not eager to back out the work they've already put into it.
However, since I've shown that a clean conceptual model for such text
IS workable, there's no excuse for continued claims that explicit
byte-packing and unpacking is the only way to go.