Signed char - What Foolishness Is This!

guy at sun.UUCP guy at sun.UUCP
Sat Oct 18 05:37:49 AEST 1986


> 1)      Do other C compilers make 'char' a signed quantity by default?

Yes.  Lots and lots of them, including the very first C compiler ever
written (if there was an earlier one, Dennis, let me know...) - the PDP-11 C
compiler.

> 2)      What possible justification is there for this default?

1) When the PDP-11 C compiler was written, ASCII characters *were* 7-bit
characters, and there was no general use of 8-bit characters, and 2) the
PDP-11 treated bytes as signed, rather than unsigned, so references to ASCII
characters as unsigned rather than signed costs some time and bought you
nothing.  I suspect Microsoft did this to make less-than-portable code
written for PDP-11s and VAXes work on 8086-family machines without change.

> Is not 'char' primarily a logical (as opposed to mathematical) quantity?

Yes, but the people to complain to here are ultimately the designers of the
PDP-11 (although a lot of string manipulation on PDP-11s could be done using
unsigned characters without much penalty).

> I can understand the desirability of allowing 'signed char' for gonzo
> programmers who won't use 'short',

It's not a question of "gonzo programmers who won't use 'short'.  There are
times where you absolutely *must* have a one-byte number in a structure;
"short" just won't cut it here.  (Bit fields would, perhaps, except that you
can't take the address of a bit field.)  Structures representing device
registers, or representing fields in other externally-specified data, are an
example of this.  Also, if you have a *huge* array of integers in the range
-127 to 128, you may take a significant performance hit by using "short"
rather than "char" (remember, "short" takes twice the amount of memory that
"char" does on most implementations).

> or who want to risk future compatibility of their code on the bet that
> useful characters will always remain 7-bit entities.

They're risking nothing.  "signed char" is a gross way of saying "short
short int", not a way of saying "signed character" (which, as you say, is
meaningless).  Unfortunately, C originally didn't have "short" or "long",
and when they were added they did not cascade.

I presume, by the way, that "isupper(<u-umlaut>)" is intended to return 0
and "isupper(<U-umlaut>)" is intended to return 1.  If Microsoft didn't put
the extended character set into the "ctype" tables, the way that the
indexing is done is irrelevant.
-- 
	Guy Harris
	{ihnp4, decvax, seismo, decwrl, ...}!sun!guy
	guy at sun.com (or guy at sun.arpa)



More information about the Comp.lang.c mailing list