Why unsigned chars not default?

Henry Spencer henry at utzoo.uucp
Sun Oct 23 09:10:50 AEST 1988


In article <9563 at pur-ee.UUCP> mendozag at ee.ecn.purdue.edu (Victor M Grado) writes:
>   He claims the compilers are at fault and that all the compilers
> should have 'unsigned char' as default for characters...

[Possibly we ought to have a "frequently-asked questions" posting in this
group.  Here, slightly modified, is something I posted two years ago,
when a debate raged on this issue.]

Would he still feel this way if all manipulations of unsigned char took
three times as long as those of signed char?  It can happen.

All potential participants in this debate please attend to the following.

- There exist machines (e.g. pdp11) on which unsigned chars are a lot less
	efficient than signed chars.

- There exist machines (e.g. ibm370) on which signed chars are a lot less
	efficient than unsigned chars.

- Many applications do not care whether the chars are signed or unsigned,
	so long as they can be twiddled efficiently.

- For this reason, char is intended to be the more efficient of the two.

- Many old programs assume that char is signed; this does not make it so.
	Those programs are wrong, and have been all along.  Alas, this is
	not a comfort if you have to run them.

- The Father, the Son, and the Holy Ghost (K&R1, H&S, and X3J11 resp.) all
	agree that characters in the "source character set" (roughly, those
	one uses to write C) must look positive.  Actually, the Father and
	the Son gave considerably broader guarantees, but the Holy Ghost
	had to water them down a bit.

- The "unsigned char" type exists (in most newer compilers) because there
	are a number of situations where sign extension is very awkward.
	For example, getchar() wants to do a non-sign-extended conversion
	from char to int.

- X3J11, in its semi-infinite wisdom, has decided that it would be nice to
	have a signed counterpart to "unsigned char", to wit "signed char".
	Therefore it is reasonable to expect that most new compilers, and
	old ones brought into conformance with the yet-to-be-issued standard,
	will give you the full choice:  signed char if you need signs,
	unsigned char if you need everything positive, and char if you don't
	care but want it to run fast.

- Given that many compilers have not yet been upgraded to match even the
	current X3J11 drafts, much less the final endproduct (which doesn't
	exist yet), any application which cares about signedness should use
	typedefs or macros for its char types, so that the definitions can
	be revised later.

- The only things you can safely put into a char variable, and depend on
	having them come out unchanged, are characters from the native
	character set and small *positive* integers.

- Dennis Ritchie is on record, as I recall, as saying that if he had to do
	it all over again, he would consider changing his mind about making
	chars signed on the pdp11 (which is how this mess got started).
	The pdp11 hardware strongly encouraged this, but it *has* caused a
	lot of trouble since.  It is, however, much too late to make such
	a change to C.
-- 
The meek can have the Earth;    |    Henry Spencer at U of Toronto Zoology
the rest of us have other plans.|uunet!attcan!utzoo!henry henry at zoo.toronto.edu



More information about the Comp.lang.c mailing list