is it really necessary for character values to be positive?

karl at haddock.UUCP karl at haddock.UUCP
Mon Jan 19 07:49:16 AEST 1987


In article <598 at mcgill-vision.UUCP> mcgill-vision!mouse (der Mouse) writes:
>In article <289 at haddock.UUCP>, karl at haddock.UUCP (Karl Heuer) writes:
>> Suppose I am using such a system, and one of the characters -- call
>> it '@' -- has a negative value.  The following program will not work:
>>     main() { int c; ... c = getchar(); ... if (c == '@') ... }
>> ... Any printing character that I want to enclose in single quotes had
>> better be positive, or it becomes VERY awkward to use.
>
>Well.  Now, exactly what does it mean to say that @ is negative?
>Presumably it means that the test below will succeed:
>	char c = '@'; if (c < 0) ...

Actually what I meant was simply that "if ('@' < 0) ..." would succeed.  This
is not the same thing since '@' has type int.  Your test says only that char
is implemented as a signed datatype, and that '@' has the high bit set.

>Notice that you can't make '@' the same thing as what getchar() returns,
>because [char s[N]; if (s[0] == '@') ...] will fail.

That's the flip side of the problem, which I overlooked it in my posting.  The
problem is independent of single-quotes; any machine on which characters are
signed will fail to handle the test (getchar() == s[0]).  The only reason it
"worked" so well on the pdp11 was that *in practice*, all the chars one has to
deal with (I'm assuming text characters, not one-byte integers) were 7-bit, so
it didn't matter whether they were sign-extended (as with s[0]) or unsigned
(as with getchar()).

>About the neatest solution I see is to make 'x' have type unsigned char
>rather than int, at least when there's only one character between
>quotes.  Then we also have to arrange that char and unsigned char
>are not promoted to int in expressions not involving anything bigger
>than char.  This should make both of these work.

I dunno.  A simpler solution is to assert that plain char is unsigned char.
As I said before, I suspect the adopted solution will be that in an 8-bit
environment plain char will be unsigned char; the only default-signed-char
compilers will be on pdp11-like machines in 7-bit environments.

>(is there any code out there *using* multi-char character constants?)

If so, it's almost all nonportable.  The only portable use I've seen was one I
wrote for a program that dealt with the two-letter codes found in termcap,
troff, etc: "switch (s[0]*'\1\0' + s[1]*'\0\1') { case 'xy': ...; }".  I ended
up not using it anyway, since lint didn't like it.  (But it is independent of
byte size and byte ordering.)

[From article <600 at mcgill-vision.UUCP>, same author, again quoting kwzh]
>> [Your suggestion] supports my contention that making getchar() an int
>> function was a mistake in the first place.**

I am now even more sure, btw, that making it (int)(unsigned char)c was wrong.
(Perhaps, as someone else suggested, (int)c would have been better; provided
EOF is defined as something out-of-band like 0x8000.)

>> **I do have what I think is a better idea, but I'm not going to
>> describe it in this posting.

(This was because I tend to do a lot of my posting in the wee hours of the
morning, and I didn't trust myself to give any details.)

>How about in another posting then?

Stay tuned.  I'll probably be posting it to comp.lang.misc (since "it isn't C
anymore") sometime in February (not sooner; I have a big project due).  Look
for "Error handling".

>What I normally do is something more like [char c; /*!*/ ... c = getchar();
>if (feof(stdin)) ...] ie, *ignore* the EOF return and check explicitly.

I think that's a better model in that it doesn't rely on the ability to cast
char into a larger type; the problem is that it's cumbersome.  The common
idiom "while ((c = getchar()) != EOF) ..." has to be written with a comma
("while (c = getchar(), !feof(stdin)) ...") or a test-in-the-middle loop
("for (;;) { c = getchar(); if (feof(stdin)) break; ... }").

Karl W. Z. Heuer (ima!haddock!karl or karl at haddock.isc.com), The Walking Lint



More information about the Comp.lang.c mailing list