How to use toupper()

Ray Butterworth rbutterworth at watmath.waterloo.edu
Fri Jan 20 01:30:29 AEST 1989


getchar() presents yet another aspect of the problem.  Consider:
    switch (getchar()) {
        case EOF:
            ...
        case 'C':
            ...
    }
If 'C' is any character that sign extends, the switch won't work.

> karl at haddock.ima.isc.com (Karl Heuer)
> > msb at sq.com (Mark Brader)
> > The best you can do is to avoid "char" altogether and use "unsigned char".
> > You probably have to do it throughout the program, in fact.
> If the program has to be strictly conforming, you may be right.  (But then
> string literals, and functions that expect `char *' arguments, may screw
> things up; casting the pointers ought to be safe, though.)

i.e. you will have to say
    (unsigned char *)"string"
or
    (unsigned char)'C'
whenever you use any literal, and you'll have to cast all your (char*)
arguments to standard ANSI functions.  This is true for any application
that might be used in a locale with non-ASCII character sets and wants
to be portable to any conforming ANSI compiler that might have chosen to
treat chars as signed.

In general though, if the compiler is expected to produce programs that
can work on a local character set containing characters with the high
bit set, it is almost certain that the compiler will have to treat
(char) as (unsigned char).  Anyone that really wants to use chars to
perform signed arithmetic can now explicitly ask for (signed char).

The Standard should have explicitly stated that (char) is identical
with (unsigned char), and mentioned that compilers may, as an extension,
treat chars as signed for backward compatibility.  At least, this
should have been listed as a denigrated feature that will probably
be eliminated in future versions of the Standard.

In practice I'm sure that is the way it will eventually turn out.
I can't imagine any European ANSI compiler having (char) signed.
It would provide far too little benefit and far too many complications.


Much of this was mentioned to the Committee.
e.g. Letter P04 to the Second Public Review contained:

+  4.3 Character Handling:
+      Most of these functions don't work for signed char values if
+  the upper bit is on.  Is it unreasonable to expect that with
+      char c[10];
+      int i;
+      c[0] = i = getchar();
+  the function calls
+      isxxx(*c)
+  and
+      isxxx(i)
+  should behave the same way if "i" is not EOF?  This is not difficult
+  to do, and there certainly can't be any existing code that depends on
+  the described behavior.  Why not state that if the argument is not
+  EOF, the result will be the same as if the argument were cast to
+  unsigned char.  This would also remove the need for an equivalent to
+  the "isascii" function.

Perhaps I overestimated their abilities when I said "is not difficult".
Their response was:

+  This was considered a request for information, not an issue.
Well, it certainly looks like an issue to me.

+  It was never intended that they do so.  If you pass a signed char
+  argument and the sign is extended, the resulting value will not fit
+  in an unsigned char, as required.
Exactly.  I'm saying that you don't need to require it.
Drop that requirement and say that they are only defined to work on
values that can be returned by getchar().

+  Your suggestion would require the <ctype.h> functions to cast their
+  argument to unsigned char if it is non-EOF.
No it wouldn't.

+  This would require macro versions to evaluate their argument more
+  than once (once to test for EOF and once to cast them), rendering
+  them unsafe.
No, it would not require that macros evaluate their argument more
than once.  At worst it would require defining EOF as some negative
value other than -1, something that is explicitly allowed by the Standard.



More information about the Comp.std.c mailing list