Must sizeof(int) exceed sizeof(char) in hosted environments?

Fri Sep 1 05:59:02 AEST 1989

In article <1989Aug29.204254.3307 at sq.sq.com> msb at sq.com (Mark Brader) writes:
>But in a *hosted* implementation, Section 4 applies as well.  And Doug
>Gwyn has just called attention in comp.lang.c to the fact that several
>library functions specified there, such as getchar(), are expected to
>convert an unsigned char value to type int.
>
>Considering an implementation where sizeof(int)==sizeof(char), Doug writes:
>> Since in such an implementation an int would be unable to represent
>> all possible values in the range of a unsigned char, as required by
>> the specification for some library routines, it would not be standard
>> conforming.
>
>Setting aside the fact that we're talking only about hosted environments
>here, this seems shaky to me.  I can see two ways out of it, which makes
>three possibilities in all.
>
>[1] The wording of footnote 16 defining a so-called pure binary numeration
>system is so broad that it may allow an unsigned type to simply ignore the
>high-order bit position, provided that the corresponding signed type is
>at least one bit wider than the minimum otherwise required.  Then int could
>be 16 bits, char could be 16 bits, and unsigned char could be 16 bits of
>which only the lower 15 are actually used.

I agree that the pure binary numeration definition paraphrased in a footnote
does allow this sort of implementation.

>[2] The wording of the requirements of the aforementioned functions could
>be taken as specifying only that such a conversion be attempted, not that
>it be possible for all possible values of the argument.  If int and char
>are both 16 bits, and getchar() reads the character 0xFEED from the input,
>then getchar() should be allowed to do whatever happens when you assign
>the positive value 0xFEED to an int variable, and anything else would be
>undefined behavior under the "invalid value" rule of 4.1.6.
>
>[3] The above argument is right and so sizeof(int)>sizeof(char) is required
>to be true, in a hosted environment only.

Mark's and Doug's articles got me thinking along the same lines.  I believe
that the library does not force sizeof(char) to be less than sizeof(int).
Mark's [2] is a valid argument for Doug's point, but there are other
library section items:

1. 4.9.2, p126, l32-34:
	"A binary stream is an ordered sequence of characters that can
	transparently record internal data.  Data read in from a binary
	stream shall compare equal to the data that were written out to
	that stream, under the same implementation."

2. 4.9.3, p127, l9-11:
	"All input takes place as if characters were read by successive
	calls to the fgetc function; all output takes place as if by
	successive calls to the fputc function."

3. 4.9.7.1, p142, l7-8:
	"The fgetc function obtains the next character (if present) as
	an unsigned character converted to an int."

Since all objects are exact multiples of characters, this means that all
the bits in a character must be significant so that an fwrite/fread of a
negative int value works.

Now, if EOF is required to be distinguishable from all unsigned char
values after conversion to int, then it follows that sizeof(char) must
be less than sizeof(int).  There are many strong indications that EOF
"should" be different, I cannot find anything that actually requires
such a distinction.  Two such indications:

4. 4.9.7.1, p142 l12-13:
	"If the stream is at end-of-file, the end-of-file indicator for
	the stream is set and [the] fgetc [function] returns EOF.  If a
	read error occurs, the error indicator for the stream is set
	and [the] fgetc [function] returns EOF."

5. 4.9.7.11, p145, l15-16:
	"If the value of c [the first parameter for ungetc] equals that
	of the macro EOF, the operation fails and the input stream is
	unchanged."

Of course, virtually every program that reads input until EOF is not
portable since they don't check feof when getchar returns EOF!  And one
cannot pushback any character, since EOF must be rejected by ungetc.

>I seem to recall that the Committee explicitly decided not to require that
>sizeof(int)>sizeof(char) when it was requested for other reasons, to do
>with avoiding surprises with unsigned types in comparisons.  ("It was
>decided to allow implementers flexibility in this regard", or some such
>words.)  Are they now finding that they did require this all along?

Therefore (while discovering that even "cat" as most simply written is
not portable), the pANS still does not require that sizeof(char) must
be less than sizeof(int).

At this point, I'd be happier if there were a requirement that EOF be
distinct from all other values possible to return from fgetc!

Dave Prosser	...not an official X3J11 answer...