integer value of multi-char constants

Chris Torek chris at mimsy.umd.edu
Tue Oct 17 23:14:11 AEST 1989


>In article <29588 at gumby.mips.COM> lai at mips.COM (David Lai) writes:
>>on a mips, vax, and sun the following is true:
>>	'\001\377' == '\000\377';
>>however on the same machines:
>>	'\001\177' != '\000\177';
>>The question is: does the above behaviour conform to ANSI C?

In article <20205 at mimsy.umd.edu> I wrote:
>Certainly.  The more important question is `why would anyone expect
>otherwise?'

Oops, for whatever reason I read the first line as

	'\000\377' == '\000\377'

However, the results are still easily ( :-) ) explained.  '\377' is
shorthand for -1, and the compiler expands multicharacter constant
values as follows (simplified: \ processing hidden):

	case '\'':
		if ((value = nextc()) == STOP)
			error("no characters in character constant");
		while ((c = nextc()) != STOP)
			value = (value << 8) | nextc();

So '\001\377' computes as

		value = 1;	/* \001 */
		c = -1;		/* \377 */
		value = (1 << 8) | -1;
		c = STOP;	/* ' */
		/* value = -1 */

while '\000\377' computes as

		value = 0;	/* \000 */
		c = -1;		/* \377 */
		value = (0 << 8) | -1;
		c = STOP;	/* ' */
		/* value = -1 */

If the compiler added the values, rather than ORing them, the results
would be different (and very peculiar).

Probably the compiler should not sign extend unless the character constant
contains only a single character.
-- 
`They were supposed to be green.'
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7163)
Domain:	chris at cs.umd.edu	Path:	uunet!mimsy!chris



More information about the Comp.std.c mailing list