char constant?

Chris Torek chris at trantor.umd.edu
Mon Apr 18 12:13:26 AEST 1988


>>In article <5206 at ihlpg.ATT.COM> tainter at ihlpg.ATT.COM (Tainter) asks:
>>-Is "ABCD"[0] legal ANSI C?

>In article <11072 at mimsy.UUCP> I answered:
>>Yes.  It is not, however, a constant expression (see section 3.4).

In article <5214 at ihlpg.ATT.COM> tainter at ihlpg.ATT.COM (Tainter) replies:
>Well then, fix the standard!  Quick before it gets cast in concrete!

>Every component is a constant, why isn't the result a constant?

To a large extent, I happen to agree.  Constants should (almost?)
always be reduced to their simplest form at compile time.  There
is one argument against converting "x"[0] to 'x' at compile time;
it is rather weak:  Some C compilers---notably PCC variants---do
not carry strings about within the compiler.  Instead, they work
as follows:

When the lexical analyser sees a double-quote (`"') character, it
calls a special routine to collect strings.  If the compiler has
just begun the initialisation of an array of char, this routine
gathers each byte and drops it in the main initialised data space
(`.data' or `.data 0').  If not, it switches to the alternate
initialised data space (`.data 1') and generates a label, then
drops each byte in this space, then resumes the previous space
(instruction or main data).  This can be seen in the compiler output
for, e.g.,

	f() {
		static char str[] = "main data";
		char *p = "alternate data";
	}

	# saw function declaration, so generate prologue for f
	_f:
		.word	L12
	# (end-of-function code moved here by peephole optimiser)
		subl2	$4,sp
		.set	L12,0x0

	# saw `static char str[] = "': generate static local variable `str'
		.data		# main data space
	L16:			# str is L16
		.long	0x6e69616d	# "main"
		.long	0x74616420	# " dat"
		.long	0x61		# "a\0"
	# end of string

	# saw `char *p = "': generate anonymous string constant
		.data	1	# alternate data space
	L17:			# anonymous string is L17
		.ascii	"alternate data\0"
		.data		# resume previous space
		.text		# finish p = "..." initialisation:
		moval	L17,-4(fp) # p = L17

		ret		# end of f()

This method of handling anonymous aggregates, while expedient (the
compiler never carries more than one `thing' in its `head'), has
several unpleasant side effects.  One is that "text"[0] generates
code like

		cvtbl	L17,r11

rather than simply

		movl	't',r11

Another is that

	char *p1 = "hello", *p2 = "hello";

generates two separate strings that have the same text, rather than
making only one `hello\0' but making p1 and p2 both point to that.
Finally,

	f() { return (sizeof("hello")); }

compiles to the following mess:

	# some junk deleted
		.data	1
	L16:
		.ascii	"hello\0"
		.text
	_f:
		.word	0
		movl	$6,r0
		ret

Although the string itself is never used, it is still generated.

The latter two problems can be cured without changing the basic
anonymous aggregate string builder; the first cannot.
-- 
In-Real-Life: Chris Torek, Univ of MD Computer Science, +1 301 454 7163
Domain: chris at mimsy.umd.edu		Path: ...!uunet!mimsy!chris



More information about the Comp.lang.c mailing list