Array bounds checking: what is legal

Chris Torek chris at mimsy.umd.edu
Sun Sep 2 06:37:35 AEST 1990


In article <26196 at mimsy.umd.edu> I wrote:
>`&arr[sizeof arr/sizeof *arr]' ... is Officially Legal.

(Those who would dispute this are advised to see ANSI Standard
X3.159-1989, otherwise known as `The ANSI C Standard', sections 3.2.2.1
(Lvalues and function designators), 3.3.3.4 (The sizeof operator), and
3.3.6 (Additive operators).)

This seems to be rather universally misunderstood.  To amplify a bit:

In article <29051 at nigel.ee.udel.edu> gdtltr at freezer.it.udel.edu (Gary Duzan)
writes:
>I don't believe accessing the element after is legal, but the pointer
>is still legal.

Correct.  Given `int a[4];', the following holds:

	int *p = a;			/* legal */
	a[0], a[1], a[2], a[3];		/* all legal */
	p[0], p[1], p[2], p[3];		/* all legal */
	p = &a[4];			/* legal */
	*p;				/* illegal (a[4] does not exist) */
	p--;				/* legal */
	p = a;				/* legal */
	p--;				/* illegal */
	p = &a[4];			/* legal */
	p[-4], p[-3], p[-2], p[-1];	/* all legal */

Note the last carefully: it is not the subscript itself that makes a
given x[i] legal or illegal, but rather whether x+i yeilds a legal address
and, if so, whether *(x+i) is also legal.

Now, as to why &a[4] is legal when a[4] is not, consider:

	int i;
	for (i = 0; i < 4; i++)
		printf("%d\n", i);

When this code is run, i takes on five values, namely 0, 1, 2, 3, and 4.
Even if we alter the loop slightly to get rid of the `4', i still takes
on the value 4:

	for (i = 0; i <= 3; i++)
		...

Now what happens if we loop `p' over the various elements in `a'?

	for (p = &a[0]; p < &a[4]; p++)
		...

p must eventually take on the value &a[4].  There is no way around it;
even if we get rid of the `&a[4]' in the loop, p still winds up with
&a[4] as its final value:

	for (p = &a[0]; p <= &a[3]; p++)
		...
	/* now p == &a[4] */

Since this sort of thing happens all the time in existing code, there was
no choice but to make it Officially Legal and require all C compilers to
support it.  This, on the other hand, is not legal:

	for (p = &a[3]; p >= &a[0]; p--)	/* illegal */
		...

This loop supposedly terminates when p takes on the value &a[-1]; but as
noted above, &a[-1] is not a legal address, and in fact this code fails
on some machines---for instance, on a 68000 where the C compiler starts
the data space at location 2, and `a' is a global array of 32-bit `int's
that happens to be the first object in the data segment.  The code turns
into, e.g.,

loop:
	...
	subql	#4,a2		# p--
	cmpl	#2,a2		# (unsigned long)p < 2?
	jcs	out		# if so, exit loop
	jra	loop		# otherwise continue

and when p==&a[0], p==2, so p-4 puts 0xfffffffe into p, which is still
greater than or equal to 2.

This is the same old fencepost problem that occurs everywhere.

Incidentally, there is a way to keep p from taking on &a[4]:

	for (p = a;; p++) {
		...
		if (p == &a[3])
			break;
	}

This is the same solution required for loops that purport to run to
MAXINT or MAXULONG or other such maxima, and it shares their drawback:
these are exceedingly ugly.
-- 
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 405 2750)
Domain:	chris at cs.umd.edu	Path:	uunet!mimsy!chris



More information about the Comp.lang.c mailing list