Array bounds checking with C????

Fri Sep 7 06:30:31 AEST 1990

In article <1589 at redsox.bsw.com> campbell at redsox.bsw.com (Larry Campbell) writes:
>Are there actually any current compilers out there that are so stupid that
>they generate substantially different code for the following two code fragments?
>
>    /* Fragment 1 */
>    for (p = array; p < &array[ARRAY_SIZE]; p++)
>	*p++ = '\0';
>
>    /* Fragment 2 */
>    for (i = 0; i < ARRAY_SIZE; i++)
>	array[i] = '\0';
>
>If the answer to my question is "no", then the only excuse for permitting
>array indices past the end of an array is for compiling old, crufty code
>from clay tablets.
>
>If the answer to my question is "yes", then I think we should flame the
>culpable compiler vendors to a crisp.

To answer the first question, "Does any compiler generate different
codes for the two fragments?" I would have to say definately yes.
I would expect the two fragments to generate different codes. The first
one uses a pointer to access the contents of the array, and the second one
uses an integer as an index to the array. The final result will still be the
same, in that the array is cleared to 0, but I don't think that the compiler
will "convert" or "optimize" the second to the first or vice versa. After all,
assuming it is trying to convert the second to the first, where is it going to
get a pointer from? Create one? How will i become ARRAYSIZE after exiting the
loop? There has to be SOME code difference. I don't think that a compiler
should have to be sophisticated enough to "guess" at the intent of your code.

(Another article stated that a MIPS compiler actually generated the SAME
code for the two fragments. I find that hard to believe unless index i was
never used anywhere else. Even so, that must have been a sophisticated
compiler.) 

As for the usefulness of "crufty code", I think that the first fragment
will run faster on most machines, since the address (array+i) doesn't have
to be computed at each step. Whether this savings of time is of any interest
to you is another matter.

To answer the second question, namely "Why don't compilers check array
bounds?", I would say that it is a very difficult task to check array
bounds. For simple cases, like:

int array[ARRAYSIZE];

for (i=1;i<=ARRAYSIZE;i++)   /* Error */
	array[i] = '\0';

perhaps a compiler could discover a possible error. But checking more
complicated examples would probably be as hard as checking correctness of
algorithms. For example:

for (i=0;i<ARRAYSIZE;i++)
	array[func(i,someparm)]=nextval();

How is the compiler going to know if the array bound will be exceeded?
It would need to check func and see if the return values fall within the
bounds (0..ARRAYSIZE-1). To further complicate things, func is dependent
on someparm, which perhaps if given the wrong values, can cause func to
return invalid indeces. Thus, the compiler would have to make sure that
someparm is in ITS valid range. As you can see, this problem can balloon
further, given more variables and dependencies.

I think that the best that can be done is to check for simple cases and to
give warnings where applicable. Other than that, it is quite unreasonable
to expect array bounds checking at compile time. I don't think other languages
have this, do they?  Run-time checking is more realistic, and I think I have
seen this in Pascal, but it does slow things down quite a bit. All I can
suggest to you is to find a compiler that has run-time checking that you
can turn on or off, so that once you're satisfied that you haven't made any
mistakes, you can turn the checking off and run your program at full speed.
--
----------------------------------------------------------------------
Jimmy Hu                                      jimmy at tybalt.caltech.edu
----------------------------------------------------------------------