C pet peeve

Thu Mar 24 00:22:50 AEST 1983

As various folks have mentioned, it is difficult to check C subscripts.
In fact, it is worse than has been mentioned: there may well be only two
rational design points for languages ofthe C/PASCAL/FORTRAN/ALGOL... level:

1) (like C) use a language that models typical machines directly,
with little extra overhead, and fairly unconstrained semantics, i.e.,
we all know pointers are addresses, and expect no protection.
OR
2) Design a language to be compile-time checkable from day one,
with a) highly-constrained pointer semantics, b) either dope vectors/
descriptors for any objects (like arrays) passed by reference, or
array-size conformance required of functions (thus forbidding
variably-sized arguments).
In case 2, given an optimizing compiler that does serious dataflow
analysis (i.e., like IBM FORTRAN IV(H)), it is possible to optimize away
many of the otherwise necessary subscript checks.
However, much care is needed in design of language semantics or this becomes
excruciatingly difficult (excruciating because safety usually implies
numerous checks that are actually unecessary).  For example, in PL/I:
DCL X(10);		DCL	X(10);		DCL X(10);
DO I = 1 TO 10;		DO I = 1 TO 10;		CALL SUBR(I);
    X(I) = X(I)+1;	   CALL SUBR(I);	I = 1;
END;			   X(I) = X(I) + 1;	CALL SUBY;
			END;			X[I] = 1;
The left case needs no subscript checking; the 2nd case needs 1 subscript check
for the assignment statement, because SUBR may have modified I.  (It probably
didn't, but call-by-reference makes it very difficult to know what's
happening at the point of invocation -- here, C's default call-by-value
only is a great help: at least when you see funct(&x) you expect that x
might be changed.) Even worse, in the 3rd case, the X(I) above also needs
a check, because safety requires that you assume that once you give away the
address of anything (as in SUBR), that it may be saved somewhere and
the value modified in any subroutine call. Same issue arises in some FORTRANs.
Solutions to the problem for typical languages require complex inter-
procedural analysis, fancy linkers, or complex compilation/binding systems

What's the moral? this is not an argument against checking for
(subscript-in-range, undefined variables, pointer usage), but an observation
that doing checking well requires considerable language design thought,
or acceptance of considerable overhead in space and time.

I personally think that either a) stick with something whose semantics
is fairly straightforward, like C, or b) go to a much higher level where
subscript-checking mostly disappears into higher-level aggregate operations,
i.e., go to APL or SETL, etc.
-mashey