problems/risks due to programming language, stories requested

Fri Mar 2 17:52:35 AEST 1990

In article <48f0d9c2.20b6d at apollo.HP.COM>, perry at apollo.HP.COM (Jim Perry)
offers a list of 8 mistakes he made.  Before commenting, I would remind
you that I am not a C bigot; I would love to have the chance to use Ada,
and I have fallen in love with Eiffel (which generates C...).

Comment 0:  Jim Perry didn't say what operating system and compiler he
was using.  This turns out to matter.

> 1. A function had [int *count; as a parameter] [and he wrote]
>     *count++;
> However, a better C compiler could have flagged the fact of
> the unused expression, i.e. that while "count++" was presumably an
> intended side effect, "*count" was unused.

Just so.  The compiler *could* have flagged it.  'lint' _would_
have, and I've used non-pcc compilers on Unix that would have.

> 2. I wanted to fill in a record whose structure was something like:
>     struct {
>         struct a fixed_length_stuff;
>         struct b variable_length_array[fixed_length_stuff.size];
>         char     string[]; /* variable-length null-terminated */
>         struct c more_stuff;
>     } foo;

Not a C problem.  Instead of using variable length fields, which C
quite explicitly doesn't support, it would be better to use pointers.
I think this mistake should be counted as refusal to do things the C way.

> Other languages, however, would have allowed me to describe such a
> structure within the language (PL/I, for one).

It's worth pointing out that PL/I is not a strongly typed language
(*much* weaker than C)
and although it has record variables it does not have record types.

> 3. [an off-by-one error accessing an array]
> In a system
> with runtime array bounds checking, this would have been detected
> quickly and painlessly.  As C doesn't really have arrays it's very
> unlikely that a C runtime implementation could do this.  

Note that some systems with runtime bounds checking only check that
the result of a multidimensional array access is somewhere in the
array as a whole (read the VS Fortran 2 manual for instance) which
means that this can happen _within_ an array.  Note that there is not
one tiny little thing anywhere in the ANSI Pascal standard (which I
have read) that *requires* bounds checking.  PL/I used to be the same
way.  Ada (LRM 4.1.1, 11.1) _does_ require bounds checking (nice one).

However, the claim that "C doesn't really have arrays" is not quite
true.  It is only legal to access the value of a pointer variable
itself in C when the variable points into (or just one pas
t the end
of) an array, and it is only legal to *dereference* a pointer when
it points properly into an array.  A C implementation may, for
example, maintain pointer values as triples:  (Arr,Lim,Off) where
Arr is the address of the beginning of the array, Lim is the size
of the array, and Off is the offset of the pointer from the base
of the array, so that 0 <= Off <= Lim for the pointer to be valid
and 0 <= Off < Lim for dereferencing to be valid.  Not only is this
possible in principle, but Symbolics C does something similar, and
I believe that the Saber-C debugger also checks array bounds.

> 4. [a field in a record wasn't set; not a C problem]
>    [a function had both return e; and return;]
> The absence of a return statement could
> and should have been caught by the compiler.

Another quality-of-implementation issue.  lint _would_ have caught it
and several C compilers I've used would have caught it.

> 5. [an oversight; not a C problem]

> 6. [output variables not set on exception; not a C problem]

> 7. I had one bug caused by omission of an item in an initializer list
> for a struct (a vector of function pointers).  The compiler could have
> caught that if the language didn't allow partial initializer lists.

Caught by a feature, I'm afraid.  There are several methods you can use
to help yourself catch this kind of mistake.  One is to do

	some_type the_table[ /*note size not specified*/ ] =
	    { ... ... }

	void check_the_table()
	    {
		assert(sizeof the_table == expected * sizeof the_table[0]);
		/* other tests */
	    }

Another trick is to use the preprocessor to help you check your counting.
#define ten(A,B,C,D,E,F,G,H,I,J) A,B,C,D,E,F,G,H,I,H

	some_type the_table[] = {
	    ten(
		ten(x00, x01, ..., x09),
		...
		ten(x90, x91, ..., x99)
	    )};

Now if you miscount, the preprocessor will complain.

> 8. [not a C problem]

> Overall, that's 8 bugs or classes of bugs. 5 of the 8 could have been
> avoided or detected by a smarter compiler or a different language. 

My totals are
4   Not a C problem
1   Can be avoided by exploiting the preprocessor
2   Would have been caught by Lint or some existing compilers
1   Would have been caught by a C interpreter

That's *one* C-related problem where a V7 UNIX programmer would have
been left lamenting.

One of the nice things about Dijkstra's notation is that he can
distinguish between variables a program text is _allowed_ to change
and variables it is _obliged_ to update.  That means that a simple
flow analysis can catch procedure output parameters which are not
assigned.  This would have caught one of the "Not a C problem" mistakes.
C would have a hard time doing this because it doesn't know when you
intend a pointer parameter to be a pointer input and when it is the
address of an output.  On the other hand, Ada doesn't know the
difference between "MAY assign" and "SHOULD assign" either.

It is a fair criticism of many existing C *compilers* that they do not
issue enough warning messages, and we *should* tell compiler vendors
that warning messages are worth money to us.  If Bill Wolfe (say) had
done a survey of magazine reviews of C compilers and told us that
reviewers consistently rated speed of compilation as more important
than good warning messages, that would have been a legitimate criticism.
(I get that impression, but I _haven't_ done the survey and don't claim
it as anything more than an impression.)