My pointer stuff: C caught me again (?) but it has truths in it

Guy Harris guy at sun.uucp
Sun Jun 29 16:11:01 AEST 1986


> The code in question is two analogous sections:
> 
> -------- section 1 ---------
> 
> struct sfld (*__cursf)[] = (struct sfld (*)[]) 0;
> 
> if ((__cursf = (struct sfld (*)[]) calloc(n, sizeof (struct sfld)))
> 	== (struct sfld (*)[]) 0) ...
> 
> ----------------------------
> 
> This was intended to allocate an array and assign it to a variable of type
> ``pointer to array of (struct sfld).  I suspect the type is wrong but I'm
> not sure how to decalre such a beastie; I suspect that it *does* *not*
> *exist* *at* *all* in C, now that I've played with it.

Wrongo.  "struct sfld (*cursf)[]" *is* a declaration of a pointer to an
array of "struct sfld".  However, it is not possible to generate a value
with that type by taking the address of an object which is an array of
"struct sfld".  You *can* generate a value of that type by using the name of
an array of arrays of "struct sfld"; such a name has the type of a pointer
to an element of that array, and hence the type "pointer to array of 'struct
sfld'".

(By the way, the casts of "0" are not necessary; the compiler knows that the
LHS of the "=" operator in the declaration, and the "==" operator in the
"if", is a pointer, and thus knows that it must coerce the "0" into a null
pointer of the appropriate type.)

The "malloc" here *allocates* an *array* of "struct sfld"; however, it
*returns* a pointer to the first element of that array.

> This could easily have been done correctly:
> 
> int array[3];	-- should declare a pointer followed by 3 integers, with the
> 		   pointer initialized to the 3 integers
> int array[];	-- should decalre a pointer.

No, NO, *NO*, ***N*O****,


	N     N   OOOOO   !
	NN    N  O     O  !
	N N   N  O     O  !
	N  N  N  O     O  !
	N   N N  O     O  !
	N    NN  O     O
	N     N   OOOOO   !

"int array[3]" does not, and should declare any sort of pointer.  It should
reserve storage for three "int"s - PERIOD!  "int array[]" should, if "array"
is initialized, declare an array with as many members as appear in the
initialization; if it's not initialized, it should either be an error or be
considered an "extern" declaration of an array whose size is specified (and
whose storage is reserved" in another module.  The only pointers involved
should be the *constant expression* "array", which has type "pointer to
'int'" when it appears in an expression.  NO storage should be reserved to
hold this "pointer", because no storage NEEDS to be reserved to hold this
pointer - any more than storage needs to be reserved (except, possibly, in
the instruction stream, or maybe in a literal pool) for the "3" in the
expression "x + 3".

> C should treat ``int array[]'' as a different type from ``int *ptr'',

It does.  That's what people have been trying to tell you!

> and while ``int array[3]'' and ``int array[]'' are the same type, the sized
> array's pointer should be treated as a constant.  (This may be arguable.)

Damn straight it's arguable.  NEITHER array has a "pointer" in the sense of
a location of memory which holds a pointer to that array.  The name "array"
is, when used in an expression, a *constant* pointer to the first member of
that array - in *both* cases.

> 	the malloc()'ed one is type (int *), to the C compiler (to me, int [])
> 	the declared one is type (int []), to the C compiler
> 		(which defines (int []) as (int *))

No, it doesn't.  You haven't been listening.  *Start* listening.  To the C
compiler, "int []" declares an array of "int"s, which is normally
implemented as a consecutive block of locations holding "int"s.  However, an
array can *not* be used as an object in an expression.  You can't do array
assignment, you can't add two arrays, you can't pass arrays to functions as
arguments, and you can't have a function which returns an array.  When the
name of an array is used in an expression, it is *reinterpreted* as a
*constant* pointer to the first element of that array.

The "malloc()'ed one" is type "int []"; however, "malloc" returns a pointer
to the first element of that array.  This is not much stranger than

	int *x;
	x = (int *) malloc(sizeof int);

"malloc" can't very well return an "int" here, it can *only* return a
*pointer* to what it has allocated.  You *have* to declare a "pointer to
'int'" here, even though the object which "malloc" has allocated is an
"int", not a "pointer to 'int'".  The same is almost true of arrays, except
that you declare a pointer to an object of type <whatever>, rather than of
type "array of <whatever>", when "malloc"ing an array.

> and they are in fact identical in memory, so the C compiler treats them as
> identical period.

Bullshit.  A pointer to "int" and an array of "int" are in NO WAY identical
in memory.

> Come to think of it -- can malloc() or similar be typed right anyway?  I
> suspect this is why Pascal uses the ``new(pointer)'' construct, known to the
> compiler; it's type-able at compile time.  But catching the allocation of an
> (int []) (vs. an (int)) from malloc() and forcing the former to be assigned
> to a variable of type (int []) and the latter to an (int *) is nearly
> impossible even when the language considers (int []) and (int *) to be
> different.

No, no, no!  If you "malloc" an array, you don't assign the result of
"malloc" to a variable of type "int []".  What you want is to be able to
assign it to a variable of type "pointer to array of 'int'" and use that
pointer to refer to that array.  If you "malloc" an "int", you don't assign
the result to a variable of type "int", do you?

The problem here is that you don't deal with pointers to arrays in the
following fashion:

	int (*pointer_to_array)[];

	pointer_to_array =
	    (int (*)[]) malloc(number_of_array_elements * sizeof int);
	third_element_of_malloced_array = (*pointer_to_array)[2];

If arrays had been first-class types in C, this would have been how you
would have done it.  Instead, you have to do:

	int *pointer_to_first_element_of_array;

	pointer_to_first_element_of_array =
	    (int *)malloc(number_of_array_elements * sizeof int);
	third_element_of_malloced_array =
	    pointer_to_first_element_of_array[2];
	/* or *(pointer_to_first_element_of_array + 2) */

This is the source of infinite confusion for some C programmers, and I agree
with Wayne that it was, in balance, a mistake.  It *can't* be fixed now,
however fervently one might wish to do so.  It's *too late*.  C is *already
out there*, and changing it now would break too many programs.  If you
change it, you'll have to call the resulting language D (or P).
-- 
	Guy Harris
	{ihnp4, decvax, seismo, decwrl, ...}!sun!guy
	guy at sun.com (or guy at sun.arpa)



More information about the Comp.unix mailing list