pointers to arrays

Thu Nov 20 11:47:55 AEST 1986

Stuart D. Gathman (stuart at bms-at.UUCP) asks:
> I am still confused about several things concerning pointers to arrays.
> ... When are pointers to arrays appropriate?  

This has been discussed here before, but I think not recently, so let
me have a crack at a new way to explain it.

Let me set up a background by taking a couple of screens to review
pointers to elementary objects, e.g. pointers to ints.

There are basically just 2 reasons that these are used.  One reason is
to access a scalar variable of that elementary type that you can't
otherwise get at:

	swapint (x, y)
	int *x, *y;
	{
		int t;
		t = *x; *x = *y; *y = t;
	}
	/* Please, no followups about other ways to write this! */

The second reason is to point to the beginning of an array, or of part of
an array, which you are going to index into.  The array name itself is
such a pointer, but you might use a pointer anyway because the array is
dynamically allocated or declared outside your scope or because you want
to point to part of the array.

For instance:

	int *a;
	a = (int *) malloc (n * sizeof (int));	/* or (n * sizeof *a) */
	for (i = 0; i < n; ++i)
		a[i] = ...

In older environments the cast of the result of malloc could be omitted,
but this is becoming non-kosher.  Another example is:

	int *a;
	int b[50];
	...
	for (a = b;  a[0] > a[1];  ++a)
		;
	/* Now a points to the beginning of the first subarray -- part of
	   the array -- whose first two elements are in descending order */

An important special case of the second reason occurs because C does not
allow arrays to be passed to functions:

	/* Invocation */		/* Definition */
	int b[20];			foo (x)
	foo (b);			int *x;

The array name b, used as an expression (as in the function argument), is
a constant pointer which is used to initialize the pointer x in the function.
x can then be indexed off, e.g. x[i].  This special case is so important that,
IN A FUNCTION HEADER ONLY, the declaration "int *x;" can also be written as
"int x[];" or even "int x[20];", to the great confusion of novices who then
think that arrays and pointers are the same thing.

Now to answer the question.

Since C does not have any operations on entire arrays, the first of
the above two uses is not applicable to pointers to arrays.  But the
second use is entirely applicable.  So the answer is: you use a pointer-
to-array type when you want to index into AN ARRAY OF ARRAYS, otherwise
called a 2-dimensional array.  The (2-d) array name itself is such a pointer,
but you might need to declare another pointer because the space is
dynamically allocated or declared outside your scope or because you
want to point to a part of the array.

Let's modify the above examples to show this.

	char (*a)[20];	/* pointer-to-arrays-of-20-chars */
	a = (char (*)[20]) malloc (n * sizeof (char [20]));
				/* or, again, (n * sizeof *a) */
	for (i = 0; i < n; ++i)
		strcpy (a[i], ...);

or the last part might also be

	for (i = 0; i < n; ++i)
		for (j = 0; j < 20; ++j)
			a[i][j] = ...

The next example might become something like this:

	char (*a)[20];
	int b[50][20];
	...
	for (a = b;  strcmp (a[0], a[1]) > 0;  ++a)
		;

or the for-header might instead be

	for (a = b;  a[0][0] > a[1][0];  ++a)

Again, the pointer might be used to pass an array to a function.
The third example becomes:

	/* Invocation */		/* Definition */
	int b[50][20];			foo (x)
	foo (b);			int (*x)[20];

and in this special case, the declaration "int (*x)[20];" could also
be written "int x[][20];" or even "int x[50][20];".

All clear now?

Going on to Stuart's specific questions:

> 1) How does one get such an animal?  The only methods I can figure are
> 	a) a cast: ( type (*)[] ) array_of_type

Correct, but the [] shouldn't be empty.  A pointer-to-array-of-unspecified-
length makes no sense.  (Some compilers may accept it, but they shouldn't.)
See one of the examples above; this construction with malloc is about the
only time this cast should be seen.

> 	b) a function returning such a type (but the return must use a cast!)

No cast needed if the declarations are correct.  In this example,
getm is a function that allocates space for a 10 by 10 matrix of
ints, and returns a pointer to the space, the type of which, of course,
is pointer-to-array-of-10-ints.

	/* Invocation */		/* Declaration */
	int (*m)[10];			int (*getm())[10]
	int (*getm())[10];		{
	...					return (int (*)[10]) malloc
	m = getm();					(100 * sizeof(int));
	for (i = 0; i <	9; i++)		}
	    for (j = 0; j < 9; j++)
		m[i][j] = 0;

The other, and commonest, way is to use the name of a 2-dimensional array
as a value.  Its type is pointer-to-array.  For instance, if getm() could
return a constant pointer, it could be written as:

					int (*getm())[10]
					{
						static int a[10][10];
						return a;
					}

Of course there are no parentheses around the *getm in the declaration;
those occur in pointers to functions.

A fourth way to get a pointer-to-array SHOULD be to precede an array name
with &, as in:

	int (*p)[10];
	int q[10];

	p = &q;

but in older compilers this does NOT WORK; the compiler, seeing &q,
thinks that you just meant q since that is already a pointer, and
compiles the statement accordingly.

If you start with a 3-dimensional array, an element of it is a pointer
to array.  For instance, in "int a[4][5][6];", a[0] has type "int (*)[6]".
The name a itself, used as a value, is also a pointer-to-array, but it is
a pointer-to-array-of-array, specifically "int (*)[5][6]".

> 2) The SysV semop(2) is defined as ...

I haven't used semop() myself, but from my reading of the man section,
both it and the lint library are wrong.  The argument is being used to
point to the beginning of an array of structs, so it should be a plain
pointer-to-struct.

> 3) It seems to me that any distinction between a pointer to an array
>    and a pointer to its first element is purely semantic.  (And given
>    the semantic difficult of obtaining a pointer to an array, why use
>    them?)  There is no pointer conversion that I can imagine involved.

Do you mean "purely syntactic"?  Certainly there is a semantic difference,
and it occurs when you increment or index off a pointer.  If m contains
a pointer to address 10000, and m is declared "char *m", then m[1] is at
address 10001; but if m is declared "char (*m)[50]", then m[1] is at
address 10050 (and has type pointer-to-char, to m[1][0] is the char at
10050, and m[1][1] is the char at 10051).

Since m[n] is equal to *(m+n) by definition whether m is an array or a
pointer, similar remarks apply to expressions like m+1 and ++m.

A final remark...  it may be easier to grasp all this if you banish from
your head the standard phraseology, which I have been using in this
article, where "char *" is called pointer-to-char, and "int (*)[10]"
is called pointer-to-array-of-10-ints.

Instead, I suggest calling them "pointer-to-chars" and "pointer-to-arrays-
of-10-ints-each".

It is true that, at any particular time, the pointer points to only one
char or one array-of-10-ints, BUT that char or array-of-10-ints could be
the beginning of an array of such objects, which justifies the phrasing
that I suggest.  In the case of pointers to arrays, as I said, pointing
to arrays of the arrays is about the only reason they are used.

Mark Brader, utzoo!dciem!msb
#define	MSB(type)	(~(((unsigned type)-1)>>1))