sizeof and multi-dimensional arrays

Chris Torek chris at mimsy.umd.edu
Mon Jan 7 06:27:48 AEST 1991


First, the instant replays (I guess I saw too much football yesterday :-) );
then a tutorial essay....


In article <1991Jan5.050613.22303 at Neon.Stanford.EDU>
dkeisen at Gang-of-Four.Stanford.EDU (Dave Eisen) asks why, with his compiler,
>char x[2][3];
>  sizeof (*x)          gives 6
>  sizeof (x[0])        gives 3.
>What's the scoop?

(The correct answer is `There is a bug in that compiler.')

In article <fred.663069060 at prisma> fred at prisma.cv.ruu.nl (Fred Appelman)
writes:
>You are just confused. 
>'x' is a two dimensional array of 2*3 elments of type char. Makes a total of
>6. 'x[0]' and 'x[1]' are arrays with a length of 3 elements. So both arrays
>have a size of 3.

This is correct, but does not explain why the compiler produces 6 for
`sizeof (*x)'.  (Of course, no one without the source can explain the
particular bug in that compiler.)

In article <4596 at sactoh0.SAC.CA.US> jak at sactoh0.SAC.CA.US (Jay A. Konigsberg)
adds:
>Something is wrong here.

(True enough.)

>sizeof(x)    makes sense as it is returning the total size declared for
>	      the array.
>sizeof(x[0]) makes sense as it returns the total size of that dimmension
>	      of the array.

Right.

>sizeof(*x)   DOES NOT make sense. The size of a pointer on this machine
	      is 4 bytes. (Note: adding "char *y; sizeof(y) does return 4).

Not right.

In article <10303 at hydra.Helsinki.FI> wirzeniu at cs.Helsinki.FI (Lars Wirzenius)
corrects Jay Konigsberg:
>But *x isn't a pointer, it's an array.  First the the type of x decays
>from "array 5 of array 6 of char" into "pointer to array 6 of chars".
>(See for example: _Standard_C_, by P.J.Plauger and Jim Brodie, page 74,
>or K&R-2, Section A7.1, "Pointer Generation", page 200.)
>
>This pointer is dereferenced with '*', and the result is an array of
>type |char [6]|, which has the size 6.

This is exactly right.

Finally, in article <1991Jan5.232225.14909 at ccs.carleton.ca> a mystery
person (`Engineers' seems rather an unlikely surname!) given as
bull at ccs.carleton.ca (Bull Engineers) writes:
>Sorry, sizeof(*x) makes perfect sense.  Remember, the * operator
>means "evaluate what's at this address".  This means, that for
>two-dimensional arrays, *x and x[0] are identical by definition.  Try
>this with a three dimensional array z[2][3][4].  sizeof(z) = 24,
>sizeof(z[0]) = 12, and sizeof(*z) = 12 also.  Why?  Because *z
>dereferences the first (0th) dimension of z.

This is awfully informal, but is the right idea.

[begin tutorial]

	Key concepts:
		types
		objects
		values
		contexts (object and value)
		address-of operator `&' changes object to value
		indirect operator `*' changes value to object
		arrays in object contexts remain arrays
		arrays in value contexts become values

C has five different `places' in which array identifiers (including []
and `*') can appear:

 - declarations and definitions:
	int i, a[10], *p;	/* local, global, extern, whatever */
   These can be further divided into formal parameters and all others.

 - `left hand sides' (`to the left of an assignment'):
	i = 3;
	a[2] = 4;
   This includes the `modifying' operators `++' and `--', i.e., in the
   expression
	a[3] = ++i;
   the `i' being incremented is in a `miniature left hand side' of its
   own.

 - `right hand sides':
	p = a;
   Here `p' is in a `left side', or `left value', or `lvalue', context,
   and `a' is in a `right side', or `right value', or `rvalue', context.

 - sizeof:
	sizeof(a)
   An identifier that follows sizeof is treated as if it were in a `left
   value' context.  (More on this in a bit.)

 - address-of operator:
	&i
   An identifier that follows an address-of ampersand (`&') is also treated
   as if it were an `lvalue'.

Aside from declarations and definitions, then, there are really only
two contexts here, `lvalue' and `rvalue'.  Since an `lvalue' identifier
need not actually appear on the left---as is the case with `++i'
above---I prefer to call these `object' and `value' contexts.  Other
books may use `lvalue' and `rvalue' respectively.

In an object context, we are interested in the object itself.  Usually
the variable name corresponds to some `address' (whatever that is; the
C language does not pin down addresses all that exactly, so that
whatever the system uses for addresses will probably suffice).  `i',
`a', and `p' above each have some address%.  Each variable has a type,
and so each of these addresses also has a type corresponding to the
variable's type:

      name   is a/an		so its address is a
      ----			-------------------
	i     int		pointer to int
	a     array 10 of int	pointer to array 10 of int%%
	p     pointer to int	pointer to pointer to int

This address is what the `&' operator produces.  The result of the `&'
operator is itself a value, not an object; a value does not have an
address and it is therefore illegal to try to take it, so `&(&i)' is
illegal.  (Most C compilers correctly diagnose this error, although
many do not correctly diagnose `&(&*p)'.  This does not make &(&*p)
legal: even though it *could* be defined as &p, it happens that it is
not.  If you want &p, write &p.)
-----
% Note that `i', `a', and `p' need not be given addresses unless the
  code takes those addresses with `&'.  A smart compiler can, if the
  machine allows it, put objects into machine registers or other
  `special' places.  In a few cases, it can do this even when the
  object's address is taken.  (One example occurs on Pyramid computers,
  where the registers have addresses.) The `register' keyword acts as a
  promise, and sometimes as a recommendation: `I promise not to take
  the address of this variable, and suggest that the compiler might put
  it in a machine register.'  Most modern compilers completely ignore
  the advice, and some do not even hold you to the promise.

%% In `old C' as defined by K&R 1st edition, &a is illegal.  This
  is no longer the case; &a is the address of the array `a', and its
  type is `pointer to array 10 of int'.
-----

`sizeof' is not really interested in the object's address, but on the
other hand, it is not interested in the object's value either.  Objects
that appear in `sizeof' contexts are used only for their type.  The
size of that type, whatever it is, is `spliced in' as though it were an
integral constant.  (Note that this constant has type `size_t'.)  In
other words, given `char c;', writing `sizeof c' is essentially the
same as writing `(size_t)1'.

This leaves assignments and value contexts (and declarations and
definitions, which I am ignoring).  Here things start to get a bit
peculiar.  For sizeof and address-of, we are only interested in the
size and type of the object that follows, but in assignments and
values, we need the value of the object as well---sometimes to fetch
it, sometimes to set it, sometimes both.  This is all well and good for
`simple' objects like `i', for pointers like `p', and (these days) even
for structure and union objects (with some restrictions).  But array
objects are different.  They get no respect.

An assignment to an array object is simply illegal.  (Note that the
initial value that may appear in a definition is not an assignment%:
it is an initializer.  That is why it is legal there.)  `i = 3;' is
fine, but `a = { 0,1,2,3,4,5,6,7,8,9 };' is not.  You might think,
then, that taking the value of an array would also be illegal.
-----
% Well, technically speaking, at least.  It looks and acts like an
  assignment, but the rules regarding what is and is not legal are
  different.
-----

Here is where things get very strange.

Instead of being outlawed, an attempt to take the `value' of an array
is treated as an attempt to take the address of the first element of
the array (the one with subscript 0).  So in
	p = a;
the compiler pretends you wrote instead
	p = &a[0];
a[0] is an object of type `int', therefore its address is a value of
type `pointer to int', so we have an assignment with a `pointer to int'
on the left (p) and a `pointer to int' on the right (&a[0]) and everything
is okay.

There is a subtlety here as well.  How did we name a[0] in the first place?

The expression
	a[0]
breaks down into four sub-expressions:
	a
	0
	add
	indirect
As above, the `a' turns into the address of a[0].  To this value we
add 0 (leaving it unchanged) and then indirect.  This changes the value
`pointer to a[0]' into the object `a[0]'.  In other words, we have to
know where a[0] is in order to find a[0]!  So it is a good thing we can
find a[0] by asking for `a'.

Formally, then, the rule is:

    In a value context, an object of type `array N of T' (where N is an
    integral constant and T is a legal type) becomes a value of type
    `pointer to T' whose value is the address of the first element---
    element number 0---of that array.

Remember also that the `&' address-of operator takes an object and
produces a value, and that the `*' indirect operator takes a value
and produces an object.  For `&' the value produced has type `pointer
to ...' while for `*' the value consumed must have type `pointer to ...'.
In each case the `...' represents the type of the object (whether
consumed or produced).

Rewinding to the original question, then:
>char x[2][3];
>  sizeof (*x)          gives 6
>  sizeof (x[0])        gives 3.
>What's the scoop?

We can see that this is a compiler bug by expanding the two arguments
to `sizeof'.  These are each in object context and we want their types.
First we have

	*x

This means that x appears in a value context (`*' takes a value and
produces an object).  It had better come out as a value of type `pointer
to ...'.  Well: `x' is an `array 2 of array 3 of char', but as noted
above, an array in a value context gets changed:

    In a value context, an object of type `array N of T' (where N is an
    integral constant and T is a legal type) becomes a value of type
    `pointer to T' whose value is the address of the first element---
    element number 0---of that array.

so we have an array with N=2 and T=`array 3 of char'.  This becomes a
value of type `pointer to T', or in this case, `pointer to array 3 of
char', pointing to the first element of x (x[0]).  So we can apply the
indirecting `*'.  The indirection changes this `pointer to array 3 of char'
into the object `array 3 of char'.  Thus we want the size of an object
that is an `array 3 of char'; by definition, this is the value `3'.

To check `sizeof x[0]', do the same thing.  Write down the expression:

	sizeof x[0]

Break down the subexpression x[0] by rewriting according to its definition:

	*( (x) + (0) )

Handle the subexpression x+0, noting the contexts:

	*( [value] ( [value] (x) + [value] (0) ) )

`x' is an array in a value context, so apply The Rule from above:

	[value] (x)						=
	[value] <object, array 2 of array 3 of char, x>		=
	[value] <value, pointer to array 3 of char, &x[0]>

Adding 0 leaves the pointer unchanged, so apply the `*':

	*( <value, pointer to array 3 of char, &x[0]> )		=
	<object, array 3 of char, x[0]>

Now we have an object in an object context (target of `sizeof') so
we just read its type---`array 3 of char'---and decide its size: 3.

Incidentally, sizeof can handle values as well as objects: `sizeof 3+4'
produces the same constant as `sizeof(int)'.  Sizeof is unique in this;
other C operators that take objects refuse to work on values.  Of
course, sizeof can also take a type in parentheses, which shows just
how special it is.
-- 
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7163)
Domain:	chris at cs.umd.edu	Path:	uunet!mimsy!chris



More information about the Comp.lang.c mailing list