A quick question...

Chris Torek torek at elf.ee.lbl.gov
Fri Mar 15 09:40:34 AEST 1991


In article <1991Mar13.174154.12537 at nntp-server.caltech.edu>
eychaner at suncub.bbso.caltech.edu writes:
>>	*((short *)arrayptr)++ = othervalue();
>... is what I meant.

But this does not mean anything.  A cast is defined semantically as
`assign the cast-expression value to an unnamed temporary variable
whose type is given by the cast'.  Thus, aside from the fact that a
cast produces a value, not an object, and therefore cannot be
incremented, this expression is otherwise semantically identical to:

	{ short *temp; temp = arrayptr; *temp++ = othervalue(); }

In other words, if the increment were legal, it would not alter arrayptr
at all, but rather some mysterious temporary variable.  Fortunately,
the increment is illegal.

As someone else pointed out earlier (but it bears repeating [either that
or you might as well give up on comp.lang.c :-) ]), the expression

	void f(char *arrayptr) {
		*(*(short **)&arrayptr)++ = 1;
		*(*(short **)&arrayptr)++ = 2;
		*(*(short **)&arrayptr)++ = 3;
	}

*is* legal, but is probably not what you meant anyway.  Disassembling
one of the above expressions into its components reveals why:

	arrayptr:
		<object, pointer to char>

	&arrayptr:
		<value, pointer to pointer to char>

	(short **)&arrayptr:
		short **temp; temp = (address of arrayptr, treated
			as if it were a pointer to pointer to short);
		<value, pointer to pointer to short, temp>

	*(short **)&arrayptr:
		<object, short, *temp>	[note 1]

	(*(short **)&arrayptr)++:
		<value, short, *temp>	[note 1] and also
		add 1 to *temp before next sequence point

	*(*(short **)&arrayptr)++
		<object, short, **temp>	[note 1] and also
		add 1 to *temp before next sequence point

To figure out what this mess meant, I shortened the last three
<>-bracketed triples to just use `*temp' and `**temp', without first
writing down what `*temp' is, so now we need to do that:

	[note 1] *temp is:
		The object found at the address given by `temp'.
		Temp is a <value, pointer to pointer to short,
			(address of <object, pointer to char, &arrayptr>
			treated as if it were a pointer to pointer to short)>.
		But what *is* this value?  The answer is:  `We have no
		idea and we cannot find out without going to the
		compiler, or the compiler's documentation or author or
		whatever, and finding out what it does on this
		particular machine.'

We do not know, and cannot find out (without going into the guts of
the compiler), what you get when you treat an <object, pointer to char>
as if it were something else.

Just for fun, though, we can go ahead and dig into the guts of a compiler.
I will take a typical C compiler for a Data General MV series machine.

The Data General MV series has two kinds of pointers, `byte pointers'
and `word pointers'.  Both are 32 bits long, but one looks vaguely like
this:

	WWW...WWWI	[note: I am deliberately leaving out the ring stuff]
			(mostly because I cannot remember how it worked)

and the other like this:

	BWWW...WWW

where W is a word address, `I' is an indirection bit (normally 0), and B
is the index number of a byte within a two-byte word.  So if we have
`arrayptr' as an object in memory, it is a byte pointer and looks like
the second:

					BWWW...WWW  [arrayptr]

If we take its address, we get a word pointer that points to the above
byte pointer:

	  WWW...WWWI [&arrayptr] ----> BWWW...WWW [arrayptr]

Now we will treat the word pointer on the left as if it were a `pointer
to pointer to short'.  This means we will pretend that what it points
to (on the right) is a `pointer to short'---specifically, that it is
a word pointer:

  actual: WWW...WWWI [&arrayptr] ----> BWWW...WWW [arrayptr]
  pretend:WWW...WWWI [&arrayptr] ----> WWW...WWWI [arrayptr]

Next, we will fetch the thing our `pretend' pointer points to, i.e., 32
bits of `WWW...WWWI'.  The actual bits found at that location are
`BWWW...WWW'.  We will look at the top 31 bits of those 32 bits and
fetch a word from that location, i.e., the location (W/2 + B<<31).  If
arrayptr points to `byte 0 of word at 0x3004', this will be `word at
0x1802', while if arrayptr points to `byte 1 of word at 0x6480', this
will be `word at 0x40003240'.  Once we find that word (if it is in our
address space at all), we will look at the bottom bit, the `I' bit, and
if it is set we will fetch the word to which this word points.  So if
`arrayptr' happens to point to byte 0 of word 0x51379', we will first
look in location 0x289c, see where that points (taken as if it were a
word pointer), and go warping off to wherever that is.

In other words, by closing our eyes and pretending that this byte
pointer is a word pointer, we are going to

	- cut the word address in the byte pointer in half;
	- if the byte pointer pointed to the odd-numbered byte, add 2^31;
	- if the byte pointer was odd, head off into the ozone.

We are definitely NOT going to get two bytes from the place to which
`arrayptr' points.

Just for more fun, we can follow what happens when we use instead a
proper expression:

	((short *)(arrayptr += sizeof(short)))[-1]

On the D/G, this means:

	- add one to `arrayptr', leaving the top bit alone (i.e.,
	  point to the next word);
	- treat the result as if it were a pointer to words, i.e.,
	  shift it left one bit and put a zero in the bottom (I) bit;
	- subtract two from the resulting pointer (i.e., point to
	  the previous word);
	- fetch the word from the resulting location.

The trick is that arithmetic on a pointer depends on what *kind* of
pointer it is.  If it is a byte pointer, we add 1 to move forward one
word, and add 0x80000000 and then add the carry to move forward one
byte.  If it is a word pointer, we add 2 to move forward one word, and
there is no way at all to move forward one byte.  Conversion between
byte and word pointers is not just `bits as is'; it requires shift
instructions.  The compiler does this whenever you have `word pointer'
and `byte pointer' next to each other, but if you cheat (by casting
&foo to some other type) you are telling the compiler to throw away
that information, and skip the conversion.
-- 
In-Real-Life: Chris Torek, Lawrence Berkeley Lab EE div (+1 415 486 5427)
Berkeley, CA		Domain:	torek at ee.lbl.gov



More information about the Comp.lang.c mailing list