offsets in structures.

Guy Harris guy at rlgvax.UUCP
Fri Oct 19 09:53:45 AEST 1984


Arithmetic expressions that produce pointers:

1) Purely integer expressions:

As discussed in my previous article, K&R indicates that no such expression,
except a constant 0, is to be interpreted as a null pointer.  The phrase
"the constant 0" appears in several places (in the discussion of the conditional
operator, as well as the places mentioned in my previous article); I do
not think that the modifier "the constant" appears by accident.  I believe
it was explicitly put there to indicate that an arbitrary integral result
of zero need not be converted into a null pointer; only an explicit zero
constant need be so converted.  If somebody has a statement to the
contrary, from either K or R, they should exhibit it.

2) Pointer plus or minus an integer expression:

The actual phrase in "7.4 Additive operators" reads

	A pointer *to an object in an array* and a value of any integeral
	type may be added.  The latter is in all cases converted to an
	address offset by multiplying it by the length of the object
	to which the pointer points.  The result is a pointer of the
	same type as the original pointer, and which points to another
	object in the same array, appropriately offset from the
	original object.

A null pointer does not point to any object in an array.  If
you add an integer to a pointer, by the paragraph above the resulting
pointer points to an object in an array.  Therefore, it is not a
null pointer.

I am quite aware that if you have a pointer to an element in a character
array on a PDP-11, and the element has the address 0177777, adding one
to that pointer yields the result 0.  This is not an argument that you
can produce a null pointer by an arithmetic expression.  First of all, arrays
move forward in memory, so there *is* no next element in that array, as the
element in question is at the end of your address space.  Second of all, if
you have a machine on which a null pointer does not have the value zero,
and you add 1 to a pointer whose value is such that adding 1 to it will
cause wrap-around, you have still not produced a null pointer.  You
may have produced a pointer that doesn't point where it "should", and which
may even to a non-existent part of the address space, but that does not
mean it must be a null pointer.

3) Other expressions:

Under 14.4, "Explicit pointer conversions", it says

	Certain conversions involving pointer are permitted *but have
	implementation-dependent aspects....

	...An object of integeral type may be explicitly converted to
	a pointer.  The mapping always carries an integer converted
	from a pointer back to the same pointer, but is otherwise
	machine dependent.

This implies that if you convert a null pointer to an integer, the
integer that results must convert back into a null pointer.  The most
natural and "unsurprising" conversion (see the previous paragraph in section
14.4 on conversions from pointer to integer) is just a bitwise copy.  If
converting a null pointer produces an integer with the value 0xff000000,
so be it.  If that's how a null pointer is represented internally, I'd
find conversion of a null pointer into a zero integer more surprising than
conversion of it into 0xff000000.  Given that, converting an integer back
into a pointer by a bitwise copy would be the natural way to do it; this
would convert an integer value of 0, other than a constant 0 (which is
*not* an integer converted from a pointer), into a pointer with the value
0, not a null pointer, and would convert an integer with the value 0xff000000
into a null pointer.

Yes, this implies that it's a pain to produce a pointer which points to
location 0.  It even implies that producing a pointer which points to
location 0 can't be done the same way you produce a pointer which points
to location 1; you'd have to say

	something *p;
	int i;

	p = (i - i);

Worse things have happened.  It may be a pain to produce such a pointer,
but it's not impossible, and it's not *that* common an operation.

So what sort of arithmetic expressions are left?

I do rescind my earlier statement that 16-bit "int"s and 32-bit pointers
are illegal.  The statement that "(the integer-to-pointer mapping)
always carries an integer converted from a pointer back into the same
pointer" does not imply that an "int" must be big enough to hold a pointer.
It merely implies that there must be an *integral type* big enough to
hold a pointer; "int" is not the largest integral type, just the most
"natural" type.  "Natural" is not a precise specification; it implies
that the choice of size of "int" is machine dependent.  Of course, what
is most "natural" given the data path width of the machine isn't necessarily
the most "natural" given the size of objects you can put on the machine;
try using "malloc" and "realloc" to grow a symbol table past 64K on
a machine with 16-bit "int"s but 32-bit pointers.  (It can't be done in
a straightforward fashion.  Believe me.  We have such a machine, and we've
*tried*.  The standard UNIX "nm" uses that technique, and if your symbol
table is bigger than 64K bytes, you lose.)  So if you have 32-bit "int"s,
you can't convert the pointer with the bit pattern 0x801234 into an "int"
and back and get the same value back, but you can convert it to the
integral type "long" and back; as it says in section 14.4, paragraph
2,

	A pointer may be converted to any of the integral types *large
	enough to hold it.  Whether an "int" or "long" is required
	is machine dependent.*  (italics mine)

However, it does state specifically that the difference between two pointers
is an "int", not just an integral value.  (We don't do that.  *Nostra
culpa* - not "*mea culpa*"; it wasn't my idea.  Our newer systems will
bite the bullet and have 32-bit "int"s, mainly for compatibility with
our 32-bit supermini, but also because they're 4.2BSD-based,
and there's probably several *months* of work changing 4.2BSD to use "long"
instead of "int" when it means "32-bit quantity".  I assume the AT&T
68000 C compiler gets this right, when built for 16-bit "int"s.)

	Guy Harris
	{seismo,ihnp4,allegra}!rlgvax!guy



More information about the Comp.lang.c mailing list