Type punning in C

Lars Henrik Mathiesen thorinn at skinfaxe.diku.dk
Fri Oct 27 07:01:12 AEST 1989


bill at twwells.com (T. William Wells) writes: <In article
<14939 at haddock.ima.isc.com> karl at haddock.ima.isc.com (Karl Heuer)
<writes:
<: It may be that the undefined-behavior clause would permit an
<: implementation where punning doesn't work and differently-typed union
<: members don't overlap, provided that appropriate code is generated
<: when casting to/from a union type.  This may be the source of the
<: rumor you heard.

<Actually, the standard is quite unambiguous in asserting that the
<members of a union overlap. But it doesn't say beans about by how
<much.

If this is meant in the sense that inverting all the bits of one union
member is guaranteed to invert at least one bit of every other member,
then I think that the committee has overconstrained the language a
bit. As discussed before in the context of capability machines,
objects of pointer type may want to live in another address space than
objects of basic type.

Another reason to make union members start at an offset would be to
allow objects (e.g., chars) which are smaller than a machine word to
be placed in the same position in a memory word as in a register; in a
big-endian architecture that position will not have the same byte
address as the containing word.

A compiler optimizing for fast-but-space-consuming code might want to
do this for all ``small'' structure members; however, the standard
forbids holes at the start of structures. This means that given

	struct s { signed char c, int i} s;
	union  u { signed char c, int i} u;
	void *p;

these are guaranteed to hold:

(1)	&s.c == (signed char *)(void *)&s;
(1')	&((struct s *)p)->c == (signed char *)p;
(2)	&u.c == (signed char *)(union u *)&u.i;

whereas the compiler is free to put u.c at an offset so that

(3)	&u.c != (signed char *)(void *)&u.i;

The difference between (2) and (3) is that the standard demands that
the compiler can cast between pointer-to-union and pointer-to-member
(even if there's an offset). The difference between (1) and (3) is
that the standard forces s.c to be at offset zero.

Note that even if the no-initial-hole constraint were lifted,

(4)	&s.c == (signed char *)&s;
(4')	&((struct s *)p)->c == (signed char *)(struct s *)p;

could still be guaranteed (with some work from the compiler _if_
initial holes were used). The relevant sentences (in the May 88 draft)
go: ``A pointer to a structure object, suitably cast, points to its
initial member (or if that member is a bit-field, then to the unit in
which it resides), and vice versa.  There may _therefore_ be unnamed
holes within a structure, but not at its beginning, as necessary to
achieve the appropriate alignment.'' (My emphasis). It seems to me
that the no-initial-hole rule is a conclusion rather than a deliberate
decision.

Therefore I wonder which, if any, of the following it was that the
Committee intended:
	a) ``Suitably cast'' does not involve changing the byte
address of a pointer, only its type; in other words, an intermediate
void pointer can validly be used. [In this case, all union members
have to start at the same byte address too, as the language used for
unions is similar in regard to the casting although the no-hole rule
is not explicit stated.]
	b) There is so much code out there which does not explicitly
cast structure pointers (when used as function arguments, for example)
that this guarantee must be given; however, that is not the case for
union pointers[?!].
	c) When a union type has several members with structure type,
and each of those structures have an initial member with the same
common type, it should be possible to use those initial members
interchangeably. [This would be very useful for discriminated unions,
however, the comment about unions from a) above must apply here too
for that to work.]
	d) RISCs are fast enough already, let's not allow them to
optimize too much [:-)].

In my opinion, only c) above is totally acceptable, and if that was
the reason, it might have been formulated more directly. Anyway, if I
ever get my hands on a current Draft (or even Standard), Rationale,
Response and Interpretations document, and they don't cover this
better, I may cook up a Request for Interpretation (or whatever the
term is).

--
Lars Mathiesen, DIKU, U of Copenhagen, Denmark      [uunet!]mcvax!diku!thorinn
Institute of Datalogy -- we're scientists, not engineers.      thorinn at diku.dk



More information about the Comp.lang.c mailing list