Union type conversions

Chris Torek chris at mimsy.UUCP
Tue Aug 2 15:35:40 AEST 1988


In some article whose referent was deleted by deficient news software
(actually, by crossing a news/FIDO gateway), I wrote in re union offsets:
>>Write some correct code that produces a wrong answer if a union of a
>>set of elements were implemented as a structure containing all the
>>elements, and you will have a proof.

(Incidentally, you need to do this for all possible cases to make a
proof for more than just one case.  This was in an article in which
my point was `in a computer language, a difference that makes no
difference *is* no difference.')

In article <224.22F29448 at stjhmc.fidonet.org>
will.summers at p6.f18.n114.z1.fidonet.org (will summers) writes:
>How about:
>  union { int a; int b; } x;
> 
>  int offset_of_b = (int) (&x.b - &x);
>    ...
> 
>    if  (offset_of_b != 0)
>        printf("compiler is broke\n");

>So what's wrong with the above code?

`Let me count the ways ...' :-)

First: you have subtracted pointers of different types; that operation
is undefined.  Well, we could fix that:

	int offset_of_b = &x.b - &x.a;

but it may still be wrong to assume that offset_of_b should then be 0.

>"A union may be thought of as a structure all of whose members begin at
>offset 0" (K&R A.8.5, pg 197).
> 
>If I may "think of" a union as an "all 0 offset structure", then I may write 
>code that fails when any aspect of that metaphor is violated.

You may indeed---but it might still be incorrect code.

The question here is really in regard to the original wording: why say
`may be thought of as' rather than `is'?  Given that this is in a
tutorial, rather than a formal language definition, there is one very
likely answer: perhaps `is' would be false.  After all, a float `may be
thought of as' a pair of integers separated by a decimal point.  That
metaphor works to a fair extent, but it breaks down when you really
push it.  Perhaps the same is true of a union.  Just possibly, the idea
was that a union would be a structure with zero offsets, except
wherever that happened not to be ideal, such as when embedded into
another structure:

	struct this_could_be_aligned_fancily {
		char name[7];	/* object name, max 7 letters */
		union {
			long l;	/* value if long */
			char c[5]; /* value if string */
		} u;		/* (type distinguished by name) */
	};

On a byte-addressible machine where `long's must be aligned at a
multiple of four bytes (e.g., SPARC), the obvious way to pack this is:

	offset	object(s)
	------	---------
	 0	name[0]
	 1	name[1]
	...
	 6	name[6]
	 7	<hole>
	 8	u.l, u.c[0]
	 9	u.l, u.c[1]
	10	u.l, u.c[2]
	11	u.l, u.c[3]
	12	u.c[4]
	13	<hole>
	14	<hole>
	15	<hole>

A `better' way to pack it is this:

	offset	object(s)
	------	---------
	 0	name[0]
	...
	 6	name[6]
	 7	u.c[0]
	 8	u.l, u.c[1]
	 9	u.l, u.c[2]
	10	u.l, u.c[3]
	11	u.l, u.c[4]

This saves four bytes per structure object.

The question, then, is `is such a packing legal?'  K&R does not really
answer this.  I hope that the dpANS does, and does so with a `no'; but
I do not know whether this is in fact the case.

(I had hoped not to have to be this explicit, but at least this is
better than reruns of `pointers vs. arrays' or `defining NULL' :-) .)
-- 
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7163)
Domain:	chris at mimsy.umd.edu	Path:	uunet!mimsy!chris



More information about the Comp.lang.c mailing list