register unions

Richard A. O'Keefe ok at quintus.UUCP
Wed Feb 24 17:53:39 AEST 1988


Someone (I have lost the original posting) suggested that
	'register union { foo* x; baz *y; ... }'
would be a useful construct in C.

Well, it's already legal.  According to the Oct '86 dpANS,
"A declaration with storage-class specifier 'register' is an 'auto'
declaration, with a suggestion that the objects declared be stored
in fast-access machine registers if possible.  The types of objects
that are stored in such registers and the number of such
declaratiopns in each block that are effective are implementation-
defined [footnote: The implementation may treat any 'register'
declaration simply as an 'auto' declarations.  However, ... the
unary & (address-of) operator may not be applied to an object
declared with storage-class specifier 'register', whether or not
a machine register is actually used.]"

The System V Programmer's Guide explicitly says
"excess or invalid 'register' declarations are ignored."
Similar statements appear in other C manuals.
Interestingly enough, DEC's VAX C manual explicitly says
"If the variable requires storage (for example, arrays or structures),
the object of the variable is not placed in a register."

It seems that any C compiler which rejects 'register union foo X'
as an error has always been broken:  this is legal K&R C.  But a
compiler has always been within its rights to ignore 'register' in
this or any other case, though better-quality compilers print a
warning message.

The construct thus already being legal, I assumed the original poster
to be urging that compilers SHOULD put unions in registers if it is
possible for them to do so, and to be alleging that this was particularly
important for pointers.

I posted a message pointing out that this simply doesn't make sense on
some machines (specifically including PR1MEs), and employed a familiar 
humourous device to stress this.  I have received some flaming messages
from people who took exception to this commonplace observation.  Oddly
enough, no-one using a PR1ME has complained to me yet...

Here's why I feel strongly about issues like this:

(1) I did a couple of days consulting once for a company who had found that
    the only practical way of porting 4.2BSD to their machine was to change
    the microcode so that *(char*0) == 0.

(2) I have had the unpleasant experience of porting a program which used
    pointers heavily to a machine where 'int' and 'char*' were not the
    same size.  Even changing the definition of NULL to 0L (which is not,
    strictly speaking, correct) didn't help.  I had to go through more
    lines of code than I care to remember changing 0 to (char*)NULL.

(3) I had to port a program which assumed that, given the declaration
	union two {int a; char *b;} jim;
    the calls
	harry(jim);
    and
	harry(jim.b);
    were identical.  Suffice it to say that they weren't.

(4) I ported a middle-sized program to a machine, and watched someone else
    port a much large program to the same machine, where although character
    pointers and word pointers were both the same size as an integer, they
    had different representations.  So, for example,
	int data[50];
	fwrite(data, sizeof *data, (sizeof data)/(sizeof *data), output);
    wasn't just badly typed (it has always been that), it gave the wrong
    answers.  Again, one had to go though changing things like this to
	fwrite((char*)data, ....);

(5) I had to advise someone that a very large (and *very* useful) program
    of theirs would be too expensive to port to a PR1ME because they had
    assumed throughout that word pointers and character pointers were both
    the same size as an integer.  What was really tragic about this was
    that the program in question had very little real use for character
    pointers, but things had been converted to this "common currency".

What is the point of something like this:
	union ptr { char *c; int *i; long *l; int (*f)(); };
	register union ptr fred;
Surely the point is to say
	fred.c = /* something */;
	... fred.i ...
and have it go fast.

But this is going to give you major porting headaches in the future.
Or more plausibly, it is going to give someone else major porting
headaches.  Too bad that it is already legal...

Is there something comparable which is less trouble for porting?  YES.
Use casts.  Do something like

	#if	....
	typedef	int UsualStorageUnit;
	#elif	...
	typedef short UsualStorageUnit;
	#elif	...
	typedef char UsualStorageUnit;
	#else
	/* if case not handled, syntax error in next declaration */
	#endif
	typedef UsualStorageUnit *UsualPointer;

(void*) is close, but not quite identical.  (void*) has to handle the
worst case, and is usually much the same as (char*).  UsualPointer is to
be the "native" pointer type, just as int is the "native" integer type.

	#define	AsUsualPtr(x) ((UsualPointer)(x))
	#define AsShortPtr(x) ((short*)(x))
	#define AsIfuncPtr(x) ((int (*)())(x))
and so on.

Then you can declare routines like
	void apply1(Fn, Arg)
	    register UsualPointer Fn, Arg;
	    {
		(*AsIfuncPtr(Fn))(*AsShortPtr(Arg));
	    }

What's the difference?  Well, apart from the fact that the compiler is
more likely to put UsualPointers into registers than unions (though
putting both, and putting neither, are both ALREADY legal), the compiler
can now spot each type change, and can tell you about the ones that
aren't going to work.

Look *very* carefully at **any** union in your programs which is not
#ifdeffed by machine or implementation; bugs breed in them like
mosquitos in a swamp.  Using different members of a union at different
times is fine, but putting something into one member of a union and
picking it up again from another member is bad practice in any programming
language.  (My first introduction to the problem was trying to port a
Pascal program from a CDC machine to a B6700.  The Pascal programmer had
assumed that 10 characters = 1 integer, and not only was that not the
case, but the bit pattern of the first N characters often wasn't valid
as an integer.)

Frankly, I am not impressed by people who say
    "don't be so condescending, this is a useful construct on MY machine."
I do not use a PR1ME (or any of the other machines I hinted at above)
myself.  I have done, and hope never to do so again.  Life is difficult
enough for these people without going out of our way to make things worse.

Oh yes, another porting problem:  sizeof *main.  On at least three machines
that I can think of, the function pointers that C programs pass around is
actually a pointer to a control block, *not* a pointer to the code...
Don't expect the bit value C has to be equal to what you see in a load map.



More information about the Comp.lang.c mailing list