Byte order (retitled)

Gregory Smith greg at utcsri.UUCP
Sun Apr 20 09:31:17 AEST 1986


In article <1104 at psivax.UUCP> friesen at psivax.UUCP (Stanley Friesen) writes:
>	Could you expand on this! Do you mean that if you cast a
>pointer to a long to a pointer to a short and dereference you will get
>the *high* order portion on a big-endian machine and the *low* order
>portion on a little-endian? Clearly a portability problem, and the the
>big-endian behavior is counter-intuitive. Or do you intend that the
>pointer always point to the low-order byte even on a big-endian
>machine? Then you have to index *backwards* to break the item up into
>bytes! Really, the only way to get the rigth semantics on a big-endian
>machine is to actually convert pointers when a cast is used.

I strongly disagree. If you have long *x, then (char)*x ( as opposed to
*(char*)x ) is the low-order byte of the pointed-to long and is
portable. If you also have char c, then c= *x is also portable. One
would hope that these would be recognized by the code generator on a
big-endian machine, so that only the single byte would be read.
Besides, your solution would get really weird on things like the PDP
where lo bytes come first in words but lo words are second in longs (
admittedly a silly setup ).

Pointer conversions between 'pointer-to-x' and 'pointer-to-y' should be
no-ops whenever possible, since things like (struct foo *)malloc(...)
are done so often. If the above scheme were to be used, then promoting,
say, the (char *) from malloc to a (long *) would require masking off
the lower 2 ( or whatever ) bits of the address. On a 68K, e.g., the
lower bit would have to be cleared when a (char *) was promoted, and
there is no provision in the instruction set to do this directly to an
address register, which is where a pointer would most likely be. Of
course, on some machines pointers have 'tag' bits which have to be
modified so the new pointer can be dereferenced properly - so a no-op
pointer conversion can't be done regardless of big/little-end.

The following portably strips a long into bytes:

int i;
unsigned char bytes[ sizeof( long )];		/* lo-byte first */
long input;
	for(i=0;i<sizeof(long); ++i)
		bytes[i] = input >> (i<<3)

Nit-pickers can substitute i*CHARBITS for i<<3. Sure it's slow, but it
is portable, and the problem is one that tends to defy portability more
than others. Besides, it could be sped up easily at the expense of
clarity ( that's what clarity is for :-) ). If you really want speed,
put the following in your 'config.h':

On 68K:			On Vax:			On PDP11:
union wombat{		union wombat{		union wombat{
    long longish;	    long longish;	    long longish;
    struct chs{		    struct chs{		    struct chs{
	char ch3,ch2;		char ch0,ch1;		char ch2,ch3;
	char ch1,ch0;		char ch2,ch3;		char ch0,ch1;
    }charish;		    }charish;		    }charish;
};			};			};

If you declare 'union wombat x' then x.longish is a long
and x.charish.ch3 is the high-order byte, and is fast. Given
long *lp, you could do ((struct chs*)lp)->ch2 which would portably
refer to the second-most-significant byte. Not as general, but it *is*
fast.

In summary, I don't feel that the portability problem you talk about
will rear its nasty head very often, and it can be dealt with using the
tools provided, just as other portability problems can be. The trouble
you have to go to depends on (1) how portable you want it to be (2) how
fast you want it to be (3) how many intrinsically hard-to-port things
you want to do. Stripping specific bytes out of a long by addressing
tricks *is* intrinsically non-portable and is not a very common thing,
anyway.
-- 
"If you aren't making any mistakes, you aren't doing anything".
----------------------------------------------------------------------
Greg Smith     University of Toronto      UUCP: ..utzoo!utcsri!greg



More information about the Comp.lang.c mailing list