Is an object made up of bytes?

Doug Gwyn gwyn at brl-smoke.ARPA
Sat Jan 17 11:50:39 AEST 1987


In article <1987Jan15.215225.9688 at sq.uucp> msb at sq.UUCP (Mark Brader) writes:
>Richard Stallman says [in effect]:
>>  I am not sure whether the standard implies that, given "short in, out;",
>> {   char *inptr, *outptr; int i;
>>     inptr = (char *) ∈ outptr = (char *) &out;
>>     for (i = 0; i < sizeof (short); i++) outptr[i] = inptr[i];   }
>>  is defined and equivalent to "out = in;".
>and Doug Gwyn replies:
>$ No, this can't be guaranteed. For example, there may be bits
>$ in the short that are not covered by its chars.
>I'm pretty sure this is wrong.  The draft proposed standard says:

Mark is, I think, correct in his assessment of the nature of bytes
in the X3J11 model of C objects.  However, I had something else in
mind but due to interruptions while preparing my response I didn't
get it worded correctly.  (The extra bits I had in mind were tag
bits; see below for a corrected version.)  I'll try again..

The things that prevent RMS's approach from working portably are:

The semantics of "(char *) &object" aren't guaranteed to produce anything
that can be safely dereferenced to access a char.  The only guarantee is
that the opposite conversion can be made subsequently without losing
information.  This can be an issue for machines that don't support byte
addressing; to keep pointer arithmetic simple, the high-order bits of a
pointer may indicate the size of its dereferenced type; in such a case, if
the cast is merely a word transfer without the bits being shifted and
otherwise rearranged, the cast (char *) does not produce a useful address.

Even if the resulting char pointer designates a char, it might not be the
char that one would guess.  On "little endian" machines it probably would
be, but there may be "big endian" byte-addressed architectures where the
numeric address of a word is not the lowest-valued address of the bytes
within the word; in this case the loop in the example would copy the wrong
collection of bytes (assuming again that the cast is implemented as a
simple word transfer without being rearranged specifically to make such
examples work, which would involve additional overhead).

In a tagged architecture, the pointed-at object may not be referenced as
the wrong type without causing a machine trap.

In general, I believe X3J11 intended to strongly discourage ANY reliance on
"type punning".

P.S.  Upon re-reading 3.3.4 Semantics, I see that RMS and I interpreted the
use of the word "may" differently.  Comparison with other sections of the
document now leads me to believe that RMS was probably correct in thinking
that pointer<->integer conversion via casts MUST be supported by a
conforming implementation, although enough is left "implementation-defined"
that an implementation could choose to make this a useless operation.  This
means that some restriction on use of externs in initializers really is
necessary (to prevent having to support complete C-arithmetic in linkers)
if the typical implementation is to give useful meaning to such conversions.
This deficiency in the draft standard needs to be fixed.



More information about the Comp.lang.c mailing list