doubtful assumptions about pointers

Sat Jan 13 05:21:13 AEST 1990

In article <11922 at smoke.BRL.MIL>, gwyn at smoke.BRL.MIL (Doug Gwyn) writes:
> In article <1250.25ab3338 at csc.anu.oz> bdm659 at csc.anu.oz writes:
> >The following is a list of Doubtful Assumptions (DAs).  ...
> >I'd welcome proofs in either case.
>
> Well, I'll try to respond, but with explanations, not rationalistic
> "proofs".  I really have to take issue with people who insist on
> dissociating the purpose of the C Standard from reality, instead
> arguing excessively over formalism.

The question of whether or not a particular coding practice is strictly
conforming is of great importance to anyone seriously interested in
portable programming.

FA = false assumption
DA = doubtful assumption
RA = reasonable assumption
TA = true assumption

>                                      We expressed the Standard in
> technical English rather than a formal notation primarily in order
> to aid programmers (and to a lesser degree, implementors) to relate
> it to their daily activity.  It is not intended to form a system
> suitable for treating with formal symbolic logic and therefore
> should not be taken as such.

Don't put words into my mouth.

>                               Thus, a truly perverse implementation
> might actually comply with the letter of the Standard while
> exploiting unintended loopholes to produce a travesty quite at
> variance with the spirit of C.

See my comment about "perverse" at the end.

>                                 (We tried to document in the
> Rationale most of the intentional loopholes.)

I checked the Rationale concerning all my DAs.  What's a "loophole", anyway?
You seem to have taken the line that any DA which *you* feel should be a
TA represents such an unintentional loophole.  Justification?

> Another meta-comment here is that the DA examples indicate too
> much concern with representational aspects of entities within a
> C program and too little concern with dealing with data at the
> appropriate level of abstraction.  In the vast majority of
> applications, these questions should not even arise.

So asking questions about strict conformity is sinful?

> The answers I give will assume that implementations do not go out of
> their way to introduce unnecessary complications.  (Necessary ones,
> caused by architectural or environmental considerations, are okay;
> we deliberately allowed slack in the specifications to cover those.)

One motive for my posting was to explore the boundary between "unintentional
loopholes", as you call them, and deliberate non-specification. This is a
perfectly reasonable object of study, and one which is appropriate to this
newsgroup.  Your criticisms of everyone who attempts it are not helpful.

> >DA[0]:  int *pi; char *pc;
> >        Suppose pi is valid, and do  pc = (char*) pi.  Then *pc overlaps *pi
> >        in the sense that changing the value of *pc changes the value of *pi.
>
> TA (True Assumption).  The addresses of the bytes within a single object
> constitute a nice linear address space.  (However, there need not be one
> global linear address space within which all objects are located.)
>
> It is not specified which PART of *pi is accessed by *pc, but some part
> must be.  Big-endian and little-endian architectures will differ here.

Forgive me if my memory is wrong, but I seem to remember a posting of yours
in which you agreed that the members of unions might not physically overlap
in some implementations.  If you consider that pi might point to such a member,
there are difficulties in reconciling that posting with this one.

> >DA[1]:  int *pi, *pj;  char *pc, *pd;
> >        Suppose pi and pj are valid,  and that  pi == pj .
> >        Now do  pc = (char*) pi; pd = (char*) pj .
> >        Then  pc == pd .
>
> TA.  Pointers to distinct objects (including bytes within other objects)
> compare unequal and vice-versa.  The only loophole an implementation
> could exploit here would be to randomly select a byte address within the
> int object when the conversion to char* occurs, knowing that alignment
> constraints applied during the inverse conversion would recover the same
> int*.  Even if such a loophole is logically permitted by the specification,
> I don't think it poses a serious practical threat, because I see no
> legitimate reason for introducing such run-time indeterminacy and
> therefore don't expect to see it in practical implementations.  ...

Yes, an argument on the basis of determinacy might be reasonable here.
It's a pity such meta-arguments are needed, though.  A fundamental difficulty
in analysing these problems is that pANS doesn't ever define the semantics
of pointer conversion.  We are only given some functional axioms, from which
some desirable properties, like this one, don't obviously follow.

> >DA[2]:  Just like DA[1], but using type void* instead of char*.
>
> TA.  A void* is really just a byte* (i.e., a char*) subject to additional
> programmer-safety compile-time constraints.  The run-time representation
> of void* and char* MUST be identical (3.1.2.5), and this implies that
> success for one equality comparison implies success for the other.

Why does equal representation imply equal semantics?
An argument based on the existence of functions like memcpy() might show
DA[2] is an RA, though.

> >DA[3]:  long *pi, *pj;
> >        Suppose that pi is valid, and do  pj = (long*)(int*) pi;
> >        Then  pi == pj .
> >        [comment: there's no rule that says an int can't have a more
> >         strict alignment requirement that a long.]
>
> TA.  If the conversion to int* does not violate the alignment constraint,
> then the test for equality must succeed.  I don't know of any architectures
> where it would be reasonable for the C implementation to impose stricter
> alignment constraints on int than on long, so this is in practice a TA.

An implementation in which short arithmetic is in hardware and long arithmetic
is in software might reasonably have this property.  I think this is a FA.

> >DA[4]-DA[6]:

Your analyses agree with mine on these.  All are FAs.

> >DA[7]:  int i, *pi;
> >        Suppose i != 0, and do  pi = (int*) i .
> >        Then  pi != (int*)0 .
>
> FA.  (int*)0 is a null pointer of type (int*), whereas pi is the
> implementation-defined result of converting the integer value 0 to an
> int*.  0 in this source code context may be treated as a special case
> by the compiler.

Actually i is nonzero, though your argument still holds.  However, the case
with i==0 is a more instructive DA (indeed FA as you say).

> >DA[8]:  int *pi, *pj;
> >        Suppose pi is a valid pointer of kind P3, and do
> >        pj = (int*)(char*) pi .   Then  pi == pj .
> >        [comment: the rule in section 3.3.4 only applies to pointers
> >         to objects, which pi might not be.]
>
> [P3 means "one past the end".]  I think 3.3.4 meant for "type" to
> distribute over "object or incomplete", as it does explicitly later in
> the same sentence.  The intent is to distinguish these from function
> pointers.  Even if that interpretation is not upheld by X3J11, it would
> be most unlikely that an implementation would cause this example not to
> succeed, because it would take more work not to.  Thus, this is also TA.

We seem to be looking at different sentences.  I meant the sentence "It is
guaranteed ... original pointer." starting on the last line of page 46.
The word "type" doesn't appear in it at all.  Anyway, alignment is defined
as a requirement on objects; applying it to the values of pointers which may
not be pointers to objects seems doubtful.  As far as implementations are
concerned, consider one which treats pointers of kind P3 as a special case
in representation (perhaps to permit objects reaching to the end of memory
segments).  It may be that making (int*)(char*) a no-op in this case could
take more work, not less.  I think this is a DA, probably a FA, though this
could well be unintentional.

> >DA[9]-DA[10]:

I agree these are FAs.

> >References and some nit-picking.
>
> >3.1.2.5.  types and type terminology
> >          definitions of "object type" and "incomplete type"
> >   nit-pick:  This section several rules of the form "types X and Y have the
> >              same representation and alignment requirements".  Footnote 15
> >              tells us that this is intended to imply interchangeability as
> >              function arguments, function return values, and members of
> >              unions.  However, this does not follow from the rule.
> >              Interchangeability of two types as function arguments requires,
> >              in addition, equality of argument-passing mechanisms.  This is
> >              nowhere prescribed.
>
> I don't know what you mean by this; the footnote is EXPLAINING what we
> intended by these terms.  Don't you think that function arguments have
> to be somehow represented and aligned?

You missed my point.  Suppose the body of the text said "Gismos are pink."
and the footnote said "This is meant to imply that gismos are pink and
crinkly.".  What can we infer about gismos?  Well, the only reasonable
inference is that gismos are both pink and crinkly, even though the footnote
is not supposed to be part of the standard.  However, we could also reasonably
grumble about the extra information not appearing in the proper place.
If you doubt the analogy, consider the mythical XYZ compiler: values of type
void* and char* are both represented as 32-bit unsigned addresses, and have no
alignment requirements.  However, since void* was added later by a different
programmer, arguments of type void* are passed in registers whereas those of
type char* are passed on the stack.  Implementations quite often use
different argument passing mechanisms for different types, so I don't think
this is particularly perverse.  It shows that representation+alignment
equality does not imply argument interchangeability, i.e., the footnote adds
an entirely new restriction.  That is all I was nit-picking about.

> >3.3.4.    more on conversion amongst pointer types
> >          conversions between integral types and pointers
> >   nit-pick:  The case of (obj*)0 should be excluded from these rules
> >              as it is specified differently in 3.2.2.3.

I agree with your response to this.

> >3.3.8.    relational operators
> >   nit-pick:  The phrase "or both are null pointers" is missing from the
> >              sentence in lines 8-10.  See the otherwise identical sentence
> >              in section 3.3.9.
>
> No, this omission was deliberate, since it is improper to provide a null
> pointer as an operand of a relational operator, which is what 3.3.8 is
> all about.  3.3.9 covers the equality operators, for which null pointers
> are permissible operands.

I will concede defeat on this too, but with some reluctance.  The sentence in
question is not describing the behaviour of a relational operator.  Unless,
perhaps, you interpret "equality" as the conjunction of "<=" and ">=".

> In summary:
>
> As I've said in the past and elaborated somewhat upon at the beginning
> of this article, one cannot understand what C is by applying formalistic
> arguments to the phraseology in the Standard.  I doubt that the Standard
> in itself suffices to completely specify what is essential about C to
> someone who has never encountered it (or, even more extreme, who knows
> nothing about computer programming); THAT IS NOT ITS PURPOSE.  It is

from the Forward: "[pANs] addresses the problems of both the
program developer and the translator implementor by specifying the
C language precisely."   Are you saying it fails to meet this claim?

> merely intended to serve as a reference "treaty" by which both C
> programmers and C implementors agree to be bound, in order to facilitate
> the use of C as a practical tool in solving real-world problems, with
> particular emphasis on source-level application portability.
>
> Therefore, you should refer to the Standard to see what the terms of the
> treaty are, not to determine what is sane or insane.

I wish to test things for strict conformity by applying the definition that
the standard gives.  If it isn't "specified in this Standard", it isn't
strictly conforming.

>                                                       An unduly warped
> implementation does not facilitate the use of C; there is much more
> involved in determining the utility of an implementation than merely
> literal conformance to the letter of the Standard.  (X3J11 termed these
> "quality of implementation" issues.)  An implementor who provides a
> perverse implementation would undoubtedly incur the wrath of his
> customers, and deservedly so.

(1) I was addressing conformance, not utility.
(2) One person's "perverse" is another person's "reasonable".
(Your example of GNU's #pragma will do.)  A study of what things are strictly
conforming *because the standard actually says that they are* is a worthwhile
pursuit because it establishes a solid foundation on which to base further
discussion.  Your exhortations against that study are unjustified.

Brendan McKay
bdm at anucsd.oz  or  bdm659 at csc1.anu.oz