noalias comments to X3J11

dmr at alice.UUCP dmr at alice.UUCP
Sun Mar 20 18:37:58 AEST 1988


Reproduced below is the long essay I sent as an official comment
to X3J11.  It is in two parts; the first points out some problems
in the current definition of `const,' and the second is a diatribe
about `noalias.'

By way of introduction, the important thing about `const' is that the
current wording says, in section 3.3.4, that a pointer to a
const-qualified object may be cast to a pointer to the plain object,
but "If an attempt is made to modify the pointed-to object by means of
the converted pointer, the behavior is undefined."  Because function
prototypes tend to convert your pointers to const-qualified pointers,
difficulties arise.

In discussion with various X3J11 members, I learned that this section
is now regarded as an inadvertant error, and no one thinks that
it will last in its current form.  Nevertheless, it seemed wisest
to keep my comments in their original strong form.  The intentions
of the committee are irrelevant; only their document matters.

The second part of the essay is about noalias as such.  It seems likely
that even the intentions of the committee on this subject are confused.

Here's the jeremiad.

				Dennis Ritchie
				research!dmr
				dmr at research.att.com
----------

This is an essay on why I do not like X3J11 type qualifiers.
It is my own opinion; I am not speaking for AT&T.

     Let me begin by saying that I'm not convinced that even
the pre-December qualifiers (`const' and `volatile') carry
their weight; I suspect that what they add to the cost of
learning and using the language is not repaid in greater
expressiveness.  `Volatile,' in particular, is a frill for
esoteric applications, and much better expressed by other
means.  Its chief virtue is that nearly everyone can forget
about it.  `Const' is simultaneously more useful and more
obtrusive; you can't avoid learning about it, because of its
presence in the library interface.  Nevertheless, I don't
argue for the extirpation of qualifiers, if only because it
is too late.

     The fundamental problem is that it is not possible to
write real programs using the X3J11 definition of C.  The
committee has created an unreal language that no one can or
will actually use.  While the problems of `const' may owe to
careless drafting of the specification, `noalias' is an
altogether mistaken notion, and must not survive.

1.  The qualifiers create an inconsistent language

     A substantial fraction of the library cannot be
expressed in the proposed language.

     One of the simplest routines,

        char *strchr(const noalias char *s, int c);

can return its first parameter.  This first parameter must
be declared with `const noalias;' otherwise, it would be
illegal (by the constraints on assignment, 3.3.16.1) to pass
the address of a const or noalias object.  That is, the type
qualifiers in the prototype are not merely an optional
pleasantry of the interface; they are required, if one is to
pass some kinds of data to this or most other library routines.

     Unfortunately, there is no way in X3J11's language for
strchr to return the value it promises to, because of the
semantics of return (3.6.6.4) and casts (3.3.4).  Whether
the stripping of the const and noalias qualifiers is done by
cast inside strchr, or implicitly by its return statement,
strchr returns a pointer that (because of `const') cannot be
stored through, and (because of `noalias') cannot even be
dereferenced; by the rules, it is useless.  (Incidentally, I
think this observation was made by Tom Plum several years
ago; it's disconcerting that the inconsistency remains.)

     Although the plain words of the Standard deny it, plastering
the appropriate `non-const' cast on an expression to
silence a compiler is sometimes safe, because the most probable
implementation of `const' objects will allow them to be
read through any access path, and will diagnose attempts to
change them by generating an access violation fault at run
time.  That is, in common implementations, adding or taking
away the `const' qualifier of a pointer can never create any
bugs not implicit in the rule `do not modify a genuine const
object through any access path.'

     Nevertheless, I must emphasize that this is NOT the
rule that X3J11 has written, and that its library is inconsistent
with its language.  Someone writing an interpreter
using X3J11/88-001 is perfectly at liberty to (indeed, is
advised to) carry with each pointer a `modifiable' bit, that
(following 3.3.4) remains off when a pointer to a const-
qualified object is cast into a plain pointer.  This implementation
will prevent many of the real uses of strchr, for
example.  I'm thinking of things like

        if (p = strchr(q, '/'))
                *p = ' ';

which are common and innocuous in C, but undefined by
X3J11's language.

     A related observation is that string literals are not
of type `array of const char.'  Indeed, the Rationale (88-004
version) says, `However, string literals do not have
[this type], in order to avoid the problems of pointer type
checking, particularly with library functions....'  Should
this bald statement be considered anything other than an
admission that X3J11's rules are screwy?  It is ludicrous
that the committee introduces the `const' qualifier, and
also makes strings unwritable, yet is unable to connect the
two conceptions.

2. Noalias is an abomination

     `Noalias' is much more dangerous; the committee is
planting timebombs that are sure to explode in people's
faces.  Assigning an ordinary pointer to a pointer to a
`noalias' object is a license for the compiler to undertake
aggressive optimizations that are completely legal by the
committee's rules, but make hash of apparently safe
programs.  Again, the problem is most visible in the
library; parameters declared `noalias type *' are especially
problematical.

     In order to write such a library routine using the new
parameter declarations, it is in practice necessary to
violate 3.3.4: `A pointer to a noalias-qualified type ...
may be converted to ... the non-noalias-qualified type.  If
the pointed to object is referred to by means of the converted
pointer, the behavior is undefined.'  Thus, the problem
that occurs with `const' is now much worse; there are no
interesting and legal uses of strchr.

     How do you code a routine whose prototype specifies a
noalias pointer?  If you fail to violate 3.3.4, but instead
try to rewrite the declarations of temporary variables to
make them agree in type with parameters, it becomes hard to
be sure that the routine works.  Consider the specification
of strtok:

        char *strtok(noalias char *s1, noalias const char *s2);

It retains a static pointer to its writable, `noalias' first
argument.  Can you be sure that this routine can be made
safe under the rules?  I have studied it, and the answer is
conditionally yes, provided one accepts certain parts of the
Standard as gospel (for example that `noalias' handles will
NOT be synchronized at certain times) while ignoring other
parts.  It is a very dodgy thing.  For other routines, it is
certain that complete rewriting is necessary: qsort, for
example, is full of pointers that rove the argument array
and change it here and there.  If these local pointers are
qualified with `noalias,' they may all be pointing to different
virtual copies of parts of the array; in any event,
the argument itself may have a virtual object that might be
completely untouched by the attempt to sort it.

     The `noalias' rules have the assignment and cast restrictions
backwards.  Assigning a plain pointer to a const-
qualified pointer (pc = p) is well-defined by the rules and
is safe, in that it restricts what you can do with pc. The
other way around (p = pc) is forbidden, presumably because
it creates a writable access path to an unwritable object.
With `noalias,' the rules are the same (pna = p is OK,
p = pna is forbidden), but the realistic safety requirements are
completely different.  Both of these assignments are equally
suspicious, in that both create two access paths to an
object, one manifestation of which might be virtual.

     Here is another way of observing the asymmetry: the
presence of `const type *' in a parameter list is a useful
piece of interface information, but `noalias type *' most
assuredly is not.  Given the declaration


        memcpy(noalias void *s1, const noalias void *s2, size_t n);

what information can one glean from it?  Some committee
members apparently believe that it conveys either to the
reader or to the compiler that the routine is safe, provided
that the strings do not overlap.  They are mistaken.
Perhaps the committee's intent is not reflected in the
current words of the Standard, but I can find nothing there
that justifies their belief.  The rules (page 65, lines 19-20)
specify `all objects accessible by these [noalias]
lvalues,' which is the entirety of both array arguments.

     More generally, suppose I see a prototype

        char *magicfunction(noalias char *, noalias char *);

Is there anything at all I can conclude about the requirements
of magicfunction? Is there anything at all I can conclude
about things it promises to do or not to do?  All I
learn from the Rationale (page 52) is that such a routine
enjoins me from letting the arguments overlap, but this is
at variance with the Standard, which gives a stronger
injunction.

     Within the function itself, things are equally bad.  A
`const type *' parameter, though it presents problems for
strchr and other routines, does usefully constrain the function:
it's not allowed to store through the pointer.  However,
within a function with a `noalias type *' parameter,
nothing is gained except bizarre restrictions: it can't cast
the parameter to a plain pointer, and it can't assign the
parameter to another noalias pointer without creating
unwanted handles and potential virtual objects.  The interface
MUST say noalias, or at any rate DOES say noalias, so
the author of the routine has all the grotesque inventions
of 3.5.3 (handles, virtual objects) rubbed in his face, like
or not.

     The utter wrongness of `noalias' is that the information
it seeks to convey is not a property of an object at
all.  `Const,' for all its technical faults, is at least a
genuine property of objects; `noalias' is not, and the
committee's confused attempt to improve optimization by pinning
a new qualifier on objects spoils the language.
`Noalias' is a bogus invention that is not necessary, and
not in any case sufficient for its apparent purpose.

     Earlier languages flirted with gizmos intended to help
optimization, and generally abandoned them.  The original
Fortran, for example, had a FREQUENCY statement that didn't
help much, confused people, and was dropped.  PL/1 had
`normal/abnormal' and `uses/sets' attributes that suffered a
similar fate.  Today, these are generally looked on as
adolescent experiments.

     On the other hand, the insufficiency of `noalias' is
suggested by Cray's Fortran compiler, which has 20 separate
keywords that control various details of optimization.  They
are specified by an equivalent of #pragma, and thus, despite
their oddness, can be ignored when trying to understand the
meaning of a program.

     Perhaps there is some reason to provide a mechanism for
asserting, in a particular patch of code, that the compiler
is free to make optimistic assumptions about the kinds of
aliasing that can occur.  I don't know any acceptable way of
changing the language specification to express the possibility
of this kind of optimization, and I don't know how much
performance improvement is likely to result.  I would
encourage compiler-writers to experiment with extensions, by
#pragma or otherwise, to see what ideas and improvements
they can come up with, but I am certain that nothing resembling
the noalias proposal should be in the Standard.

3.  The cost of inconsistency

     K&R C has one important internal contradiction
(variadic functions are forbidden, yet printf exists) and
one important divergence between rule and reality (common
vs. ref/def external data definitions).  These contradictions
have been an embarrassment to me throughout the years,
and resolving them was high on X3J11's agenda.  X3J11 did
manage to come up with an adequate, if awkward, solution to
the first problem.  Their solution to the second was the
same as mine (make a rule, then issue a blanket license to
violate it).

     I'm aware that there are distinctions to be made
between `conforming' and `strictly conforming' programs.
Although the X3J11 rules for qualifiers are inconsistent,
and therefore most nominally X3J11 compilers will ignore, or
only warn about, casts and assignments that X3J11 says are
undefined, people will somehow survive.  C has, after all,
survived the vararg and the extern problems.

     Nevertheless, I advise strongly against sanctifying a
language specification that no one can possibly embody in a
useful compiler.  This advice is based on bitter experience.

4.  What to do?

     Noalias must go.  This is non-negotiable.

     It must not be reworded, reformulated or reinvented.
The draft's description is badly flawed, but that is not the
problem.  The concept is wrong from start to finish.  It
negates every brave promise X3J11 ever made about codifying
existing practices, preserving the existing body of code,
and keeping (dare I say it?) `the spirit of C.'

     Const has two virtues: putting things in read-only
memory, and expressing interface restrictions.  For example,
saying

        char *strchr(const char *s, int c);

is a reasonable way of expressing that the routine cannot
change the object referred to by its first argument.  I
think that minor changes in wording preserve the virtues,
yet eliminate the contradictions in the current scheme.

1)   Reword page 47, lines 3-5 of 3.3.4 (Cast operators), to
     remove the undefinedness of modifying pointed-to
     objects, or remove these lines altogether (since casting
     non-qualified to qualified isn't discussed explicitly
     either.)

2)   Rewrite the constraint on page 54, lines 14-15, to say
     that pointers may be assigned without taking qualifiers
     into account.

3)   Preserve all current constraints against modifying
     non-modifiable lvalues, that is things of manifestly
     const-qualified type.

4)   String literals have type `const char []'.

5)   Add a constraint (or discussion or example) to assignment
     that makes clear the illegality of assigning to an
     object whose actual type is const-qualified, no matter
     what access path is used.  There is a manifest constraint
     that is easy to check (left side is not const-
     qualified), but also a non-checkable constraint (left
     side is not secretly const-qualified).  The effect
     should be that converting between pointers to const-
     qualified and plain objects is legal and well-defined;
     avoiding assignment through pointers that derive ultimately
     from `const' objects is the programmer's responsibility.


     These rules give up a certain amount of checking, but
they save the consistency of the language.



More information about the Comp.lang.c mailing list