Pointers to Incomplete Types in Prototypes

Mon May 6 14:44:14 AEST 1991

People are having a hard time understanding how

	extern blort(struct piffle *);
	struct piffle { int cazart; };

creates two *different* struct piffle's, i.e. the struct
definition on the second line does *not* complete the incomplete
struct within the prototype.

For those who prefer concrete examples, I will describe how
structures are typically implemented within compilers.  (I hasten
to point out that such an excursion should not, strictly speaking,
be necessary -- we are supposed to be able to answer these
questions by reading the documentation, without peeking at the
source code.  In this case, the Standard is unambiguous, but
understanding how its requirements map to the compiler internals
may put a few minds at rest.)

Inside the compiler, we might have a structure which describes a
structure.  It might look something like this:

	struct structure
		{
		char *tag;
		int flags;
		struct symtabent *members;
		int nmembers;
		};

The tag field obviously records the structure's tag name, or is
NULL for unnamed structures.  The members and nmembers fields
record the number, names, and types of the structure's members,
but those details do not concern us here.

The important fact to realize is that if the compiler has to
structures lying around which it wants to test for compatibility,
it does *not* do so by comparing the tag names:

	struct structure *sp1, *sp2;
	...
	if(strcmp(sp1->tag, sp2->tag) != 0)		/* WRONG */
		error("incompatible structures");

Rather, with one exception [footnote 1], it does so by comparing
the pointers themselves for equality:

	if(sp1 != sp2)
		error("incompatible structures");

I can't say exactly why the Standard requires compilers to behave
in this way; one reason is obviously that tag comparison can't
work for structures without tags.  (I confess that I can't find
explicit language in the Standard which requires behavior such
as I have described, but if you look at section 3.1.2.6 -- "Two
types have compatible type if their types are the same" -- and
section 3.5.2.1 -- "The presence of a struct-declaration-list in
a struct-or-union-specifier declares a new type, within a
translation unit" -- it's clear that tag comparison is not used.
Section 3.5.2.3 is devoted to tags, which we'll now explore.)

The second thing to understand is the way that scopes nest, and
the way that existing names are looked up in, and new names
inserted into, these nested scopes, particularly when the name is
a structure tag.  (Chris Torek has already described this process
in considerable detail; the informal treatment I present here
may be a bit easier to follow.)

When a compiler sees a structure tag without a struct-
declaration-list (the brace-enclosed list of the structure member
names and types), it looks through the current set of nested
scopes for a matching struct tag.  (Unlike compatible structure
testing, this search *is* made by string comparison of the tag
names.)  If it finds one, then this struct tag is a reference to
an already-declared struct, and that already-declared struct is
used as the type of whatever is being declaring now (via the
standalone struct tag just encountered, at the beginning of this
paragraph).  In particular, part of the type of the thing being
declared now is the pointer to the struct structure of the struct
with the matching tag.  (Got that :-) ?)

If a matching struct tag is *not* found, the compiler has
encountered an incomplete struct definition.  It allocates a new
struct structure, with the given tag and no members (and perhaps
an explicit indication in the flags field that this is an
incomplete struct).  This incomplete structure definition must
now be entered (still in its incomplete form) into the scope
list.  At which level?  At the current one, just like any other
definition.  No other choice would be regular, or make much
sense.

When a structure declaration with a struct-declaration-list is
encountered, whether it has a tag or not, it is the definition of
a new struct type.  (See section 3.5.2.1, page 61, lines 23-24.)
This new structure definitely gets defined at the current scope
level.  If this new structure has a tag, and if there is already
a structure with the same tag at this scope level, and if that
existing struct was incomplete, this declaration completes it.
(The incomplete definition's struct structure is used, so that
anything already declared using the incomplete type will remain
compatible, by the method of pointer comparison.)  If the new
structure has a tag, and if there is already a structure with the
same tag at this scope level, and if that existing struct is
*not* incomplete (already has members), it's an error (an
attempt is being made to redefine the structure) [footnote 2].

Finally, given that there is a new, microscopic (but still
nested) scope active within a function prototype that is not part
of a function prototype, we can see why

	extern blort(struct piffle *);
	struct piffle { int cazart; } x;
	blort(&x);

defines two different struct piffles, such that the call to blort
in the third line is in error, while 

	struct piffle;
	extern blort(struct piffle *);
	struct piffle { int cazart; } x;
	blort(&x);

works as intended.  That empty struct piffle on the first line is
just to get an incomplete struct piffle entered at file scope, so
that the incomplete struct piffle in the prototype on line 2 will
reference it rather than creating a new one, and so that the
definition on line 3 will complete the same struct referenced in
the prototype, so that the call on line 4 will use a properly
compatible type.

                                            Steve Summit
                                            scs at adam.mit.edu

Footnote 1.  The exception is when structures must be compatible
across translation units.  Obviously, if they're compiled
separately, the compiler can't compare pointers to its run-time
data structures.  In fact, the compiler isn't going to check
compatibility at all; nor, for that matter, does the linker
usually do so.  Section 3.1.2.6 describes when structures are
compatible across translation units; presumably a utility like
lint might make use of it.  (Obviously, the programmer must also
be aware of this information, if the programs are to work,
although the common and recommended practice of putting structure
definitions in header files of course ensures compatibility.)
Curiously, section 3.1.2.6 requires that the members have the
same types and be in the same order (obviously) and also that
they have the same names, but *not* that the structures have the
same tags.  Presumably this means that two structure types with
different tags (or without tags) but with identical descriptions
would be strictly compatible across translation units.
(Obviously, the code would work correctly, under any conceivable
architecture, in any case, and nothing would be likely to go
wrong if the member names didn't match, either.)

Footnote 2.  A few months ago, there was a long discussion about
incomplete types and the precise interpretation of the term "an
enclosing scope."  I don't remember if the discussion concerned
structure tags (it might have been about incomplete array types),
but section 3.5.2.3 states explicitly that when a tag is
declared, "Subsequent declarations [with the same tag] shall omit
the bracketed list."