Pointers to Incomplete Types in Prototypes

Sun May 5 09:16:35 AEST 1991

(This is worth a try, anyway....)

The ANSI scoping rules are actually quite simple, but they do produce
surprising results sometimes.  Here is one way (I believe correct :-) )
to work out what happens.

Item: file scope is called `level 0'.  File scope ends only at the end
of a `source unit' (the source file).

Item: braces (the characters `{' and `}') delimit scopes.  An open
brace introduces a new scope; this scope ends at the corresponding
close brace.  In effect, `{' increments the current scope and `}'
decrements the current scope.

Item: function parameters in all forms of declaration and definition
appear at scope level 1.  Variables (and `goto' labels) that appear
inside the function are at level 2 or higher.  This is necessary, among
other reasons, because

	f(p) char *p; { int p = 3; ... }

is legal (if a bit peculiar).

Item: `extern' declarations are inserted at the current scope level
(this differs from pcc, in which extern declarations are inserted in
scope 0, regardless of the current scope).  Goto labels are inserted in
scope 2 (so that you can jump across braces).

Now, inner scope declarations (higher numbers) of some name may give it
a type that differs from an outer (lower number) scope declaration
---for instance, in the f(p) example above, the `int' p is not at all
the same as the `char *' p.  To disambiguate these, you should keep a
mental `stack of paper' by your left, on which there is one sheet per
scope, and one very large sheet on your right.  (You could share the
level 0 page for this but it is easier to imagine a separate sheet).
Here is how you work them.  (The following ignores name space
separation---variables, structure tags, and goto labels all get their
own pairs of left-pile,right-page---but is good enough for illustration.)

Whenever you come across a *declaration* for a name, you search for
that name on the top page on your left (the highest numbered scope).
If it appears, you probably have a redeclaration error (e.g., `int k;
int k;') (but see below).  If not, however, you:

	1. Write the name on the right sheet of paper.  Append a
	   number.  The number can be any that you have not written
	   on the right before, but it is easiest to start at 0 or
	   1 and increase.  Thus, given `int k;' at scope 0 with
	   both pages blank, you would write:

		k<0>

	   on the right page.

	2. Write the name and the same number on the topmost sheet on
	   the left---here, k<0>.  This counts as the declaration.
	   It may be `incomplete' if this is a struct or union, in
	   which case we need one more declaration to `complete' it.
	   (This is what `struct foo;', with no structure contents and
	   no variable names, is for.  It is a special case.)

Whenever you come across a *reference* to a name, you search for that
name in *all* the pages on your left, starting at the top (the highest
numbered scope).  If it appears, you take its number; this is the
`real' name for that identifier.  If it does *not* appear, this may be
an error (e.g., `return foo;' where `foo' is undefined) or it may count
as an `incomplete' declaration (e.g., `struct glorp *' where `struct
glorp' is undefined).  If it is an incomplete declaration, it works just
as described above.

A declaration that fills out an incomplete type occurs only when it
happens on the same piece of paper on the left.

Whenever you open a new scope, you add a blank sheet of paper on the
left, on top of the pile.  Whenever you close it, you throw away the
top sheet.  The sheet on the right remains `active' for the whole file.

(There is a special case for `typedef': when you see `typedef foo bar;'
rather than looking for bar, not finding it, and giving it a *new*
number, you look for bar, do not find it, and give it the *same* number
you found for foo.  typedefs for `base' types [int, char, etc] can be
written as `bar<int>', if you like.  But never mind that.)

Two `struct' types are the same ONLY IF THEIR NUMBERS MATCH.

Okay, so now what happens with incomplete structure types that appear in
various scope levels?

Suppose we have

	void f(p) struct a *p; {
		struct a { int a; };
		...

Working only the `struct' declarations, we start with two blank
sheets.  Between `f(p)' and the first `{' we add a new scope---a new
blank sheet ---on the left, and we look for `struct a' on both
left-hand pages (because this pointer refers to struct a, i.e., this is
a reference to, not a definition of, `struct a').  It is not there, so
we add an incomplete definition, writing

	struct a<0>

on the right and copying it to the left.  (We still do not know what
`struct a' is.)  Then we take the open brace, which adds a new scope,
so we put a third sheet of paper on the left.  Now we come across a
definition for a `struct a'---but there is no `struct a' on the top
page on the left, so we add a new one on the right:

	struct a<1>

and copy that to the left.

The result is that `p' points to a `struct a<0>' but the only `struct
a' we know about is a `struct a<1>'.  p is thus largely useless.  When
we reach the final `}' closing function f, we throw away the top two
left-hand sheets, going back to our blank one, and so if we declare
another `struct a' it is a `struct a<2>'.

This works the same whether f is written as

	void f(p) struct a *p; {

or

	void f(struct a *p) {

or even

	void f(struct a *p);

Now consider what happens if we have:

	struct a;
	void f(struct a *p);

This time we write `struct a<0>' on the right and copy to the left
before we add any more sheets of paper.  This gives us an incomplete
declaration of the structure `a'.  Next, we put down a new blank
sheet.  We then see a reference to `struct a'.  We search through both
sheets on the left and, voila!, find `struct a<0>'.  p thus points to a
`struct a<0>'.  The level-1 scope (top page) disappears after the
second semicolon, and if we encounter a

	struct a { ... };

definition, this `fleshes out' the struct a<0>.

[Now that I have done all this, it occurs to me that it might be
simpler to tag each declaration with its scope level, rather than a
global unique number, at least for discussion.  The BSD debugging
symbol table format uses global unique numbers, which is why I did it
this way.  The initial number is not 0, however; the first few numbers
are assigned to the `base types'.  This is what the strings in a
`.stabs' directive are all about.]
-- 
In-Real-Life: Chris Torek, Lawrence Berkeley Lab CSE/EE (+1 415 486 5427)
Berkeley, CA		Domain:	torek at ee.lbl.gov