Standardization questions (cpp mostl

jim at ism780b.UUCP jim at ism780b.UUCP
Mon Oct 8 14:20:01 AEST 1984


>   1.   undefined token has the value zero (in #if's).  Cpp should
>        print a warning -- or error if the evaluator is intelligent
>        about statements like:
>                #if defined (foo) && foo == 0
>        My cpp prints a warning, as it erroneously evaluates the
>        entire statement.

I disagree.  I've seen plenty of code which does

	#if foo
or
	#if !foo

and I don't think a "-Dfoo 0" should be required.

>   2.   <backslash><newline> is "invisible" to all processing in
>        the standard.  I regard it as "whitespace" outside of
>        strings, and hence a token delimiter.  This greatly
>        simplifies accurate error message generation.

If <backslash><newline> is allowed everywhere, it should behave the same
everywhere, so it can be handled at the lowest possible level.
And since it must be ignored in strings, it should be ignored everywhere.

>   3.   The standard isn't clear about <form-feed> and <vertical-tab> --
>        are they everywhere identical to <space>?  I.e. may they appear
>        between the start of a line and the # that introduces a control
>        statement?  The standard is also unclear about the action to be
>        taken at the end of an include file:  is the <eof> a token delimiter?
>        Does it terminate a line?

The System V cpp allows FF and VT before the #, but it does not allow other
space there.  FF and VT should be allowed at least wherever other whitespace
is allowed.  Allowing arbitrary space before the # breaks the use of cpp
as a general preprocessor, but that is probably not a concern of the
committee.

>   4.   Just how invisible are comments?  For example, are the following
>        correct?
>
>                /* foo */ #ifdef foo
>                # /* foo */ endif

Reiser's cpp is poorly written, and breaks on comments in any surprising
place.  However, it is hard to think of a syntax which does not make that
a bug (except for the case where the # must be the first character on the
line).

>   5.   cpp should accept "# <number>" as a synonym for "#line <number>"
>        so that it accepts its own output format.

Absolutely.  The fact that the committee has not allowed for this reveals
that they have not spent much time looking at existing cpp implementation
or usage.

>   6.   Some people write
>
>                #ifdef foobar
>                #endif foobar
>
>        This should be provided for in the syntax -- or explicitly
>        rejected.

I agree; it should be allowed.  Also for #else.  Arbitrary text should
be allowed to the right of the required tokens.

>   7.   I added __DATE__ to the preprocessor predefineds.  It's
>        useful for embedding debugging status (but not essential).

The form __FOO__ should be reserved for preprocessor built-in's,
with __FILE__ and __LINE__ required and all other implementation-defined.

>   8.   I claim that nested comments /* ... /* ... */ warrant
>        a warning message -- that is a very common source of
>        error in the programs I see (and impossible to detect
>        without a warning message).

I think the standard allows this warning but does not demand it.
This is probably good policy for all warnings, and will encourage
implementors to provide the warnings to be competitive.

>Those are all the problems I have (today).  Here are some questions
>about the new concatenation operation:
>
>1.  May it appear anywhere, or only on a #define line?

If it appears anywhere, the # introducing a control line becomes
syntactically ambiguous.  However, that is probably ok.
I can't imagine a proper implementation that wouldn't have to do more
work to disallow it outside of #defines than to allow it.

>2.  What are the semantics of, say,
>
>        #define foo abc # def
>
>    Is it (1) "foo";  (2) read "abc"; (3) read '#' and realize we're
>    expanding a token, so (4) read "def" and glue them together?
>    If so, what happens when "abc" or "def" are macro's:
>
>        #define unique here # __LINE__

Is white space allowed around the #?  That makes the syntax a bit messy;
the concatenation operator then becomes "an intermixed sequence of zero or
more whitespace characters and one or more #'s".  How much whitespace do you
have to scan in order to find the #?  If the # is allowed in running text,
that whitespace would normally be copied, but obviously not if it is
followed by a #.

I don't see why macros are a problem.  The # is just like whitespace in that
it delimits a token, but it is not copied to the output.

> 3.  May the #define token be concatenated:
>
>         #define unique # counter __LINE__

That would require expanding names delimited by #, even when they appear in
a position not normally expanded.  No big deal, but it doesn't get you much;
see below.

> 4.  If I should write:
>
>        #define unique_var      var # counter
>        #define counter         (counter + 1)
>        #define another_var     var # counter
>
>    will cpp "do what I mean?"

Of course not; the preprocessor does not do arithmetic, and counter
is not expanded at the time of the define.  But I agree that the semantics
must be fully specified so the behavior of such cases is well-defined.

>I just added stringization to Decus cpp and discovered something
>interesting:
>
>        #define print(format, value) printf("Result " "format", value)
>            print("%d", 123);
>
>My first attempt expanded to
>
>            printf("Result " ""%d"", 123);
>
>I've added a hack to strip one level of quotes, but aren't too
>happy with it.  Note that you just can't omit the argument
quotes as you may want to pass ',' through.

Why not just define it as

       #define print(format, value) printf("Result " format, value)

Certainly the string concatenation should not happen until format is
evaluated.

>Also, is this ok:
>
>            print('%d', 123);
>
>In that case, I generate
>
>            printf("Result " "'%d'", 123);
>
>without comment.

I would think that you want

	print("'%d'", 123);

You should consistently require double quotes.
If you don't want to require quotes, then you can't allow commas,
right parens, /*, etc. in the argument.
Trying to have your cake and eat it too by stripping quotes just doesn't
cut it.  Rememeber that this isn't m4, where the quotes are balanced (`').

> The committee  might consider specifying the core run-time library
> (str..., is..., the math routines, and a few others) such that the
> compiler may generate in-line code or non-standard calling sequences.
> There should be a way to override some or all of this, of course.
> This was done for Fortran with no evil effects.

I agree.  It would be nice if there were a way to specify in a header file
that a routine is possibly builtin, so lint could complain if you take its
address.

-- Jim Balter, INTERACTIVE Systems (ima!jim)



More information about the Comp.lang.c mailing list