ANSI C token set (including $ and @)

Thu Jan 5 22:21:07 AEST 1989

The following comments are based on the X3J11/88-090 (may/88) version of
the dpANS report. In a couple of days I'll get the latest version, but for
now it will do.

In article <11343 at haddock.ima.isc.com> karl at haddock.ima.isc.com writes:

> Let's see if I've got this straight yet.
>
>o `$' is required to scan as a separate pp-token, despite existing practice
>   making it an optional identifier-character.

Yes. The syntax of an identifier is (par. 3.1.2):

	identifier:	nondigit | identifier nondigit | identifier digit ;
	nondigit:	"_[a-z][A-Z]"
	digit:		"0-9"

Whether the '$' should be scanned as a separate pp-token depends on the source
character set.

>o  When converting pp-tokens to tokens, an implementation is free to merge
>   {foo}{$}{bar} into a single token {foo$bar}.  (I'm guessing on this one.)

No, in this conversion the '$' is a garbage character. So what you get is
{foo} <ERROR> {bar}. (the $ character is not part of the non-terminal identifier,
see above).

>o  But, since macro expansion happens first, it is {foo}, and not {foo$bar},
>   that is subject to macro replacement, even if the above is true.

{foo$bar} can never be subject to any macro replacement, since it's not an
identifier (see 3.8.3).

>o  Hence, certain features of DEC and APOLLO implementations cannot be
>   conforming.

I don't know about DEC or APOLLO, but if they allow things like described
above their implementations are not strictly conforming (perhaps there is
a flag -pendatic as with the GNU C compiler ?).

>o  DEC and APOLLO, through their representatives on X3J11, are aware of the
>   above and accept it.  Their ANSI C implementations, if any, will not use
>   `$' in identifiers.

Depends on there policy. They are free to add features. Perhaps they will
make a flag (if $ is the only nonconforming aspect).

>o  Non-English letters, which are clearly not usable in a strictly conforming
>   program, are in fact not usable in *any* conforming program, for the same
>   reasons that apply to `$'.  

The basic source set, the set in which source files are written, does not
contain $, umlaut, accent grave, etc. The strings however, may contains these
characters (depending on the size of the character representation you could
use single or multibyte character strings).

>o  The international community is aware of this and accepts it.

Yep, why not ?

BTW: The best wishes for 1989. "Hope it's a good one"
-- 
Leendert P. van Doorn 			   		 <leendert at cs.vu.nl>
Vrije Universiteit / Dept. of Maths. & Comp. Sc.
De Boelelaan 1081
1081 HV Amsterdam / The Netherlands			tel. +31 20 548 5302