ANSI C token set (including $ and @)
Leendert van Doorn
leendert at cs.vu.nl
Thu Jan 5 22:21:07 AEST 1989
The following comments are based on the X3J11/88-090 (may/88) version of
the dpANS report. In a couple of days I'll get the latest version, but for
now it will do.
In article <11343 at haddock.ima.isc.com> karl at haddock.ima.isc.com writes:
> Let's see if I've got this straight yet.
>
>o `$' is required to scan as a separate pp-token, despite existing practice
> making it an optional identifier-character.
Yes. The syntax of an identifier is (par. 3.1.2):
identifier: nondigit | identifier nondigit | identifier digit ;
nondigit: "_[a-z][A-Z]"
digit: "0-9"
Whether the '$' should be scanned as a separate pp-token depends on the source
character set.
>o When converting pp-tokens to tokens, an implementation is free to merge
> {foo}{$}{bar} into a single token {foo$bar}. (I'm guessing on this one.)
No, in this conversion the '$' is a garbage character. So what you get is
{foo} <ERROR> {bar}. (the $ character is not part of the non-terminal identifier,
see above).
>o But, since macro expansion happens first, it is {foo}, and not {foo$bar},
> that is subject to macro replacement, even if the above is true.
{foo$bar} can never be subject to any macro replacement, since it's not an
identifier (see 3.8.3).
>o Hence, certain features of DEC and APOLLO implementations cannot be
> conforming.
I don't know about DEC or APOLLO, but if they allow things like described
above their implementations are not strictly conforming (perhaps there is
a flag -pendatic as with the GNU C compiler ?).
>o DEC and APOLLO, through their representatives on X3J11, are aware of the
> above and accept it. Their ANSI C implementations, if any, will not use
> `$' in identifiers.
Depends on there policy. They are free to add features. Perhaps they will
make a flag (if $ is the only nonconforming aspect).
>o Non-English letters, which are clearly not usable in a strictly conforming
> program, are in fact not usable in *any* conforming program, for the same
> reasons that apply to `$'.
The basic source set, the set in which source files are written, does not
contain $, umlaut, accent grave, etc. The strings however, may contains these
characters (depending on the size of the character representation you could
use single or multibyte character strings).
>o The international community is aware of this and accepts it.
Yep, why not ?
BTW: The best wishes for 1989. "Hope it's a good one"
--
Leendert P. van Doorn <leendert at cs.vu.nl>
Vrije Universiteit / Dept. of Maths. & Comp. Sc.
De Boelelaan 1081
1081 HV Amsterdam / The Netherlands tel. +31 20 548 5302
More information about the Comp.std.c
mailing list