ANSI C token set (including $ and @)

Karl Heuer karl at haddock.ima.isc.com
Wed Jan 11 05:18:15 AEST 1989


In article <1858 at zell.cs.vu.nl> leendert at cs.vu.nl () writes:
>In article <11343 at haddock.ima.isc.com> karl at haddock.ima.isc.com writes:
>> Let's see if I've got this straight yet.
>>
>>o `$' is required to scan as a separate pp-token, despite existing practice
>>   making it an optional identifier-character.
>
>Yes. The syntax of an identifier is [the pattern /[_a-zA-Z][_a-zA-Z0-9]*/].
>
>Whether the '$' should be scanned as a separate pp-token depends on the source
>character set.

In the environment I'm thinking of, `$' should be legal in strings (where it
represents the same symbol in the execution character set), hence it must be a
member of the source character set, and by 3.1 it scans as a pp-token.

>>o  Hence, certain features of DEC and APOLLO implementations cannot be
>>   conforming.
>
>I don't know about DEC or APOLLO, but if they allow things like described
>above their implementations are not strictly conforming (perhaps there is
>a flag -pendatic as with the GNU C compiler ?).

`Strictly conforming' is an attribute of programs, not implementations.  An
implementation is either ANSI C, or it isn't.  According to the rules,
accepting `$' in an identifier seems to yield a non-ANSI implementation.

>>o  DEC and APOLLO, through their representatives on X3J11, are aware of the
>>   above and accept it.  Their ANSI C implementations, if any, will not use
>>   `$' in identifiers.
>
>Depends on there policy. They are free to add features. Perhaps they will
>make a flag (if $ is the only nonconforming aspect).

Hmm, assuming they do, I wonder if they'll follow Doug's suggestion of turning
off __STDC__ whenever `$' is enabled.

>>o  Non-English letters, which are clearly not usable in a strictly conforming
>>   program, are in fact not usable in *any* conforming program, for the same
>>   reasons that apply to `$'.  
>
>The basic source set, the set in which source files are written, does not
>contain $, umlaut, accent grave, etc. The strings however, may contains these
>characters (depending on the size of the character representation you could
>use single or multibyte character strings).

The source character set is used both inside and outside of string literals;
those within string literals (or character constants) are mapped to the
execution character set as they are tokenized.  For the purposes of this
discussion, I'm assuming that the source and execution character sets are
identical, and that they contain `$' and/or non-English letters in addition to
the minimal character set of 2.2.1.

>>o  The international community is aware of this and accepts it.
>
>Yep, why not ?

Because the users can't use their native languages to name their variables.
Doesn't it bother you that you can't have a variable named `IJspret' with a
proper ligature instead of separate letters?  It bothers me, and I don't even
have any plans to use such a feature.

(Actually, the problem occurs even in English; I once had a set of constants
named DONT_xxx to selectively suppress individual features of a large system.
I didn't worry about the lack of an apostrophe, because (a) there's nothing to
be done about it, since the symbol is already in use, and (b) the meaning was
clear without it.  The correct use of the apostrophe seems to be declining in
American English anyway.  But that's a topic for a different group.)

Karl W. Z. Heuer (ima!haddock!karl or karl at haddock.isc.com), The Walking Lint



More information about the Comp.std.c mailing list