C source character set

Tue Oct 3 07:53:29 AEST 1989

In article <1302 at gmdzi.UUCP> wittig at gmdzi.UUCP (Georg Wittig) writes:
>		/* in the following lines let @ be the character '\0' */
>		int x;
>		x = 1 +	/* foo @ bar */
>		    2	/* */
>		    ;

The character you're representing by "@" is not in the standard C source
character set, so such a program is not strictly conforming.  Some
implementations may be able to deal with that source code but others
will not.  If an implementation does deal with it, it is up to that
implementation how to interpret this non-standard extension.

>[2] Furthermore, there are (non-UNIX) operating systems that encode the end of
>    a source line by the number of bytes of that line ...

There is a misunderstanding here.  The specifications for C source
character set do not constrain how C source code files are represented
in a particular implementation, nor how text editors present C source
code visually, nor myriad other similar issues.  C source code
characters must be seen as distinct units by the conforming C translator;
what mapping is done from physical source character encoding before that
point lies beyond the scope of the C standard.  Presumably it will be
similar to that done for "text" files in the hosted C library text-stream
support, but it need not be.

>[3] Line continuation by `\': Does it only apply to #define contexts and string
>    constant contexts, or is it a general rule?

It's a general rule.  The first translation phase is physical-to-C source
code character mapping, then trigraph replacement, then \ newline splicing.
Preprocessing occurs after that.