Comment recognition in Lex, again

Chris Miller chris at hwcs.UUCP
Fri May 18 20:13:45 AEST 1984


The following is a fully general comment recogniser for /* ... */
comments in 'lex' - I have used definitions to make it a little more
readable (I just can't cope with things like ("*"[^*]*)!).

It should be pointed out that I don't believe that this is the RIGHT
way to handle comments unless it is essential to retain their text;
comments can be very long, and trying to match them with 'lex' can
easily overflow buffers.  I prefer solutions which match the opening
/* and then throw away the rest of the comment in the action routine,
using a bit of ad hoccery.
____________________________________________________________________
	STAR	"*"
	SLASH	"/"
	CSTART	({SLASH}{STAR})
	CUNIT	([^*]|{STAR}+[^/*])
	CBODY	({CUNIT}*)
	CEND	({STAR}+{SLASH})
	COMMENT	({CSTART}{CBODY}{CEND})
	%%
	{COMMENT}	printf("COMMENT '%s'\n", yytext);
	%%
	yywrap()
	{
		exit(0);
	}

	main()
	{
		for (;;)
			yylex();
	}
____________________________________________________________________
One problem with the original non-working version is that it fails for
comments terminated by an EVEN number of asterisks and a /.  This seems
to be a common bug in distributed compilers, etc, even when they don't use
'lex' for token generation.  I have encountered this bug in several C
compilers and their corresponding lints (of course, since lint usually uses
cpp), and also in the original distribution of CProlog - you may find it
entertaining to try out
	/** This is a legal comment **/
on any language systems which OUGHT to accept it.  The fix is almost always
trivial - the problem comes from reading the character following an asterisk
without subsequently putting it back in the input if it happens to be
another asterisk.

						Chris Miller
						Heriot-Watt Computer Science
						...ukc!edcaad!hwcs



More information about the Comp.unix.wizards mailing list