Comment recognition in Lex, again

Mark Plotnick mp at whuxle.UUCP
Sun May 6 03:54:38 AEST 1984


The problem with
	"/*"([^*]|("*"/[^/]))*"*/"
is that the right context handling in lex in nested regular expressions
is a little nonintuitive.  After lex recognizes the complete
expression, it backs up one character because of the ``/[^/]''
expression.

In case you still don't see the problem, run this lex program:

	a(b/c)c { printf("I saw this: %s\n", yytext); }
	.	{ printf("char: '%c'\n", yytext[0]);

The first rule will NOT match ``abc'', but it will match ``abcc'',
sort of.  It prints out ``I saw this: ab''.

To be safe, only use right context at the very end of your regular
expression.

Yet Another Way To Recognize Comments:

I really don't enjoy beating my head against a wall playing
with regular expressions and starting conditions.  When we
had to write a compiler a couple of years ago (any other
AM295 survivors out there?), we did something like:
"/*" {
#define LEXEOF 0
	int c, last_c='\0';
	while ((c=input()) != LEXEOF) {
		if (last_c == '*' && c=='/')
			break;
		else
			last_c=c;
	}
	printf("comment seen\n");
	if (c == LEXEOF)
		printf("EOF within comment\n");
    }

Moving some of the effort into the action routine allows you to easily
add more context-dependent features, such as printing a warning message
if there's a ';' within the comment, supporting nested comments, etc.
	Mark Plotnick



More information about the Comp.unix.wizards mailing list