lex grammer for C comments

Michael Gwilliam michael at nyit.UUCP
Tue Apr 5 02:49:45 AEST 1988



NOTE: Sorry this reply took so long, but our phone line was out for a long
time.

-----

Well the information is back and I've summerized the replies.  In case
you forgot the question it is,  "Can C comments be filtered out with
LEX as regular expressions?"

The answer is, "Yes, but it may not be a good idea."

The reasons are...

o	It's nearly impossible to read.

o	An extended comment could over flow the buffer.


The correct way of doing this seems to be:


You could use states, something like this (I might have the syntax
a bit wrong):  
	"/*"		{ BEGIN comment; }
	<COMMENT>.	;
	<COMMENT>"*/"	{ BEGIN 0; }
The problem is that this requires you to set up states for everything,
which is a pain.

Here's what I did -- built my own little automata inside the action
for the "/*" pattern.  This is stripped out of working code.

"/*"	        {
		    /* Comment. */
		    register enum { S_STAR, S_NORMAL, S_END } S;

		    for (S = S_NORMAL; S != S_END; )
			switch (input()) {
			    case '\0':
				/* Complain about premature EOF? */
				S = S_END;
				break;
			    case '*':
				S = S_STAR;
				break;
			    case '/':
				if (S == S_STAR) {
				    S = S_END;
				    break;
				}
				/* FALLTHROUGH */
			    default:
				S = S_NORMAL;
				break;
			}
		}
(credit goes to rsalz)

Another method uses states.


%START Normal Comment
%%
					{ BEGIN Normal; }
<Normal>"/*"				{ ECHO; BEGIN Comment; }
<Comment>"*/"				{ ECHO; printf("\n"); BEGIN Normal; }
<Comment>\				|
<Comment>[^ \t\n*]+			|
<Comment>"*"/[^/]			|
<Comment>.				|
<Comment>\n				{ ECHO; }
<Normal>.				|
<Normal>\n				{ }

(credit goes to Tony Hansen)

If you're hard set on doing this, a good reference seems to be...

_Introduction_to_Compiler_Construction_with_Unix_, by Axel T. Schreiner and
H. George Friedman, Jr., Prentice-Hall, 1985, on page 25 gives:

	"/*""/"*([^*/]|[^*]"/"|"*"[^/])*"*"*"*/".

The reason that the expression I used was accepting nexted comments
is that lex tries to match the largest case.

Nested comments are not regular expression so they are hopeless without
writting a little C code.  I never really wanted to do them anyway, I guess
I just didn't make myself clear.  (Besides, I'm told they're not ANSI.)


Thanks for all the help from...

Erik Baalbergen <mcvax!cs.vu.nl!erikb at uunet>
Kjell Post <cmcl2!ida.liu.se!kpo>
MH Cox <rutgers!garage.nj.att.com!mhc at gatech>
R. Nigel Horspool <rutgers!uw-beaver!uvicctr!nigelh at gatech>
cmcl2!gondor!psuvax1!gondor!schmidt at uiucdcs (David E. Schmidt)
cmcl2!harvard!pineapple.bbn.com!rsalz
harvard!gsg!gsgpyr!lew at linus (Paul Lew)
harvard!ll-xn!ames!sdcsvax!sdcc6.UCSD.EDU!ix426 at linus (Tom Stockfisch)
sbcs!mmintl!franka at pwa-b
sbcs!pegasus!hansen at cbosgd

and I hope to goodness I gave proper credit to everyone.

michael



More information about the Comp.lang.c mailing list