Comment recognition in Lex, again

merlyn at sequent.UUCP merlyn at sequent.UUCP
Sat May 5 00:30:16 AEST 1984


> From: anderson at uwvax.UUCP
> Subject: Comment recognition in Lex, again
> Message-ID: <245 at uwvax.ARPA>
> 
> I have received several replies to my request for a lex expression
> to recognize /* ... */ comments.  The only one that works (sent in
> by Jim Hogue) is
> 
> "/*"([^*]*"*"*"*"[^/*])*[^*]*"*"*"*/"
> 
> which I can't claim to fully understand.  Nor do I understand why my
> original,  "/*"([^*]|("*"/[^/]))*"*/", doesn't work.  The idea is that
> each character in the string between /* and */ can either be something
> other than *, or * followed by something other than /.

I looked at this expression for a while (translated it into railroad tracks
so I could study it as an FSM).  It's sound, but utterly complex.
That is to say, it will match everything that is considered a C-comment,
and nothing else.

My previous suggestion (deleting the "/" before the "[^/]") fails for
cases of /***/, because the second * and the third * are matched in the
middle of the parthensized expression, leaving no * to use with the trailing /.
Actually, all you have to do is document this :-).

If you want simplicity, here's the way I do it (with start states!)

####	%s INCOMMENT
####	
####	<INITIAL>"/*" 	{
####		BEGIN INCOMMENT;
####	}
####	<INCOMMENT>(.|\n)	{
####		/* ignore */;
####	}
####	<INCOMMENT>"*/"	{
####		BEGIN INITIAL;
####	}

Works just fine, and is VERY clear.  "INITIAL" is an undocumented state that
represents the start state that you are in to begin with.  If you have any
other single char matchers in your lex script, make sure they are AFTER the
middle pattern above, or are prefixed with another start state (even INITIAL,
if you don't need any other start states).

Randal L. ("(null)") Schwartz, esq. (merlyn at sequent.UUCP)
	(Official legendary sorcerer of the 1984 Summer Olympics)
Sequent Computer Systems, Inc. (503)626-5700 (sequent = 1/quosine)
UUCP: ...!XXX!sequent!merlyn where XXX is one of:
	decwrl nsc ogcvax pur-ee rocks34 shell teneron unisoft vax135 verdix



More information about the Comp.unix.wizards mailing list