Comment recognition in Lex, again

hogue at hsi.UUCP hogue at hsi.UUCP
Fri May 4 23:24:06 AEST 1984


Ok I will attempt to explain the lex script.  First a little history.
The script was derived while attempting to write a C complier for a
class at the University of Arizona.  Dr. P. J. Downey and I derived
the regular expression from the finite automata and the knowledge of
lexs' attempt to match the longest possible match.

 > I have received several replies to my request for a lex expression
 > to recognize /* ... */ comments.  The only one that works (sent in
 > by Jim Hogue) is

 > "/*"([^*]*"*"*"*"[^/*])*[^*]*"*"*"*/"

1. "/*" 
	matches the initial /* of a comment.

2. ([^*]*"*"*"*"[^/*])*
	matches the stuff in the comment including any *'s or /'s but does 
	not allow the * just prior to the / to be matched.  As such it
	does not match */

3. [^*]*"*"*"*/"
	matches more stuff and this time forces the match of the final */

Problems:
	First the /**/ /***/ ... or /*foo**/ style comments.  These are 
	taken care of by 3.
	Next the /**foo*/ and /*foo*baz*/ style comments.  These are taken
	care of by 2.
	The final trick is that the first */ is matched and thus comments
	of the form /*foo*/a = b;/*baz*/ are matched in the same style as
	PCC.  ie. /*foo*/ and /*baz*/ are matched and a = b; is not.

Now for the real way to do it.  Simply recognize the /* and then call a three
line c program that "eats" up the comment!  (The c preprocessor does
the comment removal for the c complier).

The real reason this looks so complicated is because it was derived not
thought out!
-- 
		Jim Hogue
		{kpno, ihnp4}!hsi!hogue
		Health Systems International
		New Haven, CT  06511



More information about the Comp.unix.wizards mailing list