LEX

Mike Coffin mike at arizona.edu
Thu Feb 4 04:21:46 AEST 1988


In article <260 at nyit.UUCP>, michael at nyit.UUCP (Michael Gwilliam) writes:

> I'm writting a C like language to discribe data structures.  When I
> was writting the tokenizer using LEX and I got intrigued by a little
> problem.  Is it possible to write a regular expression that will
> transform a /* comment */ into nothing?

I tried to mail this, but the mailer couldn't find you:

You can probably write a single regular expression to recognize C
comments, but it would be a bad idea.  In general, comments can be
long.  Lex, being a tokenizer, is not designed to recognize things
bigger than its internal buffer size, which is only a few thousand
characters.  When presented with long tokens, lex drops core.  Two
possible solutions:

1) Upon recognizing "/*", call a C routine to eat the rest of the
comment.  Inside the routine, use the Lex macro "input()" to get
characters.

2) Use Lex "start conditions".  These allow you to specify several
different tokenizers and switch between them explicitly.  Untested
code:

<N>"/*"			{BEGIN BC;}
<BC>[^*\n]*		;
<BC>"*"			;
<BC>"\n"		{lineno++;}
<BC>"*/"		{BEGIN N;}

Start condition <N> is the "normal" start condition, while <BC> is
"block-comment" condition.  This is much safer than recognizing entire
comments as one token; to overflow the buffer a single line would have
to be longer than the buffer.


-- 

Mike Coffin				mike at arizona.edu
Univ. of Ariz. Dept. of Comp. Sci.	{allegra,cmcl2,ihnp4}!arizona!mike
Tucson, AZ  85721			(602)621-4252



More information about the Comp.lang.c mailing list