not the way ... (was Re: Want a way to strip comments from a)

Tom Stockfisch tps at chem.ucsd.edu
Thu Mar 23 16:12:38 AEST 1989


In article <4221 at omepd.UUCP> merlyn at intelob.intel.com (Randal L. Schwartz @ Stonehenge) writes:
>| >Does anyone have a sed or awk script which we
>| > can use to preprocess the C source and get rid of all the comments
>|   The following works in vi: :%s/\/\*.*\*\///g
>Nope.  Just try it on the line:
>  foo; bar;  /* comment1 */  bletch; /* comment2 */
>'bletch;' disappears with the comments.
>The regexp that matches comments looks like (in egrep/lex notation):
>  [/][*]([*]*[^*/])*[*]+[/]
>Didn't we just go through this about nine months ago? :-)
>(And didn't I give the wrong answer at least twice? :-) :-)

You still don't have it right, I'm afraid.

This pattern won't work on

	/ /* / */

It is unbelievable how hard this task is in regular expressions, when it is
trivial to code by hand.

To convince yourself that a pattern is correct, I think you have to show
two things
	1.  That the body between the "/*" and "*/" cannot possibly contain
	    a "*/",
	2.  That the body can contain any other sequence of characters.

Various other patterns which have been posted (including ones by famous
net gurus) have failed correctly to match the following:

1.
	/*****//hello world */

2.
	/* hello /* /* world */

3.
	/* */ hello /* */

4.
	/**// /* this input should produce "/ \n" for output */

5.
	/* */ hello */


So what works?  I haven't been able to crack this one, which also correctly
ignores comments in strings and character constants.

If you want a practical program, use start states and don't match an entire
comment with one pattern -- you won't be in danger of overflowing yytext[].
If you want to see how it's done with regular expressions, study the
following.


	/* lex program that strips comments */

okslash	([^*/]"/"+)

%%
"/*""/"*([^/]|{okslash})*"*/"	;

\"((\\(.|\n))|[^\\"])*\"	ECHO;

\'((\\(.|\n))|[^\\'])*\'	ECHO;

.|\n	ECHO;
-- 

|| Tom Stockfisch, UCSD Chemistry	tps at chem.ucsd.edu



More information about the Comp.lang.c mailing list