Want a way to strip comments from a C file

Michael Condict mnc at m10ux.UUCP
Fri Mar 24 03:17:00 AEST 1989


In article <4060 at ttidca.TTI.COM>, hollombe at ttidca.TTI.COM (The Polymath) writes:
> In article <880 at m10ux.UUCP> mnc at m10ux.UUCP (Michael Condict) writes:
> }I recently posted to this group a shell script that
> }    [ deletes comments from C source, among other things ]
>     . . .
> If I understood the original posting correctly, it will also fail if it
> encounters a /* or */ within a quoted string constant.  E.g.:
>     . . .

Oops, you are absolutely right.  After some analysis of this limitation in
my sed script, it is obvious that the regular expressions of sed (or awk or
vi/ex/ed) are too limited to handle the job in any reasonable fashion.
Besides the lex script that does the job is trivial.  Someone pointed out that
they were posting a six-line lex script to comp.sources.unix.  This doesn't
seem like the best way to display the solution, since the article announcing
the posting was itself longer than six lines.  I'll throw out the following
3-line lex script, which has been tested on all the devious ways of forming
comments and quotes that I can think of.  In particular, it handles comment
delimiters within quotes and quotes within comment delimiters:

----------- Lex script to delete comments from C source code ----------------
%%
\"([^\\"]*\\(.|\n))*[^\\"]*\"	ECHO;
"/*"([^*]*"*"[^/])*[^*]*"*/"	;
.				ECHO;
-----------------------------------------------------------------------------

Can anyone find anything wrong with this one (he asks stupidly)?  Can anyone
find a shorter solution?  Boy this is almost as much fun as computing factorial
in the minimum-sized C program.
-- 
Michael Condict		{att|allegra}!m10ux!mnc
AT&T Bell Labs		(201)582-5911    MH 3B-416
Murray Hill, NJ



More information about the Comp.lang.c mailing list