Stripping C comments: what about quotes??

Jeff Erickson krazy at claris.com
Sat Mar 18 18:40:25 AEST 1989


>From article <4221 at omepd.UUCP>, by merlyn at intelob.intel.com (Randal L. Schwartz @ Stonehenge):
> The regexp that matches comments looks like (in egrep/lex notation):
> 
>   [/][*]([*]*[^*/])*[*]+[/]
> 
> (I use [X] here instead of \X because I hate backslashes...).
>
The problem with that expression is that is doesn't account for quotes.
For example:

	printf(foo ? "/*" : "*/");

gets turned into:

	printf(foo ? "");

if you aren't careful.  That is a correct regular expression for C comments,
but only if you assume the lack of quotes.  I'm not sure, but I don't think
you can find REAL C comments (no parts in quotes) with a regular expression
search, or a series of them.

Handle the following cases:

	printf(foo ? "/*" : "*/");
	printf("/*");   /*/ hi! /*/
	char foo[] = /*/"bar /*/"baz /*/";
	#define MYNAME "/*/"/*/"/*/"Jeff/*/"/*/"Ernie/*/"/*/"

All of these are legal under ANSI C.  Only the last is questionable under
classic C, becuase it relies on "x""y" being turned into "xy".

The last one should translate into:

	#define MYNAME "/*/" "Jeff/*/" "/*/"
or
	#define MYNAME "/*/Jeff/*//*/"

Happy decommenting!!!

-- 
Jeff Erickson     \  Internet: krazy at claris.com          AppleLink: Erickson4
Claris Corporation \      UUCP: {ames,apple,portal,sun,voder}!claris!krazy
415/960-2693        \________________________________________________________
____________________/        "I'm so heppy I'm mizzabil!" -- Krazy Kat



More information about the Comp.lang.c mailing list