Want a way to strip comments from a

Michael Condict mnc at m10ux.UUCP
Thu Mar 23 04:38:30 AEST 1989


In <9900010 at bradley>, brian at bradley.UUCP writes:

>> /* Written  9:58 am  Mar  9, 1989 by jrv at siemens.UUCP */
>>                          Does anyone have a sed or awk script which we
>> can use to preprocess the C source and get rid of all the comments before
>> sending it to the compiler?
>
>  The following works in vi: :%s/\/\*.*\*\///g
>
>  I don't know if it will work in sed, but it should...

Lest anyone actually be tempted to use such a naive method, you should be
aware that it DOESN'T WORK, except for the simplest case of one comment per
line and no multi-line comments.  A correct sed command, which I may have
posted before (forgive me) is shown below.  To use it on SystemV-derived
seds, you have to first delete all the comments from the sed script
itself (ironically, enough!).

To see all of the reasons why the simple method doesn't work, try this:
Take the test C file appended after the sed script below and run it through
the sed script into a file.  Now run diff on the original C file and the one
with comments removed.  What you are looking at is all of the various ways
that comments and things looking almost like comments can be intertwined in C
source files.

Michael Condict		{att|allegra}!m10ux!mnc
AT&T Bell Labs		(201)582-5911    MH 3B-416
Murray Hill, NJ

-------------------- Sed script to delete C comments -------------------------
# Delete comments from C source files:
: delcom
/\/\*/{
	# Change first comment delim to @ (after eliminating existing @'s):
	s/@/<Used#to%be+an-At>/g
	s:/\*:@:

	# Read until we have the end comment:
	: morecm
	/\*\//!{
		# Just to cut down on max buffer length:
		s/@.*/@/
		N
		b morecm
	}

	# Get rid of any $'s:
	s/\$/<Used#to%be+a-Dollar>/g

	# First occurrence of */ is guaranteed to be the corresponding end
	# comment, because it is otherwise not legal C, so:
	s:\*/:$:
	s/@[^$]*\$/ /

	# Restore $'s and @'s:
	s/<Used#to%be+a-Dollar>/$/g
	s/<Used#to%be+an-At>/@/g

	b delcom
}
------------------------ The test C program ----------------------------------
#define APAP\
		37
# /*hi*/ define GOO(x) y

char *abc = "hi \"Joe\"";
/* this is
 * a comment
 */
struct A_S {
	int wopper /**** a *** b *** c *//*again*/ ;
}; int
f
(x, /* a * in a comment */
	yoohoo)  /**/    /* a /* b */ char *yoohoo;
{
	int a, b, c = '\'';
	char * quote="h#w \
#bo{ut @hat?";
	a = b /*oops*/*c;	/****************/
} enum goober {a,b};
	struct A_S *george(x) struct {int x;
				      float y;} x; { return 0; }

typedef int bar;
struct A_S * * george2(moo, x, glop, foo) struct {
					     int q[13]; float y;} x[];
	bar moo ,	*foo[];
	struct A_S *glop;
/*a*/{
		return 0;
}

/* Try various combinations of register arg decls:*/
flop(a_1, b) register a_1; { return 0; }
struct BB {int f,g;} floop(a_1, b_1) register char *a_1; float register*b_1;
{ struct BB j; return j;}

/* Test arg names that are substrings of one another: */
char sub1(abc, abcdef) int* abcdef; float abc; { return 0; }
-----------------------------------------------------------------------------
-- 
Michael Condict		{att|allegra}!m10ux!mnc
AT&T Bell Labs		(201)582-5911    MH 3B-416
Murray Hill, NJ



More information about the Comp.lang.c mailing list