A way to significantly speed up compiles?

Flint Pellett flint at gistdev.gist.com
Sat Dec 15 11:14:45 AEST 1990


This is a proposal on how to make C compiles a lot faster, but it may
not be a useful enough case to be worth it for typical C programs.
I thought I'd throw the idea up for discussion though.  Consider the
following sample of code:

#define	STARTLOC	0
 <..10 things..>
#define ENTRY1		STARTLOC + 10
 <..5 things..>
#define ENTRY2		ENTRY1 + 5
 <..7 things..>
#define ENTRY3		ENTRY2 + 7
...
int pointers[] = {STARTLOC,ENTRY1,ENTRY2,ENTRY3...

I expected the output of cpp to look like this:

int pointers[] = {0 , 10 , 15 , 22 , ...

cpp, however, is only doing simple text substitution here, and produced:

int pointers[] = {0 , 0+10 , 0+10+5 , 0+10+5+7 , ...

If you're trying to use MSC5.1 it doesn't even work: after you get
somewhere around ENTRY60, it quits, telling you you've tried to nest
your constants too deeply.  The questions is, why not have cpp combine
those numbers at pre-processor time?  Wouldn't it make a lot more
sense to have cpp add 0+10+5+7 together once when ENTRY3 is #defined
in cpp rather than making the compiler add those numbers together 100
times in the 100 places that ENTRY3 is used? 

I can toss some real-world experience in to help answer that question,
since my company has it's own specialized language that we develop and
market: I made a similar improvement to our pre-processing so that it
combined everything it could when it was defined rather than when it
was used, and the result was that compile times were reduced to as
little as 50% of previous times in some cases, and generally to 80% to
90% in all others.  However, that language is one in which you tend
to want to build up definitions of one thing based on a previous one,
and I haven't seen that same tendancy nearly so much in C.

Consider these factors:

1.  What is worst case?  The constant is defined and never used, so
you end up doing the evaluation for nothing.  How often do you
realistically expect to see that happen?  The second worst case: the
constant is defined and then only used once, in which case you have
merely moved the job of evaluating it to cpp from the compiler, and
the slow-down of having cpp look at is going to be relatively minor. 
(cpp only needs to evaluate it if it finds an operator in it
somewhere.  The cpp evaluation can quit immediately upon finding
anything like a variable that it can't handle at compile time, and
fall back to simple text replacement.) 

2.  What is the best case?  If you have an app that is chaining
constants like those above, the difference snowballs, because cpp can
even take advantage of it's own previous results.  Assume 100
constants, each defined in terms of addition of a modifier to the one
before.  (The reason I'm bringing this up now is because I just
bumped into a real-world case where this was done.)  Those 100
constants could be computed with 100 additions by cpp.  Done the
way they are done now, the compiler has to perform 5,050 additions,
not to mention parsing a lot more text, assuming it is even willing
to nest things 100 levels deep.  (It was trying to port it to MSC
that I found out that not a compilers will.)

3.  By combining the values, cpp ought to be able to significantly
reduce the memory it consumes: it won't have to save all those long
strings, it can save the reduced results. 

The obvious question is, how much will this really save in a typical C
program?  I don't know the answer to that one: if real programs don't
use #define to define anything except simple integers in the first
place, then not much.  However, if you're build up defined masks by
combining other defined masks, or are are building a table of pointers
where each one is relative to the one before, potentially quite a bit.

Of course, I've got one other question to the ANSI experts: I've
got a program on my hands that isn't portable, because the place where
it was written let you nest constants > 150 levels deep, and MSC won't.
I don't remember ever seeing anything about stuff like that in a
portability guide.  Does ANSI conformance put any requirements on
what cpp has to be able to handle on things like this?
-- 
Flint Pellett, Global Information Systems Technology, Inc.
1800 Woodfield Drive, Savoy, IL  61874     (217) 352-1165
uunet!gistdev!flint or flint at gistdev.gist.com



More information about the Comp.lang.c mailing list