Execution time bottleneck: How to speed up execution?

Paul Tomblin pt at geovision.gvc.com
Thu Feb 14 01:21:11 AEST 1991


barmar at think.com (Barry Margolin) writes:

>In article <21658 at yunexus.YorkU.CA> racine at yunexus.yorku.ca (Jeff Racine) writes:
>>   for (j=1; j <= n; j++) {
>>      sum = 0.0;
>>      for (i=1; i<= n; i++) {
>>         sum += exp( con * (x[i]-x[j]) * (x[i]-x[j]) );
>>      }
>>      a[j] = k*sum;
>>   }

>First a question about the correctness: do you really want to loop from 1
>to n?  In C, if the array has n elements, the indices run from 0 to n-1,
>not from 1 to n.

>>This part of the algorithmn is a bottleneck. I want to make this as
>>quick as possible. What are my options? I have heard that assembler
>>would be faster. How much faster would assembler be? Would it be worth
>>the effort? Are there other alternatives that you can see?
[Some good suggestions from Barry deleted]
>Register Allocation.
>Loop Rule 1 -- Code Motion Out of Loops.
>Expression Rule 3 -- Common Subexpression Elimination.

putting all the optimization I can think of off the top of my head, and
assuming that he *really* meant 0 to n-1, I come up with:

	register	x_ptr_t ptri, ptrj;
	register	a_ptr_t ptra;
	register	double	sum, sub_expression;

	ptrj = x[0];
	ptra = a[0];
	ptr_to_end = x[n];
	while (ptrj < ptr_to_end) {
		sum = 0.0;
		ptri = x[0];
		while (ptri < ptr_to_end) {
			sub_expression = ptri * ptrj;
			sum += exp( con * sub_expression * sub_expression);
			ptri++;
		}
		ptra++ = k*sum;
		ptri++;
	}

This combines MOST of the techniques that I learnt from an article in
Microsoft System Journal talking about the optimizer in MS-C.  In other
words, if you use MS-C, you don't need to do most of these things.  If 
you use pcc (the bundled compiler on Suns, Ultrix, on lots of other unix
boxen), you DO need to do all this in critical loops.

The techniques I used:

Pointer unraveling? (I think that's what it's called): Converting array
references to pointers eliminates a lot of calculations of array offsets
on lousy optimizers

Register allocation: Tell the compiler that the variables are going to
be referenced a lot.  If you have enough registers, it will help (again,
only if your compiler isn't smart enough to decide on its own)

Common subexpressions: Another easy one, but I don't think pcc does it.

Anybody think of anything else?  Did I make any major boo-boos?
-- 
Paul Tomblin, Department of Redundancy Department.       ! My employer does 
The Romanian Orphans Support Group needs your help,      ! not stand by my
Ask me for details.                                      ! opinions.... 
pt at geovision.gvc.com or {cognos,uunet}!geovision!pt      ! Me neither.



More information about the Comp.lang.c mailing list