Execution time bottleneck: How to speed up execution?

Bob Martin rmartin at clear.com
Fri Feb 15 04:04:00 AEST 1991


>Execution time query:
>
>   for (j=1; j <= n; j++) {
>      sum = 0.0;
>      for (i=1; i<= n; i++) {
>         sum += exp( con * (x[i]-x[j]) * (x[i]-x[j]) );
>      }
>      a[j] = k*sum;
>   }
>
>This part of the algorithmn is a bottleneck. I want to make this as
>quick as possible. What are my options? I have heard that assembler
>would be faster. How much faster would assembler be? Would it be worth
>the effort? Are there other alternatives that you can see?  Any
>improvement would be welcome.

I am not a math whiz, I couldn't see any way to get the exp out of the
inner loop.  However there are a few things that can make the program
run significantly faster...

Assume that x runs from 0 - n-1 instead of 1 - n.

	double x[MAXSIZE], a[MAXSIZE];
	register double *ip, *jp, *ap;
	double diff;
	register int j, i;
	int n = MAXSIZE; /* or whatever size you decide on */

	for(j=n, jp=x, ap=a; j; j--, jp++, ap++) {
		sum = 0.0;
		for(i=n, ip=x; i; i--, ip++) {
			diff = *ip - *jp;
			sum += exp( con * diff * diff );
		}
		*ap = sum;
	}

Array subscripting is often slower than direct pointer dereferences.
Also the subtraction of x[i]-x[j] need only be done once.
Putting the pointers into registers may also help quite a bit.

The changes I have made here are similar to the trade offs that an assembly
language programmer would make.  Further translating to assembly language
may increase your performance yet again, I would guess that it would not
be dramatically more effective than the above.

-----

Here is another thought which may reduce the O complexity of the 
algorithm from n^2 to < (n^2+n)/2. (Although it will make 
the code a bit more complex)

The matrix S[i][j] = exp(con * (x[i]-x[j])^2) is symetric about the 
i==j axis.  S[i][j] == S[j][i]!!!.  Also, the diagonal always evaluates
to 1.!!!  This means that you need calculate less than half of the S 
values.  

	for (j=0; j<n; j++) {
		a[j] = 1;	/* put in the diagonal */
	}

	for (j=0; j<(n-1); j++) { /* don't need to do the last row */
		for (i=j+1; i<n; i++) {  /* this excludes the i==j axis */
			diff = x[i] - x[j];
			s = exp(con * diff * diff);
			a[j] += s;
			a[i] += s;
		}
	}

Now if you "pointerize" this algorithm, it will be much faster than
your original.

Caveat:  This is all theory, I haven't tested it at all.

Note:  I am posting this since other netpeople more knowledgable than
	   I may offer more help.

-- 
+-Robert C. Martin-----+:RRR:::CCC:M:::::M:| Nobody is responsible for |
| rmartin at clear.com    |:R::R:C::::M:M:M:M:| my words but me.  I want  |
| uunet!clrcom!rmartin |:RRR::C::::M::M::M:| all the credit, and all   |
+----------------------+:R::R::CCC:M:::::M:| the blame.  So there.     |



More information about the Comp.lang.c mailing list