Execution time bottleneck: How to speed up execution?

Barry Margolin barmar at think.com
Wed Feb 13 10:35:22 AEST 1991


In article <21658 at yunexus.YorkU.CA> racine at yunexus.yorku.ca (Jeff Racine) writes:
>   for (j=1; j <= n; j++) {
>      sum = 0.0;
>      for (i=1; i<= n; i++) {
>         sum += exp( con * (x[i]-x[j]) * (x[i]-x[j]) );
>      }
>      a[j] = k*sum;
>   }

First a question about the correctness: do you really want to loop from 1
to n?  In C, if the array has n elements, the indices run from 0 to n-1,
not from 1 to n.

>This part of the algorithmn is a bottleneck. I want to make this as
>quick as possible. What are my options? I have heard that assembler
>would be faster. How much faster would assembler be? Would it be worth
>the effort? Are there other alternatives that you can see?

Assembler probably won't help much, unless you're already a pretty good
assembler programmer.  However, it would probably be helpful if you looked
at the assembler code produced by your C compiler, to see whether there are
ways you could recode the C so that it would produce more optimal compiled
code.

You might want to look at the book "Writing Efficient Programs" by Jon
Louis Bentley (he's the same person who went on to write the "Programming
Pearls" column in CACM (these columns have since been collected into a
couple of books).  It describes many general techniques for determining why
a program is slow and improving it.

As for your code, there are a couple of minor changes that may help a
little (but small improvements inside inner loops can have noticeable
effects).  They assume that your C compiler doesn't have a very good
optimizer (generally a pretty safe assumption).  I'll reference the
efficiency rules that Bentley describes.

Register Allocation.
  You reference i and j very frequently, so they should be declared
  "register".  This may not be necessary, though; I expect many compilers
  default to putting loop index variables into registers.

Loop Rule 1 -- Code Motion Out of Loops.
  Your inner loop refers to x[j] each time through the loop, although j
  doesn't change during that loop.  It would probably be better to do
  "register x_sub_j_temp = x[j];" before the inner loop, and then access
  x_sub_j_temp in the inner loop instead of x[j].

  You've already applied this rule in your assignments to a[j], by the way.
  A slower, but equivalent, way to have written your code would have been
  for the inner loop to contain:

	a[j] += k * exp (...);

Expression Rule 3 -- Common Subexpression Elimination.
  You perform the subtraction of the two elements twice when squaring.
  Instead, store the result of the subtraction into a temporary, and square
  that:

	temp_difference = x[i] - x[j];
	sum += exp (con * temp_difference * temp_difference);

  Many compilers are able to do this optimization themselves (it's one of
  the oldest and most well known), so it may not improve efficiency (but
  you might need to use the "register" declaration to get exactly the same
  code); look at the compiler's assembler output to see.  However, I
  personally find this good style even if it doesn't change run time; the
  originally code forced me to examine carefully to be sure that both
  subtractions were the same (it could have been (x[i]-x[j])*(x[j]-x[i]),
  which looks almost the same); by simplifying the expression you make it
  easier for the human reader to discern the structure of the formula (it
  would also help to use a more descriptive name than "temp_difference",
  especially if this difference has a name in the application domain).
--
Barry Margolin, Thinking Machines Corp.

barmar at think.com
{uunet,harvard}!think!barmar



More information about the Comp.lang.c mailing list