Array indexing vs. pointers...

Sun Oct 2 00:49:00 AEST 1988

>    I've been programming for YEARS, and if it's not a professional trade
>secret, I'd love it if you would summarize your "coding practices" and
>email them or post them.  If you can get half the added efficiency
>you're talking about, the programming practices be would of interest to the
>whole computing world.  Like you I spend a great deal of my time
>optimizing code for other people, and I have yet to see more than a
>20-25% improvment (even over BAD code.)  
>                *** THIS IS NOT A FLAME OF ANY KIND ***
>I would sincerely like to see these practices.  They might give me a new
>outlook on coding.

I think that 25% or even 60% improvement is not a bad guess. This is
only an average, though, and includes, in my experience, some cases
of truly great gains , that is 30000% improvement (300 times faster).

I'll be happy to share my experiences (I have my flameproof suit on,
wear yours).

1.
      Don't use a bad algorithm. I have seen programs with slow Fourier
transforms in them, a factor of 10 to 50 too slow. The NCAR graphics
package included a 3d surface contour plotter that drew with a pen
plotter in such a way that it picked up the pen after EVERY line
segment, even if the next one started where the last one left off
(also present in Mathcad)(factor of 8). The latest version (3.9p)
of MicroEmacs searches 30 times faster than the previous one by
an improved algorithm (though the old one suffered from another
problem also.)

2.     
       Don't use a general purpose routine to do a very specific,
simple thing. The classic case in C is writing 
     char ss[80];
     ss[0] = 'a';
     ss[1] = '\0';
      printf("%s",ss);

instead of putchar('a');  ... note that I'm not talking about the
first one happening occasionally in a loop.

-2.    Don't reinvent a slow wheel if a fast one is lying around for
the taking. (Illustrated by the recent thread on Duff's device, where
several folks discovered memcpy to be faster. Save hacks like that
for where they are NEEDED - like writing to IO space, where memcpy
might not work.) 

3.     Don't use a complicated canned system to push square pegs
in round holes. This is where I have found my factors of 300. Graphics
systems like GKS and Phigs are the classic case, as are the graphics
routines in Microsoft Windows (and, I'll bet a cup of coffee, OS/2),
but not, for some odd reason , the Mac. Those things go through so
many layers of code from your call to the screen that they are
guaranteed to be slow. If you have a simple task (as most of mine are)
like drawing a bunch of circles and ellipses and moving them around
the screen fast, use a simple tool. Write your own tool to draw ellipses
direct to the screen - direct to the hardware. My routines beat
Microsoft WIndows by a factor of 10 and IBM's GKS by 100 to 1000 times
(that's right, 1000 times faster). Save the big clumsy packages for
big clumsy tasks - if your calculations take 10,000,000 lines,
they are probably just fine.

4.     Don't use nested subroutines to do lots of simple things
inside lots of simple things. Write the damn stuff out, even if it makes
more code. The classic case here is IBM's PC video bios.

5.     Okay, we admit the rest is tweaking. 25% is a good goal.
To get the fastest code, the first thing to do is convince yourself
(and maybe your boss - luckily I don't have one) that you WANT to get
it faster. Then understand the hardware and instruction set of the
target computer (portable? what's that? We have already decided to go 
go the gold medal). Look at the compiler's .asm output. Try changing
things like x*65 to x<<6+x. It might be faster, might not. IF
your code does indexing like      for(i = 0; i < 100; i++)a[i] = i*6;
change it to                 b = a;
                             j = 0;
                             for(i = 0; i < 100; i++, j += 6)*b++ = j;
It might (or might not) make a difference.

Honestly now - aren't all these things obvious? I would love to hear
about non-obvious ones!. Nevertheless, I have seen them missed more
often than not.

6.     When all else fails, cheat. Use assembly language. Look at the
compiler output. Maybe the optimizer isn't perfect. Remove stores of
never used values. Play with instruction order. Maybe your processor
can overlap floating point computations with fixed point ones -
take advantage of this. Maybe you can assign all your variables
in a loop to registers and get a big gain there (I got a 15% gain
in one graphics routine by storing data in argument pointer and
segment registers on the 8086). There are some cases where certain
compilers can do a better job than you or me, mainly RISC machines
with microscheduling compilers. I'd still like to see what the
guy who designed the scheduler could eke out if he tried hard!

Doug McDonald

D
D
                             for(i = 0; i < 100; i++, j+=6)