low level optimization

Mon Apr 15 22:52:06 AEST 1991

In <1991Apr9.213601.12309 at agate.berkeley.edu> rkowen at violet.berkeley.edu (Dario Bressanini) writes:

>>>   Consider for a moment (yes, these are not equivalent):
>>>
>>>           x = ++c;       vs       x == c++;
>>> These can be "compiled" as:
>>>
>>>                                   temp_001 <-- c;
>>>           c <-- c + 1;            c <-- c + 1;
>>>           x <-- c;                x <-- temp_001;
>>>
>>> (now throw away the "x=" part (last "instruction").
>>>
>>>So, "++c" is ``cleaner'' in some pedantic sense[1], and I suppose a
>>>sufficiently lacking compiler might actually produce slower code
>>>for "c++;" than for "++c;".                  ^^^^^^^^^^^^^^^^^^^^

>I always read this group with interest, i have learned a lot following
>many discussions, but sometimes the "war"  "This statement is faster
>than the other" comes up, like in this case. I don't want to discuss
>this particular case, but to make some general personal observations.
>Based on my experience, I hardly believe that  in a "real world code"
>one can improve the performances of a program using this kind
>of "optimization".
>When i REALLY HAVE to optimize a program, first of all i use 
>a profiler to see where is the bottleneck, and THEN i try to optimize it;
>probably I am biased since i mostly write programs (90% in FORTRAN)
>for scientific computation (yes, I use floating points :-) ) where usually
>you have a single routine, or even a single loop, that takes 95% of 
>the CPU time.
>In most cases the best way to gain speed was to change completely
>the algorithm, and not to make some very low level optimization.
>Following the latest "this is faster than that" wars I had 
>the impression that they were pure void theoric discussions, without any
>connection with the "real world", at least to my world.

>Just in case.... I don't want to start the usual and useless war
>C vs FORTRAN etc..,i would like to use C for my programs, but in most
>cases i tried, the Code produced by the C compiler was 2 or 3 times
>slower that the fortran code.

To me it seems you are contradicting yourself. First you tell us that you
only find algorithmic improvement worthwhile, then you complains about slow
code produces by C compared to Fortran. 
  As you are probably well aware, the speed difference you noticed is very
dependent of computer and compiler, but a few observations may be of interest:
You cannot do much low-level optimization yourself in Fortran, right?
So, Fortran compilers tend to have very clever optimizers, especially when
the program does number crunching which often is quite easy to handle for an
optimizer. 
  C, on the other hand, allows the programmer a lot more freedom to tune his or
her program himself, using register variables, pointer arithmetic and other
interesting stuff. This freedom gives optimizers a hard time (especially 
pointers is a big problem, since a optimizer cannot assume that certain
values can be kept in registers if a indirect assignment can change them..)
  Fortran may therefore have an edge on number crunching, but I doubt it holds
in general (not that many successful operating systems, compilers, word processors
or whatever are written in Fortran (at least not that I know of, no need to flame))
  Further, were the C versions of the programs you compared tuned by somebody that
is used to doing the tuning ? Were they tuned at all ?  If you want maximum speed
you have to do pointer arithmetic rather than array indexing on many machines/compilers.
(Since you can do this yourself in C, most optimizers don't bother with it, while
Fortran optimizers virtually allways makes this for you). When converting from Fortran
you also have to consider the column-major and row-major storage used for two
dimensional arrays and so forth.
  Most of the time improvement of the algorithm is preferable, but when you have
the best known algorithm and your program is still slobbish what do you do then ?
Turn to assembly language ?  I have found low-level optimizations of the discussed
kind most helpful in tight loops, sometimes speeding it up with a factor of two or so.
The only problem is that many such optimizations are machine dependent (sometimes
even compiler dependent) for example: the MC68000-series has indirect addressing modes
with post-increment and pre-decrement, corresponding to C expressions like *(p++) and
*(--p) respectively. There are no pre-increment or post-decrement addressing modes,
however. Knowing this may speed up many loops when running on a 68k, but may not
have any effect whatsoever on another machine (I bet there is some machine out there 
that has only pre-increment and post-decrement on which it will run slower...)  

-Anders Andersson      (andand at cia.docs.uu.se)