micro-optimizing loops (was Help with casts)

Mon Feb 25 19:19:07 AEST 1991

I have little to had to Chris's summary except to note that some machines
have the equivalent of the PDP-11's SOB instruction, which does both a
decrement (increment) and a branch on condition code in one instruction,
e.g the 68xxx series.  For these machines you have a one instruction overhead
in a loop.

Also the natural thing to do is to jump to the end of the loop immediately,
e.g.

		goto 2$
	1$:	loop body
		iteration code (i++, whatever)
	2$:	goto 1$ if test is satisfied
	3$:

Replicating the test really doesn't save you much, i.e.

		goto 3$ if test is not satisfied
	1$:	loop body
		iteration code
	2$:	goto 1$ if test is satisfied
	3$:

doesn't win much because the cost of a naked goto (as distinct from the
cost of a naked civil servant) is very cheap [for CISC machines].

For reasons that are not clear to me many optimizing compilers will not
collapse the two machine instructions

		dec r1
		bge 1$

into the available single instruction to do the same thing.  Perhaps
some of our compiler writers can explain this to us.

Finally, the inimitable Dan B. points out that the do-while construct
gives the compiler its best shot at optimizing code.  Dan is right,
albeit a little sloppy.  [Hint 1 -- the do-while loop executes at least
once].

While I am at it there is an important point that has not been made.
Generally speaking, when you are altering code (presumably for optimiz-
ation since clarity is irrelevant in C :-)) it is desirable that a
transformation preserves semantics.  Or, to be more precise, it should
be clear what is being preserved by the transformation and what is not.

For example, the construct 'for (i=0;i<E;i++)' has the properties that
the loop body will be executed for i=<0,...E-1>, that i assumes these
values sequentially increasing, that i and E may be signed or unsigned,
and that the loop body will not be executed if E is negative.  Now consider
the three posted alternatives for a countdown loop:

(A)	for(i=E;--i>=0;) {body}
(B)	for(i=E-1;i;--i) {body}
(C)	i=E-1; do {body} while(i--);

All three do not satisfy the condition i assumes the values 0 to E-1
sequentially increasing.  Alternative B fails if i is unsigned.
Alternative C fails if E<=0 initially.  What this means is that B and
C are weaker and less valuable transformations than A.  On the other hand
a corrected version of (C)

(D)	i=E; if (i--) {do {body} while(i--);}

is equivalent to (A), and, as Dan observes, stands the best chance of
being completely optimized.

Some of you may think I am being fussy (and you would be right).  However
part of the trade of being a programmer is taking programs which are
(supposedly) correct and changing them into other faster/smaller programs
which are equivalent.  Ones chances of doing this are much better if one
relies on proven transformations with known semantics.
-- 
Richard Harter, Software Maintenance and Development Systems, Inc.
Net address: jjmhome!smds!rh Phone: 508-369-7398 
US Mail: SMDS Inc., PO Box 555, Concord MA 01742
This sentence no verb.  This sentence short.  This signature done.