C "optimization" (3 of 8)

Wed Feb 15 05:21:54 AEST 1984

This is a continuation of my diatribe on "C doesn't optimize, it neatens".
In this and other articles, I compare a true optimizing compiler (Bliss-32
running under VMS) to a code neatener (C running under BSD 4.1c).  Any and
all counterexamples are welcome.  However, this is NOT a comparison of the
languages.  Both C and Bliss have their good and bad points.  This is simply
a comparison of the code they generate.  As in all examples, the source
code and uncensored assembly code is presented.  In all examples, the C source
and Bliss source are as nearly identical as language differences permit.  I
have not taken advantage of any "tricks" to get either language to perform
better or worse than the other.  The optimizer was enabled for both languages.

		-Dan Klein, Mellon Institute, Pittsburgh	(412)578-3382
=============================================================================

In this example, I demonstrate the ability (and lack thereof) of the compilers
to extract loop invariant code from the body of a loop.  This is a technique
we were all taught in Programming-1.  The Bliss compiler knows about this
technique, and does it for you when you forget (or when it is less elegant to
create a temporary variable to hold the invariant value).  What I do here is
loop on "i" from 0 to "(5+a)/2", and do *nothing* in the body of the loop.
The invariant expression is "(5+a)/2".  It doesn't take a genius to see that
that value will never change in the loop, especially since the loop does
nothing at all (let alone reference "a").  This is a very simple example of
loop invariant code.  Bliss can recognize more complex examples than this.
Neither compiler eliminates the loop altogether.  This is a religious issue,
in that "is the loop needed at all if you know you aren't going to do anything
in it".  There is no "right" answer to that question, since it is really very
application dependant (i.e. do you really want to ignore software timing
delays?).  So, on to the comparison:

	1) Bliss recognizes the loop invariant section of the loop, and
evaluates it once (before the loop is executed).  Thereafter, it does not
need to reevaluate the expression.  The C compiler, on the other hand,
evaluates the limit expression before each pass of the loop.  Not only is
this computationally redundant, but speed inefficient.
	2) Bliss uses the AOBLEQ (Add One and Branch if Less or Equal) to
effect the loop.  The C compiler (after having recalculated the limit
expression), uses an "incr" / "cmpl" / "jleq" combination.  This is less
efficient in both speed and space.
	3) In the calculation of the limit expression, both compilers need
a temporary variable to place the result.  The Bliss compiler chooses R1,
while the C compiler allocates a stack location.  This is a poor choice on
the part of C, since stack accesses take far longer than register accesses
and require more bytes of assembly code.  The register "r1" is available
(and does not need to be preserved on routine entry), so C should use it.
	4) The C compiler allocates a single variable on the stack in the
wrong way.  It emits "subl $4,sp" / "clrl -4(fp)" when it could much more
efficiently do "clrl -(sp)".  Thereafter it refers to the variable as "-4(fp)"
when it should use "(sp)".  The latter takes 1 bytes versus 2.  However, as
mentioned in 3) above, using "r1" is better all around.
	5) The Bliss compiler sets the loop index variable to be 1 less than
the starting value it needs, and immediately increments it (i.e. the loop
increment is at the top of the loop).  The loop increment in C is also at the
top of the loop, but C sets the variable to be what it wants to start at,
skips the loop increment the first time, and hits it each time afterward.
For loop increments that are complex (i.e. involve pointer deaccessing),
this is a reasonable approach.  However, for simple increments (like "i++"),
the code is wasteful.
----------------------------------------+-------------------------------------
routine test(a) : novalue =		|	test(a)
begin					|	int	a;
					|	{
    incr i from 0 to (5+.a)/2 do ;	|	    int i;
					|
end;					|	    for (i=0; i<=(5+a)/2; i++) ;
					|	}
					|
	.TITLE  FOO			|		.data
					|		.text
	.PSECT  $CODE$,NOWRT,2		|	LL0:	.align  1
					|		.globl  _test
TEST:	.WORD	^M<>			|		.set	L12,0x0
	ADDL3	#5, 4(AP), R1		|		.data
	DIVL2	#2, R1			|		.text
	MNEGL	#1, R0			|	_test:  .word	L12
1$:	AOBLEQ  R1, R0, 1$		|		subl2	$4,sp
	RET				|		clrl	-4(fp)
					|		jbr	L18
					|	L200001:incl	-4(fp)
					|	L18:	addl3	$5,4(ap),r0
					|		divl2	$2,r0
					|		cmpl	-4(fp),r0
					|		jleq	L200001
					|		ret