Loop unfolding

Earl Killian earl at mips.COM
Fri Sep 2 15:11:56 AEST 1988


In article <5612 at june.cs.washington.edu> rik at june.cs.washington.edu (Rik Littlefield) writes:
   ...
   As I said, unrolling was a very effective way of removing virtually
   all the overhead from these loops.  Can anyone suggest other solutions
   analogous to the alternatives mentioned above, or for that matter,
   any better solution other than assembly language?

I think unrolling is the right answer, but I would suggest just
writing straight-forward C and using a good compiler instead of doing
it in the source.  I did that for your examples.  Here are the
inner loops:

  for (i = 0; i < n; i += 1) s += p[i];
	lw	t2,0(a1)
	lw	t3,4(a1)
	lw	t4,8(a1)
	addu	v1,v1,t2
	lw	t5,12(a1)
	addu	v1,v1,t3
	addiu	a1,a1,16
	addu	v1,v1,t4
	bne	a1,a2,0x68
	 addu	v1,v1,t5

  for (i = 0; i < n; i += 1) p[i] = *q[i];
	lw	t7,0(a2)
	lw	t9,4(a2)
	lw	t8,0(t7)
	lw	t0,8(a2)
	sw	t8,0(a1)
	lw	t1,0(t9)
	lw	t3,12(a2)
	sw	t1,4(a1)
	lw	t2,0(t0)
	addiu	a2,a2,16
	sw	t2,8(a1)
	lw	t4,0(t3)
	addiu	a1,a1,16
	bne	a2,a3,0x118
	 sw	 t4,-4(a1)

  volatile int *r;
  for (i = 0; i < n; i += 1) *r = p[i];
	lw	t0,0(a1)
	addiu	a1,a1,16
	sw	t0,0(a2)
	lw	t2,-12(a1)
	nop
	sw	t2,0(a2)
	lw	t3,-8(a1)
	nop
	sw	t3,0(a2)
	lw	t4,-4(a1)
	bne	a1,a3,0x1c4
	 sw	t4,0(a2)

For the last an assembly language coder could do better because he can
know that r and p aren't aliased and get rid of the nops.  For the
others, the compiler's code is hard to beat.
-- 
UUCP: {ames,decwrl,prls,pyramid}!mips!earl
USPS: MIPS Computer Systems, 930 Arques Ave, Sunnyvale CA, 94086



More information about the Comp.lang.c mailing list