Loop unfolding
Earl Killian
earl at mips.COM
Fri Sep 2 15:11:56 AEST 1988
In article <5612 at june.cs.washington.edu> rik at june.cs.washington.edu (Rik Littlefield) writes:
...
As I said, unrolling was a very effective way of removing virtually
all the overhead from these loops. Can anyone suggest other solutions
analogous to the alternatives mentioned above, or for that matter,
any better solution other than assembly language?
I think unrolling is the right answer, but I would suggest just
writing straight-forward C and using a good compiler instead of doing
it in the source. I did that for your examples. Here are the
inner loops:
for (i = 0; i < n; i += 1) s += p[i];
lw t2,0(a1)
lw t3,4(a1)
lw t4,8(a1)
addu v1,v1,t2
lw t5,12(a1)
addu v1,v1,t3
addiu a1,a1,16
addu v1,v1,t4
bne a1,a2,0x68
addu v1,v1,t5
for (i = 0; i < n; i += 1) p[i] = *q[i];
lw t7,0(a2)
lw t9,4(a2)
lw t8,0(t7)
lw t0,8(a2)
sw t8,0(a1)
lw t1,0(t9)
lw t3,12(a2)
sw t1,4(a1)
lw t2,0(t0)
addiu a2,a2,16
sw t2,8(a1)
lw t4,0(t3)
addiu a1,a1,16
bne a2,a3,0x118
sw t4,-4(a1)
volatile int *r;
for (i = 0; i < n; i += 1) *r = p[i];
lw t0,0(a1)
addiu a1,a1,16
sw t0,0(a2)
lw t2,-12(a1)
nop
sw t2,0(a2)
lw t3,-8(a1)
nop
sw t3,0(a2)
lw t4,-4(a1)
bne a1,a3,0x1c4
sw t4,0(a2)
For the last an assembly language coder could do better because he can
know that r and p aren't aliased and get rid of the nops. For the
others, the compiler's code is hard to beat.
--
UUCP: {ames,decwrl,prls,pyramid}!mips!earl
USPS: MIPS Computer Systems, 930 Arques Ave, Sunnyvale CA, 94086
More information about the Comp.lang.c
mailing list