Unrolling string copy loop

Radford Neal radford at calgary.UUCP
Tue Apr 2 07:33:46 AEST 1985


> 	sym.1:
> 		movb	(r2)+,(r1)+
> 		bneq	sym.1

> By the way, Colonel, this loop is not improved by unrolling.

WRONG! I timed the following two routines:

# String copy with ordinary loop.

_sc1:	.word	0
	movl	4(ap),r1
	movl	8(ap),r2

1:	movb	(r1)+,(r2)+
	bneq	1b

	ret

# String copy with unrolled loop.

_sc2:	.word	0
	movl	4(ap),r1
	movl	8(ap),r2

1:	movb	(r1)+,(r2)+
	beql	2f
	movb	(r1)+,(r2)+
	beql	2f
	movb	(r1)+,(r2)+
	beql	2f
	movb	(r1)+,(r2)+
	beql	2f
	movb	(r1)+,(r2)+
	beql	2f
	movb	(r1)+,(r2)+
	beql	2f
	movb	(r1)+,(r2)+
	beql	2f
	movb	(r1)+,(r2)+
	beql	2f
	movb	(r1)+,(r2)+
	beql	2f
	movb	(r1)+,(r2)+
	bneq	1b

2:	ret

The first takes 120 microseconds to copy a thirty character string. The
second takes only 100 microseconds. 

Seems that branches not taken are faster than branches which are taken.

    Radford Neal
    The University of Calgary



More information about the Comp.lang.c mailing list