A study in code optimization in C

Richard Harter rh at smds.UUCP
Thu Jul 26 16:07:23 AEST 1990


The macro shown below is an optimized memory to memory copy macro.
It is probably faster than memcopy on your machine -- I have checked
it on several machines and have always found it to be faster.

Some may find it instructive as an example of code optimization.
Techniques used are:  loop unrolling, localized use of register
variables, count down loops, and shift and modulo operators instead
of arithmetic operators.

Some general notes:

There are two major ways to unroll a loop with a variable remainder.
The method used here is to use two loops, one for the remainder and
one for the bulk.  The other method is to jump into a loop inside a
switch (legal but bizarre.)  The second method has the advantage of
not needing the "short" loop for the remainder.  However timing studies
show no real difference between the two methods.  The reason is that
the overhead of setting up the switch balances off with the higher cost
of the short loop.

Given two loops, one which is unrolled by a constant, and a short loop
handling the remainder, a loop unrolling factor of 8 is optimal under
most circumstances.

A count down loop saves several instructions in a CISC machine,
provided that (a) the loop variable is a register variable, and (b)
the optimation option is selected.  The savings are due to (1) the
comparison is against 0 rather than a constant, (2) CISC machines
typically have a branch on result of comparison, (3) optimization
and register declaration avoids a store of the loop variable.  For
some compilers there is also a savings by placing the calculation
of the limit in the first clause of the loop when the limit is a
complex expression.  It should be noted that the savings are only
signifigant in tight loops.

A count down loop should have one of the following two forms:

	for (index=(expression)+1;--index>0;) {body}
	for (index=(expression);--index>=0;) {body}

Plum Hall uses the first form; I use and recommend the second as
being clearer.  The second form will fail (infinite loop) if the
index is unsigned.  However the savings will be lost in any case
if the index is unsigned.  These forms should be used as is; the
loop optimization is lost if you don't predecrement in the test.

The macros do not include or exploit an alignment test on the grounds
that (a) the resulting code is not portable, (b) the overhead of the
tests removes the savings in short and medium length copies, (c)
many aligned copies can be caught by passing the proper type, and
(d) I didn't feel like coding it.

The copyright notice is included solely on the grounds that I feel
that if you use somebody elses code you should acknowlege where you
got it from.

------------------------ CUT HERE -----------------------------------
/*
	Copyright (c) 1990, Software Maintenance and Development
	Systems Inc.  Permission to copy, to modify, and to distribute,
	whether for sale or not is granted, provided that a copy of
	this notice is included and is not altered.

	Macro name:  gencopy
	Function:    In memory copy
	Author:	     Richard Harter
	Arguments:
		dest	Pointer to destination
		src	Pointer to source
		nb	Number of items to copy
		type	Type of items
	Synopsis:

	This macro is an optimized in line memory to memory copy
	routine.  The following restrictions apply: (a) if dest and
	src overlap, dest must have a lower address than src; (b)
	the item type must be one for which assignment is valid.

	Bugs:

	No check is made for negative length.  Results may be
	surprising.
*/

#define gencopy(dest,src,nb,type ) {register type *cpy_a;\
register type *cpy_b;\
register int cpy_i,cpy_j;cpy_a=(dest);cpy_b=(src);cpy_j=(nb);\
for(cpy_i=(cpy_j & 7); --cpy_i>=0;) *cpy_a++ = *cpy_b++;\
for (cpy_i=(cpy_j>>3); --cpy_i>=0;) { *cpy_a++ = *cpy_b++;*cpy_a++ = *cpy_b++;\
*cpy_a++ = *cpy_b++; *cpy_a++ = *cpy_b++; *cpy_a++ = *cpy_b++;\
*cpy_a++ = *cpy_b++; *cpy_a++ = *cpy_b++; *cpy_a++ = *cpy_b++; }}

#define copy(dest,src,nb) gencopy(dest,src,nb,char)
------------------------ CUT HERE -----------------------------------
-- 
Richard Harter, Software Maintenance and Development Systems, Inc.
Net address: jjmhome!smds!rh Phone: 508-369-7398 
US Mail: SMDS Inc., PO Box 555, Concord MA 01742
This sentence no verb.  This sentence short.  This signature done.



More information about the Comp.lang.c mailing list