faster bcopy using duffs device (source)
Chris Torek
chris at mimsy.UUCP
Sat Sep 9 15:02:12 AEST 1989
>In article <19473 at mimsy.UUCP> I suggest that
>>bcopy() should be written in assembly (on most processors), put in
>>a library, and forgotten about ....
In article <5603 at thor.acc.stolaf.edu> mike at thor.acc.stolaf.edu (Mike Haertel)
writes:
>I just tried the obvious bcopy "while (n--) *s++ = *d++;"
>on a 68010 using gcc. It produced a dbra loop that beat
>the sh*t out of the supposedly carefully handcoded one
>in the C library. (Which is a Duffish sort of thing that
>tries to copy fullwords at a time. Not Duff's device,
>but structurally similar.)
Alas, gcc is rather the exception, although this is changing (more
vendors are discovering gcc!), and it is disappointing how many
supposedly carefully handcoded simple-but-time-consuming routines
(bcopy, bzero, bcmp; or on SysV, memmove, memset, memcmp) were actually
thrown together by someone who had no idea what was optimal.
Just for fun, here is my bcopy-in-C for a 68010, assuming you have a
compiler that is as smart as gcc. (In fact, it has been tested with
gcc, and, at least with the gcc on our Suns, compilers very nicely,
aside from one #ifdef. Curiously, the gcc-specific hack is necessary
only for one of the two loops.)
/*
* Copyright (c) 1989 by the University of Maryland Computer Science
* Department. All rights reserved. Permission to copy for any
* purpose is hereby granted so long as the copyright notice remains
* intact.
*/
#ifdef test
typedef unsigned int size_t;
#else
#include <stddef.h>
#endif
/*
* Block copy from src to dst, len bytes.
* Allows misaligned copies (with attendant performance penalty)
* and allows overlapping regions.
*
* We make the asumption that <read,write,read,write> cycles are
* as fast as <read,read,write,write> cycles so that we can move
* shortwords at a time. If this is not true (e.g., if writes are
* done `instantly' through a write buffer), this is suboptimal,
* and the loop should be changed to move longwords at a time.
*/
void bcopy(register char *src, register char *dst, register size_t len) {
register size_t l2;
if (len == 0)
return;
/* check for potential overlap */
if ((unsigned long)src + len > (unsigned long)dst) {
/* copy backwards */
src += len;
dst += len;
/* get operands word-aligned */
if ((long)src & 1)
*--dst = *--src;
if ((long)dst & 1) {
/* cannot word-align both operands */
while (--len != (size_t)-1)
*--dst = *--src;
return;
}
/* copy words until no words left */
l2 = len >> 1;
while (--l2 != (size_t)-1) {
#ifndef __GNUC__
/* for some reason, gcc does not optimise this */
*(short *)dst = *(short *)src;
dst -= sizeof (short);
src -= sizeof (short);
#else
/* so we use a gcc extension */
*--(short *)dst = *--(short *)src;
#endif
}
/* copy last byte, if any */
if (len & 1)
*--dst = *--src;
} else {
/* copy forwards, otherwise identical */
if ((long)src & 1)
*dst++ = *src++;
if ((long)dst & 1) {
while (--len != (size_t)-1)
*dst++ = *src++;
return;
}
l2 = len >> 1;
while (--l2 != (size_t)-1) {
*(short *)dst = *(short *)src;
dst += sizeof (short);
src += sizeof (short);
}
if (len & 1)
*dst = *src;
}
}
--
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7163)
Domain: chris at mimsy.umd.edu Path: uunet!mimsy!chris
More information about the Comp.lang.c
mailing list