Explanation, please!

Kenneth Goodwin klg at njsmu.UUCP
Fri Sep 2 03:19:39 AEST 1988


In article <9064 at pur-ee.UUCP>, hankd at pur-ee.UUCP (Hank Dietz) writes:
> In article <189 at bales.UUCP>, nat at bales.UUCP (Nathaniel Stitt) writes:
> > Here is my own personal version of the "Portable Optimized Copy" routine.
> 2.	If the number of items/bytes is not known, then build a binary tree of
> 	such structs and copy half, then half of what remains, etc.  This is
> 	struct t512 { int t[512]; };
> 	struct t256 { int t[256]; };
> 	struct t128 { int t[128]; };
	.... etc .....
> 	if (n & 512) {
> 		*((struct t512 *) q) = *((struct t512 *) p); q+=512; p+=512;
> 	}
> 	if (n & 256) {
> 		*((struct t256 *) q) = *((struct t256 *) p); q+=256; p+=256;
> 	}
	...  etc ...
> 	Incidentally, this ran about 8x faster (on a VAX 11/780) than using
> 	the usual copy loop.  Unfortunately, the above code should have been
> 	written as:
> 
> 	if (n & 512) {
> 		*(((struct t512 *) q)++) = *(((struct t512 *) p)++);
> 	}
> 	...

	BUT This is where UNIONS come in handy, I used a similar although
	more brief technique for a faster version of a bmov() (byte move)
	subroutine on our PDP11-70 a while ago, and subsequently ported
	it to memcpy when we updated from V6 to System V.
	The basic idea that was used is to create a union of long, int,
	(short), and char pointers, use the character pointer to achieve
	the needed alignments and then use the largest available pointer
	to do the copy. There is no reason why a stucture copy could not be
	used, although I suspect on NON-VAX systems it may actually 
	be detremental (sp?) in some cases.
	The PDP11 C compiler used to stuff registers onto the stack
	and create a 16 bit word copy loop to do structure copies
	using the freed registers, restoring them when it was done.
	So a structure copy would be the same as a word copy on that style
	of a system (ie, ones without block move instructions)

	So In the case of your example, a modified brief version of it
	would be:

		union ptr_types {
			struct t512 { int t512[512] } *t512;
			....
			struct t32 { int t32[32] } *t32;
			long	*t_long;
			int	*t_int;
			short	*t_short;
			char	*t_char;
		} ;

		(probably could dispense with long and short pointers
		and related tests)

	memcpy(a, b, len)
	char *a; *b;
	{
		register union ptr_types a_ptr, b_ptr;

		a_ptr.t_char = a;
		b_ptr.t_char = b;

		while(NOT ON A WORD BOUNDARY AND CHARS LEFT) {
			*a_ptr.t_char++ = *b_ptr.t_char++;
			len--;
		}
		if(len >= sizeof(int) * 512) {
			/* if we can use a 512 int structure copy */
			*a_ptr.t512++ = *b_ptr.t512++;
			len -= (512 * sizeof(int));
		}
		/*M the biggest win is that the pointers increment correctly
		len -= (sizeof(*element pointer)) is the correct form over
		N INTS * sizeof int */

		.......
		I guess the rest is obvious, some GLUE may be needed
		that has not be shown.... :-)
		Boundaries should be checked on source and destination addresses
		to avoid memory faults....
		As you may be given incompatible source and destination address
		that may require a full char by char copy. The first
		test loop sort of does this, but all the other copies
		should also check for proper address alignments before
		proceeding.
Ken Goodwin
NJSMU.



More information about the Comp.lang.c mailing list