Explanation, please!

Wed Sep 7 05:03:38 AEST 1988

In article <5654 at june.cs.washington.edu>, pardo at june.cs.washington.edu (David Keppel) writes:
> 
> I can immagine that on some machines it is faster to copy words into
> register and repack the words in the registers rather than do a byte
> copy, since you could be taking advantage of some hardware gak.
> 

On the old CDC 6000-series machines (early RISCs...) that was the *only*
practical way to do it, as well as being blazingly fast.  We had copies
that would handle arbitrary *bit* alignments at a cost of around 6 instructions
and 2 memory references per 60-bit word, in the middle of the string.  
The sequence was basically fetch, shift, mask, mask, OR, and store, 
appropriately rearranged to minimize memory delay and functional unit 
conflicts, of course.  I vaguely remember that this thing could even
be unrolled a couple of times and still fit in the instruction cache
("stack", in those days) for machines expensive enough to have one.

VAXen I don't know about for sure, but I'd be real surprised if their
microcode didn't do the same thing.

--Rik