Integer square root routine needed.

Wed Aug 2 10:02:20 AEST 1989

In article <7415 at ecsvax.UUCP>, utoddl at ecsvax.UUCP (Todd M. Lewis) writes:
> This may be the wrong place to post this, but I need C source
> to a reasonably fast integer square root routine.  (Actually
> a clear description of an efficient algorithm would do--I can
> write the code myself.) 

After seeing a Newton's method, I had to go write up this one again.
This basic algorithm was used in a floating point package I wrote
a long long time ago when I was a wee lad.  (For those who are curious, 
it used a 16-bit exponent and a 16-bit mantissa.  This means you can 
represent numbers in the range 10^+-9800 to three significant digits.
Very useful for games, where accuracy takes a back seat to speed.
End of digression.)

Okay, so here's the code.  I wrote it from memory in a few minutes,
so it must've been pretty obvious :-)  If you like it, dislike it,
or find something wrong with it, let me know.
--

#define BITSPERLONG 32

#define TOP2BITS(x) ((x & (3 << (BITSPERLONG-2))) >> (BITSPERLONG-2))

/* usqrt:
    ENTRY x: unsigned long
    EXIT  returns floor(sqrt(x) * pow(2, BITSPERLONG/2))

    Since the square root never uses more than half the bits
    of the input, we use the other half of the bits to contain
    extra bits of precision after the binary point.

    EXAMPLE
        suppose BITSPERLONG = 32
        then    usqrt(144) = 786432 = 12 * 65536
                usqrt(32) = 370727 = 5.66 * 65536

    NOTES
        (1) change BITSPERLONG to BITSPERLONG/2 if you do not want
            the answer scaled.  Indeed, if you want n bits of
            precision after the binary point, use BITSPERLONG/2+n.
	    The code assumes that BITSPERLONG is even.
        (2) This is really better off being written in assembly.
            The line marked below is really a "arithmetic shift left"
            on the double-long value with r in the upper half
            and x in the lower half.  This operation is typically
            expressible in only one or two assembly instructions.
        (3) Unrolling this loop is probably not a bad idea.

    ALGORITHM
        The calculations are the base-two analogue of the square
        root algorithm we all learned in grammar school.  Since we're
        in base 2, there is only one nontrivial trial multiplier.

        Notice that absolutely no multiplications or divisions are performed.
        This means it'll be fast on a wide range of processors.
*/

unsigned long usqrt(unsigned long x)
{
    unsigned long a = 0;                    /* accumulator */
    unsigned long r = 0;                    /* remainder */
    unsigned long e = 0;                    /* trial product */

    int i;

    for (i = 0; i < BITSPERLONG; i++) {      /* NOTE 1 */
        r = (r << 2) + TOP2BITS(x); x <<= 2; /* NOTE 2 */
        a <<= 1;
        e = (a << 1) + 1;
        if (r >= e)
            r -= e, a++;
    }

return a;
}
--
Raymond Chen, mathematician by training			...!microsoft!t-rayc