Wanted: fast, low-precision trig functions for C

Tue Feb 28 13:40:40 AEST 1989

In article <1256 at dukeac.UUCP> tcamp at dukeac.UUCP (Ted A. Campbell) writes:
>I am writing a graphics-based program that has to calculate 
>several thousand map coordinates.  ...
>I wondered if there were any fast, low-precision approaches 
>to the standard trig functions, ...

The cost of the "extra precision" over a less precise but
otherwise similar approximation is typically small.

However, there may be hope -- I know of two circumstances
under which one can substantially speed up such functions:

(1)  If all coordinates will be taken from a grid, as when
dealing with graphic display devices, then several methods
are available for quickly returning answers that are as
precise as the grid resolution allows.  Unfortunately the
ones I know of are implemented as proprietary code.  If
you have access to sources for graphic devices, e.g. DMD,
check there for fast approximate trig functions.

(2)  For more or less arbitrary floating-point coordinates,
if there are only a manageable number of distinct function
arguments, then you may be able to "cache" them, either
precomputed or computed the first time a value is needed,
stashing the value for immediate use the next time the
same argument is supplied.  For this to work well, there
must be a fast method of mapping the argument into the
correct cache entry.

	(a)  If the arguments are all regularly spaced,
	for example integral number of degrees, then
	the cache can be an array of return values indexed
	by the integer number of spacing intervals.  E.g.
		double cosval[360] = { 1.0, 0.999848, ... };

	(b)  If the arguments are arbitrary within some
	domain, but the function is nicely-behaved, then
	you can obtain good enough accuracy by converting
	the argument to some number of intervals, as above,
	rounding to the nearest integral number, and proceed
	as above.  For example, the slope of cos() is no
	worse than +-1, so for 0.1% accuracy it would suffice
	to divide the range [0,Pi/2] into on the order of a
	thousand bins.

Once you start thinking along these lines, you can probably
spot several areas for code speed improvement.  The key is
that the cost to access the cache must be less than the cost
of whatever approximation method the standard library is
using.  If your machine has a "polynomial evaluate" instruction,
it might be hard to beat the standard library.