Multi-Processor Performance Problem

Mike Muuss mike at BRL.MIL
Thu Jan 11 20:39:57 AEST 1990


I have been running RT, BRL's parallel-processing ray-tracing code,
on our 4D/240 and 4D/280 machines.  I have noticed that there seems
to be an unusual amount of time recorded by gr_osview (and regular
osview) in the "system" category.  When I am lucky, about 10% of
all processors is consumed this way;  when I am unlucky, about 60%
of all processor time is consumed this way.

Thanks to the superb DBX that SGI provides, I was able to isolate this
activity to the library routine _hsetlock() calling the system call
sginap(0).  Very odd.  I fussed around for a while, and eventually
determined that the routine _hsetlock() only tries to acquire
the hardware interlock 20 times (in a *very* tight loop) before
giving up, and calling sginap(0).

This constant of 20 would seem to be from <ulocks.h> variable _USDEFSPIN:

#define _USDEFSPIN      20      /* default spin for lock */

Suspeciting the worst, I wrapped my calls to the library locking
routines with my own spin-lock checking first, and got an ENORMOUS
speedup -- virtually all the system time went away.

I would therefore request that in the next IRIX release, either (a)
the built-in constant be chosen so that the system call isn't performed
until at least 1 microsecond of looping has passed, or (b) that this
constant be user-settable, perhaps via the usconfig() call.

I suppose that this should be sent to the hotline, but I'm working nights
this week, so you get E-mail instead.  Somebody at SGI please forward this
to the right folk(s).

	Best,
	 -Mike

-----------

PS:  For the curious, here is a chunk of the code I'm using in order to
handle the locks on the SGI:

#ifdef SGI_4D
# include <sys/types.h>
# include <sys/prctl.h>
# include <ulocks.h>
static char		*lockfile = "/usr/tmp/rtmplockXXXXXX";
static usptr_t		*lockstuff = 0;

void
RES_INIT(p)
register int	*p;
{
	register int i = p - (&rt_g.res_syscall);
	ulock_t	ltp;

	if( !rt_g.rtg_parallel )  return;
	if (lockstuff == 0) {
		(void)mktemp(lockfile);
		if( rt_g.debug & DEBUG_PARALLEL )  {
			if( usconfig( CONF_LOCKTYPE, _USDEBUGPLUS ) == -1 )
				perror("usconfig CONF_LOCKTYPE");
		}
		lockstuff = usinit(lockfile);
		if (lockstuff == 0) {
			fprintf(stderr, "RES_INIT: usinit(%s) failed, unable to allocate lock space\n", lockfile);
			exit(2);
		}
	}
	ltp = usnewlock(lockstuff);
	if (ltp == 0) {
		fprintf(stderr, "RES_INIT: usnewlock() failed, unable to allocate another lock\n");
		exit(2);
	}
	*p = (int) ltp;
	lock_usage[i] = 0;
}

void
RES_ACQUIRE(ptr)
register int	*ptr;
{
	register int i = ptr - (&rt_g.res_syscall);

	if( !rt_g.rtg_parallel )  return;

	/* Attempt to reduce frequency of library calling sginap() */
	if( lock_busy[i] )  {
		lock_spins[i]++;	/* non-interlocked */
		while( lock_busy[i] )  lock_waitloops[i]++;
	}
	ussetlock((ulock_t) *(ptr));
	lock_busy[i] = 1;
	lock_usage[i]++;		/* interlocked */
}

void
RES_RELEASE( ptr )
register int	*ptr;
{
	register int i = ptr - (&rt_g.res_syscall);
	if( !rt_g.rtg_parallel )  return;
	lock_busy[i] = 0;		/* interlocked */
	usunsetlock((ulock_t) *(ptr));
}

#endif /* SGI 4D */

PPS:  The 4D/280 is **fast**!



More information about the Comp.sys.sgi mailing list