Multi-processor problems

dixons%phvax.dnet at SMITHKLINE.COM dixons%phvax.dnet at SMITHKLINE.COM
Fri Jan 12 12:57:57 AEST 1990


I have been working on getting a FORTRAN program running parallel.  I seem
to have gotten it running with reasonable load balance, etc but have
observed a curious phenomenon which depends on the system load.  Here's
what happens:
When I run on a system with no other users, I see a speedup which
depends on the number of processors used in a sensible way.  The 
final speedup with 4 processors is about 1.75x.  But if I run the
same job on the system when one other compute bound (single processor,
non-mp) job is running here are the running times as a function of
the number of processor used in the parallel job:

1 proc		2 proc		3 proc		4 proc
7:14		5:17		4:32		about 22 min

I say about 22 minutes since time returns the rather strange results:
real    30:35.19
user  1:06:58.08
sys         6.41

A ps on the 4 proc job just before it finishes show the following

  5451 ?       22:03 pdg
  5439 ?       22:54 pdg
  5452 ?       22:01 pdg
  5453 ?       21:58 pdg

In other words, using four processors suddenly takes 3 times longer than
1 processor.  This seems to be repeatable.  Also if two other computer
bound jobs are each using a processor then the problem starts when
three processors are used for the mp job.  
Four single processor versions of the same job all running against the
same other compute bound job all finish in about 7:20 each.

Someone else with a 240 has mentioned to me that he has seen similar
behaviour.  Have others of you observed the same?  Is there a fix for
this?   It seems to me to be a rather serious problem which would
effectively prevent multi-processor SGI boxes from being used in
parallel mode unless they were dedicated to a single compute job.
The system in question is a 4D240 with 32Megs running Irix 3.2.  The
programs are CPU bound, do little I/O and are not swapping much (almost
not at all).  CPU utilization is high (>90%) in user mode on all cpus.
I believe that similar behaviour occurred with earlier releases of
Irix as well but I haven't gotten around to looking systematically till
now.
Scott Dixon (dixons at smithklin.com)



More information about the Comp.sys.sgi mailing list