Microtasking in CAL
Marco Zagha
marcoz at enquirer.scandal.cs.cmu.edu
Sat Feb 23 08:29:03 AEST 1991
I have a few questions about how multiprocessing works in CAL. My
main concern is figuring out the semantics of the $MDO/$ENDMDO
microtasked loops. (I am using an 8-processor Y-MP.)
$MDO is a construct to start to run multiple iterations of a loop
in parallel. The construct
$MDO S1=0,S6,TRIPCNT=S7
[loop body]
$ENDMDO
will execute a loop for S1 from 0 to S6 where iterations of the
loop may be run in parallel.
I have a few questions about how this works:
1) Which registers are "cloned" from the single-threaded code (before
the $MDO) to the multi-threaded code? From the examples in the macros
manual (Macros and Opdefs Reference Manual SR-0012D), it appears that
at least the S and A registers get cloned. Are the V, T, and B
registers also cloned?
2) What happens if you side-effect a register on an iteration of a
loop and use that register on a later iteration? Are you always
guaranteed to get the values from the single-threaded code, or do you
get whatever was left behind from the most recent iteration executed
by that processor. For example:
S3 = 0
$MDO S1=0,100
S3 = S3 + 1
$ENDMDO
On some iteration, say S1=50, do you get
a) S3 = 0
b) S3 = some number between 0 and 50
c) S3 = some number between 0 and 100
d) S3 = garbage
>From my experiments, it seems that (b) is correct and that the
registers are not re-cloned from the single-threaded code ---
side-effects can be seen in later iterations that use the same
processor. Is the answer the same for all types of registers?
Unfortunately, the Cray Y-MP and Cray X-MP Multitasking Programmer's
Manual SR-0222 mostly describes Fortran and doesn't address my
questions about registers. Does anyone know of any documentation or
sample code that I might find helpful?
In case you want to see a full example of $MDO, I've included one from
the macros manual at the end of my message. (It this example is clear
that S5 from the single-threaded code is available in all the parallel
loop iterations, but I can't get any more information out of it than
that.)
I also have a question about allocating processors from C. In
Fortran, the line "CMIC$ GETCPUS n" will ask for n processors. How
can the equivalent be done in C? (I've been calling my C from Fortran
to get around this problem.)
Thanks,
== Marco Zagha
School of Computer Science
Carnegie Mellon University
Internet: marcoz at cs.cmu.edu Uucp: ...!seismo!cs.cmu.edu!marcoz
Bitnet: marcoz%cs.cmu.edu at cmuccvma CSnet: marcoz%cs.cmu.edu at relay.cs.net
The following example adds two 2-dimensional arrays, element by element,
and places the output in a third array. The addition is vectorized on
the inner loop and microtasked on the outer loop. This example also
shows the nesting of a $VDO/$ENDVDO macro pair inside a scalar
multitasked macro.
____________________________________________________
|Location|Result_____|Operand________|Comment________
|1_______|10_________|20_____________|35_____________
| | | |
| |S6 | D'20 |Set ending index for outer loop
| |S5 | D'300 |Set ending index for inner loop
| |$MDO | S1=0,S6,TRIPCNT=S7
| | A2 | S1 |Move index to A register
| | A3 | D'500 |Get first dimension of arrays
| | A2 | A2*A3 |Computer offset into arrays
| | A3 | X |Get base address of X array
| | A3 | A3+A2 |Compute staring offset into X
| | A4 | Y |Get base address of Y array
| | A4 | A4+A2 |Compute staring offset into Y
| | A5 | Z |Get base address of Z array
| | A5 | A5+A2 |Compute staring offset into Z
| | $VDO | S2=0,S5,TRIPCNT=S3,SEGLEN=A1
| | A0 | A3 |
| | V0 | ,A0,1 |Load segment of X array
| | A0 | A4 |
| | V1 | ,A0,1 |Load segment of Y array
| | V2 | V0+FV1 |Add segments of X and Y arrays
| | A0 | A5 |
| | ,A0,1 | V2 |Store sum in Z array
| | A3 | A3+A1 |Increment pointer into X array
| | A4 | A4+A1 |Increment pointer into Y array
| | A5 | A5+A1 |Increment pointer into Z array
| | $ENDVDO | |
| |$ENDMDO | |
More information about the Comp.unix.cray
mailing list