Fortran vs. C for numerical work (SUMMARY)

Fri Nov 30 17:15:38 AEST 1990

Several of you have been missing the crucial point.

Say there's a 300 to 1 ratio of steps through a matrix to random jumps.
On a Convex or Cray or similar vector computer, those 300 steps will run
20 times faster. Suddenly it's just a 15-1 ratio, and a slow instruction
outside the loop begins to compete in total runtime with a fast
floating-point multiplication inside the loop.

Anyone who doesn't think shaving a day or two off a two-week computation
is worthwhile shouldn't be talking about efficiency.

In article <7339 at lanl.gov> ttw at lanl.gov (Tony Warnock) writes:
>       Model        Multiplication Time     Memory Latency
>       YMP          5  clock periods         18 clock periods
>       XMP          4  clock periods         14 clock periods
>       CRAY-1       6  clock periods         11 clock periods

Um, I don't believe those numbers. Floating-point multiplications and
24-bit multiplications might run that fast, but 32-bit multiplications?
Do all your matrices really fit in 16MB?

>       Compaq       25 clock periods         4  clock periods

Well, that is a little extreme; I was talking about real computers.

> For an LU
>     decompositon with partial pivoting, one does rougly N/3 constant
>     stride memory accesses for each "random" access. For small N, say
>     100 by 100 size matrices or so, one would do about 30
>     strength-reduced operations for each memory access. For medium
>     (1000 by 1000) problems, the ratio is about 300 and for large
>     (10000 by 10000) it is about 30000.

And divide those ratios by 20 for vectorization. 1.5, 15, and 150. Hmmm.

---Dan