function calls

Mon Mar 26 16:55:46 AEST 1990

In article <PCG.90Mar25230051 at rupert.cs.aber.ac.uk> pcg at rupert.cs.aber.ac.uk (Piercarlo Grandi) writes:

>This is an old fallacy: the number of useful registers is usually quite
>low; the Wall paper and others say that for most codes, even floating
>point intensive ones, 4-8-16 registers make do. 

Experience papers that say "n is enough registers" should be read
with the caveat "for my optimizer and my benchmarks."  More agressive 
optimization will usually make profitable use of more registers.

>The problem that Giles
>does not seem to consider is that caching values in registers is only
>useful if the values are going to be used repeatedly, like all forms of
>caching. 

On many (most?) machines, 2 uses is enough to justify keeping a value
in register.

>It is not difficult to produce examples of fairly common pieces
>of code where on many machine register caching worsens performance.

Of course, we can produce plenty of examples where many registers are helpful.
Further, can you post an example of some sort?  I spend a lot
of time thinking about this stuff, and hard cases would help.

>Many registers are useful when:
>1) Your so called 'optimizer' does not select values to cache on
>expected dynamic frequency of use but on static frequency of use.

Surely bad optimizers/register-allocators waste registers.
(Naturally, your reply will be "there's no such thing as a good optimizer.")

>2) You have extremely high latency to memory
>3) You have extremely high latency to memory, and you can prefetch
>blocks of operands while other blocks of operands are being processed,
>4) You have multiple functionals units

But how many chips does this describe?  Today, many; next year, many more.
The i860 has a fast cache, but it's small and a miss is about 25 cycles.
Many chips have load pipelines, where it's profitable to fetch several
cycles in advance.  And how many chipsets have asynchrounous FP processors?

CPU's outrun memory.  They have been for years, and memory isn't catching up.
Hence the development of caches, multi-level caches, 
wide data busses, and large register sets.

>1) means that your compiler is stupid, 
Right, but I don't like stupid compilers either.

>2) that you are missing a proper dynamic cache
Cache isn't a cure-all.  It's finite and fairly simplistic.
Long line sizes and limited set associativity tend to restrict
it's abilities to replace registers.

>and 3) and 4) that you have actually multiple threads of control.
Or perhaps the compiler might find enough low-level parallelism
to keep your chip busy.

>My aversion to large register caches
You like "regular" caches but not registers caches...
Aren't registers just the top of the memory hierarchy?
They are rather more subject to control from software, but I'd
think that was a plus.  I'd rather see the systems extended so that
the other layers of the hierarchy were also under software control
(prefetches to cache, fetches around cache, prefetching pages of
virtual memory, and so forth).

--
Preston Briggs				looking for the great leap forward
preston at titan.rice.edu