function calls

Piercarlo Grandi pcg at rupert.cs.aber.ac.uk
Mon Mar 26 09:00:51 AEST 1990


In article <14281 at lambda.UUCP> jlg at lambda.UUCP (Jim Giles) writes:

   From article <29551 at amdcad.AMD.COM>, by tim at nucleus.amd.com (Tim Olson):
   > [...]
   > It might be true that scientific routines written in FORTRAN may have
   > this many live, non-overlapping variables to keep in registers, but I
   > don't believe this is true in general.  Statistics from a large
   > collection of programs and library routines (a mix of general and
   > scientific applications written in C) show that of 782 functions (620
   > of which were non-leaf functions), an average of 6.5 registers per
   > function were live across function calls.

   This statistic can only be interpreted in one way: the C compiler in
   question didn't allocate registers very well.  Especially in scientific
   packages, there are _HUGE_ numbers of 'live' _VALUES_ to deal with during
   execution of even simple routines.  Vectors, arrays, lists, strings, etc,
   are alle being either produced or consumed.

This is an old fallacy: the number of useful registers is usually quite
low; the Wall paper and others say that for most codes, even floating
point intensive ones, 4-8-16 registers make do. The problem that Giles
does not seem to consider is that caching values in registers is only
useful if the values are going to be used repeatedly, like all forms of
caching. It is not difficult to produce examples of fairly common pieces
of code where on many machine register caching worsens performance.

Many registers are useful when:

1) Your so called 'optimizer' does not select values to cache on
expected dynamic frequency of use but on static frequency of use. Since
the two are poorly correlated, your so called 'optimizer' wants to cache
everything in sight.

2) You have extremely high latency to memory, and you want to use a
large register cache as a large cache, where even infrequently reused
values are insufferably expensive to refetch.

3) You have extremely high latency to memory, and you can prefetch
blocks of operands while other blocks of operands are being processed,
because you know which operands are going to be needed next, like with
vector machines.

4) You have multiple functionals units, and each of them then can make
use of a set of registers.

Note that all these do not really mean that you need lots of registers;
1) means that your compiler is stupid, 2) that you are missing a proper
dynamic cache, and 3) and 4) that you have actually multiple threads of
control.

My aversion to large register caches and so called clever optimizer
should be well known now, and stems from my opinion that stupid
compilers are to be avoided, very optimizing compilers are accident
prone and easily made unnecessary by careful coding where it matters,
and that I am only interested in general purpose architectures. It is
always possible to design a specific architecture that isn't such...

In particular there are two uses for registers: one is as inputs to
functional units, and the other as cache. There are machines, especially
ones with stack orientedness and caches, that have only specialized
registers, i.e. each register is only there as input to a functional
units. The Manchester mainframes were specialized register machines. Crisp
is in some sense such a machine as well.
--
Piercarlo "Peter" Grandi           | ARPA: pcg%cs.aber.ac.uk at nsfnet-relay.ac.uk
Dept of CS, UCW Aberystwyth        | UUCP: ...!mcvax!ukc!aber-cs!pcg
Penglais, Aberystwyth SY23 3BZ, UK | INET: pcg at cs.aber.ac.uk



More information about the Comp.lang.c mailing list