A note for those not consumed by efficiency worries (was: function calls)

Tue Mar 20 09:02:46 AEST 1990

>From article <1990Mar17.190858.13930 at athena.mit.edu>, by scs at athena.mit.edu (Steve Summit):
> It is unfortunate that perpetual concerns about microoptimization
> tend to suggest that function call overhead is unacceptable and
> should be avoided.  For most programs, readability and
> maintainability are far more important than efficiency, and good
> modularity (accompanied by lots of function calls) is of course
> an important property of a well-written, readable and
> maintainable program.

I quite agree.  No one in this thread of discussion has suggested otherwise,
certainly not me.  In fact, I have argued this very point on this newsgroup
in the recent past (against someone who thought the programmer should do
his own common expression eliminations, etc.).  If your most critical code
issue is readibility, maintainability, or just_get_it_out_the_door_before_
the_deadline, then concentration on optimization issues is not likely to
pay off.

HOWEVER, there _ARE_ some contexts where speed is everything.  Procedure
call overhead is one area where the compiler/environment cannot yet be
expected to do the optimizations for you (in fact, the compiler _CAN'T_
do it if separate compilation is allowed).

> [...]
> (Remember, too, that function calls on any machine are by no
> means "slow."  This discussion is splitting hairs between
> "extremely fast" and "ridiculously fast;" the concerns about
> having to save state in a stack frame merely indicate that it is
> nontrivial to make things ridiculously fast.)

This is simply not true.  Procedure calls are the slowest 'single operations'
that high-level languages provide.  On the Cray, for example, call/return
overhead varies from a low of several dozen to a high of a few hundred
clocks!  In fact, for 'baselevel' or 'leaf' procedures, the overhead
often exceeds the the time spent in the procedure by multiple factors.
If you claim that procedure calls are fast, let's see your benchmark.

Note: method for benchmarking call/return overhead.  Take a production
      code.  Fetch the real-time clock before and after a 'leaf' procedure
      call.  'Inline' the procedure and time the 'inlined' code.  compare
      the times.  No fair using 'dummy' drivers to make the call - you
      need an active 'caller' which has 'live' values to be saved/restored. 

> In article <14273 at lambda.UUCP> jlg at lambda.UUCP (Jim Giles) writes:
>> [... 24 registers is a small ceiling ... ]
>
> I'd dispute this, and I'm not even religious about keeping all of
> my subroutines smaller than a page or anything.  Few of my
> routines even have 24 variables, let alone 24 variables that have
> good reason to be put in registers.  [...]

The smallest routine I've ever written (this was for a class, years ago)
was a matrix multiply on fixed-sized matrices (they were always guaranteed
to be 3 by 3).  Say the array arguments are A, B, and C - I want to compute
the value of A=B*C.  Each of the 18 elements of B and C are used 3 times
during the calculation - each of the 9 elements in A are summed to 3 times.
For efficiency, you want to load each element of the arguments only once.
Similarly, you want to store each element of A only once.  The rest of the
time, you want the data in registers.  If speed were paramount, I would
unroll the loops of the matrix multiply and do the whole thing explicitly.
I might be able to find an order in which to do the calculation which only
needed 24 registers, but the the obvious method would be to put each of
the array elements in a separate register - that's 27 'live' values!

Note that the above was the _smallest_ procedure I've written!  Normally
I would not even consider doing matrix multiply as a procedure call
(the idiom of three nested loops, etc., is simple to read and widely
recognized and understood).  In a language like C, this problem would
be a good candidate for a MACRO.

To be sure, there are certainly procedures which don't need as many as
24 registers.  But, if speed were paramouint, I suspect that nearly all
of these would be immediate candidates for 'inlining'.  The only exceptions
to this would be assembly routines which used hardware/instructions which
were not accessible from the high-level language.

> [...]                                              Good code,
> particularly object-oriented code, has lots and lots of small
> routines, but they don't all need inlining.

I disagree.  Good code is neither monolithic nor fragmented.  There is
a compromise between these two tendencies which is better than either.
A lot of the OOP that I've seen leans way too far toward fragmentation.
I've seen a matrix multiply method implemented as a sequence of calls
to a vector dot product method, which in turn does a vector elemental
multiply method and a vector summation method.  The problem is that
matrix multiply is a _single_ operation - it's actually _easier_ to
read and maintain if it's all done in one place.

> [...]                                         (A medium-sized --
> i.e. small but not tiny -- routine that's called a lot shouldn't
> necessarily be inlined.  Code size still matters.)

I agree, with a proviso: code should _always_ be 'inlined' if the user
explicitly asks for it.  Otherwise, the rules for automatic 'inlining'
should be:

   1) 'Inline' if the procedure/method is only called once in the
      entire call tree.
   2) 'Inline' if the procedure/method body is smaller than some multiple
      of the call/return overhead code (including register save/restores).
   Otherwise, don't 'inline'.

Of course, this requires a language environment which allows 'inlining'
at all.  I am hopeful that this ability may soon (within this decade) be
commonplace.  But, who knows?

J. Giles