function calls

Henry Spencer henry at utzoo.uucp
Fri Mar 16 03:34:08 AEST 1990


In article <14268 at lambda.UUCP> jlg at lambda.UUCP (Jim Giles) writes:
>> ... On the Mips machines, for example,
>> a C function call to a "leaf" function (one which calls no others) is
>> one instruction for the call and one for the return, total elapsed time
>> a small fraction of a microsecond.  Non-leaf functions have to do a little
>> bit more work at times.
>
>Most machines implement call/return with single instructions.  However, this
>tends to be the tip of the iceberg for procedure call overhead.  The interface
>_MUST_ do the following:
>
>1) Save/restore all register values that are still 'live' values for the
>   caller.  ...  The problem is: _MOST_ of the procedure call
>overhead is concentrated in number (1)!  And, this overhead applies to all
>procedures - including 'leaf' routines.  Basically, 'leafness' has very
>little to do with procedure call overhead...

Um, do you want to go back and re-think that statement, Jim?  Leafness has
a lot to do with it, because a leaf procedure doesn't have to keep things
neat in case it calls somebody.  In particular, if you assign some of your
registers to be scratch registers, normally unused at call time, then a
leaf procedure just uses them for all its internal work and has to save
and restore *nothing* except in very unusual cases.  This is exactly what
the Mips compilers do.  When I said that a call to a leaf function on a
Mips machine was one instruction each way, that's *one* instruction,
total, end to end, no further overhead involved.  And those are RISC
instructions, normally one cycle apiece.

The correct statement is that the callee must save/restore all register
values that are still live for the caller *and that the callee is going
to overwrite*.  Careful register-allocation conventions (with an adequate
supply of registers) can minimize that number, usually to zero in the case
of leaf functions.

Most of the other overheads you mention can likewise be minimized or
eliminated with some thought.  Stack-overflow probing is needed only on
systems without MMUs, or with defective MMUs; even the pdp11, the original
C machine, did without it.  Traceback overhead for debuggers can be
reduced considerably with better cooperation between compiler and debugger,
and a modern compiler at a high level of optimization generally produces
stuff that a debugger can't handle anyway, so it can jettison that entirely.
Interprocedural optimization in the presence of separate compilation is
not merely possible, it's commercially available:  -O3 on a Mips compiler.

Much of the "common wisdom" on the subject of call overhead simply isn't
true if you are willing to work hard at falsifying it.  (Actually, the
same is true of many other performance "constraints"...)
-- 
MSDOS, abbrev:  Maybe SomeDay |     Henry Spencer at U of Toronto Zoology
an Operating System.          | uunet!attcan!utzoo!henry henry at zoo.toronto.edu



More information about the Comp.lang.c mailing list