Efficient coding considered harmful?

Thu Oct 27 17:33:10 AEST 1988

Sorry I wasn't able to participate much in the discussion on efficient
coding, but between setting up my new computer, a wedding
anniversary, computer problems at work, thinking about comp.archives,
and lots of other things, I have had little or no time to write, and
certainly not about anything complex. Anyway, here goes...

It's the damnedest thing, but people whose opinions I otherwise
respect seem to have this thing about coding efficiently.  They don't
like it. Worse, they discourage it.

Well, phooey.

---

I advocate training oneself to be one who codes efficient programs. I
do not mean writing code and then making it more efficient, I mean
writing it that way in the first place. One should make it a practice
to consider which coding methods are best on each architecture, so
that it is second nature to code efficiently. Efficient coding should
take its place right along with easily understood coding as something
every coder should do, as a routine part of the job.

I have heard three main counterarguments to this:

1) "The compiler ought to handle that, so you shouldn't bother with
   it." What nonsense! We have to code in the real world, not the
   world of make-believe and ought-to.

2) "What is more efficient on one machine may well be less efficient
   on another." Well, this is a good argument for knowing a lot of
   different machines and adapting one's coding style to the relevant
   machines, but this is certainly not a good argument for avoiding
   efficient coding. What it boils down to is an argument that a
   program which is coded in ignorance of efficiency may well be
   better (efficiency-wise) than one which is coded to be efficient,
   albeit only for some set of architectures. Well, could be, but I
   wouldn't bet on it.

3) "Efficient coding makes obscurer programs." Well, since most
   efficient coding involves minor changes from the inefficient way
   of doing things, changes which are mostly a matter of style rather
   than of organization, this argument should really be read: "*I*
   don't like to or find it hard to read that kind of program, so it
   is unclear." Well, speak for yourself!

---

[I wrote about improving other people's code. I mentioned that
following certain coding practices, eliminating "fat", and doing so
throughout a program, I was often able to improve the execution time
by 25% or more.]

From: rwberry at hubcap.UUCP (Robert W Berry)
Message-ID: <3105 at hubcap.UUCP>

:     I've been programming for YEARS, and if it's not a professional trade
: secret, I'd love it if you would summarize your "coding practices" and
: email them or post them.

They are no real secret but I haven't really thought hard about
organizing them for presentation, so I can't just tell you about
them. However, some of the suggestions that I have seen posted have
been similar to what I do. The next time I go over someone else's
code, I'll see if I can remember to make notes.  Following are some
suggestions right off the top of my head, take them with the
requisite dollop of thinking for oneself.

The first thing: understand the kinds of architectures where
efficiency matters.  If you are coding a game, you can be reasonably
sure that efficiency is only of concern on relatively conventional
architectures, so make yous assumptions accordingly.  If you are
coding some kinds of numerical stuff, where the greatest efficiency
ought to be obtained on an array processor, keep in mind the
characteristics of array processors when optimizing code.  The more
you know about the processors that your code could have to run on,
and the better your analysis of which ones efficiency matters on, the
better will be the decisions you make on which coding practices to
adopt.

Read compiler design books. Whatever they tell you is good for the
compiler is also good for you the coder.  They talk about common
subexpression elimination: when you write code, look for common
subexpressions and eliminate them. (But watch out: not all common
subexpressions should be eliminated, this too is a matter of
judgegment and experience.) They talk about strength reduction, you
look for places where strength reduction is possible.  Note that
*you* can do a better job of this than the compiler can. *You* are
capable of inductions that the compiler is not. *You* can find
optimizations that are not possible to discover mechanically. This is
one reason why the argument "leave it to the compiler" is often wrong.

Along this line, look for redundant calculations: sometimes it is
cost effective to store the result of a previous calculation instead
of recomputing it. In particular avoid recalculation in the control
part of iterative statements. Many's the time that I have got
significant improvements just by turning:

	for (i = 0; i < a[j]; ++i) ...
into
	for (i = 0, ilast = a[j]; i < ilast; ++i) ...

or some equivalent.

Use register variables. And take a litle pain to get them right.  A
technique that works enough of the time is this: break up a routine
into groups that are executed with about the same frequency.
Guessing is OK. Starting with the most frequent, count the number of
references to each automatic. If the numbers are sufficiently
different, order them by those numbers. Where they are not, use the
next most frequent group to decide the order. Repeat till you run out
of groups or variables. This is only a heuristic, but it can be done
on the fly, or with simple programming tools, and is certainly better
than not specifying register variables at all.

Avoid excess function calls. A lot of people have been brainwashed by
some modularity gurus into believing that every identifiable "whole"
should have its own function. What I have found is that this makes
the program LESS maintainable than otherwise.  And, of course,
programs written this way do a lot of function calls. My own recipe
divides functions into two classes: those that control the operation
of other functions and those that do the dirty work. The control
functions should generally restrict themselves to if's and function
calls and maybe some initialization; the other functions can do
whatever is needed. If the grunge functions start to get too
complicated, start from the outside when splitting up the function:
this keeps down the number of function calls while getting the most
decomposition for your effort.

Don't use goto. Yes, oddly enough, this can make programs slower.
Many optimizers that I know of give up on any function that contains
a goto. So adding a goto may very well slow down your code.

---

From: g-rh at XAIT.XEROX.COM (Richard Harter)
Message-ID: <34112 at XAIT.XEROX.COM>

:         I would say that we all know what Bill is talking about
: here, except that "we" all obviously don't.  Basically this is
: using hand optimization as a default in coding style.  One can take
: the view that a good optimizer will do many of these things, e.g.
: use of temporaries, moving static expressions outside loops, placing
: the appropriate variables in registers, etc.  One can also take the
: view that the capabilities and realities of optimizing compilers are
: less than claimed.  Hand optimizing is safer across a variety of
: environments.
:
:         In my experience hand optimizing in original code is less
: efficient, in the sense of code written per unit time, then writing
: code initially in a "clear" style first, and then hand optimizing
: afterwards.  In short, write it in the simplest way first, get it
: working, and then improve performance.

This is not quite what I am advocating. Rather than hand-optimizing
as you code (though one should certainly do a certain amount of
this as a matter of course), I am advocating developing a mindset
where this kind of "optimizing" becomes your natural coding style.
This does mean developing more than one "natural" coding style, but
that is no more a problem than our habit of using different speech
patterns in different social situations.

As an example of what I mean, when I am coding a program that I
expect to be run on one of the "normal" machines, I don't write my
code as

	for (i = 0; i < ALAST; ++i) {
		... A[i];
	}

and then change it to:

	for (ap = A; ap < &A[ALAST]; ++ap) {
		... *ap;
	}

I write it this second way, as a matter of course, because I thought
about it for a while and decided that this was the appropriate way to
do it.  I then trained myself into the habit of doing it this way.
Now, I don't spend *any* extra time getting the advantages of the
latter formulation. As a necessary and important side effect I also
became comfortable with such things and now treat them as no more
abnormal or incomprehensible than *ptr++.

---

From: bright at Data-IO.COM (Walter Bright)
Message-ID: <1700 at dataio.Data-IO.COM>

: Here's some of the things I do:
:     o   Organize complicated if statements so the result of the calculation
:         is known by evaluating it as little as possible. For instance,
:                 if (a && b && c)
:                         ...
:         organize such that if a is easy to evaluate and usually FALSE, put
:         it first.

This also generalizes to sequences of if-else statements.

:     o   Replace stuff like,
:                 if (a)
:                         x = b;
:                 else
:                         x = c;
:         with,
:                 x = a ? b : c;
:         Many compilers generate better code for the latter.

I have to disagree with this.  About the only time that this helps is
when combining the statements gives greater scope to the optimizer.
This happens because many optimizers won't optimize across
statements. Unfortunately, this particular change, when done with
complicated expressions, can make the code much harder to read, so it
must be done with care.

:     o   Use the ^ operator because many times,
:                 a = !a;
:         should really be,
:                 a ^= 1;

However, some machines can't do ^ very well, so take care.

:     o   Avoid things like,
:                 a %= 4;
:         Replace with,
:                 a &= 3;

Generally true, but watch out for negative numbers. Also, dividing
positive numbers by powers of two should be done with a shift.

:     o   Organize control flow such that the most common cases go straight
:         through, this is most important for pipelined machines.

While this is true, this tends to make the code *much* less readable,
even when you can get the optimizer to cooperate. So I don't do this
one.

:     o   I gag when I see things like,
:                 a = strlen("asdf") + 1;
:         instead of,
:                 a = sizeof("asdf");
:         The strlen case above was even printed in a magazine recently to
:         'prove' that assembler was better than C!

Gag. This is a specific example of a general case: never do at run
time that which you can get the compiler to do at compile time.
However, beware of getting the compiler to do floating point for you:
some won't, and others will do it differently in the compiler than
they would have at run time.

:     o   Try to improve the 'locality' of operations, i.e. move calculations
:         as close as possible to the point where they are used. This helps
:         most compilers to utilize registers better.

Good point. I hadn't thought of doing this. And it should often make
the reader's job easier, bringing related things closer.

:     o   Replace int variables with unsigned where possible. This tells the
:         optimizer that the variable can never be negative, making certain
:         optimizations possible.

Unfortunately, this one is likely to make the optimizer not do certain
optimizations that the compiler writer didn't think were useful for
unsigned. Sigh. Not only that, but I have discovered more bugs dealing
with unsigned things...

:     o   Put the most frequently accessed member of a struct first, so the
:         offset is 0.

Yes. And put scalar arguments before aggregate arguments in your
functions.

:     o   Use char arrays instead of int arrays where possible.

Accessing characters on some systems causes a fair amount of code to
be generated. For example, on a 68000, accessing an unsigned character
to be used as an integer adds an instruction to each read.  Accessing
a signed character might involve two extra instructions.

:     o                                                         Adjust other
:         arrays so that the size of the array elements are a power of 2.
:         (Making the address calculation a simple shift.)

Be careful if you are making the arrays larger, you may make the
program slower. Demand paging systems and swapping systems both can
do you dirty when the program data is bigger.

:     o   Avoid bit fields, most especially signed ones.

Try: don't use them at all, they tend to be nonportable. Of course,
if you have a good compiler on a VAX...

:     o   Use realloc instead of malloc/free pairs. Use calloc instead of
:         malloc followed by zeroing each member.

Efficient memory allocation is *much* more difficult than this, what
you really need to do is to cut down on the number of calls to malloc
and free. Of course, that usually means writing some kind of
allocation stuff yourself, never an easy task.

Realloc has to copy data, so this should be less efficient than just
doing another malloc & free.

Also, calloc is nearly useless for clearing structures that contain
pointers.  It is just not portable to assume that a bit pattern of all
zeros is equivalent to a null pointer.

---

From: gwyn at smoke.ARPA (Doug Gwyn )
Message-ID: <8630 at smoke.ARPA>

: >    o  Avoid things like,
: >               a %= 4;
: >       Replace with,
: >               a &= 3;
:
: You're assuming a is nonnegative.  Any decent compiler will
: perform such instruction replacements for you.  Write whatever
: is clearest in context.  To get a remainder, % is clearer.

You are assuming that the compiler can figure out that a is
nonnegative. Unless declared so, it is unlikely that a compiler will
figure this out. However, the human can. And, again, clarity is often
merely in the eye of the beholder. *I* don't have any problem
interpreting the & as a remainder operation.

: In the case that the "4" might change, parameterizing the
: first case will give correct code after it changes, while
: the second will break unless another power of two is used
: for the replacement for "4".  Thus the first is SAFER.

Unless, of course, the programmer remembered to COMMENT.

: >    o  Combine printfs,
: >               printf("aksdf aksjdhf kahdf jhdsfhj\n");
: >               printf(" asdkljfhkajshdf djfh kjahsdfkja h\n");
: >       Convert to,
: >               printf("aksdf aksjdhf kahdf jhdsfhj\n\
: >                asdkljfhkajshdf djfh kjahsdfkja h\n");
:
: Thereby introducing a bug (exercise for the student).

The spaces. Two black spots for K&R string continuation.

: The difference in time between these is negligible,
: but if you're really tweaking for efficiency puts()
: might have been better, depending on the implementation.

But I'd think of this as a *space* optimization, not a speed
optimization. And I know of several programs where just this one
optimization could save 5-10% in the program size.

: ...
: A related point is to declare local variable in blocks
: rather than all at the beginning of the function body.

I've found that this can actually make the code harder to read, by
making it difficult to find the declaration of a variable. My own
practice is to declare things in just four places: in an include file,
at the top of a source file, at the top of a function, and within a
block, but only if the whole of the block will fit in a 20 line
window.

: >    o  Put the most frequently accessed member of a struct first, so the
: >       offset is 0.
:
: Not all architectures can access the 0 offset faster than
: the others.  I knew of one that was actually slower.

But, unless one really cares about this oddball situation, one should
do it anyway.

: >    o  Avoid struct function parameters and return values.
:
: This is a matter of interface design.  Small structs such
: as one might use to represent a complex number are not a
: problem; large structs are more quickly (though less conveniently)
: passed by reference, so long as not too many references to
: the members are made inside the function.  Forcing the
: caller to allocate the structure may not be convenient.

I avoid structure passing and returning. There are still enough
compilers out there that don't handle them, or that handle them wrong
that portability considerations keep me from using them.

---

From: bright at Data-IO.COM (Walter Bright)
Message-ID: <1704 at dataio.Data-IO.COM>

: Micro-efficiency gets regularly denounced. Here are some counterarguments:
: o       If your commercial product is slower than the competition, you
:         won't get a chance to take advantage of the maintainability/
:         portability advantages because you'll be out of business.

Or if it is larger. This discussion has focused on making programs
faster, but making them smaller is also relevant. I do know that we
have lost customers because someone else offered a smaller program to
do the same job.

: o       The sum of all the tweaks can make the difference (on the PC)
:         between using a small memory model and the large. This gives
:         a lot of leverage to seemingly minor changes.

The sum of all tweaks can make a difference on lots of machines.
Consider the Mac 32K segments, the PDP-11 64K+64K address space, the
user forced to wait for 10 seconds instead of 4 because of the 5%
slower routine which caused the scheduler to lower his priority....

: o       I derive a lot of personal satisfaction from creating 'lean
:         and mean' code.

This is often disregarded: good programmers get that way because they
*care*, and if they care, they will insist on doing the best they
can. And that should include these "tweaks".

---

Phew... I finally got it all answered.  On to the next subject...

---
Bill
{uunet|novavax}!proxftl!twwells!bill