Array indexing vs. pointers...

Wed Oct 5 14:55:33 AEST 1988

In article <1700 at dataio.Data-IO.COM> bright at dataio.Data-IO.COM (Walter Bright) writes:
>Here's some of the things I do:

I have to take exception to most of these micro-efficiency tweaks.
Most of them have no effect on the code generated by even a moderately
good compiler, and nearly all of them make the code more obscure.
It is much more important for the source code to be patently correct
than for every surplus nanosecond to be squeezed out.  Fast, buggy
code does nobody a service, and bugs are hard to spot when the source
code is obscure.  Besides, most efficiency losses are from poor
choice of data structures or algorithms, not from fatty source code.

>    o	Replace stuff like,
>		if (a)
>			x = b;
>		else
>			x = c;
>	with,
>		x = a ? b : c;

I don't know of any compiler that wouldn't produce the same code
for these two cases.  Write whichever is more readable in context.

>    o	Replace stuff like,
>		x = (a == 0) ? 1 : 0;
>	with,
>		x = (a == 0);

Only if x is conceptually Boolean in the first place.

>    o	Use the ^ operator because many times,
>		a = !a;
>	should really be,
>		a ^= 1;

If a is conceptually Boolean, the first is clearer.

>    o	Avoid things like,
>		a %= 4;
>	Replace with,
>		a &= 3;

You're assuming a is nonnegative.  Any decent compiler will
perform such instruction replacements for you.  Write whatever
is clearest in context.  To get a remainder, % is clearer.

In the case that the "4" might change, parameterizing the
first case will give correct code after it changes, while
the second will break unless another power of two is used
for the replacement for "4".  Thus the first is SAFER.

>    o	In fact, avoid !, / and % like the plague.

It's pretty hard to divide without /.  You should be sure
that the divisor cannot be zero before flow reaches this
point in the algorithm.  (This does NOT mean to "stick in
a test for division by 0".)

Because % is not a true modulo operator, its use are
limited; however, it does have its uses.

It's hard to imagine avoiding ! in readable code.

>    o	I gag when I see things like,
>		a = strlen("asdf") + 1;
>	instead of,
>		a = sizeof("asdf");

The general principle is to compute at compile time whatever
CAN be computed at compile time rather than at run time.
But sometimes it is clearer to initialize things at run time
(not in this example, though).

>    o	Combine printfs,
>		printf("aksdf aksjdhf kahdf jhdsfhj\n");
>		printf(" asdkljfhkajshdf djfh kjahsdfkja h\n");
>	Convert to,
>		printf("aksdf aksjdhf kahdf jhdsfhj\n\
>		 asdkljfhkajshdf djfh kjahsdfkja h\n");

Thereby introducing a bug (exercise for the student).
The difference in time between these is negligible,
but if you're really tweaking for efficiency puts()
might have been better, depending on the implementation.

>    o	Try to improve the 'locality' of operations, i.e. move calculations
>	as close as possible to the point where they are used. This helps
>	most compilers to utilize registers better.

Just so long as clarity of the algorithm is maintained.

A related point is to declare local variable in blocks
rather than all at the beginning of the function body.

>    o	Replace int variables with unsigned where possible. This tells the
>	optimizer that the variable can never be negative, making certain
>	optimizations possible.

It can also produce slower code; this depends on the
implementation.

>    o	Put the most frequently accessed member of a struct first, so the
>	offset is 0.

Not all architectures can access the 0 offset faster than
the others.  I knew of one that was actually slower.

>    o	Use char arrays instead of int arrays where possible.

Char access on a word-oriented machine is likely to be
noticeably slower.

>    o	Avoid struct function parameters and return values.

This is a matter of interface design.  Small structs such
as one might use to represent a complex number are not a
problem; large structs are more quickly (though less conveniently)
passed by reference, so long as not too many references to
the members are made inside the function.  Forcing the
caller to allocate the structure may not be convenient.

>    o	Avoid bit fields, most especially signed ones.

I would say rather, use bit fields where they seem to be
the natural mechanism, as in device register definitions.

>    o	Replace sequences of if..else if... with a switch.

These are not equivalent.  Switch wins when there are
numerous cases for different integer values of a variable,
but not in the more general case of alternative conditions.

>    o	Use realloc instead of malloc/free pairs. Use calloc instead of
>	malloc followed by zeroing each member.

realloc() is not usually faster, and it makes the application
bookkeeping harder.

calloc() is almost worthless since it stores zero BYTES
into the storage area.  That may not be appropriate for
data types other than the integral types, in particular
pointers and floating-point data.

>    o	Think about replacing trivial functions with a macro.

Functions are usually better for debugging.  If a function
is a bottleneck, consider making it a macro, but be aware
that macros are not as flexible or as safe as functions.

>    o	Replace floats with doubles (to avoid the conversion).

Use the size that best fits the problem, keeping in mind
that the standard math library deals with doubles.  Float
can be faster; ANSI C does not require all floats to be
promoted to doubles in expressions.  Large arrays almost
certainly should be float[] not double[].

>    o	Try very hard to replace divides with other operations, as in:
>		x / 10
>	with:
>		x * .1

But this is less accurate, and not necessarily noticeably
faster (it depends on the f.p. hardware).

Some optimizers will do this for you.  If it is more readable,
use division when it is the natural way to express the
computation; many robust algorithms will naturally not use
division since they keep numbers in the range from say -2 to 2
and division by something near 0 could be problematic.

>    o	Use functions like ldexp and frexp as much as possible.

No, use them sparingly.  Their main value is to pick apart
a floating point number into integers which are easier to
interchange among heterogeneous processors, for example in
network connections.  In normal application algorithms there
is no value in using these functions.

>    o	Use temporaries to eliminate all common subexpressions.

That can actually interfere with a good optimizer.  Use
temporaries when you suspect that many optimizers will not
take care of the common subexpression for you; otherwise
if it is not in a bottleneck section of code, leave it alone.