Efficient coding considered harmful?
Patricia Shanahan
ps at celerity.UUCP
Fri Nov 4 11:03:25 AEST 1988
In article <8819 at smoke.BRL.MIL> gwyn at brl.arpa (Doug Gwyn (VLD/VMB) <gwyn>) writes:
>In article <137 at twwells.uucp> bill at twwells.UUCP (T. William Wells) writes:
>>#define HASHSIZE 4 /* A power of two, or else! */
>
>To continue my complaint against using bitwise operators where arithmetic
>is called for, let me point out that most reasonable hashing schemes work
>best of the hash table size is NOT a power of two. Now, if you had used
>arithmetic instead of bit-diddling throughout your hashing code, most
>likely a programmer could change HASHSIZE to some useful number like 113
>and that would be all it would take to improve the hashing scheme. On
>the other hand, your code would force him to either use 128 (and obtain
>suboptimal performance), or else go through the code and try to put it
>back in the shape it should have had from the outset.
>...
Although I agree with most of what you are saying, it is not safe to assume
that hashing schemes necessarily work better for HASHSIZE 113 than for 128.
I did some experiments to select a hashing function for use in an assembler.
I extracted all identifiers from several assembly language programs and
measured the symbol table look up time and other statistics for each of the
hash schemes that I was considering.
Hashing with a prime size reduced the number of collisions. Hashing with a
power of two size reduced the total time to do the symbol table management.
The reason was that the extra time to do a prime division compared to a bit
masking outweighed the small savings due to reduced search length. This is
an empirical result, applicable only to the machine for which I was coding,
and the type of mix of input strings that resulted from extracting the
identifiers.
I agree that micro-efficiency should not be considered during initial coding
because of this type of experience. To do a good job of performance
implementation for a particular machine requires a lot more work than
appears on the surface. Many trade-offs require careful analysis or
measurement for the actual machine. It is usually not worth doing for most
functions.
ps
(Patricia Shanahan)
uucp : ucsd!celerity!ps
arpa : ucsd!celerity!ps at nosc
phone: (619) 271-9940
More information about the Comp.lang.c
mailing list