Efficient coding considered harmful?

Patricia Shanahan ps at celerity.UUCP
Fri Nov 4 11:03:25 AEST 1988


In article <8819 at smoke.BRL.MIL> gwyn at brl.arpa (Doug Gwyn (VLD/VMB) <gwyn>) writes:
>In article <137 at twwells.uucp> bill at twwells.UUCP (T. William Wells) writes:
>>#define HASHSIZE 4              /* A power of two, or else! */
>
>To continue my complaint against using bitwise operators where arithmetic
>is called for, let me point out that most reasonable hashing schemes work
>best of the hash table size is NOT a power of two.  Now, if you had used
>arithmetic instead of bit-diddling throughout your hashing code, most
>likely a programmer could change HASHSIZE to some useful number like 113
>and that would be all it would take to improve the hashing scheme.  On
>the other hand, your code would force him to either use 128 (and obtain
>suboptimal performance), or else go through the code and try to put it
>back in the shape it should have had from the outset.
>...


Although I agree with most of what you are saying, it is not safe to assume
that hashing schemes necessarily work better for HASHSIZE 113 than for 128.
I did some experiments to select a hashing function for use in an assembler.
I extracted all identifiers from several assembly language programs and
measured the symbol table look up time and other statistics for each of the
hash schemes that I was considering.

Hashing with a prime size reduced the number of collisions.  Hashing with a
power of two size reduced the total time to do the symbol table management.

The reason was that the extra time to do a prime division compared to a bit
masking outweighed the small savings due to reduced search length.  This is
an empirical result, applicable only to the machine for which I was coding,
and the type of mix of input strings that resulted from extracting the
identifiers. 

I agree that micro-efficiency should not be considered during initial coding
because of this type of experience. To do a good job of performance
implementation for a particular machine requires a lot more work than
appears on the surface. Many trade-offs require careful analysis or
measurement for the actual machine. It is usually not worth doing for most
functions.

	ps
	(Patricia Shanahan)
	uucp : ucsd!celerity!ps
	arpa : ucsd!celerity!ps at nosc
	phone: (619) 271-9940



More information about the Comp.lang.c mailing list