hash function

Phong Vo[drew] kpv at ulysses.att.com
Mon Jun 24 05:03:19 AEST 1991


Somewhere in this thread, Chris Torek proposed as a hash function:
> >>	hash = 0;
> >>	for (each character)
> >>		hash = (hash * 33) + (the character);
> >>	hash_index = hash & ((some power of two) - 1);
> 
Chris then asked,
> ...  What *does* that 33 do?
> 
> Maybe there are some theory people out there who can explain it.
> Probably some staring at Knuth vol. 3 would help.

A good property for a hash function is if you keep feeding it
a constant character, it performs as if it's a random number generator.
The hash function f(h,c) = 33*h + c is derived from a class of random
number generators called linear congruential. At the start of Knuth Vol 2,
there is a good discussion of these generators. Let's apply some of the
theorems in that section and see what we get. Keep in mind that we are working
with random numbers mod 2^x where x is usually 16 or 32. Assuming that c is some
fixed odd value, Theorem A guarantees that 33 is good because (33-1)%4 == 0.
In the same reasoning, 31 is not good since (31-1)%4 == 2 != 0.
Here good means that if you keep feeding c, you'll cycle thru the entire
set of values [0,2^x). On the other hand, 33 is not a primitive element for 2^x
when x > 3 (Theorem C), so it isn't a good multiplicator if you keep feeding in 0's.
For example, 37 is a better value from that point of view. To summarize,
a good multiplicator is any number with 101 as their low order 3 bits.

At this point, we depart from theory. We typically hash byte strings and we want
the bits to mix in the 16-bit or 32-bit registers relatively fast. This suggests
that some value larger than 2^8 (but not too large) may do better as a multiplicator.
After a little experimentation, here's a pretty decent hash function:
	hash = 0;
	for(each byte c)
		hash = (hash<<8) + (hash<<2) + hash + c;
This corresponds to the multiplicator 261 whose bit pattern is 100000101.
Try it, you'll like it.



More information about the Comp.lang.c mailing list