hash function
Phong Vo[drew]
kpv at ulysses.att.com
Mon Jun 24 05:03:19 AEST 1991
Somewhere in this thread, Chris Torek proposed as a hash function:
> >> hash = 0;
> >> for (each character)
> >> hash = (hash * 33) + (the character);
> >> hash_index = hash & ((some power of two) - 1);
>
Chris then asked,
> ... What *does* that 33 do?
>
> Maybe there are some theory people out there who can explain it.
> Probably some staring at Knuth vol. 3 would help.
A good property for a hash function is if you keep feeding it
a constant character, it performs as if it's a random number generator.
The hash function f(h,c) = 33*h + c is derived from a class of random
number generators called linear congruential. At the start of Knuth Vol 2,
there is a good discussion of these generators. Let's apply some of the
theorems in that section and see what we get. Keep in mind that we are working
with random numbers mod 2^x where x is usually 16 or 32. Assuming that c is some
fixed odd value, Theorem A guarantees that 33 is good because (33-1)%4 == 0.
In the same reasoning, 31 is not good since (31-1)%4 == 2 != 0.
Here good means that if you keep feeding c, you'll cycle thru the entire
set of values [0,2^x). On the other hand, 33 is not a primitive element for 2^x
when x > 3 (Theorem C), so it isn't a good multiplicator if you keep feeding in 0's.
For example, 37 is a better value from that point of view. To summarize,
a good multiplicator is any number with 101 as their low order 3 bits.
At this point, we depart from theory. We typically hash byte strings and we want
the bits to mix in the 16-bit or 32-bit registers relatively fast. This suggests
that some value larger than 2^8 (but not too large) may do better as a multiplicator.
After a little experimentation, here's a pretty decent hash function:
hash = 0;
for(each byte c)
hash = (hash<<8) + (hash<<2) + hash + c;
This corresponds to the multiplicator 261 whose bit pattern is 100000101.
Try it, you'll like it.
More information about the Comp.lang.c
mailing list