hash function

Phong Vo[drew] kpv at ulysses.att.com
Wed Jun 26 07:27:14 AEST 1991


> Again, practically any good multiplier works. I think you're worrying
> about the fact that 31c + d doesn't cover any reasonable range of hash
> values if c and d are between 0 and 255. That's why, when I discovered
> the 33 hash function and started using it in my compressors, I started
> with a hash value of 5381. I think you'll find that this does just as
> well as a 261 multiplier.
> 
> ---Dan

261 is just a number that was picked out of a hat that satisfies
the conditions outlined in the theorems in Knuth. These conditions are
pretty minimal so yes there are lots and lots of numbers that would do
just as well. For compressors, if you work with binary files, it isn't
infrequent to have strings of 0's. In such a case, you'll want the multiplier
to have the nice properties of a RNG. For this reason, 261 type of numbers
may be a little better than 33 (in either case, don`t forget to start the
hash sum at some odd value!).

For everyone's entertainment, I did a simple (and admittedly contrived) test
of hashing 1000000 numbers starting at 1000000 into 1000000 buckets.
This is done by treating each number which is stored in a word as a
string of 4 bytes. With 261, 927300 of these numbers are
in their own buckets, the rest are in 36350 buckets each with 2 elements.
This is almost as good as a perfect hash function. The result for 33 is
much worse. The largest bucket has 64 elements in it and there are 4050
such buckets. The key difference is the 8th bit in 261 which causes the
mixing to go a little faster.



More information about the Comp.lang.c mailing list