hash function

Dan Bernstein brnstnd at kramden.acf.nyu.edu
Wed Jun 26 18:17:50 AEST 1991


In article <15047 at ulysses.att.com> kpv at ulysses.att.com (Phong Vo[drew]) writes:
> For everyone's entertainment, I did a simple (and admittedly contrived) test
> of hashing 1000000 numbers starting at 1000000 into 1000000 buckets.
> This is done by treating each number which is stored in a word as a
> string of 4 bytes.
  [ results supposedly showing 261 much better than 33 ]

Sorry, but the mod has to be a power of 2. Also, your test is remarkably
contrived: we're talking about string hash functions, and typical
strings (a) are concentrated on a few character values, not spread
evenly over the entire character set; (b) do not all have the same high
byte; (c) do not all have the same length.

Surely you noticed that a multiplier of 256 will produce perfect hashing
in this case. Does that mean you're recommending a multiplier of 256?

The profiling statistics I've saved from various compressors form a much
more realistic sample. They show 33 as slightly better than random
hashing on typical files, 37 as slightly worse. I've never tried 261 or
31, but I'll bet 261 does worse than 33 on text files.

---Dan



More information about the Comp.lang.c mailing list