hash algorithm

rlb rlb at polari.UUCP
Sat Aug 20 04:20:46 AEST 1988


In article <723 at goofy.megatest.UUCP>, djones at megatest.UUCP (Dave Jones) writes:
> From article <2550078 at hpisod2.HP.COM>, by decot at hpisod2.HP.COM (Dave Decot):
> >> could anyone recommend a hashing algorithm to store a list of 10
> >> digit phone numbers (area code plus 7 digits).
> >        ...
> > Treat the ten digits as an integer, n, and compute the index
> > as (n % HASH_SIZE), where HASH_SIZE is the size of your hash table.
> > 
At this point, no one has pointed out that modulo is a better hash
function if the table size is a prime number.
> 
> ...
> 
> I have devised a method for hashing which has not been published previously.
> It is very very fast. Here is the WORLD PREMIERE!!
> 
> First let's settle the business of how to hash integers into the
> range 0..M-1.  I suggest that you use hash(I) = (I*32821) % M
> 
> My hashing technique uses a table-size M which is always a power of two.
> This makes the modulo-function (%) very fast on a two's compliment machine,

Well, yes, but you're not counting the cost of multiplying by 32821 elsewhere,
so, in reality, your way seems to be a wee bit slower than (key % prime).

>     ...   long discussion/demonstration of rehashing technique ...
>
To summarize:  You've created a hashing scheme.  Forming the hash itself
isn't faster than the simple (key % prime) algorithm.  Re-hashing and
deletions require more code than the simple "put it in the next empty slot"
method.  It requires more storage than the simple algorithm (I'm assuming
it "breaks" if you let the table get more than half full, I didn't actually
figure it out).

The principal advantage seems to be that access time is flatter (not, flat,
but flatter, and probably not much flatter) than the simple "put it in the
next empty slot" method.  If you're not going to let the table get more than
half full, then (assuming a fairly successful hash) there's only about a 50%
chance that you'll have to rehash.  So, even with a bit of clumping, doesn't
it seem likely that the simplest rehashing scheme compares quite favorably
with the one you've created?
Agree?  Disagree?



More information about the Comp.lang.c mailing list