Hash functions in C

Dave Jones djones at megatest.UUCP
Thu Nov 8 14:47:41 AEST 1990


I just now tested some hash-functions for an application which is
probably more typical, namely a compiler. The simplest function is
still a big winner on all counts.

I ran tests on a standard Pascal header-file containing 6290 identifiers,
including many duplicates (such as "procedure"). I got 370 collisions
using my hash-function, for a rate of only 0.06 collisions per lookup!
That was with a real big table. Lots of empty slots. When I reduced the
table size so that only about half the slots were empty, I got a little
under one collision per lookup, which is theoretical (1/2 + 1/2*1/2 + ...).
Under these circs, anything around one should generally be considered pretty
good. Surprisingly the CRC-16 hash-function was not too bad in this test,
giving a collision-rate of 1.24. Why it did so very poorly on the
dictionary, I don't know. And of course the relative slowness of
calculating it is also important, so I don't think it has anything at
all going for it as far as hashing char-strings is concerned. I'm wondering
why it was even suggested.



More information about the Comp.lang.c mailing list