hash algorithm

Ray Dunn ray at micomvax.UUCP
Thu Sep 1 03:22:13 AEST 1988


In article <2392 at rtech.rtech.com> daveb at llama.UUCP (Dave Brower) writes:
 > ...
 > If the collision chains generated by the naive function are short, it
 > may be cheaper to search the chain than use a smarter hash.
 > ...

Definitely!

 > Moral:  Try several ways and *measure*.  The results may be surprising.

Better Moral: Whenever possible analyze the properties and distributions of
the data you want to handle and *then* design the hashing algorithm.  At
that point it is then reasonable to measure and tinker.

This discussion is getting very like those on sorting - please realize there
is no *general* best method,  there are only best methods in specific
application environments.

The goal should not normally be to minimize the search time for an
individual item, but to minimize the total time spent searching in a given
run session or series of runs, of the application - a fact often ignored in
discussions of minimizing chain lengths etc..

If the tokens "(", ")", ";" and "," each occur (i.e.  require to be
"looked-up") approximately n times more frequently than the tokens "while",
"for", "switch" and "do",  then wouldn't it be reasonable to design your
algorithm so that the former items were located approximately n times faster
than the latter?

This can be achieved in a variety of ways, including, let us not forget, ad
hoc tests for specific high frequency items prior to applying a generalized
routine.

In the case of a fixed data-base, for example the reserved words in a
compiler, it does not take much effort to analyze a filesystem or three for
token frequencies and then structure and *order* the dictionary accordingly.

-- 
Ray Dunn.                      |   UUCP: ..!philabs!micomvax!ray
Philips Electronics Ltd.       |   TEL : (514) 744-8200   Ext: 2347
600 Dr Frederik Philips Blvd   |   FAX : (514) 744-6455
St Laurent. Quebec.  H4M 2S9   |   TLX : 05-824090



More information about the Comp.lang.c mailing list