hash function

Bruce.Evans evans at syd.dit.CSIRO.AU
Wed Jun 26 21:27:24 AEST 1991


In article <14583 at dog.ee.lbl.gov> torek at elf.ee.lbl.gov (Chris Torek) writes:
>>>	hash = 0;
>>>	for (each character)
>>>		hash = (hash * 33) + (the character);
>>>	hash_index = hash & ((some power of two) - 1);
>
>In article <14491.28619071 at stjhmc.fidonet.org>
>Dave.Harris at f14.n15.z1.fidonet.org (Dave Harris) writes:
>>Is the *33 use to keep short sequences has key more random or does it
>>really help in the distribution of large string too?
>
>The third question is the hardest.  What *does* that 33 do?  I have no
>idea.  All I know is that I have experimented with `easy' functions

I don't think there are any theoretical mysteries behind the 33. 33 = 32 + 1.
The 32 is to avoid mixing bits with previous hash bits. 256 would avoid all
mixing (with 8-bit chars) but it would throw away too many bits. The 1 is to
avoid throwing away bits. Rotating the bits instead of shifting left would
probably be a better way to avoid this.

The following hash function worked well for me in an assembler. It implements
the ideas that hashing all the chars in the string is unnecessary (and maybe
even bad if too many bits get shifted out) and that the middle char is more
random than leading or trailing chars.

In the program this was extracted from, the string length was already known.
Avoiding looking at all the chars in the string is not such a good idea when
strlen has to do it anyway. Also, this probably does best with a lot of
short strings.
---
extern unsigned strlen();	/* #include <string.h> */

#define SPTSIZE	1024

extern struct sym_s *spt[SPTSIZE];

#define hconv(ch) ((unsigned char) (ch) - 0x41)	/* better form for hashing */

struct sym_s **gethashptr(str)
register char *str;
{
    register unsigned hashval;
    register unsigned length;

    /* Hash function is a weighted xor of 1 to 4 chars in the string.
     * This seems to work better than looking at all the chars.
     * It is important that the function be fast.
     * The multiplication by MULTIPLIER should compile as a shift.
     */

#define MULTIPLIER (SPTSIZE / (1 << USEFUL_BITS_IN_ASCII))
#define USEFUL_BITS_IN_ASCII 6

    length = strlen(str);
    if (length <= 3)
    {
	if (length <= 2)
	    hashval  = hconv(str[-1]) * MULTIPLIER;
	else
	    hashval  = hconv(str[-2]) * MULTIPLIER,
	    hashval ^= hconv(str[-1]);
    }
    else
	hashval  = hconv(str[-(length / 2)]) * MULTIPLIER,
	hashval ^= hconv(str[-2]) << 2,
	hashval ^= hconv(str[-1]);
    return spt + (hashval ^ (hconv(str[0]) << 1)) % SPTSIZE;
}
---
-- 
Bruce Evans		evans at syd.dit.csiro.au



More information about the Comp.lang.c mailing list