What Data Structure?

der Mouse mouse at thunder.mcrcim.mcgill.edu
Fri Jun 28 23:41:49 AEST 1991


In article <1991Jun24.110300.3740 at ecsvax.uncecs.edu>, burgin at ecsvax.uncecs.edu (Robert Burgin) writes:

> I am writing a program that includes a dictionary look-up of 24,500
> English words.  Each entry in the dictionary will contain the word
> itself and a string representing valid syntactic tags for that word
> (noun, verb, etc.).

> Would a hash table be best for the implementation?  Or a trie?  Why?

Yes.  :-)

Seriously though, there is no single data structure that can definitely
be said to be best, given no more information than you have given.

For example, how dear is memory?  If you can afford to dedicate several
megabytes to the dictionary data structures, want a reasonably fast
lookup, no inserts or deletes at run-time, and can afford to spend some
CPU time preprocessing the dictionary, there's one reasonably simple
answer.  If you're tight on memory, need to insert and delete, can
afford moderately slow lookups, and can't afford to do much
preprocessing, something else would be better.

> I worry about collisions in the hash table and wonder what the ideal
> number of buckets would be for this case.

That depends on the hash function.  For example, if you can afford the
preprocessing time to compute a perfect hash function, more than 24500
buckets would be a waste, and collisions can be ignored.

> Not to mention having to scare up a proper hash function for strings
> of variable length.

Experiment with various functions, like the (h*33)+c function posted
here recently, or a simple sum-of-characters, or a CRC...find out what
works well for your case.

					der Mouse

			old: mcgill-vision!mouse
			new: mouse at larry.mcrcim.mcgill.edu



More information about the Comp.lang.c mailing list