Storing substrings: some real C code in comp.lang.c!

David Hutchens hutch%citron.cs.clemson.edu at hubcap.clemson.edu
Thu Oct 26 01:55:49 AEST 1989


>From article <1989Oct25.091519.10718 at twwells.com>, by bill at twwells.com (T. William Wells):
> I have been beating my head against the wall, looking for speed,
> *speed*, and MORE SPEED in this program I'm working on. At only
> 800 WPM (on my 16MHz '386) it is not exactly a speed demon.
> Especially since it has to be run on large lists of words, none of
> which are smaller than 80,000 words.
>
> Bill                    { uunet | novavax | ankh | sunvice } !twwells!bill
> bill at twwells.com

First, forget the tree, and the hash.  What you need is a Finite State
Automaton.  The basic idea for this can be found in the May 1988
issue of CACM in an article on an implementation of SCRABBLE(tm).

I've implemented a version of this and I can build the FSA in under
a minute (for about 50000 words) on a sun-3 and under 5 minutes on
a 80286.  If possible, you should build the FSA for the words in
a separate processing step since it does take a bit of memory.
You'll need about a Meg.

The FSA, by its very nature allows you to get at prefix strings
very easily (that is quite useful in SCRABBLE).

	David Hutchens
	hutch at hubcap.clemson.edu

PS:  Since you are doing something confidential, I assume it is
for *business*.  I'll be glad to discuss selling you the right
to incorporate my program.



More information about the Comp.lang.c mailing list