Unix Dictionaries

Beth Katz beth at brillig
Thu Feb 5 05:51:35 AEST 1987


I am not a Unix expert, but I have looked at 'spell' and how it
accepts garbage.  I haven't read the papers mentioned previously.

One reason why 'spell' accepts so much garbage is that it uses
a hashed list of acceptable words.  On many systems I have seen,
this list is 50000 bytes.  Given all the garbage that can be
generated by random combinations of letters, you run out of space
in that table very quickly.  'spell' was designed to catch misspelled
words rather than filtering out absolute garbage.  The stop lists
catch words that could be created through transformations but that
are misspelled nonetheless.

You can do some extra transformations to clean up the lists if
you've fed 'spell' real garbage, but for most situations, it 
doesn't matter all that much.

				Beth Katz



More information about the Comp.unix.questions mailing list