Help w/2 things ...

Walter Murray walter at hpclwjm.HP.COM
Tue Aug 15 03:29:17 AEST 1989


Randy Tidd writes:

> The first is a certain algorithm that i'm looking for, it's called
> fuzzy string-matching.  Apparently you give it two strings and a
> percentage, and through various and sundry means it determines if
> the two strings are that percentage alike.

> Does anyone know what i'm talking about, and does anyone have any
> code like this?

The Levenstein metric could probably be used quite nicely to provide
what you want.  I once worked on a name lookup system for a hospital.
The object was to provide a patient's chart number given a
name which was likely to be misspelled.  Given a name, we extracted
from the data base all names with the same Soundex code, then ranked
those names according to how similar they were to the requested name.

Levenstein's algorithm is rather simple.  It tells you how many
characters have to be deleted, added, or changed to transform one
character sequence to another.  You can assign different weighting
factors to the probabilities of a character being deleted, added,
or changed.

I can't provide code, unless you will take COBOL.

I can provide bibliographic references, but they would be rather dated.

I suspect someone else will be able to provide the exact code you want.
If not, contact me and I'll see what I can dig up.  I wrote a paper on
name searching for an Information Theory course I took a few years ago.

Walter Murray
walter at hpda.HP.COM
(408) 447-6129
--------------



More information about the Comp.lang.c mailing list