Efficient STRing CoMPares?

Tue Mar 19 13:48:02 AEST 1991

In article <15486 at smoke.brl.mil>, gwyn at smoke.brl.mil (Doug Gwyn) writes:
> In article <1991Mar15.142821 at mars.mpr.ca> stone at mars.mpr.ca (Darren Stone) writes:
> >Does anyone know if there's a better (!?) way to compare
> >strings than with strcmp()?
> >I'm looking for any speed advantage I can get!
> 
> #define	StrEq( a, b )	(*(a) == *(b) && strcmp( a, b ) == 0)	/* UNSAFE */

It is doubtful that there is one *general* way of optimizing strcmp() for
all machines. Most strcmp() implementations are written in machine language
and have already gone through some serious optimization analysis. Also,
since comparing NULL-terminated character strings is such a common operation,
many (non-RISC) machines today have some direct support for comparing strings
in their instruction sets. This means that doing a strcmp() can be done
in only few instructions on many machines, rather than a character-by-character
comparison loop.

Although there is probably no *general* way of optimizing strcmp(),
you might be able to do optimizations specific to your application by
taking advantage of the type of data and expected frequency of data used
in your application. The above macro, StrEq(), MIGHT be faster if most
of the strings being compared tend to differ in their first character. 
When two strings differ in their initial character, StrEQ(), will save
having to do a strcmp() call. Note, however, when a pair of strings
do NOT differ in their first character, a strcmp() must be done along with
the added overhead, (albeit small), of the extra character comparison
and the logical AND operation.

STrEq() might actually be SLOWER than a simple strcmp(),
depending on your application's data and your machine, (the overhead 
of doing the extra operations and the execution time for strcmp()).
IMHO, I doubt that StrEq() will acheive any significant optimization for
MOST cases - it may be either slower or be faster by an amount that will
be too small to measure.

A more fruitful approach might be to examine your data more closely to
see if some of the strcmp()'s could be avoided. For example, if many
of the string's being compared are actually part a restricted set of
values rather than arbitrary strings, you might be able to represent them
internally as enumerated values, (where integer comparisons could be done),
rather than as strings. Also, for strings that have fixed lengths or
known lengths, you could do a memcmp() rather strcmp() - although any
speedup here would certainly be machine-dependent.

Jim Barton