strcmp

D. Richard Hipp drh at duke.cs.duke.edu
Thu Jun 20 23:26:21 AEST 1991


A recent thread on this newsgroup has devoted a lot of bandwidth to
discussions of how to implement a fast "strcmp".  I suppose doing
strcmp quickly is necessary on occasions (such as running benchmarks :-)),
but more often than a quick strcmp, I need a *sensible* strcmp.  That
is to say, I need a strcmp that used in conjuction with sort will put
strings in what I would call "correct" alphabetical order.  Strcmp
fails at this task for two principle reasons:  1)  It places "BBB" before
"aaa".  2) It places "x10" before "x9".  Both of these decisions are
wrong, IMHO.

I have, on various occasions, implemented my own string comparison
routines which attempt to address the above deficiencies in strcmp.
(One such implementation, strpbcmp -- string compare in PhoneBook order,
is attached.)  I am not especially pleased with any of these routines,
however.  Somehow, to my mind, they like that special sense of elegance
and grace that a truly great string comparison routine should have.
I therefore request option from the net on what others think is the
one right, true, and proper way to compare strings.  Side issues that
you may wish to comment upon are 1) what do you call this great new
string comparison function?  ("strcmp" is already taken) and 2) how
might you implement it efficiently?

Responses by email to drh at cs.duke.edu will be summarized to this newsgroup
after an appropriate time interval.


Appendix:  a candidate string comparison function.
------------------------------ cut here ----------------------------------
#include <ctype.h>
/*
** Compare strings in phonebook order
*/
int strpbcmp(a,b)
char *a,*b;
{
  register int ca, cb;
  int ai, bi, cnt = 0;
  int bias = 0;

#ifdef TRACE
  printf("Comparing \"%s\" to \"%s\" yields ",a,b);
#endif
  ca = *(a++);
  cb = *(b++);
  while( ca && cb ){
    if( bias==0 ){
      if( isupper(ca) ){ ca = tolower(ca); bias--; }
      if( isupper(cb) ){ cb = tolower(cb); bias++; }
    }else{
      if( isupper(ca) ){ ca = tolower(ca); }
      if( isupper(cb) ){ cb = tolower(cb); }
    }
    if( isdigit(ca) ){
      if( cnt-->0 ){
        if( cb!=ca ) break;
      }else{
        if( !isdigit(cb) ) break;
        for(ai=0; isdigit(a[ai]); ai++);
        for(bi=0; isdigit(b[bi]); bi++);
        if( ai<bi ){ ca=0; break; }
        if( bi<ai ){ cb=0; break; }
        if( ca!=cb ) break;
        cnt = ai;
      }
    }else if( ca!=cb ){
      break;
    }
    ca = *(a++);
    cb = *(b++);
  }
  if( ca==cb ) ca += bias;
#ifdef TRACE
  printf("%d\n",ca-cb);
#endif
  return ca-cb;
}


#ifdef TEST

#define LINESIZE 40
#define LINECNT  2000
#include <stdio.h>

void main(void){
  char x[LINECNT][LINESIZE];
  int i,j;

  for(i=0; i<LINECNT && fgets(x[i],LINESIZE,stdin); i++);
  qsort(x,i,sizeof(char[LINESIZE]),strpbcmp);
  for(j=0; j<i; j++) fputs(x[j],stdout);
}
#endif



More information about the Comp.lang.c mailing list