Efficient STRing CoMPares?

Lars Wirzenius wirzenius at cc.helsinki.fi
Sun Mar 17 07:11:54 AEST 1991


In article <1991Mar15.142821 at mars.mpr.ca>, stone at mars.mpr.ca (Darren Stone) writes:
> Does anyone know if there's a better (!?) way to compare
> strings than with strcmp()?
> I'm looking for any speed advantage I can get!

For non-portability, the fastest way would probably be to write inline
assembly code using the machine's string compare instruction.  Of
course, this instruction doesn't exist in all machines, but if it does,
it's probably quite fast. 

Another way (hopefully somewhat more portable) to go is to be clever,
and to avoid string comparisons by modifying the algorithm so they
aren't needed (or are needed less frequently).  

After that you may be able to improve on the comparison method. One way
is to avoid calling strcmp if the first characters differ:

	#define STRCMP(s,t) (*(s) == *(t) ? strcmp((s), (t)) : *(s) - *(t))

(or something along those lines, this is untested). If your strings are
mostly going to differ in the first few bytes, you could pre-compute a
special value based on those bytes, and compare it first:

	#define BITS_IN_CHAR 8
	
	struct string {
		char *str;
		long value;
	};
	
	void compute_value(struct string *s) {
		int i;
	
		s->value = 0;
		for (i = 0; s->str[i] != '\0' && i < sizeof(long); ++i) {
			s->value <<= BITS_IN_CHAR;
			s->value |= s->str[i];
		}
		if (i < sizeof(long))
			s->value <<= BITS_IN_CHAR * (sizeof(long) - i);
	}

(The above could be optimized, but you get the idea.) 

If the pre-computed values differ, you know the strings differ in the
first sizeof(long) bytes, so there's no need to call strcmp. 
Furthermore, if you are careful to arrange the first character as the
most significant byte of the value (and so on), you can even find out
which string (according to the byte values, no locale collation sequence
here) is greater.  Of course, if you need the locale collation sequence,
you have to use strcmp.
-- 
Lars Wirzenius    wirzenius at cc.helsinki.fi



More information about the Comp.lang.c mailing list