Efficient STRing CoMPares?

Chris Torek torek at elf.ee.lbl.gov
Fri Mar 22 10:45:12 AEST 1991


[This is a correction to a previous article, which should have been
stamped out before you read it, because of the Supersedes header, but I
suspect there are news sites out there that do not implement
Supersedes.  The correction is just an `='-for-`-' typo caused by
runaway fingers.  Thanks to David Barnett for catching it early.]

In article <1151 at gistdev.gist.com> flint at gistdev.gist.com
(Flint Pellett) writes:
>... I'm curious if anyone has a method that actually speeds up strcmp()
>in the situtation where you have to get the lexicographic ordering
>information back.

For a particular application (my own personal version of Emacs, actually),
I found that comparing the first two characters `inline' gave the best
results.

/*
 * Find an entry in a table.  Create should be true iff we should create
 * a new entry if the name is not in the table.  The names of any new
 * entries are saved with savestr().
 */
struct tentry *
FindEntry(t, name, create)
	struct table *t;
	register char *name;
	int create;
{
	register struct tentry **ents, *te;
	register int i, h, m, l;

	if (name == NULL)
		return (NULL);
	if (t->t_sorted <= 0)
		SortEntries(t);
	ents = t->t_ents;
	/* binary search */
	h = t->t_size - 1;
	l = 0;
	while (h >= l) {
		te = ents[m = (h + l) >> 1];
		/* first two characters done inline for speed */
		if ((i = te->te_name[0] - name[0]) == 0 && name[0] != 0) {
			if ((i = te->te_name[1] - name[1]) == 0)
				i = strcmp(te->te_name + 1, name + 1);
		}
		if (i > 0)
			h = m - 1;
		else if (i < 0)
			l = m + 1;
		else		/* found */
			return (te);
	}
	if (create) {
		if ((name = savestr(name)) != NULL &&
		    (te = insert(t, name, &ents[l])) != NULL)
			return (te);
		if (name != NULL)
			free(name);
		error("out of memory allocating entry in %s", t->t_name);
	}
	return (NULL);
}

SortEntries uses a clever trick: it does an insertion sort if the
table is `almost' sorted, otherwise it does a quicksort.

Much of this speed-hacking is due to *measured* results showing that my
Emacs spent most of its `startup' time building tables and reading the
current directory.
-- 
In-Real-Life: Chris Torek, Lawrence Berkeley Lab CSE/EE (+1 415 486 5427)
Berkeley, CA		Domain:	torek at ee.lbl.gov



More information about the Comp.lang.c mailing list