Efficient STRing CoMPares?

Richard Harter rh at smds.UUCP
Thu Mar 21 18:11:03 AEST 1991


In article <11111 at dog.ee.lbl.gov>, torek at elf.ee.lbl.gov (Chris Torek) writes:
> In article <15510 at smoke.brl.mil> gwyn at smoke.brl.mil (Doug Gwyn) writes:
> >... in the vast majority of applications the strings being
> >compared, even in cases where they match, are contained in different
> >storage locations.  E.g.
> >	if ( StrEq( buffer, "EOF" ) ) ...

> Of course, you can build yourself a `string pool' system, in which each
> distinct string appears exactly once, and then two strings match iff their
> pointers match . . . but this merely offloads the `compare for truly
> equal' into the string pool system.

Just so.  In one of our programs where there was a great deal of manipulation
of strings we did a string "object" package (C not C++).  For each unique
string under management there is a struct which looks something like

	struct string_object {
		char	*c;		/* The string itself	*/
		int	 n;		/* Its length		*/
		int	 rc;		/* Reference count	*/
		int	 sz;		/* Allocated space for string */
		struct string_object *link;	/* Data access	*/
		};

The pool used a hash table with short buckets (the link field).  In the
case in question there are many more references to existing strings (and
multiple instances) then there are references to new strings so using
pointers into the pool is a big win.  The only object primitives are
add_string and delete_string.  Delete_string decrements a reference
count and does the appropriate when the count goes to zero.  Add_string
has to do the compare, which is more expensive than a simple compare
because it has to compute the hash index (which only needs to be very
quick and dirty).  If the string exists the reference count is upped;
otherwise a new object is created (actually taken off a free list.)
The size field is there because the strings are copied into the struct.
This is more overhead, but it is worth it in this case, because the
strings in question are often substrings.  The upshot is that there is
very little allocation/deallocation which is much more expensive than
string compares.

It would be interesting to hear how other people have handled this
sort of problem.
-- 
Richard Harter, Software Maintenance and Development Systems, Inc.
Net address: jjmhome!smds!rh Phone: 508-369-7398 
US Mail: SMDS Inc., PO Box 555, Concord MA 01742
This sentence no verb.  This sentence short.  This signature done.



More information about the Comp.lang.c mailing list