qsort(), bsearch(), and rude comparison functions

Tue Oct 9 16:24:55 AEST 1990

Paul Eggert (eggert at ata.twinsun.com) writes:
> Suppose I pass to qsort() a comparison function that isn't a total order.
> May qsort() dump core?

I think this is a Yes.  4.10.5.2 says ...

#  ... a comparison function pointed to by compar, which is called with
#  two arguments that point to the objects being compared.  The function
#  shall return an integer less than, equal to, or greater than zero if
#  the first argument is considered to be respectively less than, equal
#  to, or greater than the second.

As there is no special language to say otherwise, I think that the
terms "less than", "equal to", and "greater than" must be taken
to have their ordinary meanings, including the property that "less
than" is transitive.

The general rule in 4.1.6 then applies:

#  If an argument to a function has an invalid value ... the behavior
#  is undefined.

A pointer to a function that doesn't have the required property is
clearly an invalid value, so the behavior is undefined and it may
indeed dump core.

> While we're on the subject, what happens if the comparison function has side
> effects, benign or otherwise?

Nothing prohibits this.

> How are the sequence points defined here?  For
> example, suppose I qsort() an N-item array using a comparison function that
> increments a global counter starting at zero; must the counter be at least
> N-1 when qsort() returns?

The general rules for sequence points apply.  From 3.3.2.2:

#  ... there is a sequence point before the actual call [to a function].

And from 3.6:

#  A full expression is an expression that is not part of another
#  expression.  ...  The end of a full expression is a sequence point.

Thus, if the comparison function does "++count;" or "bar = foo * count++;"
then there is a sequence point before the call to the comparison function
and another at the end of the "++count;" or "bar = foo * count++;".  This
means that each of the incrementings of "count" is properly separated, so
no problem there.

On the other hand, there is no guarantee that the library function qsort()
actually calls the comparison function.  The following, while no doubt
silly, is a permissible implementation as far as I can tell.  (No, I haven't
tested the code, and there may be minor errors.  This is just to give an
idea of how a sort function might not call its comparison function.)

	{
		static int (*prev_compar) (const void *,const void *);
		static void *prev_result;
		static size_t prev_nmemb, prev_size;

		if (prev_result && compar == prev_compar
			&& (compar == strcmp || compar == memcmp)
			&& nmemb == prev_nmemb && size == prev_size
			&& memcmp (base, prev_result, size * nmemb) == 0)

			return;		/* it's already sorted */
		prev_compar = compar;
		prev_nmemb  = nmemb;
		prev_size   = size;
		free (prev_result);	/* ANSI C, no NULL check needed */
		prev_result = malloc (size * nmemb);
			...	/* the actual sort goes here */

		if (prev_result) memcpy (prev_result, base, size * nmemb);
	}

The reason for the check that compar is either memcmp() or strcmp() is
that if it was a user-written function then qsort() couldn't know whether
its behavior was conditional on some global variable.

But this does seem to mean that the count is not guaranteed to be at
least N-1.

> Similar questions apply to bsearch().

Similar answers apply, with one important exception.  The comparison
function is no longer required to induce a full ordering on the data,
but only a partial ordering.  4.10.5.1 says:

#  The comparison function pointed to by compar is called with two
#  arguments that point to the key object and to an array element,
#  in that order.  The function shall return an integer less than,
#  equal to, or greater than zero if the key object is considered,
#  respectively, to be less than, to match, or to be greater than
#  the array element.  The array shall consist of: all the elements
#  that compare less than, all the elements that compare equal to,
#  and all the elements that compare greater than the key object,
#  in that order.

Thus if K is the key, and A < K and K < B, and A, K, and B occur in
the array in that order, then it is irrelevant if the comparison function
would indicate that B < A.  The Standard imposes requirements on
"(*compar) (K, A)" and "(*compar) (K, B)" but not on "(*compar) (A, B)".

Footnote 129 is attached to the above text and says that "In practice,
the entire array is sorted according to the comparison function."
Footnotes are not, of course, part of the actual standard.

-- 
Mark Brader		"Actually, $150, to an educational institution,
Toronto			 turns out to be about the same as a lower amount."
utzoo!sq!msb, msb at sq.com				-- Mark Horton

This article is in the public domain.