MAX FILES PER PROCESS PROBLEM - (nf)

Thu Dec 8 19:34:13 AEST 1983

#R:ccieng5:-19600:uicsl:12500018:000:1720
uicsl!preece    Dec  5 08:48:00 1983

[The following response to my distribution sort example was received
by mail]
	But your example helps make the point that many-file algorithms almost
	certainly need to be replaced with better approaches.  As soon as your
	client for the "distribution sort of alphabetic records" notices that
	the records filed under each letter are not sorted, he is likely to
	ask for that obvious improvement.  Then what do you do, insist on
	using 26^N (where N ~ 10) open files?  That is a disgusting sorting
	scheme!
----------
The respondent has apparently not read volume 3 of Knuth.  After applying the
distribution the individual sets can be sorted by whatever means and
concatenated (not merged), since they are ordered (that is each item in set
s is greater than each element in set s-1, less than each item in set s+1).
The idea of gaining sort speed by subdividing is well known; Quicksort is
an example of the method.  Distribution sorting is convenient when the key
lends itself to distribution. It is also possible to sort by distribution
entirely by distributing on the least significant positions first,
concatenating, and moving on the next most significant, etc.  This may
be familiar to you if you've ever sorted on a card sorting machine (most of
the net is probably too young to have had that particular experience).
With variable-length keys it may be convenient to distribute first by
length, so that short keys are skipped in until we get to the key positions
they include (my particular application is sorting text words for file
inversion).

At any rate, distribution sorting is well known and effective, the respondent
is apparently ignorant as well as abrasive.

scott preece
ihnp4!uiucdcs!uicsl!preece