Sorting Algorithm

Thu Aug 23 17:47:17 AEST 1990

Having recently taken a course where a 3 week project was based upon sorting,
here are my notes...

Here's the background, the sort was performed on four 9-digit integers where
a seperate file told which of those 4 integers was the most significant key and
which was the least.  The total time spent in the program had to be less than
4 seconds for a 4000 item list (ours was about 1 second and although .15
seconds could have shaved off of it [ie, by killing all the error checking
routines], I saw one that did it in .5 seconds).

The first thing you should know is that any textbook routine like quicksort,
mergesort, or even a radix sort failed to sort a random 4000 item list in under
4 seconds on our Gould NP-1.

This is what my team learned...
1) Traversing a linked list to locate an item is very time consuming.
2) Dumb O(n^2) routines have very little overhead making them MUCH FASTER than
   O(n log n) routines (which have high overheads) for small values of n (<15).
3) An insertion sort [O(n^2)] is one of the fastest sorts for nearly sorted
   data especially when the unsorted items are close to each other.
4) Recursive routines took more time than non-recursive ones.

The best routine I saw chain hashed the values into a 33333 item array, then
it took the array and rebuilt a linked list out of it, and finally it ran an
insertion sort on the list.  For random data, there should be very few cases
where more than 5% of the items in the rebuilt linked list will be out of 
place.  Of these out of place items, they should be uniformly distributed
within the linked list so that they don't have to be moved more than 3 spaces
to place them back into their proper position.

Since a 4000 item list will only fill the hash table to ~10%, there should be
on average only 1.1 probes needed for each item.  It should therefore take
O(1.1n + n + c) where the second "n" comes from the insertion sort having to
traverse the entire list at least once and the "c" is a constant depending upon
how many items were out of place and how far they had to be moved on the
average (shouldn't be more than 4000 * 5% * 3 = 600 or n * 5% * 3 = .15n).
It should therefore average O(2.25n), have a best case of O(2n), and have
a worst case of O(2n^2).

The routine my team wrote insertion sorted every group of 12 integers, placed
this group into an larger array, and then merge sorted this array.  This gave
a average time of O((n / 12) log (n / 12) + (12^2 / 4) * (n / 12)), a best
case of O((n / 12) log (n / 12) + n), and a worst case of O((n / 12) log
(n / 12) + (12^2 / 2) * (n / 12)).

For 4000 items...
                First routine    Second routine
average time    O(9000)          O(14793.6)
best time       O(8000)          O(6793.6)
worst time      O(3.2 * 10^7)    O(26793.6)

I should stress that the worst case for the second routine is pretty straight
forward to find, but the worst case determination of the first routine would
be quite a task.  In practice, chances are the second routine will operate
under unfavorable conditions many more times than the first routine.

-- 

T. Tim Hsu                           UUCP    ...pur-ee!ajz at mentor.cc.purdue.edu
                                     ARPA    ajz at mentor.cc.purdue.edu
FAX  1 317 494 0566                  BITNET  xajz at PURCCVM