Sorting Algorithm

Tue Aug 21 23:45:02 AEST 1990

>In article <15890 at oolong.la.locus.com> geoff at ITcorp.com (Geoff Kuenning) writes:
>In article <wayned.0092 at wddami.spoami.com> wayned at wddami.spoami.com
>(Wayne Diener) writes:
>
>> I have been unable to find any references to the sorting algorithm
>> used in this program.  If anyone has seen it elsewhere, I would 
>> appreciate hearing about it.  None of my "guru" programmer friends,
>> my professors (I'm a hardware type who went back to college to learn
>> a little software) or a literature search have turned up an equivalent
>
>Did your professors point out that you are basically just building a
>binary tree in an expensive fashion?
>
>> does the sort "in-place" (requires very little additional memory,
>> other than a few more variables).  I'm not really terribly proficient
>> at "Big O" analysis, but it LOOKS like it might be O(N*log(N))
>> instead of O(N^2).  Anyone want to analyse it?  
>
>Most people would consider list links to be additional memory.  As to
>complexity, it depends tremendously on the cost of comparisons.  If
>comparisons are costly, then the complexity is indeed O(N log N).  On
>the other hand, if comparisons are cheap compared to the cost of
>following pointers (e.g., you are sorting 32-bit integers), then the
>complexity becomes at least O(N**2) because you will spend all of your
>time chasing up and down the list to do your "binary search".  (This
>shows up one of the many deficiencies in "Big O" analysis).
>
>> The sorting is accomplished using a binary search of a linked list
>> to find the the proper insertion point (just running up and down
>> the pointers and dividing the list in half each time) and then 
>> moving the pointers to change an item's location.
>
>This is very similar to building a binary tree, or possibly a balanced
>binary tree.  Binary tree insertion is O(N log N);  I think balanced trees
>have similar complexity but I don't have a reference handy to verify this
>at the moment.  The way your code does the binary search is potentially
>very expensive compared to the cost of keeping the tree balanced, especially
>with very large data bases.
>
>Having said all the above, let me add that I do think your algorithm is
>interesting, and it was thoughtful of you to post it.  In some situations,
>your algorithm will be the superior choice, and it may also inspire some
>interesting derivatives.
>
>	Geoff Kuenning	geoff at la.locus.com	geoff at ITcorp.com

     Thanks for your response...  Yes, I know that the efficiency of the
     sort depends a great deal on the cost of the comparisons vs the
     cost of the pointer manipulations.  I managed to get that far in
     the analysis myself, but I wrote the thing as a homework assignment
     for a first course in data structures (hadn't gotten to Btrees
     general trees, etc. yet), but it _did_ seem like an interesting
     approach, so when I ran across it the other day, I thought I'd
     just throw it out to the community at large and see what happened.
     I'm sure it can be improved substantially.

     This is the body of the first response I've received suggesting
     a possible method of improving the efficiency.

     *********************************

|     In-Reply-To: wayned at wddami.spoami.com's message of 16 Aug 90 23:00:00 GMT
|From: Victor Kan <kan at DG-RTP.DG.COM>
|To: isc-br!hawk!wddami!wayned at uunet.uu.net
|Subject: Sorting Algorithm
|
|
|[ I tried sending to wayned at wddami.spoami.com but my nameserver can't
|  find it, so I tried this other path.
|]
|
|I can't say that I've ever seen it done this way.  If even your
|professors can't recognize it after serious analysis, maybe you should
|send mail to Knuth at Stanford.
|
|I have a problem with the code though (probably why I've never seen it
|done in text books).
|
|Your "binary search of a linked list" isn't log(N) as the usual
|bsearch of an array would be since you must do that non-trivial
|"boogie through the list" (as my algorithms professor called it :-) to
|get to the middle.  Although you don't do comparisons in that loop, it
|might still add some complexity (I never took a course in formal
|algorithmic analysis, only a fundamental algorithms course).
|
|On the average case, you must do more than M/2 pointer traversals
|(where M is the size of the sorted-so-far list, and I think it's
|really sum{i=1..log(M) of M/2^i} traversals) in get-to-the-middle
|operations, right?
|
|That's at least sum(1..N)/2 = ((N^2+N)/2)/2 = O(N^2+N)
|
|to sort the entire set.  And that must be added to the N log(N) actual
|comparisons done.  It may have run faster than the other O(N^2) sorts
|because they actually do N^2 comparisons, whereas you do N log(N)
|comparisons and N^2+N cheap traversals.  But in terms of complexity, I
|think (can't be sure) those N^2+N traversals must be included.
|
|So in my very humble analysis, your sort is O(N Log(N) + N^2 + N)
|where the latter two terms should be multiplied	by some constant
|factor in your time analysis to account for cheaper (i.e.
|non-comparison) operations.
|
|Below, I offer a possible speed up and complexity elimination step
|which might make the sort O(N Log(N) + some other log expr).
|
|However, since I thought of it on company time and mailed it with
|company resources, the suggestion may be considered copyrighted by my
|company, Data General Corporation.
|
|If you use it and become famous for inventing a new O(N Log(N)) sort,
|please credit me and my company :-).
|
|There may be some ways to eliminate the O(N^2+N) cheap traversals to
|the middle by maintaining an array of "middle" pointers to point to
|the various "middles" in your linked list.  However, maintaining this
|array is complicated.  Optimally, you must "know" the unsorted list
|ahead of time so you can assign the middle pointers correctly in the
|array so you don't have to shuffle them around.  But if you don't know
|the properties of the list, you could actually do this shuffling
|pretty cheaply since there are only log(N) pointers to shuffle, and
|worst case shuffling would be similar to quicksort complexity on that
|short log(N) sized list (i.e. P log(P) average, P^2 worst where P =
|log(N)).
|
|
| Victor Kan               | I speak only for myself.               |  ***
| Data General Corporation | Edo emacibus, ergo sum.                | ****
| 62 T.W. Alexander Drive  | Columbia Lions Win, 9 October 1988 for | **** %%%%
| RTP, NC  27709           | a record of 1-44.  Way to go, Lions!   |  *** %%%

     *******************************************

--
Wayne Diener