World's most popular travel blog for travel bloggers.

Sorting algorithms which accept a random comparator

, , No Comments
Problem Detail: 

Generic sorting algorithms generally take a set of data to sort and a comparator function which can compare two individual elements. If the comparator is an order relation¹, then the output of the algorithm is a sorted list/array.

I am wondering though which sort algorithms would actually work with a comparator that is not an order relation (in particular one which returns a random result on each comparison). By "work" I mean here that they continue return a permutation of their input and run at their typically quoted time complexity (as opposed to degrading to the worst case scenario always, or going into an infinite loop, or missing elements). The ordering of the results would be undefined however. Even better, the resulting ordering would be a uniform distribution when the comparator is a coin flip.

From my rough mental calculation it appears that a merge sort would be fine with this and maintain the same runtime cost and produce a fair random ordering. I think that something like a quick sort would however degenerate, possibly not finish, and not be fair.

What other sorting algorithms (other than merge sort) would work as described with a random comparator?


  1. For reference, a comparator is an order relation if it is a proper function (deterministic) and satisfies the axioms of an order relation:

    • it is deterministic: compare(a,b) for a particular a and b always returns the same result.
    • it is transitive: compare(a,b) and compare(b,c) implies compare( a,c )
    • it is antisymmetric compare(a,b) and compare(b,a) implies a == b

(Assume that all input elements are distinct, so reflexivity is not an issue.)

A random comparator violates all of these rules. There are however comparators that are not order relations yet are not random (for example they might violate perhaps only one rule, and only for particular elements in the set).

Asked By : edA-qa mort-ora-y

Answered By : KeithS

So basically, you want to know if there is any sorting algorithm which wouldn't degrade from its average case if given a compare function similar to:

int Compare(object a, object b) { return Random.Next(-1,1); } 

... where Random.Next() is some method that will produce a randomly-generated integer between a specified inclusive lower and upper bound.

The answer is actually that most basic sorting algorithms will perform according to their average case, because they obey at least one of the following two conditions:

  1. A comparison between two unique elements is never made twice in the sort, and/or
  2. In each iteration of the sort, the correct position of at least one element is determined and so that element is never compared again.

For instance, SelectionSort iterates through the sub-list of unsorted elements, finds the "least" and/or "greatest" element (by comparing each one to the greatest so far), places it in its correct position and repeats. As a result, even with a non-deterministic comparator, at the end of every iteration the algorithm will have found a value that it thinks is least or greatest, swaps it with the element in the position it is trying to determine, and never considers that element again, thus it obeys Condition 2. However, an A and B can be compared multiple times during this process (as the most extreme example, consider several passes of SelectionSort on an array that's sorted in reverse order) so it violates Condition 1.

MergeSort obeys Condition 1 but not 2; as sub-arrays are merged, elements in the same sub-array (on the left or right side) are not compared to each other because it has already been determined that the elements on that side of the array are in order among themselves; the algorithm only compares the least unmerged element of each subarray to the other to determine which is lesser and should go next in the merged list. This means that any two unique objects A and B will be compared to each other a maximum of one time, but any given element's "final" index in the full collection is not known until the algorithm is complete.

InsertionSort obeys only Condition 1 as well even though its overall strategy and complexity looks more like SelectionSort. Each unsorted element is compared with sorted elements, greatest-first, until one is found that is less than the element under inspection. the element is inserted at that point, and then the next element is considered. The result is that the relative order of any A and B is determined by one comparison, and further comparisons between that A and B are never performed, but the final position of any element cannot be known until all elements are considered.

QuickSort obeys both Conditions. At each level, a pivot is chosen and arranged such that the "left" side contains elements less than the pivot and the "right" side contains elements greater than the pivot. The result of that level is QuickSort(left) + pivot + QuickSort(right) which basically means the position of the pivot element is known (one index greater than the length of the left side), the pivot is never compared to any other element after it has been chosen as a pivot (it may have been compared with previous pivot elements, but those elements are also known and are not included in any subarrays), AND any A and B that end up on opposite sides of the pivot are never compared. In most implementations of pure QuickSort, the base case is one element, at which point its current index is its final index and no further comparisons are made.

The only comparative sort I can think of that wouldn't obey either condition is a non-optimized BubbleSort. If the sort does not accept that the X greatest elements are in their proper place after running X passes, and/or uses a "double-check" pass to verify the list is sorted, the sort will only be considered "done" when the random comparator has returned -1 or 0 for every two adjacent elements in the list during a pass and thus no swaps were performed (an event which, if truly random, would occur with probability $(2/3)^{N-1}$; for a relatively small list of 25 elements, that's a one in 2000 chance, while for 100 elements the probability is 3.7*10-18). As the maximum absolute value of the result of the comparator increases, the probability for any one comparison to return negative or zero decreases towards .5, making the chance to end the algorithm that much less likely (the chance of 99 coin flips all landing heads, which is basically what this boils down to, is 1 in 1.2*1030)

EDIT A LONG TIME LATER: There are a few "sorts" designed specifically as examples of what not to do which incorporate a random comparator; perhaps the most famous is BogoSort. "Given a list, if the list is not in order, shuffle the list and check again". Theoretically it will eventually hit on the right permutation of values, just like the "non-optimized BubbleSort" above, but the average case is factorial-time (N!/2), and because of the birthday problem (after enough random permutations you become more likely to encounter duplicate permutations than unique ones) there is a nonzero possibility of the algorithm never completing to officially the algorithm is time-unbounded.

Best Answer from StackOverflow

Question Source : http://cs.stackexchange.com/questions/2336

0 comments:

Post a Comment

Let us know your responses and feedback