World's most popular travel blog for travel bloggers.

Dual-pivot Quicksort reference implementation?

, , No Comments
Problem Detail: 

Has some sort of canonical - or reference - implementation of Dual-pivot Quicksort been posted anywhere?

I would like to include that algorithm in a comparison among sorting algorithms for a specialized need that I have, but the Java versions I've seen appear to have various kinds of tweaks applied to them, like using Insertion Sort for small (sub-) arrays, which makes it harder to compare the fundamentals.

Asked By : 500 - Internal Server Error

Answered By : Sebastian

I tried to do exactly such a comparison in my master thesis, which thus naturally includes pseudo-code of "basic" versions of several dual-pivot Quicksorts (there is a list of them on page 9).

Here is my basic implementation of Yaroslavskiy's algorithm (the dual-pivot scheme that is used in Java 7):

void sort(int[] A, int left, int right) {     if (right > left) {         // Choose outermost elements as pivots         if (A[left] > A[right]) swap(A, left, right);         int p = A[left], q = A[right];          // Partition A according to invariant below         int l = left + 1, g = right - 1, k = l;         while (k <= g) {             if (A[k] < p) {                 swap(A, k, l);                 ++l;             } else if (A[k] >= q) {                 while (A[g] > q && k < g) --g;                 swap(A, k, g);                 --g;                 if (A[k] < p) {                     swap(A, k, l);                     ++l;                 }             }             ++k;         }         --l; ++g;          // Swap pivots to final place         swap(A, left, l); swap(A, right, g);          // Recursively sort partitions         sort(A, left, l - 1);         sort(A, l + 1, g - 1);         sort(A, g + 1, right);     } }  void swap(int[] A, int i, int j) {     final int tmp = A[i]; A[i] = A[j]; A[j] = tmp; } 

The partitioning loop maintains the following invariant: partitioning invariant of Yaroslavskiy's algorithm (Initially, the "?"-area is the whole array, at the end it is gone; the indices are moved in the direction indicated by the arrays, so $\ell$ and $k$ start at the left, $g$ at the right end of $A$.)

I did a mathematical average case analysis for these algorithms to determine the expected number of key comparisons, element swaps and also instruction counts for some specific low-level implementations (in Java Bytecode and MMIX, an RISC assembly language).

The main result is that Yaroslavskiy's algorithm uses $1.9n \ln n + O(n)$ comparison, which is 5% less than the $2n \ln n + O(n)$ comparisons of classic single-pivot Quicksort (both average case on random permutations). However, it needs $0.6 n\ln n + O(n)$ swaps for that, where classic Quicksort only needs $\frac13 n \ln n + O(n)$ swaps; similarly, my instruction counts also indicate that classic Quicksort is more efficient. What exactly makes Java 7's dual-pivot algorithm faster remains an open question. (In case you are interested in more details on that, I could elaborate further.)

Tweaks of the algorithm like switching to Insertionsort for subproblems smaller than some (constant) threshold or optimizing code outside the partitioning loop (while(k <= g) { ... }) will only change the numbers hidden by $O(n)$. This means that for large lists, they are not important. For practical input sizes, they still help a lot, so having them in library code is a must.

Best Answer from StackOverflow

Question Source : http://cs.stackexchange.com/questions/24092

0 comments:

Post a Comment

Let us know your responses and feedback