Article:
CC-Radix: a Cache Conscious Sorting Based on Radix sort
(IEEE 2003)
I'm trying to figure out what the author means by this section:
Explanation of CC-Radix For clarity reasons, we explain the recursive version of CC-Radix sort as shown in Figure 3. However, we use the iterative implementation of CC-Radix for the evaluations in this paper because it is more efficient. The parameters of CC-Radix are bucket, which is the data set to be sorted, and b, which is the number of bits of the key that still have to be sorted. Constant b is explained below.
CC-Radix(bucket, b) if fits in cache (bucket) then Radix sort(bucket, b ) else sub-buckets = Reverse sorting(bucket, ) for each sub-bucket in sub-buckets CC-Radix(sub-bucket, ) endfor endif end
Figure 3. Pseudocode of CC-Radix The algorithm starts by checking whether the data struc- tures to sort bucket fit in cache level Li. Those data struc- tures are vectors S and D and the counter vectors C, one for each of the digits of the key. If so, CC-Radix calls Radix sort. If the data structures do not fit in cache level , the algorithm partitions that bucket into sub-buckets by sorting the bucket by the most significant digit using the counting algorithm (explained above). Here, we call this process of partitioning, Reverse Sorting, and it takes into account the computer architecture of the machine. For each sub-bucket, the CC-Radix sort is called again. Note that the first call to the routine processes the com- plete data set. Further calls to the routine process sub- buckets. Note also that certain subsets of the data may need more Reverse sorting calls than others depending on the data skew.
Now, we explain the details of Reverse sorting and the sizing of the parameters of Radix sort. Reverse sorting. After each call of Reverse sorting, the number of digits that remain to be sorted for a specific sub- bucket is decremented by one.
I don't understand the "Reverse Sorting" part of this - how is that supposed to work?
I know this is a long shot, but I'm really stuck on what they mean. Any help is appreciated!
Asked By : ctote
Answered By : Joe
The naming in the pseudo-code seems to be making things more confusing than they need to be, and it also seems to be glossing over some detail, but here's what I think is going on.
Let's just jump right in on a small example. Suppose we have a data set, the integers 0-7 in binary: $000, 001, 010, 011, 100, 101, 110, 111$. But obviously they're not necessarily given to you in sorted order. Let's say the input is actually: $110, 101, 000, 010, 100, 111, 001, 011$.
If all this data fits in the cache, then you just run normal radix sort. That's the first branch of the if
statement. For illustration, lets say that the cache can only hold two elements. Therefore, we end up in the else
branch. The "reverse sorting" step splits the input into two sub-buckets based on the most significant bit. So we get two buckets:
$1: (10, 01, 00, 11)$ and $0: (00, 10, 01, 11)$. We recursively call our function on each of these smaller buckets. Since they are smaller, they might now fit inside the cache, and we could run normal radix sort.
So now we're in the recursive call on the first sub-bucket. The input is $1:(10, 01, 00, 11)$. However, our cache size is only 2. Therefore we move into the else branch again. We split the input into $11:(0,1)$ and $10:(1, 0)$. and call the function recursively. Now finally on each of these recursive calls, the input is small enough to fit in the cache, and we run normal radix sort.
The other recursive calls function similarly. Their pseudo-code seems to be missing how the partitions are pieced back together in a cache conscious way. Maybe this is obvious or covered elsewhere in the paper.
Best Answer from StackOverflow
Question Source : http://cs.stackexchange.com/questions/9547
0 comments:
Post a Comment
Let us know your responses and feedback