I'm trying to understand a proof regarding radix-sort but to no avail.
I'll first write down a brief summary of the proof and then assign some questions which I hope will be clarified enough.
Suppose you have an array of integer numbers in the range $\{0,1,2,\ldots n \log n\}$. Show and explain an effective algorithm in the worst case which finds out if there are two elements with the same value. note - You can use an extra memory in order of magnitude equals to $O(n)$.
I don't really care about how the algorithm will look like. The idea to this proof is using radix-sort in $O(n)$ and then looking for two elements with the same value as the array is sorted.
The proof outline:
Let's suppose we are examining a larger domain which is $\{0,1,2 ,\ldots n^2 - 1\}$. Now we'll treat each number according to its binary representation using radix-sort bit by bit.
Right after this, comes the part I can't understand at all....
As we know the order of magnitude of radix-sort is $\Theta(d(n+k))$ and therefore all we have to decide is which a to choose to have order of magnitude equals to $O(n)$.
using the formula $\frac{2\log n}{a} (n + 2^a)$.
After this step, you just choose $a = \log n$ and you are done.
My questions are:
How did the writer concluded the following formula: $\frac{2\log n}{a} (n + 2^a)$?
Also, why do the writer prefer to work on the domain of $\{0,1,2 ,\ldots n^2 - 1\}$ instead the one given in the question?
Asked By : SyndicatorBBB
Answered By : William Macrae
In the formula $\Theta(d(n+k))$, $d$ is the number of digits (iterations in the sort) and $k$ is the size of the buckets (the base you are working in). So, if you were sorting the numbers 0 through 999 with 10 buckets, there would be $d=3$ iterations (since the numbers have length 3 in base 10) and we would have $k=10$, which is the base.
If we leave the base, $2^a$ to be determined later (I guess by using a power of two we can read a bits and let that determine the bucket), there are $\log_{2^a}(n^2) = \frac{\log_2 n^2}{\log_2(2^a)} = \frac{2 \log n}{a}$ iterations. $k= 2^a$ is just the base. So $d(n+k) = \frac{2 \log n}{a}(n + 2^a)$.
The expanded range is to make the math prettier. Above we took $\log n^2$, which worked out nicely to give a constant factor 2 in our assymptotic. If we instead had $\log (n \log n)$, things would not be as nice. It should still work (possibly for a different choice of $a$), and if you're comfortable with big-O work you can do it as an exercise, but the proof is much cleaner when you don't have to argue that $O(\log n + \log \log n) = O(\log n)$.
Best Answer from StackOverflow
Question Source : http://cs.stackexchange.com/questions/7051
0 comments:
Post a Comment
Let us know your responses and feedback