A source provides a stream of items $x_1, x_2,\dots$ . At each step $n$ we want to save a random sample $S_n \subseteq \{ (x_i, i)|1 \le i \le n\}$ of size $k$, i.e. $S_n$ should be a uniformly chosen sample from all $\tbinom{n}{k}$ possible samples consisting of seen items. So at each step $n \ge k$ we must decide whether to add the next item to $S$ or not. If so we must also decide which of the current items to remove from $S$ .
State an algorithm for the problem. Prove its correctness.
Asked By : user1374864
Answered By : Raphael
Due to the dubious nature of the question, I only provide hints.
Have you tried the obvious? With probability $\frac{1}{n}$, add the new element to the sample. If it is added, choose one of the elements already in the sample uniformly at random and drop it. Sounds about fair, does it not?
For a proof, you will have to proceed inductively. In the step, you assume that $S_{n-1}$ is indeed a uniform sample. From this and the way you choose whether to include $x_n$ and which element to drop, you have to get that $S_n$ is a uniform sample, too.
Try if you can prove above idea correct. If it is not, find out where the problem is and fix it. See this answer to a similar question for a detailed application of this technique.
Best Answer from StackOverflow
Question Source : http://cs.stackexchange.com/questions/1923
0 comments:
Post a Comment
Let us know your responses and feedback