World's most popular travel blog for travel bloggers.

[Solved]: Complexity of keeping track of $K$ smallest integers in a stream

, , No Comments
Problem Detail: 

I need to analyze the time complexity of an online algorithm to keep track of minimum $K$ numbers from a stream of $R$ numbers. The algorithm is

  • Suppose the $i$th number in the stream is $S_i$.
  • Keep a max heap of size $K$.
  • If the heap contains fewer than $K$ elements, add $S_i$ to the heap.
  • Otherwise: if $S_i$ is smaller than the maximum element in the heap (i.e., the root of the heap), replace the root of the heap with $S_i$ and apply Max-Heapify to the heap; if $S_i$ is greater than or equal to the max element, do nothing.

The problem now is to find the expected number of times the Max Heapify operation will be called, when the stream of integers is of length $R$ and each element of the stream is (iid) uniformly distributed on $[1,N]$.

If the stream were guaranteed to contain only distinct elements, then the answer is easy:

$$E[X] = E[X_1] + E[X_2] + \dots + E[X_R],$$

where $X_i$ is the random indicator variable for occurrence of the Max Heapify operation at the $i$th number in the stream. Now

$$E[X_i] = \Pr[\text{$S_i$ is ranked $\le K$ among first $i$ elements}] = \min(K/i, 1).$$

Hence,

$$E[X] = K + K/(K+1) + \cdots + K/R.$$

That case is relatively easy. But how do we handle the case where the elements are iid uniformly distributed?

[This was actually a Google interview question.]

Asked By : Piyush

Answered By : FrankW

If the value of the $i$-th element is $x$, it causes a heapify, if and only if $j < K$ of the previous $i-1$ elements are smaller than $x$. This gives

$$E[X_i] = \frac{1}{N^i} \sum_{x=1}^N \sum_{j=0}^{K-1} c_{j,x},$$

where $c_{j,x}$ is the number of sequences of $i-1$ elements, of which exactly $j$ are smaller than $x$.

Now, to determine $c_{j,x}$, note that there are $\binom{i-1}{j}$ possibilities for the positions of the smaller elements in the sequence. For each of the smaller elements there are $x-1$ possibilities for its value, while for the larger or equal elements there are $N-x+1$ possibilities. This yields
$c_{j,x} = \binom{i-1}{j} (x-1)^j (N-x+1)^{i-1-j}$, and thus overall (after shifting $i$ and $x$ by $1$)

$$E[X] = \sum_{i=0}^{R-1}\frac{1}{N^{i+1}} \sum_{x=0}^{N-1} \sum_{j=0}^{K-1} \binom{i}{j} x^j (N-x)^{i-j}.$$

Unfortunately, all my attempts at bringing this formula into a simpler/closed form have been unsuccessful.

Best Answer from StackOverflow

Question Source : http://cs.stackexchange.com/questions/14661

0 comments:

Post a Comment

Let us know your responses and feedback