Suppose that we have a random generator that outputs numbers in the range $[0..R-1]$ with uniform distribution and we need to generate random numbers in the range $[0..N-1]$ with uniform distribution.
Suppose that $N < R$ and $N$ does not evenly divide $R$; in order to get a truly uniform distribution we can use the rejection sampling method:
- if $k$ is the greatest integer such that $k N < R$
- pick a random number $r$ in $[0..R-1]$
- if $r < k N$ then output $r \mod N$, otherwise keep trying with other random numbers r', r", ... until the condition is met
Is rejection sampling the only way to get a truly uniform discrete distribution?
If the answer is yes, why?
Note: if $N > R$ the idea is the same: generate a random number $r'$ in $[0..R^m-1], R^m >= N$, for example $r' = R(...R(R r_1 + r_2)...)+r_m$ where $r_i$ is a random number in the range $[0..R-1]$
Asked By : Vor
Answered By : Gilles
Yes and no, depending on what you mean by "the only way". Yes, in that there is no method that is guaranteed to terminate, the best you can do (for generic values of $N$ and $R$) is an algorithm that terminates with probability 1. No, in that you can make the "waste" as small as you like.
Why guaranteed termination is impossible in general
Suppose that you have a deterministic computation engine (a Turing machine or whatever floats your boat), plus an oracle that generates random elements of the $R$-element set $[0..R-1]$. Your goal is to generate an element of the $N$-element set $[0,N-1]$. The output of your engine depends only on the sequence of values returned by the oracle; it is a function $f$ of that potentially infinite sequence $(r_0, r_1, r_2, \ldots)$.
Suppose that your engine calls the oracle at most $m$ times. There might be traces for which the oracle is called less than $m$ times; if so, calling the oracle extra times so that it is always called exactly $m$ times does not change the output. So without loss of generality, we assume that the oracle is called exactly $m$ times. Then the probability of the outcome $x$ is the number of sequences $(r_0, \ldots, r_{m-1})$ such that $f(r_0, \ldots, r_{m-1}) = x$. Since the oracle is a uniform random generator, each sequence is equiprobable and has the probability $1/R^m$. Hence the probability of each outcome is of the form $A/R^m$ where $A$ is an integer between $0$ and $R^m$.
If $N$ divides $R^m$ for some $m$, then you can generate a uniform distribution over $N$ elements by calling the random generator $m$ times (this is left as an exercise to the reader). Otherwise, this is impossible: there is no way to obtain an outcome with probability $1/N$. Note that the condition is equivalent to saying that all of $N$'s prime factors are also factors of $R$ (this is more permissive than what you wrote in your question; for example you can pick a random element among 4 with a 6-sided fair die, even though 4 does not divide 6).
Reducing the waste
In your strategy, when $r \ge k\,N$, you don't have to re-draw immediately. Intuitively, there's a bit of entropy left in $[k\,N .. R-1]$ which you can keep in the mix.
Assume for a moment that you will in fact keep generating random numbers below $N$ forever, and you generate $u$ of them at a time by making $d$ draws. If you do a straightforward rejection sampling on this grouped generation, the waste over $d$ draws is $\dfrac{R^d - k\,N^u}{d}$, i.e. the remainder $R^d \mathbin{\mathrm{mod}} N^u$ divided by the number of draws. This can be as little as $\gcd(R,N)$. When $R$ and $N$ are coprime, you can make the waste arbitrarily small by picking sufficiently large values of $d$. For general values of $R$ and $N$, the calculation is more complicated because you need to take into account the generation of $\gcd(R,N)$ and $N/\gcd(R,N)$ separately, but again you can make the waste arbitrarily small with large enough groups.
In practice, even with relatively inefficient random numbers (e.g. in cryptography), it is rarely worth doing anything but simple rejection sampling, unless $N$ is small. For example, in cryptography, where $R$ is typically a power of 2 and $N$ is typically hundreds or thousands of bits, uniform random number generation usually proceeds by straight rejection sampling in the desired range.
Best Answer from StackOverflow
Question Source : http://cs.stackexchange.com/questions/2605
0 comments:
Post a Comment
Let us know your responses and feedback