World's most popular travel blog for travel bloggers.

Analyze Speed Up in A Dispatcher-Woker Model

, ,
Problem Detail:

In a Dispatcher-Worker model of parallel computation, we have $N$ worker machines simultaneously working on the task (for example, computing the checksum of network packets). There is no synchronization between these workers. A dispatcher distributes total amount of work $S$ to these workers.

Now consider the problem of analyzing the speed up of this model. To simplify the situation, assume the dispatcher evenly distributes work $\frac{S}{N}$ to each worker. And each worker finishes it within $X_i \sim \mathcal{N}(\frac{S}{N}, \sigma^2)$ time, where $\mathcal(\mu, \sigma^2)$ is the Gaussian distribution. Then I think the mathematical formula for the total amount of time needed is $$Y = \max(X_1, X_2, ..., X_N)$$

The question is how to compute the distribution of $Y$, so that analytically we could understand the parallel time needed, in terms of the number of workers $N$.

Here is how you can compute its distribution. Let $F(t) = \Pr[X_i \le t]$ (the cumulative distribution function of $X_i$), and $G(t) = \Pr[Y \le t]$ (the cumulative distribution function of $Y$). Then

$$G(t) = \Pr[Y \le t] = \Pr[X_1 \le t, \dots, X_N \le t] = \Pr[X_1 \le t] \times \dots \times \Pr[X_N \le t] = F(t)^N.$$

Now the probability density function of $Y$, $g(t)$, is the derivative of the cumulative distribution function, so

$$g(t) = G'(t) = N F(t)^{N-1} f(t),$$

where $f(t)$ is the probability density function of $X_i$. This directly gives you the distribution of $Y$: namely, $g(\cdot)$ is the distribution of $Y$.

This calculation also gives you a good way to compute tail bounds. For instance, if you want to compute the probability that the whole process needs more than $t$ time, i.e., that $Y > t$, you can compute this as $1-G(t) = 1 - F(t)^N$.

See also http://math.stackexchange.com/q/89030/14578 for other useful facts about the distribution of $Y$.