World's most popular travel blog for travel bloggers.

# [Solved]: Parallel algorithms on a fixed number of processors

, ,
Problem Detail:

Usual parallel algorithms require a number of processors that grow with the size of the input, usually $O(\log n)$ or $O(n)$.

This is great from a theoric point of view, but strange from a practical point of view, where the number of the processors that we can use is bounded by a constant. A corollary of the Brent Theorem allows us to simulate a parallel machine with less processors, but doesn't gurantee that the algorithm is still optimal.

Is there a branch of parallel algorithms that studies parallel algorithms that run on a number of processors bounded by a constant?

As commented below:

If the number of processors is bounded by a constant, then any sequential algorithm will be asymptotically optimal.

I think that this fact implies that asyntotical optimality is not the best way to measure performances in real world parallel algorithms.

Is there another measure, like speedup or cost (defined as execution time multiplied by the number of processors) that can tell us how faster is an algorithm that runs on a constant number of processors?

A related question is: How to scale down parallel complexity results to constantly many cores?

#### Answered By : Wandering Logic

There are several models, widely used in practice, that might be of interest to you. Almost all of them could be viewed as generalizations of Amdahl's law. They take two parameters, $P$ (number of processing elements) and $N$ size of the input. Several conferences in which you are likely to find these types of analyses are the ACM Symposium on Parallel Algorithms and Architectures (SPAA), and the ACM SIGPLAN Symposium on the Principles and Practice of Paralllel Proogramming (PPoPP).

The goal with these models is generally to understand the tradeoff, for a specific algorithm, between the increased parallelism available from larger $P$ with the increased synchronization and communication costs that come along with larger $P$.

The first one that got major attention was

Valiant, Leslie G: A bridging model for parallel computation, Communications of the ACM, 33(8):103-111, 1990.

Valiant assumes that your algorithm moves between phases of completely parallel work, a phase of barrier synchronization during which the processors communicate, and then the next phase of completely parallel work. For each computation phase you assume it is going to take time inversely proportional to $P$. For the synchronization phases you usually want to take into account load imbalance, and the latency of the communication phases is going to depend both on the structure of your communication network and how good a job you can do at minimizing the distance between communicating processors.

The paper

Culler, D; Karp, R; Patterson, D; Sahay, A; Schauser, KE, Santos, E; Subramonian, R; von Eicken, T: $LogP$: Towards a Realistic Model of Parallel Computation, ACM Symp Principles and Practice of Parallel Programming, (PPOPP-4):1-12, 1993.

built on Valiant's model by carefully taking into account the $L$atency between nodes, the $o$verhead of sending a message, and the $g$ap required between send operations (a measure of bandwidth), in addition to $P$. This is too detailed for some uses, but can be useful in cases where it is important for the algorithm designer to choose between many small messages and fewer long messages.

Finally I'll mention

Blumofe, Robert D; Leiserson, Charles E: Scheduling Multithreaded Computations by Work Stealing, IEEE FOCS, 1994.

which gives a small generalization of Amdahl's law, pointing out that execution time is bounded (below) by $T_P = O(T_1/P + T_\infty)$, where $T_1$ is the minimum serial execution time and $T_\infty$ would be the minimum execution time with an infinite (i.e., arbitrarily large) number of processors and no communication delays. (Another way to look at $T_\infty$ is that it is the critical path through the algorithm.)