For distributed processing, the (numerous) equations are split into approximately equal groups allocated to different processors. The iterative process repeatedly executes the following three consecutive steps:
(1) the current approximation to the solution is distributed to all
processors,
(2) parts of the new approximation to the solution are determined
(concurrently) by the processors,
(3) the new approximation is collected from all processors and the
convergence is checked.
The first step typically uses a broadcast (or multicast) operation, and it can be assumed that its execution time, Tb, does not depend upon the number of processors (in fact, it usually depends on this number, but this dependence is ignored here as it is rather inessential). The last step requires a transfer from all processors to a single processor which performs the convergence check, so the transfers are performed sequentially. It is assumed that the total execution time of this step is equal to N*Tr, where Tr is the communication time of a single processor. Although Tr depends upon N, the dependence is not very strong, and is neglected here.
Let Ts denote the total (sequential) computation time of one iteration. The (approximate) time of a distributed execution of a single iteration is then:
For simplicity, it can also be assumed that Tr = Tb, which is an oversimplification, but not very significant, as it appears. With this additional assumption, the speedup is:
Let r denote the ratio Ts/Tr, i.e., the ratio of total (sequential) computation time (per iteration) to the communication associated with a single processor (such a ratio makes sense only for programs with cyclic behavior). Then the speedup becomes a function of two variables, N and r:
Fig.1 shows the values of S(N) for N = 2,...,50 and for R = 10,...,100.
![]() |
| Fig.1. Speedup of a distributed iterative solver. |
Reasonable speedups (around 5) can be obtained only when the value of r is sufficiently large. The best speedup is obtained for a rather small number of processors (5 to 10); for a larger number of processors, the execution time actually increases as it is dominated by the communication time.
For this application, when the number of processors is large, the communication time becomes the dominating component of the execution time. In order to reduce the communication time, the results can be collected in a hierarchical way, in which first the results of computations are collected in groups of, say, K processors, and then the results of groups are combined together. It can be shown that the number of groups that minimizes the total communication time is equal to sqrt(N), and then the speedup becomes:
Fig.2 shows the values of this modified S(N) for N = 2,...,50 and for r = 10,...., 100.
![]() |
| Fig.2. Speedup of a modified iterative solver. |
The relation between the speedup and the number of processors is similar in character, but differs in the details; better speedup values can be obtained then previously, and the speedup is less sensitive to the number of processors. For very large values of N, further improvement can be obtained by additional levels of the hierarchical collection of results.
Research topics include:
Up to Project Page
|
Next Page
|
Copyright by W.M. Zuberek. All rights reserved.
Revised: 2005.11.15 :
Notice: Use of undefined constant COUNTNAME - assumed 'COUNTNAME' in /users/cs/faculty/wlodek/.www/research/proj-distr-comp-a.php on line 130
Notice: Use of undefined constant COUNTER - assumed 'COUNTER' in /users/cs/faculty/wlodek/.www/research/counter.php on line 21
1361