The execution of transaction-processing-like jobs on a distributed-memory system can be represented by (possibly cycles of) a processing phase followed by a communication phase (to access information in a distributed database or to exchange information with other jobs). For simplicity, it is assumed that the information is distributed uniformly over the nodes of the system, so all nodes are accessed with the same probability. It is also assumed that the job processing times are exponentially distributed, so they can be characterized by a single parameter, the average processing time (this assumption is not very important and is made primarily to simplify job descriptions). Similarly, it is assumed that the execution times of remote operations requested by a job are also exponentially distributed, with another parameter describing their average durations. It appears that the specific values of the average processing time and the average operation time are not as important as their sum which specifies the "service demand" for processors.
The utilization of processors, in a multiprocessor system, is analyzed as a function of the number of available jobs and the ratio of computation to communication time. Processor utilization is directly related to the throughput of each of the processors as well as the throughput of the whole system, so it is a convenient performance measure.
Each node of a multiprocessor system has a processor, its local memory, and two network interfaces, for incoming and outgoing messages. The delays of messages forwarded in the interconnecting network are usually associated with the switches that control the transfer of messages from one node to another. It is assumed that these delays are constant since they are rather insensitive to the length of messages, at least for reasonably short messages.
The utilizations of components in complex systems are directly related to service demands for these components. If the service demands for all components of a system are equal, the system is called "balanced" as the utilizations of all components are also equal. The service demand for each component is the product of the rate of requests (sometimes also called the "visit rate") and the (average) service time of this component.
Two (similar) systems are equivalent (in the strong sense) with respect to performance (or strongly performance equivalent) if the service demands for all their corresponding components are the same. A straightforward consequence of this definition is that the utilizations of corresponding components in strongly performance-equivalent systems are the same.
Two (similar) systems are equivalent (in the weak sense) with respect to performance (or weakly performance equivalent) if the maximum service demands for the corresponding components of the two systems are the same. Since the maximum service demands identify performance bottlenecks of the system, and the bottlenecks determine the performances of the entire systems, the overall performances of weakly equivalent systems are still the same, but the utilizations of some corresponding components can be different.
The concept of performance-equivalent systems can be used to simplify simulation-based performance analysis of multiprocessor systems (as well as other systems which have a "modular" structure). More specifically, since the simulation time required for simulation-based analysis of multiprocessor systems depends (superlinearly) upon the number of processors, instead of simulating a system containing N processors, a much simpler performance-equivalent system can be used, significantly reducing the required simulation time, and providing reasonably accurate results. For performance analysis of a 16-processor system, a 4-processor system can be used with the same (average) job execution times, but with the switch delay ts adjusted to the value which compensates the differences in the (average) message transfer times in the 16-processor and 4-processor systems, i.e., such that:
Similarly, a 64-processor system with a hypercube interconnecting network is performance equivalent to a 16-processor system with a two-dimensional torus-like interconnecting network (Fig.1) in which the switch delays are 1.5 greater than those in the 64-processor system:
A 64-processor system with the hypercube interconnection network is performance-equivalent to a 64-processor system with a three-dimensional torus-like interconnecting network with the same switch delays, and so on.
Fig.1 shows the utilization of the processors, in a 16-processor system, as a function of the number of available jobs.
|
| Fig.1. Processor utilization -- 16 processors (2D torus). |
The utilization of the processors in a performance-equivalent 4-processor system is shown in Fig.2.
|
| Fig.2. Processor utilization -- 4 processors (2D torus). |
The results shown in Fig.2 are quite similar to Fig.1, with differences not exceeding a few percent, while the reduction of the simulation time is more than an order of magnitude.
Fig.3 shows the utilization of processors in a 32-processor system with a hypercube interconnecting network which is performance-equivalent to a 16-processor system connected by a two-dimensional torus-like network.
|
| Fig.3. Processor utilization -- 32 processors (hypercube). |
The results shown in Fig.3 are practically the same as in Fig.1.
Research topics include:
Prev Page
|
Up to Project Page
|
Copyright by W.M. Zuberek. All rights reserved.
Revised: 2004.07.25 :
Notice: Use of undefined constant COUNTNAME - assumed 'COUNTNAME' in /users/cs/faculty/wlodek/.www/research/proj-multi-arch-b.php on line 165
Notice: Use of undefined constant COUNTER - assumed 'COUNTER' in /users/cs/faculty/wlodek/.www/research/counter.php on line 21
1118