Balancing the performance of block multithreaded distributedmemory
systems
Zuberek, W.M.
Simulation Modeling Practice and Theory, vol.19, pp.13181329, 2011
(ISSN 1569190X, DOI 10.1016/j.simpat.2011.01.008)
Abstract:
The performance of modern computer systems is increasingly often limited by
long latencies of accesses to the memory subsystems. Instructionlevel
multithreading is an architectural approach to tolerating such long latencies
by switching instruction threads rather than waiting for the completion of
memory operations. The paper studies performance limitations in
distributedmemory block multithreaded systems and determines conditions
for such systems to be balanced. Eventdriven simulation of a timed Petri
net model of a simple distributedmemory system confirms the derived
performance results.
Keywords:
Block multithreading, distributedmemory systems, performance analysis,
balanced systems, timed Petri nets.
References:

W. Alkohlani, J. Cook and R. Srinivasan, "Extending the Monte Carlo
processor modeling technique: statistical performance models of the Niagara 2
processor"; Proc. of the 39th Int. Conf. on Parallel Processing, pp.363374,
2010.

A. Agarwal, "Performance tradeoffs in multithreaded processors"; IEEE
Transactions on Parallel and Distributed Systems, vol.3, no.5, pp.525539,
1992.

A.O. Allen, Probability, statistics, and queueing theory with computer
science applications (2 ed.); Academic Press 1990.

E. Bajrovic and E. Mehofeer, "Experimantal study of multithreading to improve
memory hierarchy performance of multicore processors for scientific
applications"; Proc. of the Int. Conf. on Complex, Intelligent and Software
Intensive Systems}, pp.645650, 2009.

B. Boothe, A. Ranade, "Improved multithreading techniques for hiding
communication latency in multiprocessors"; Proc. of the 19th Annual Int.
Symp. on Computer Architecture, pp.214223, 1992.

G.T. Byrd, M.A. Holliday, "Multithreaded processor architecture";
IEEE Spectrum, vol.32, no.8, pp.3846, 1995.

TF. Chen, JL. Baer, "A performance study of software and hardware data
prefetching scheme"; Proc. of the 21st Annual Int. Symp. on Computer
Architecture, pp.23232, 1994.

S.P. Chaudhry, P. Caprioli, S. Yip, M. Tremblay, "Highperformance
throughput computing"; IEEE Micro, vol.25, no.3, pp.3245, 2005.

X.E. Chen, T.M. Aamodt, "A firstorder finegrain multithreaded
throughput model"; Proc. of the IEEE 15th Int. Symp. on HighPerformance
Computer Architecture, p.329, 2009.

J. Emer, M.D. Hill, Y.N. Patt, J.J. Yi, D. Chiou, R. Sendag, 2007.
"Singlethreaded vs. multithreaded: where should we focus?"; IEEE Micro,
vol.27, no.6, pp.1424, 2007.

B. Fechner, K. Keller, "A faulttolerant voting scheme for multithreaded
environments"; Proc. of the Int. Symp. on Parallel Computing in Electrical
Engineering, pp.237239, 2004.

R. Govindarajan, S.S. Nemawarkar. P. LeNir, "Design and performance
evaluation of a multithreaded architecture"; Proc. of the First
IEEE Symp. on HighPerformance Computer Architecture, pp.298307, 1995.

R. Govindarajan, F. Suciu, W.M. Zuberek, "Timed Petri
net models of multithreaded multiprocessor architectures"; Proc. of the
7th Int. Workshop on Petri Nets and Performance Models, pp.153162, 1997.

S. Hamilton, "Taking Moore's law into the next century"; IEEE Computer
Magazine, vol.32, no.1, pp.4348, 1999.

J.L. Hennessy, D.A. Patterson, Computer architecture  a quantitative
approach (3 ed.); Morgan Kaufmann 2003.

R. Jain, The art of computer systems performance analysis;
J. Wiley and Sons 1991.

A. Joshi, J.J. Yi, R.H. Bell Jr., L. Eeckhouse, L. John, D. Lilja,
"Evaluating the efficacy of statistical simulation for design space
exploration"; Proc. IEEE Int. Symp. on Performance Analysis of Systems
and Software; pp.7079, 2006.

S. Kapil, H. McGhan, J. Lawrendra, "A chip multithreaded processor
for networkfacing workloads"; IEEE Micro, vol.24, no.2, pp.2030, 2004.

A.C. Klaiber, H.M. Levy, "An architecture for softwarecontrolled data
prefetching"; Proc. of the 18th Annual Int. Symp. on Computer Architecture,
pp.4353, 1991.

P. Kontegira, K. Aingaran, K. Olukotun, "Niagara: a 32way multithreaded
Sparc processors"; IEEE Micro, vol.25, no.2, pp.2129, 2005.

H. Kwak, B. Lee, A.R. Hurson, S. Yoon, WJ. Hahn, "Effects of multithreading
on cache performance"; IEEE Transactions on Computers, vol.48, no.2,
pp.176184, 1999.

A.S. Leon, D. Sheahan, "The UltraSPARC T1: a powerefficient
highthroughput 32thread SPARC processor"; Proc. of the IEEE Asian
SolidState Circuits Conference, pp.2730, 2006.

A.S. Leon, K.W. Tam, J.L. Shin, D. Weisner, "A powerefficient
highthroughput 32thread SPARC processor"; IEEE Journal of SolidState
Circuits, vol.42, no.1, pp.716, 2007.

T. Murata, "Petri nets: properties, analysis and applications";
Proceedings of IEEE, vol.77, no.4, pp.541580, 1989.

U.G. Nawathe, M. Hassahn, K.C. Yen, A. Kumar, A. Ramachandran, D. Greenhill,
"Implementation of an 8core, 64thread, powerefficient SPARC server on
a chip"; IEEE Journal of SolidState Circuits, vol.43, no.1, pp.620, 2008.

K. Ootsu, T. Yokota, T. Ono, T. Baba, "Preliminary evaluation of a binary
translation system for multithreaded processors"; Proc. of the 2002 Int.
Workshop on Innovative Architecture for Future Generation HighPerformance
Processors and Systems, pp.7784, 2002.

M. Oskin, F.T. Chong, M. Farrens, "Using statistical and symbolic simulation
for microprocessor performance evaluation"; Journal of InstructionLevel
Parallelism, vol.4, pp.127, 2002.

JM. Parcerisa, A. Gonzalez, "Improving latency tolerance of multithreading
through decoupling"; IEEE Trans. on Computers, vol.50, no.10, pp.10841094,
2001.

W. Reisig, 1995. Petri nets  an introduction (EATCS Monographs on
Theoretical Computer Science 4); SpringerVerlag 1995.

S. Rixner, W.J. Dally, U.J. Kapasi, P. Mattson, J.D. Ovens, "Memory
access scheduling"; Proc. of the 27th Annual Int. Symp. on Computer
Architecture, pp.128138, 2000.

A. Rogers, K. Li, "Software support for speculative loads"; Proc. of the
5th Symp. on Architectural Support for Programming Languages and Operating
Systems, pp.3850, 1992.

A. Roth, G.S. Sohi, "Speculative datadriven multithreading"; Proc. of the
IEEE 7th Int. Symp. on HighPerformance Computer Architecture, pp.3748, 2001.

B. Sinharoy, "Optimized thread creation for processor multithreading";
The Computer Journal, vol.40, no.6, pp.388400, 1997.

J. Stevens, "Hybridthreads compiler: generation of application specific
hardware thread cores from C"; Proc. of the Int. Conf. on Field Programmable
Logic and Applications, pp.511512, 2007.

T. Ungerer, B. Robic, J. Silc, "A survey of processors with explicit
multithreading"; ACM Computing Surveys, vol.35, no.1, pp.2963, 2003.

J. Wang, Timed Petri nets; Kluwer Academic Publ. 1998.
 j
W.M. Zuberek, "Timed Petri nets  definitions,
properties and applications"; Microelectronics and Reliability vol.31,
no.4, pp.627644, 1991.

W.M. Zuberek, "Analysis of performance bottlenecks
in multithreaded multiprocessor systems"; Fundamenta Informaticae,
vol.50, no.2, pp.223241,
2002.

W.M. Zuberek, "Performance limitations of
blockmultithreaded distributedmemory multiprocessor systems";
Proc. of the 2009 Winter Simulation Conference, pp.899907, 2009.

W.M. Zuberek, R. Govindarajan, F. Suciu, "Timed colored
Petri net models of distributed memory multithreaded multiprocessors";
Proc. of the Workshop on Practical Use of Colored Petri Nets and Design/CPN;
pp.253270, 1998.
Available in pdf.