High Performance Computing
Programming and Applications
John Levesque ; Gene Wagenbreth
High Performance Computing: Programming and Applications presents techniques that address new performance issues in the programming
of high performance computing (HPC) applications. Omitting tedious details, the book discusses hardware architecture concepts and programming techniques that are the most pertinent to application
developers for achieving high performance. Les mer
- Vår pris
- 1435,-
(Innbundet)
Fri frakt!
Leveringstid:
Sendes innen 21 dager
Innbundet
Legg i
Innbundet
Legg i
Vår pris:
1435,-
(Innbundet)
Fri frakt!
Leveringstid:
Sendes innen 21 dager
High Performance Computing: Programming and Applications presents techniques that address new performance issues in the programming
of high performance computing (HPC) applications. Omitting tedious details, the book discusses hardware architecture concepts
and programming techniques that are the most pertinent to application developers for achieving high performance. Even though
the text concentrates on C and Fortran, the techniques described can be applied to other languages, such as C++ and Java.
Drawing on their experience with chips from AMD and systems, interconnects, and software from Cray Inc., the authors explore the problems that create bottlenecks in attaining good performance. They cover techniques that pertain to each of the three levels of parallelism:
Message passing between the nodes
Shared memory parallelism on the nodes or the multiple instruction, multiple data (MIMD) units on the accelerator
Vectorization on the inner level
After discussing architectural and software challenges, the book outlines a strategy for porting and optimizing an existing application to a large massively parallel processor (MPP) system. With a look toward the future, it also introduces the use of general purpose graphics processing units (GPGPUs) for carrying out HPC computations. A companion website at www.hybridmulticoreoptimization.com contains all the examples from the book, along with updated timing results on the latest released processors.
Drawing on their experience with chips from AMD and systems, interconnects, and software from Cray Inc., the authors explore the problems that create bottlenecks in attaining good performance. They cover techniques that pertain to each of the three levels of parallelism:
Message passing between the nodes
Shared memory parallelism on the nodes or the multiple instruction, multiple data (MIMD) units on the accelerator
Vectorization on the inner level
After discussing architectural and software challenges, the book outlines a strategy for porting and optimizing an existing application to a large massively parallel processor (MPP) system. With a look toward the future, it also introduces the use of general purpose graphics processing units (GPGPUs) for carrying out HPC computations. A companion website at www.hybridmulticoreoptimization.com contains all the examples from the book, along with updated timing results on the latest released processors.
- FAKTA
-
Utgitt:
2010
Forlag: Chapman & Hall/CRC
Innbinding: Innbundet
Språk: Engelsk
Sider: 244
ISBN: 9781420077056
Format: 23 x 16 cm
- KATEGORIER:
- VURDERING
-
Gi vurdering
Les vurderinger
Multicore Architectures
MEMORY ARCHITECTURE
SSE INSTRUCTIONS
HARDWARE DESCRIBED IN THIS BOOK
The MPP: A Combination of Hardware and Software
TOPOLOGY OF THE INTERCONNECT
INTERCONNECT CHARACTERISTICS
THE NETWORK INTERFACE COMPUTER
MEMORY MANAGEMENT FOR MESSAGES
HOW MULTICORES IMPACT THE PERFORMANCE OF THE INTERCONNECT
How Compilers Optimize Programs
MEMORY ALLOCATION
MEMORY ALIGNMENT
VECTORIZATION
PREFETCHING OPERANDS
LOOP UNROLLING
INTERPROCEDURAL ANALYSIS
COMPILER SWITCHES
FORTRAN 2003 AND ITS INEFFICIENCIES
SCALAR OPTIMIZATIONS PERFORMED BY THE COMPILER
Parallel Programming Paradigms
HOW CORES COMMUNICATE WITH EACH OTHER
MESSAGE PASSING INTERFACE
USING OPENMP
POSIX THREADS
PARTITIONED GLOBAL ADDRESS SPACE LANGUAGES (PGAS)
COMPILERS FOR PGAS LANGUAGES
THE ROLE OF THE INTERCONNECT
A Strategy for Porting an Application to a Large MPP System
GATHERING STATISTICS FOR A LARGE PARALLEL PROGRAM
Single Core Optimization
MEMORY ACCESSING
VECTORIZATION
SUMMARY
Parallelism across the Nodes
APPLICATIONS INVESTIGATED
LESLIE3D
PARALLEL OCEAN MODEL (POP)
SWIM
S3D
LOAD IMBALANCE
COMMUNICATION BOTTLENECKS
OPTIMIZATION OF INPUT AND OUTPUT (I/O)
Node Performance
APPLICATIONS INVESTIGATED
WUPWISE
SWIM
MGRID
APPLU
GALGEL
APSI
EQUAKE
FMA-3D
ART
AMMP
SUMMARY
Accelerators and Conclusion
ACCELERATORS
CONCLUSION
Appendix A: Common Compiler Directives
Appendix B: Sample MPI Environment Variables
References
Index
Exercises appear at the end of each chapter.
MEMORY ARCHITECTURE
SSE INSTRUCTIONS
HARDWARE DESCRIBED IN THIS BOOK
The MPP: A Combination of Hardware and Software
TOPOLOGY OF THE INTERCONNECT
INTERCONNECT CHARACTERISTICS
THE NETWORK INTERFACE COMPUTER
MEMORY MANAGEMENT FOR MESSAGES
HOW MULTICORES IMPACT THE PERFORMANCE OF THE INTERCONNECT
How Compilers Optimize Programs
MEMORY ALLOCATION
MEMORY ALIGNMENT
VECTORIZATION
PREFETCHING OPERANDS
LOOP UNROLLING
INTERPROCEDURAL ANALYSIS
COMPILER SWITCHES
FORTRAN 2003 AND ITS INEFFICIENCIES
SCALAR OPTIMIZATIONS PERFORMED BY THE COMPILER
Parallel Programming Paradigms
HOW CORES COMMUNICATE WITH EACH OTHER
MESSAGE PASSING INTERFACE
USING OPENMP
POSIX THREADS
PARTITIONED GLOBAL ADDRESS SPACE LANGUAGES (PGAS)
COMPILERS FOR PGAS LANGUAGES
THE ROLE OF THE INTERCONNECT
A Strategy for Porting an Application to a Large MPP System
GATHERING STATISTICS FOR A LARGE PARALLEL PROGRAM
Single Core Optimization
MEMORY ACCESSING
VECTORIZATION
SUMMARY
Parallelism across the Nodes
APPLICATIONS INVESTIGATED
LESLIE3D
PARALLEL OCEAN MODEL (POP)
SWIM
S3D
LOAD IMBALANCE
COMMUNICATION BOTTLENECKS
OPTIMIZATION OF INPUT AND OUTPUT (I/O)
Node Performance
APPLICATIONS INVESTIGATED
WUPWISE
SWIM
MGRID
APPLU
GALGEL
APSI
EQUAKE
FMA-3D
ART
AMMP
SUMMARY
Accelerators and Conclusion
ACCELERATORS
CONCLUSION
Appendix A: Common Compiler Directives
Appendix B: Sample MPI Environment Variables
References
Index
Exercises appear at the end of each chapter.
John Levesque works in the Chief Technology Office at Cray Inc., where he is responsible for application performance on Cray’s
HPC systems. He is also director of Cray’s Supercomputing Center of Excellence at the Oak Ridge National Laboratory (ORNL).
ORNL was the first site to install a Petaflop Cray XT5 system, Jaguar; as of June 2010, it is the fastest computer in the
world according to the TOP500 list.
For the past 40 years, Mr. Levesque has optimized scientific application programs for successful HPC systems. He is an expert in application tuning and compiler analysis of scientific applications.
Gene Wagenbreth is a senior system programmer in the Information Sciences Institute at the University of Southern California, where he is applying GPGPU technology in sparse matrix solvers, image tomography, and real-time computational fluid dynamics. He also presents courses on the use and programming of GPUs.
Since the 1970s, Mr. Wagenbreth has worked with most of the highest performance computers, including Cray models, other vector processors, hypercubes, and clusters. He has worked with shared and distributed memory computers using MPI, OpenMP, pthreads, and other techniques. He has also applied parallel processing in numerous fields, including seismic analysis, reservoir simulation, weather forecasting, and battlefield simulations.
For the past 40 years, Mr. Levesque has optimized scientific application programs for successful HPC systems. He is an expert in application tuning and compiler analysis of scientific applications.
Gene Wagenbreth is a senior system programmer in the Information Sciences Institute at the University of Southern California, where he is applying GPGPU technology in sparse matrix solvers, image tomography, and real-time computational fluid dynamics. He also presents courses on the use and programming of GPUs.
Since the 1970s, Mr. Wagenbreth has worked with most of the highest performance computers, including Cray models, other vector processors, hypercubes, and clusters. He has worked with shared and distributed memory computers using MPI, OpenMP, pthreads, and other techniques. He has also applied parallel processing in numerous fields, including seismic analysis, reservoir simulation, weather forecasting, and battlefield simulations.