책 이미지
책 정보
· 분류 : 외국도서 > 컴퓨터 > 컴퓨터 공학
· ISBN : 9781439815694
· 쪽수 : 400쪽
· 출판일 : 2010-11-23
목차
Introduction, David H. Bailey
Background
"Twelve Ways to Fool the Masses"
Examples from Other Scientific Fields
Guidelines for Reporting High Performance
Modern Performance Science
Parallel Computer Architecture, Samuel W. Williams and David H. Bailey
Introduction
Parallel Architectures
Processor (Core) Architecture
Memory Architecture
Network Architecture
Heterogeneous Architectures
Software Interfaces to Hardware Counters, Shirley V. Moore, Daniel K. Terpstra, and Vincent M. Weaver
Introduction
Processor Counters
Off-Core and Shared Counter Resources
Platform Examples
Operating System Interfaces
PAPI in Detail
Counter Usage Modes
Uses of Hardware Counters
Caveats of Hardware Counters
Measurement and Analysis of Parallel Program Performance using TAU and HPCToolkit, Allen D. Malony, John Mellor-Crummey, and Sameer S. Shende
Introduction
Terminology
Measurement Approaches
HPCToolkit Performance Tools
TAU Performance System
Trace-Based Tools, Jesus Labarta
Introduction
Tracing and Its Motivation
Challenges
Data Acquisition
Techniques to Identify Structure
Models
Interoperability
The Future
Large-Scale Numerical Simulations on High-End Computational Platforms, Leonid Oliker, Jonathan Carter, Vincent Beckner, John Bell, Harvey Wasserman, Mark Adams, Stephane Ethier, and Erik Schnetter
Introduction
HPC Platforms and Evaluated Applications
GTC: Turbulent Transport in Magnetic Fusion
GTC Performance
OLYMPUS: Unstructured FEM in Solid Mechanics
Carpet: Higher-Order AMR in Relativistic Astrophysics
CASTRO: Compressible Astrophysics
MILC: Quantum Chromodynamics
Performance Modeling: The Convolution Approach, David H Bailey, Allan Snavely, and Laura Carrington
Introduction
Applications of Performance Modeling
Basic Methodology
Performance Sensitivity Studies
Analytic Modeling for Memory Access Patterns Based on Apex-MAP, Erich Strohmaier, Hongzhang Shan, and Khaled Ibrahim
Introduction
Memory Access Characterization
Apex-MAP Model to Characterize Memory Access Patterns
Using Apex-MAP to Assess Processor Performance
Apex-MAP Extension for Parallel Architectures
Apex-MAP as an Application Proxy
Limitations of Memory Access Modeling
The Roofline Model, Samuel W. Williams
Introduction
The Roofline
Bandwidth Ceilings
In-Core Ceilings
Arithmetic Intensity Walls
Alternate Roofline Models
End-to-End Auto-Tuning with Active Harmony, Jeffrey K. Hollingsworth and Ananta Tiwari
Introduction
Overview
Sources of Tunable Data
Search
Auto-Tuning Experience with Active Harmony
Languages and Compilers for Auto-Tuning, Mary Hall and Jacqueline Chame
Language and Compiler Technology
Interaction between Programmers and Compiler
Triage
Code Transformation
Higher-Level Capabilities
Empirical Performance Tuning of Dense Linear Algebra Software, Jack Dongarra and Shirley Moore
Background and Motivation
ATLAS
Auto-Tuning for Multicore
Auto-Tuning for GPUs
Auto-Tuning Memory-Intensive Kernels for Multicore, Samuel W. Williams, Kaushik Datta, Leonid Oliker, Jonathan Carter, John Shalf, and Katherine Yelick
Introduction
Experimental Setup
Computational Kernels
Optimizing Performance
Automatic Performance Tuning
Results
Flexible Tools Supporting a Scalable First-Principles MD Code, Bronis R. de Supinski, Martin Schulz, and Erik W. Draeger
Introduction
Qbox: A Scalable Approach to First-Principles Molecular Dynamics
Experimental Setup and Baselines
Optimizing Qbox: Step by Step
Customizing Tool Chains with PN MPI
The Community Climate System Model, Patrick H. Worley
Introduction
CCSM Overview
Parallel Computing and the CCSM
Case Study: Optimizing Interprocess Communication Performance in the Spectral Transform Method
Performance Portability: Supporting Options and Delaying Decisions
Case Study: Engineering Performance Portability into the Community Atmosphere Model Case Study: Porting the Parallel Ocean Program to the Cray X1
Monitoring Performance Evolution
Performance at Scale
Tuning an Electronic Structure Code, David H. Bailey, Lin-Wang Wang, Hongzhang Shan, Zhengji Zhao, Juan Meza, Erich Strohmaier, and Byounghak Lee
Introduction
LS3DF Algorithm Description
LS3DF Code Optimizations
Test Systems
Performance Results and Analysis
Science Results
Bibliography
Index














