Multicore Processors and GPUs: Programming Models and Compiler Optimizations

Half-day Tutorial at

20th International Conference on on Parallel Architectures and Compilation Techniques (PACT)
October 14, 2011
Galveston Island, TX, USA

J. (Ram) Ramanujam
Department of Electrical and Computer Engineering
and Center for Computation and Technology
Louisiana State University
Baton Rouge, LA 70803, USA


P. (Saday) Sadayappan
Department of Computer Science and Engineering
The Ohio State University
Columbus, OH 43210, USA



 
Audience

This tutorial is targeted primarily at application developers, computer/computational scientists, and graduate students interested in performance optimization issues and/or compilers for high-performance computing on general-purpose multicores and GPUs. Knowledge of C programming will be assumed; basic knowledge of processor architectures will be assumed; no prior parallel programming experience or familiarity with source-to-source transformations will be assumed.

 
Brief Description

On-chip parallelism with multiple cores is now ubiquitous. Because of power and cooling constraints, recent performance improvements in both general-purpose and special-purpose processors have come primarily from increased on-chip parallelism rather than increased clock rates. Parallelism is therefore of considerable interest to a much broader group than developers of parallel applications for high-end supercomputers. Several programming environments have recently emerged in response to the need to develop applications for GPUs, and multi-core processors. There has been recent interest in two complementary approaches:

This tutorial will provide an introductory survey covering both these aspects.

 
Lecture Outline


  1. Introduction
    • Multicore architectures and accelerators
    • Issues in performance and energy
    • GPUs
    • Explicitly managed memories
    • Programming Models:
      • Thread based models
      • Data parallel models
      • Stream programming

  2. Compiler optimizations and tuning for multicore
    • Brief review of data dependences, transformations
    • Polyhedral models, tiling, parametric tiling
    • Locality and parallelism optimizations
    • Reducing synchronization overheads, scheduling
    • SIMDization
    • Compiler-driven tuning

  3. GPU architectures and programming
    • GPU architectures
    • General-purpose computation on GPUs
    • Programming models and idioms
    • GPU programming models/environments:
      • OpenCL
      • CUDA
      • CAL
      • PGI Accelerator
      • CAPS/HMPP
    • Code examples on GPUs
    • Examples of CPU vs. GPU performance

  4. Compiler optimizations and tuning for GPUs
    • Performance characterization
    • Optimizing memory accesses
    • Multi-level parallelism exploitation
    • Performance models and emipirical search
    • Compiler-driven tuning
    • Examples of application optimization
    • Software managed memory hierarchies

 
Tutorial Speakers

J. (Ram) Ramanujam received the B. Tech. degree in Electrical Engineering from the Indian Institute of Technology, Madras, India in 1983, and his M.S. and Ph. D. degrees in Computer Science from The Ohio State University in 1987 and 1990 respectively. He is currently the John E. and Beatrice L. Ritter Distinguished Professor in the Department of Electrical and Computer Engineering at Louisiana State University (LSU). In addition, he holds a joint faculty appointment with the LSU Center for Computation and Technology. His research interests are in compilers and runtime systems for high-performance computing, domain-specific languages and compilers for parallel computing, embedded systems, and high-level hardware synthesis. He has participated in several NSF-funded projects including the Tensor Contraction Engine and the Pluto project for automatic parallelization. Additional details can be found at http://www.ece.lsu.edu/jxr/.

P. (Saday) Sadayappan received the B. Tech. degree from the Indian Institute of Technology, Madras, India, and an M.S. and Ph. D. from the State University of New York at Stony Brook, all in Electrical Engineering. He is currently a Professor in the Department of Computer Science and Engineering at The Ohio State University. His research interests include compiler/runtime optimization for parallel computing, and domain-specific languages for high-performance scientific computing. He has led several NSF-funded projects including the Tensor Contraction Engine and the Pluto project for automatic parallelization. Additional details can be found at http://www.cse.ohio-state.edu/~saday/.