CGO 2014 Tutorial: Programming Models and Compiler Optimizations for GPUs and Multicores

Programming Models and Compiler Optimizations for GPUs and Multicores

Full-day Tutorial at

2014 CGO Conference
February 15, 2014
Orlando, FL, USA

J. (Ram) Ramanujam
Division of Electrical and Computer Engineering
School of Electrical Engineering and Computer Science
and Center for Computation and Technology
Louisiana State University
Baton Rouge, LA 70803, USA

P. (Saday) Sadayappan
Department of Computer Science and Engineering
The Ohio State University
Columbus, OH 43210, USA

Audience

This tutorial is targeted primarily at application developers, computer/computational scientists, and graduate students interested in performance optimization issues and/or compilers for highperformance computing on general-purpose multicores and GPUs. Knowledge of C programming will be assumed; basic knowledge of processor architectures will be assumed; no prior parallel programming experience or familiarity with source-to-source transformations will be assumed.

Brief Description

On-chip parallelism with multiple cores is now ubiquitous. Because of power and cooling constraints, recent performance improvements in both general-purpose and special-purpose processors have come primarily from increased on-chip parallelism from multiple cores rather than increased clock rates. Parallelism is therefore of considerable interest to a much broader group than developers of parallel applications for high-end supercomputers. Several programming environments have recently emerged in response to the need to develop applications for graphics processing units (GPUs) and multicore processors. This tutorial will address the following topics:

What are the currently available programming models and API's for explicit parallel programming of multi-core CPUs and GPUs?

What are the fundamental issues in achieving a significant fraction of peak performance with multicore CPUs and GPUs?

What are some of the current efforts at providing more convenient high-level frameworks for programming GPUs? What are the compiler optimization challenges that these frameworks address?

This tutorial will provide an introductory survey covering these aspects.

Lecture Outline

Introduction

Multicore architectures and accelerators
Issues in performance and energy
GPUs
Explicitly managed memories
Programming Models:

Multi-threaded SIMT
Short-vector SIMD
Higher level models

GPU architectures and programming

GPU architectures
General-purpose computation on GPUs
Programming models and idioms
GPU programming models/environments:

OpenCL
CUDA
Directive-based programming: OpenACC

Code examples on GPUs
Examples of CPU vs. GPU performance

Compiler optimizations and tuning for multicore

Brief review of data dependences, transformations
Polyhedral models, tiling, parametric tiling
Discussion of Pluto compiler for multicores
Locality and parallelism optimizations
SIMDization
Compiler-driven tuning

Compiler optimizations and tuning for GPUs

Performance characterization
Optimizing memory accesses
Multi-level parallelism exploitation
Tiling strategies for GPUs
Compiler-driven tuning
Optimization strategies for GPUs in

R-Stream
Par4All
PPCG
PGI Accelerator
CAPS HMPP

Examples of application optimization

Tutorial Speakers

J. (Ram) Ramanujam received the B. Tech. degree in Electrical Engineering from the Indian Institute of Technology, Madras, India in 1983, and his M.S. and Ph. D. degrees in Computer Science from The Ohio State University, USA in 1987 and 1990 respectively. He is currently the John E. and Beatrice L. Ritter Distinguished Professor in the School of Electrical Engineering and COmputer Science (ECE Division) at Louisiana State University (LSU), USA. In addition, he holds a joint faculty appointment with the LSU Center for Computation and Technology, where he leads the Systems Science and Engineering Focus area. His research interests are in compilers and runtime systems for high-performance computing, domain-specific languages and compilers for parallel computing, embedded systems, and high-level hardware synthesis. He has participated in several NSF-funded projects including the Tensor Contraction Engine and the Pluto project for automatic parallelization. Additional details can be found at http://www.ece.lsu.edu/jxr/.
P. (Saday) Sadayappan received the B. Tech. degree from the Indian Institute of Technology, Madras, India, and an M.S. and Ph. D. from the State University of New York at Stony Brook (now called the Stony Brook University), USA, all in Electrical Engineering. He is currently a Professor in the Department of Computer Science and Engineering at The Ohio State University, USA. His research interests include compiler/runtime optimization for parallel computing, and domain-specific languages for high-performance scientific computing. He has led several NSF-funded projects including the Tensor Contraction Engine and the Pluto project for automatic parallelization. Additional details can be found at http://www.cse.ohio-state.edu/~saday/.

Programming Models and Compiler Optimizations for GPUs and Multicores Full-day Tutorial at 2014 CGO Conference February 15, 2014 Orlando, FL, USA

Programming Models and Compiler Optimizations for GPUs and Multicores

Full-day Tutorial at

2014 CGO Conference
February 15, 2014
Orlando, FL, USA