References for material covered in lectures and needed to complete the assignments. This page will be updated as the semester progresses.
When available, links are provided to full-text versions of the material. If a dialog pops up asking for a username and password use “ee4720” as the username and use the password given in class. This does not apply to the ACM digital library material. Some of the material is copyrighted and requires a subscription (e.g., ACM Digital Library) or one-time payment for access. LSU has a site subscription to the ACM DL (as of this writing).
Graphics Processor APIsCUDA-Related Documentation and Non-Graphical ProgrammingNVIDIA CUDA Programming Guide
CUDA is system for using NVIDIA GPUs for computation. The computation
might be part of a scientific or engineering simulation (the most
common application).
This programming guide describes CUDA itself but also provides
details of some NVIDIA GPUs.
PTX, NVIDIA's GPU Pseudo Assembly Language
PTX is an intermediate language (or pseudo assembly language) generated
by the CUDA compiler front end.
Though it resembles assembly language it
should not be used for very tight optimization because PTX and native machine
instructions can have significant differences and because a substantial
amount of optimization is performed by the later stages.
That said, PTX is useful when it is important that code use
particular machine instructions.
PTX is
converted to the true machine language, informally called SASS,
of the GPU by other NVIDIA
build tools.
Disassembly Tools and GPU Instruction Set Description
PTX is a pseudo assembly language (or intermediate language) generated
by the CUDA compiler. The languages described in this reference
are the true machine languages. They should be used in place of
PTX for understanding the GPU microarchitecture and for fine-tuning
code. The reference describes the language itself and tools for
generating this assembler from CUDA binaries (disassembling).
OpenGLThe OpenGL 4.6 Compatitbility Profile Specification
A description of the OpenGL API for using GPU hardware. The phrase
“compatibility profile” in the name refers to the version of the
specification that includes API features that are not needed for
modern GPUs but are preserved for software compatibility and perhaps
as a learning aid.
NVIDIA OpenGL Extension Specifications
OpenGL extensions allow a GPU manufacturer to provide OpenGL access
to new features without having to wait for a new version of the
OpenGL language to be approved. The extensions in this
reference are quite old, most have since been incorporated into OpenGL.
Consider this an historical reference.
Shader Language APIsOpenGL Shading Language 4.60.5
C-like language used to program GPUs. OpenGL Shading Language (OGSL) code
is managed using the OpenGL API. OGSL code can implement parts
of the rendering pipeline (such as a vertex shader) or can
run independently of the rendering pipeline (as a compute shader).
Survey of Shader Languages
A good survey of shader language APIs, though a bit outdated.
Also avaialble from the
XEngine project website.
GPU AlgorithmsAlgorithm Components and ComparisonsGPU v. CPU Comparison of Many Common Algorithms
A comparison of the performance of many common scientific, media,
and other algorithms on an NVIDIA CC 1.3 GPU and an Intel i7 quad
core processor. Factors responsible for CPU or GPU advantage
are explored. Though the title uses the attention-getting word
"debunking" the paper concentrates on understanding how these codes
execute.
GPU Algorithm Building-Blocks (Patterns)
An examination of GPU program building blocks (such as organizing
data to avoid underutilized requests), and their effectiveness
over several GPU generations.
AlgorithmsMatrix Multiplication, Factoring
Description of dense matrix multiplication and other algorithms
implemented on CUDA for the NVIDIA G80 series. The paper starts with
detailed benchmark results and uses these to determine the performance
potential of the various matrix operations.
The CC 1.x (Tesla-generation) GPU that they target is obsolete,
but the methodology used in the paper, machine-code level analysis
to maximize instruction density, is still valuable, as is the
overall methods for data staging.
Sorting (Radix and Merge)
The paper describes the implementation of two sorting algorithms
on NVIDIA GPUs, a radix sort and a merge sort. The implementations
are carefully tuned to make the most efficient use of the
GPU. (In a perfect world this could be said about every published
algorithm.) The source code is part of the CUDA SDK and is
presently
available on the ECE computers in directory
/home/classes/ee7700/com/nvidia-gpu-sdk/C/src/radixSort.
Single-Source Shortest Path [between graph nodes]
Implementations of SSSP (single-source shortest path) algorithms on
Kepler-generation GPUs. Irregular work (because the number of
edges incident to a vertex can vary greatly) and scattered data
(there's no way to lay out a graph so that consecutive threads
will always access consecutive items) are the challenges faced
in GPU implementation.
Computer Architecture Topics Not Specific to AcceleratorsPrefetchingEffectiveness of Software and Hardware Prefetching
A detailed analysis of how effective various software and hardware
(alone and in combination)
data prefetching schemes are on SPECcpu2006 benchmarks.
Prefetch Survey Paper
A survey of software and hardware data prefetch techniques.
The material is well-explained but is slightly dated (in 2015).
|
David M. Koppelman - koppel@ece.lsu.edu | Modified 8 Jan 2020 17:53 (2353 UTC) |
Provide Website Feedback • Accessibility Statement • Privacy Statement |