You will need to use a debugger. Some people are in denial about that, and it costs them dearly.
These instructions are for debugging CPU (host) code and CUDA GPU (device) code. Device code is debugged using cuda-gdb a version of the GNU Debugger, gdb, extended for Nvidia GPU debugging. Both gdb and cuda-gdb can be used for debugging CPU code.
The makefiles provided with course assignments will build up to three versions of each program. Suppose we are building a typical Homework 3 assignment. The source code will be in file hw03.cu and the executable names will be hw03, hw03-debug, hw03-cuda-debug. (In some cases one or both debug versions will be missing.) All versions are compiled with CPU debugging turned on, since this has no performance penalty when building with the GNU toolchain. File hw03-debug has optimization turned off and hw03-cuda-debug has optimization turned off, event (performance) counter data turned off, and CUDA debugging turned on.
File hw03 is best for collecting timing data, though if timing is for activities on the GPU then hw03-debug should be about as fast. Use hw03-debug for debugging CPU code with gdb and hw03-cuda-debug for debugging device code with cuda-gdb.
After reading through these procedures for debugging you will no doubt be eager to learn more about the debugger commands shown here and to learn about other debug commands. Most will find it most convenient to consult the online gdb documentation and the online cuda-gdb documentation. Note that the cuda-gdb documentation covers CUDA-specific gdb commands, so for help with other gdb commands consult the gdb documentation. The gdb documentation linked above is for the latest version, but the gdb version installed will be older. The cuda-gdb documentation linked above is also for the latest version, but it's much more likely that the latest version of CUDA is installed.
A good reference for C++ is https://en.cppreference.com/w/. And we can't leave out the online CUDA documentation, look for the C++ Programming Guide among the many documents there.
In the example above the debug executable was hw01-cuda-debug. For other assignments run the make file and look at the files that were generated. There might be separate CUDA and CPU debug files, for example, hw22-cuda-debug and hw22-debug. Either can be used for CPU code debugging, but hw22-debug will run faster. Typical makefiles create just one version of an executable, say our-cool-program or hw22, and that executable may or may not be best for debugging (though they still can be debugged unless something extreme like symbol stripping was done). To make an executable suitable for debugging a debug flag, -g in gcc is added to compile commands, and optimization flags, such as -O3, are omitted. The -g flag tells the compiler to put variable names and the like into the executable where the debugger can find them. By omitting -O3 the compiler will generate code as you wrote it, for example, it won't rearrange or delete statements. That makes debugging less confusing (and it makes the program run more slowly).
So, if the makefile does not prepare a special debug version then the easiest thing to do is to is edit Makefile and make sure the -g flag is present (it is on course makefiles) and make sure that -O3 is not present. In the course makefiles it is assigned to variable OPT_FLAG. Remember, homework makefiles will create separate debug version, only edit other makefiles.
In the transcript above the program was re-run with arguments 0 0. The program ran until the breakpoint was reached. When run within Emacs there will be a little black triangle to the left of the line at which execution stopped.
The transcript below shows how one can use the command print or the abbreviation p to show the value of variables and of expressions using variables.
Suppose that you suspect (wrongly) that s can't be less than 5. In the transcript below a breakpoint is set within the loop, and execution is continued (from where it left off in the previous step) to the loop. The values of s and array elements are printed, and then a conditional breakpoint is set.
The commands to be used are continue (or its abbreviation c), info break to list current breakpoints (we need to know their numbers), and cond N to specify a condition for breakpoint N (which must be a breakpoint number). Other useful commands are disable N to temporarily turn off breakpoint N and as you'd expect enable N.
These steps show how to debug GPU code in a homework assignment. They have been written for 2020 Homework 1, so adapt them to your current assignment or debug task.
GPU code debugging is done using an NVIDIA-provided version of gdb called cuda-gdb. It accepts all or most gdb commands (probably all) and can be used for debugging both GPU and CPU code. For GPU code CUDA-specific commands are provided, for example, commands to change the context to a particular block cuda block 12 or thread cuda thread 22. Variables can be examined and changed, but not as reliably. That is, there are often situations when cuda-gdb will not be able to show a variable value. Memory in device, shared, local, and constant, address spaces can be examined and changed.
These procedures demonstrate a few basic commands. Please consult the debugger documentation to gain further proficiency.
To set breakpoints, examine variables, and the like in CUDA code both cuda-gdb must be used and the code must be compiled with the -G flag.
David M. Koppelman - koppel@ece.lsu.edu | Modified 8 Mar 2020 19:14 (014 UTC) |
Provide Website Feedback • Accessibility Statement • Privacy Statement |