Expert skill for GPU debugging using CUDA-GDB and NVIDIA Compute Sanitizer. Detect memory errors, race conditions, uninitialized memory access, validate atomic operations, analyze kernel synchronization issues, and generate debugging reports with recommendations.
Debugs CUDA GPU code using NVIDIA Compute Sanitizer and CUDA-GDB to detect memory errors and race conditions.
npx claudepluginhub a5c-ai/babysitterThis skill is limited to using the following tools:
README.mdYou are cuda-debugging - a specialized skill for GPU debugging and error detection using NVIDIA's Compute Sanitizer and CUDA-GDB tools. This skill provides expert capabilities for identifying and resolving correctness issues in CUDA programs.
This skill enables AI-powered GPU debugging operations including:
Detect memory access errors and leaks:
# Basic memory check
compute-sanitizer --tool memcheck ./cuda_program
# With detailed error reporting
compute-sanitizer --tool memcheck --report-api-errors all ./cuda_program
# Log errors to file
compute-sanitizer --tool memcheck --log-file memcheck.log ./cuda_program
# Check for memory leaks
compute-sanitizer --tool memcheck --leak-check full ./cuda_program
# Track allocations
compute-sanitizer --tool memcheck --track-alloc-dealloc yes ./cuda_program
Common memory errors detected:
Detect shared memory data access hazards:
# Basic race check
compute-sanitizer --tool racecheck ./cuda_program
# With detailed analysis
compute-sanitizer --tool racecheck --racecheck-report all ./cuda_program
# Save analysis to file
compute-sanitizer --tool racecheck --save racecheck.nvsanreport ./cuda_program
# Analyze previous run
compute-sanitizer --tool racecheck --import racecheck.nvsanreport --print-analysis ./cuda_program
Race condition types detected:
Detect uninitialized global memory access:
# Basic initcheck
compute-sanitizer --tool initcheck ./cuda_program
# Track all memory accesses
compute-sanitizer --tool initcheck --track-unused-memory yes ./cuda_program
# With error details
compute-sanitizer --tool initcheck --show-backtrace yes ./cuda_program
Detect illegal synchronization in CUDA code:
# Basic synccheck
compute-sanitizer --tool synccheck ./cuda_program
# With detailed reporting
compute-sanitizer --tool synccheck --show-backtrace all ./cuda_program
Synchronization issues detected:
__syncthreads() callsInteractive debugging with CUDA-GDB:
# Launch CUDA-GDB
cuda-gdb ./cuda_program
# Common debugging commands
(cuda-gdb) set cuda memcheck on # Enable memory checking
(cuda-gdb) set cuda break_on_launch # Break at kernel launch
(cuda-gdb) break kernel_name # Set breakpoint at kernel
(cuda-gdb) run # Start execution
# Thread navigation
(cuda-gdb) info cuda threads # List all GPU threads
(cuda-gdb) cuda thread (0,0,0) (0,0,0) # Switch to specific thread
(cuda-gdb) cuda block # Show current block
(cuda-gdb) cuda kernel # Show current kernel
# Memory inspection
(cuda-gdb) print *d_array@10 # Print device array
(cuda-gdb) print __shared_memory__ # Inspect shared memory
(cuda-gdb) info cuda devices # List CUDA devices
# Stepping through code
(cuda-gdb) cuda step # Step one warp instruction
(cuda-gdb) cuda next # Step over function calls
(cuda-gdb) continue # Continue execution
// Add bounds checking to kernel
__global__ void safeKernel(float* data, int n) {
int idx = blockIdx.x * blockDim.x + threadIdx.x;
// Bounds check
if (idx >= n) return;
// Safe access
data[idx] = data[idx] * 2.0f;
}
__global__ void reductionKernel(float* input, float* output, int n) {
__shared__ float sdata[256];
int tid = threadIdx.x;
int idx = blockIdx.x * blockDim.x + threadIdx.x;
// Load to shared memory
sdata[tid] = (idx < n) ? input[idx] : 0.0f;
__syncthreads(); // Required before reading shared memory
// Reduction in shared memory
for (int s = blockDim.x / 2; s > 0; s >>= 1) {
if (tid < s) {
sdata[tid] += sdata[tid + s];
}
__syncthreads(); // Required after each reduction step
}
if (tid == 0) {
output[blockIdx.x] = sdata[0];
}
}
// Validate atomic operations
__global__ void atomicTest(int* counter, int n) {
int idx = blockIdx.x * blockDim.x + threadIdx.x;
if (idx < n) {
// Use atomicAdd for thread-safe increment
atomicAdd(counter, 1);
}
}
// Verify result on host
int h_counter;
cudaMemcpy(&h_counter, d_counter, sizeof(int), cudaMemcpyDeviceToHost);
assert(h_counter == n); // Should equal number of threads
Comprehensive CUDA error checking:
// Error checking macro
#define CUDA_CHECK(call) \
do { \
cudaError_t err = call; \
if (err != cudaSuccess) { \
fprintf(stderr, "CUDA Error at %s:%d: %s\n", \
__FILE__, __LINE__, cudaGetErrorString(err)); \
exit(EXIT_FAILURE); \
} \
} while(0)
// Usage
CUDA_CHECK(cudaMalloc(&d_data, size));
CUDA_CHECK(cudaMemcpy(d_data, h_data, size, cudaMemcpyHostToDevice));
// Check for kernel errors
myKernel<<<blocks, threads>>>(d_data, n);
CUDA_CHECK(cudaGetLastError()); // Check launch errors
CUDA_CHECK(cudaDeviceSynchronize()); // Check execution errors
Generate comprehensive debugging reports:
# Full debugging session
compute-sanitizer --tool memcheck \
--report-api-errors all \
--show-backtrace yes \
--log-file debug_report.txt \
./cuda_program 2>&1 | tee debug_output.log
# Summary report generation
echo "=== CUDA Debugging Report ===" > debug_summary.md
echo "Date: $(date)" >> debug_summary.md
echo "" >> debug_summary.md
echo "## Memory Check Results" >> debug_summary.md
compute-sanitizer --tool memcheck ./cuda_program 2>&1 >> debug_summary.md
echo "" >> debug_summary.md
echo "## Race Check Results" >> debug_summary.md
compute-sanitizer --tool racecheck ./cuda_program 2>&1 >> debug_summary.md
This skill can leverage the following MCP servers:
| Server | Description | Installation |
|---|---|---|
| claude-debugs-for-you | Interactive debugging via Claude | GitHub |
# Debug build flags
DEBUG_FLAGS = -G -lineinfo -Xcompiler -rdynamic -O0
# Release build with symbols
RELEASE_FLAGS = -O3 -lineinfo
# Compile for debugging
nvcc $(DEBUG_FLAGS) -o program_debug program.cu
# Compile for profiling (with symbols)
nvcc $(RELEASE_FLAGS) -o program_release program.cu
| Issue | Symptom | Solution |
|---|---|---|
| Uncoalesced access | Memory errors at specific offsets | Align data to 128 bytes |
| Missing sync | Intermittent wrong results | Add __syncthreads() |
| Out of bounds | Access violation errors | Add bounds checking |
| Uninitialized shared memory | Random values | Initialize before use |
This skill integrates with the following processes:
gpu-debugging-techniques.js - Comprehensive debugging workflowsgpu-performance-regression-testing.js - Correctness verificationatomic-operations-synchronization.js - Synchronization validationWhen executing operations, provide structured output:
{
"operation": "memory-check",
"status": "errors_found",
"tool": "compute-sanitizer",
"summary": {
"total_errors": 3,
"memory_errors": 2,
"leak_errors": 1
},
"errors": [
{
"type": "Invalid __global__ read",
"size": 4,
"address": "0x7f1234567890",
"location": {
"file": "kernel.cu",
"line": 42,
"function": "processData"
},
"thread": "(128, 0, 0)",
"block": "(3, 0, 0)"
}
],
"recommendations": [
"Add bounds check at line 42",
"Verify array size matches grid dimensions"
],
"artifacts": ["debug_report.txt", "memcheck.log"]
}
| Error | Cause | Resolution |
|---|---|---|
Invalid __global__ read | Out-of-bounds access | Add bounds checking |
Potential WAW hazard | Missing synchronization | Add __syncthreads() |
Memory leak | Missing cudaFree | Free all allocations |
Uninitialized __global__ read | Reading before write | Initialize memory |
Activates when the user asks about AI prompts, needs prompt templates, wants to search for prompts, or mentions prompts.chat. Use for discovering, retrieving, and improving prompts.
Search, retrieve, and install Agent Skills from the prompts.chat registry using MCP tools. Use when the user asks to find skills, browse skill catalogs, install a skill for Claude, or extend Claude's capabilities with reusable AI agent components.
This skill should be used when the user asks to "create an agent", "add an agent", "write a subagent", "agent frontmatter", "when to use description", "agent examples", "agent tools", "agent colors", "autonomous agent", or needs guidance on agent structure, system prompts, triggering conditions, or agent development best practices for Claude Code plugins.