🤖 AI Summary
Existing fuzzing tools struggle to effectively detect memory safety vulnerabilities in CUDA programs due to their lack of native support for GPU architectures. This work proposes a compiler–runtime co-design approach that enables efficient fuzzing of CUDA programs for the first time. By transforming GPU code into CPU-executable form at the LLVM IR level and integrating AddressSanitizer for memory error detection, the method achieves comprehensive coverage of memory-related bugs. Furthermore, two novel optimizations—Partial Representative Execution (PREX) and Access-Index Preserving Pruning (AXIPrune)—dramatically improve testing throughput. Empirical evaluation reveals 122 real-world vulnerabilities, with the combined optimizations yielding up to a 224.31× speedup in fuzzing performance.
📝 Abstract
GPUs have gained significant popularity over the past decade, extending beyond their original role in graphics rendering. This evolution has brought GPU security and reliability to the forefront of concerns. Prior research has shown that CUDA's lack of memory safety can lead to serious vulnerabilities. While fuzzing is effective for finding such bugs on CPUs, equivalent tools for GPUs are lacking due to architectural differences and lack of built-in error detection. In this paper, we propose CuFuzz, a novel compiler-runtime co-design solution to extend state-of-the-art CPU fuzzing tools to GPU programs. CuFuzz transforms GPU programs into CPU programs using compiler IR-level transformations to enable effective fuzz testing. To the best of our knowledge, CuFuzz is the first mechanism to bring fuzzing support to CUDA, addressing a critical gap in GPU security research. By leveraging CPU memory error detectors such as Address Sanitizer, CuFuzz aims to uncover memory safety bugs and related correctness vulnerabilities in CUDA code, enhancing the security and reliability of GPU-accelerated applications. To ensure high fuzzing throughput, we introduce two compiler-runtime co-optimizations tailored for GPU code: Partial Representative Execution (PREX) and Access-Index Preserving Pruning (AXIPrune), achieving average throughput improvements of 32x with PREX and an additional 33% gain with AXIPrune on top of PREX-optimized code. Together, these optimizations can yield up to a 224.31x speedup. In our fuzzing campaigns, CuFuzz uncovered 122 security vulnerabilities in widely used benchmarks.