🤖 AI Summary
To address the challenge of insufficient end-to-end fuzzing coverage for the LLVM compiler backend, this paper introduces IRFuzzer—the first dedicated fuzzing framework targeting LLVM’s backend. Methodologically, it employs constraint-guided mutation coupled with LLVM IR syntax awareness to generate well-structured inputs supporting complex control flow, vector types, and function definitions. Furthermore, it instruments critical paths in the instruction selector and introduces a fine-grained feedback mechanism that jointly leverages matcher table coverage and architecture-specific built-in function invocation. Evaluated across 29 mainstream backends, IRFuzzer discovered 78 previously unknown defects—undetected by all existing fuzzers—leading to 57 fixes merged into LLVM’s main branch and 5 backported to LLVM 15. This work significantly advances the reliability verification of LLVM backend code generation.
📝 Abstract
Modern compilers, such as LLVM, are complex. Due to their complexity, manual testing is unlikely to suffice, yet formal verification is difficult to scale. End-to-end fuzzing can be used, but it has difficulties in discovering LLVM backend problems for two reasons. First, frontend preprocessing and middle optimization shield the backend from seeing diverse inputs. Second, branch coverage cannot provide effective feedback as LLVM backend contains much reusable code. In this paper, we implement IRFuzzer to investigate the need of specialized fuzzing of the LLVM compiler backend. We focus on two approaches to improve the fuzzer: guaranteed input validity using constrained mutations to improve input diversity and new metrics to improve feedback quality. The mutator in IRFuzzer can generate a wide range of LLVM IR inputs, including structured control flow, vector types, and function definitions. The system instruments coding patterns in the compiler to monitor the execution status of instruction selection. The instrumentation not only provides new coverage feedback on the matcher table but also guides the mutator on architecture-specific intrinsics. We ran IRFuzzer on 29 mature LLVM backend targets. IRFuzzer discovered 78 new, confirmed bugs in LLVM upstream, none of which existing fuzzers could discover. This demonstrates that IRFuzzer is far more effective than existing fuzzers. Upon receiving our bug report, the developers have fixed 57 bugs and back-ported five fixes to LLVM 15, which shows that specialized fuzzing provides actionable insights to LLVM developers.