IRFuzzer: Specialized Fuzzing for LLVM Backend Code Generation

📅 2024-02-07
🏛️ International Conference on Software Engineering
📈 Citations: 1
Influential: 0
📄 PDF
🤖 AI Summary
To address the challenge of insufficient end-to-end fuzzing coverage for the LLVM compiler backend, this paper introduces IRFuzzer—the first dedicated fuzzing framework targeting LLVM’s backend. Methodologically, it employs constraint-guided mutation coupled with LLVM IR syntax awareness to generate well-structured inputs supporting complex control flow, vector types, and function definitions. Furthermore, it instruments critical paths in the instruction selector and introduces a fine-grained feedback mechanism that jointly leverages matcher table coverage and architecture-specific built-in function invocation. Evaluated across 29 mainstream backends, IRFuzzer discovered 78 previously unknown defects—undetected by all existing fuzzers—leading to 57 fixes merged into LLVM’s main branch and 5 backported to LLVM 15. This work significantly advances the reliability verification of LLVM backend code generation.

Technology Category

Application Category

📝 Abstract
Modern compilers, such as LLVM, are complex. Due to their complexity, manual testing is unlikely to suffice, yet formal verification is difficult to scale. End-to-end fuzzing can be used, but it has difficulties in discovering LLVM backend problems for two reasons. First, frontend preprocessing and middle optimization shield the backend from seeing diverse inputs. Second, branch coverage cannot provide effective feedback as LLVM backend contains much reusable code. In this paper, we implement IRFuzzer to investigate the need of specialized fuzzing of the LLVM compiler backend. We focus on two approaches to improve the fuzzer: guaranteed input validity using constrained mutations to improve input diversity and new metrics to improve feedback quality. The mutator in IRFuzzer can generate a wide range of LLVM IR inputs, including structured control flow, vector types, and function definitions. The system instruments coding patterns in the compiler to monitor the execution status of instruction selection. The instrumentation not only provides new coverage feedback on the matcher table but also guides the mutator on architecture-specific intrinsics. We ran IRFuzzer on 29 mature LLVM backend targets. IRFuzzer discovered 78 new, confirmed bugs in LLVM upstream, none of which existing fuzzers could discover. This demonstrates that IRFuzzer is far more effective than existing fuzzers. Upon receiving our bug report, the developers have fixed 57 bugs and back-ported five fixes to LLVM 15, which shows that specialized fuzzing provides actionable insights to LLVM developers.
Problem

Research questions and friction points this paper is trying to address.

Improving fuzzing coverage in LLVM backend code generation
Ensuring input validity with constrained mutation techniques
Enhancing feedback quality via architecture-specific guidance
Innovation

Methods, ideas, or system contributions that make the work stand out.

Constrained mutations ensure input validity
Matcher table coverage improves feedback quality
Architecture-specific guidance enhances mutator effectiveness
🔎 Similar Papers
No similar papers found.
Y
Yuyang Rong
Advanced Micro Devices, Inc. and UC Davis, USA
Zhanghan Yu
Zhanghan Yu
UC Davis, USA
Z
Zhenkai Weng
UC Davis, USA
Stephen Neuendorffer
Stephen Neuendorffer
Xilinx
H
Hao Chen
UC Davis, USA