🤖 AI Summary
Existing autonomous driving safety testing methods typically evaluate only isolated diversity dimensions—such as input stimuli, ego-vehicle actions, or system violations—failing to capture their underlying causal interdependencies, thereby resulting in insufficient test coverage. To address this, we propose a causality-aware fuzz testing framework that constructs a causal graph model linking input scenarios, ego-vehicle actions, and system violations. Our approach incorporates causality-sensitive analysis and feedback-driven mutation strategies to guide test generation toward comprehensive causal path coverage. Integrated with high-fidelity Apollo simulation, the framework significantly improves violation-type diversity (+32.7%), causal path coverage (+41.5%), and efficiency in discovering critical scenarios (2.8× speedup in first-failure detection), all without increasing test scale—thereby substantially enhancing verification thoroughness and effectiveness.
📝 Abstract
Simulation-based testing is essential for evaluating the safety of Autonomous Driving Systems (ADSs). Comprehensive evaluation requires testing across diverse scenarios that can trigger various types of violations under different conditions. While existing methods typically focus on individual diversity metrics, such as input scenarios, ADS-generated motion commands, and system violations, they often fail to capture the complex interrelationships among these elements. This oversight leads to gaps in testing coverage, potentially missing critical issues in the ADS under evaluation. However, quantifying these interrelationships presents a significant challenge. In this paper, we propose a novel causality-aware fuzzing technique, Causal-Fuzzer, to enable efficient and comprehensive testing of ADSs by exploring causally diverse scenarios. The core of Causal-Fuzzer is constructing a causal graph to model the interrelationships among the diversities of input scenarios, ADS motion commands, and system violations. Then the causal graph will guide the process of critical scenario generation. Specifically, Causal-Fuzzer proposes (1) a causality-based feedback mechanism that quantifies the combined diversity of test scenarios by assessing whether they activate new causal relationships, and (2) a causality-driven mutation strategy that prioritizes mutations on input scenario elements with higher causal impact on ego action changes and violation occurrence, rather than treating all elements equally. We evaluated Causal-Fuzzer on an industry-grade ADS Apollo, with a high-fidelity. Our empirical results demonstrate that Causal-Fuzzer significantly outperforms existing methods in (1) identifying a greater diversity of violations, (2) providing enhanced testing sufficiency with improved coverage of causal relationships, and (3) achieving greater efficiency in detecting the first critical scenarios.