🤖 AI Summary
Safety testing of autonomous driving systems heavily relies on real-world accident data, making it challenging to extract high-value challenging scenarios from vast volumes of non-accident daily traffic videos. To address this, we propose a novel automated safety-critical scenario mining method. Our approach features: (1) a test generation mechanism that amplifies behavioral discrepancies between autonomous and human drivers under semantic equivalence constraints; and (2) a multimodal, few-shot Chain-of-Thought–enhanced large model framework that synergistically performs abstraction and concretization. Built upon a large multimodal model (LMM), the method integrates semantic consistency optimization and is validated in Apollo’s industrial-grade L4 closed-loop simulation. Experiments demonstrate that our method achieves a 3.2× improvement in safety defect detection rate over accident-driven baselines, confirming that non-accident traffic videos contain rich, exploitable testing signals.
📝 Abstract
Safety testing serves as the fundamental pillar for the development of autonomous driving systems (ADSs). To ensure the safety of ADSs, it is paramount to generate a diverse range of safety-critical test scenarios. While existing ADS practitioners primarily focus on reproducing real-world traffic accidents in simulation environments to create test scenarios, it's essential to highlight that many of these accidents do not directly result in safety violations for ADSs due to the differences between human driving and autonomous driving. More importantly, we observe that some accident-free real-world scenarios can not only lead to misbehaviors in ADSs but also be leveraged for the generation of ADS violations during simulation testing. Therefore, it is of significant importance to discover safety violations of ADSs from routine traffic scenarios (i.e., non-crash scenarios). We introduce LEADE, a novel methodology to achieve the above goal. It automatically generates abstract and concrete scenarios from real-traffic videos. Then it optimizes these scenarios to search for safety violations of the ADS in semantically consistent scenarios where human-driving worked safely. Specifically, LEADE enhances the ability of Large Multimodal Models (LMMs) to accurately construct abstract scenarios from traffic videos and generate concrete scenarios by multi-modal few-shot Chain of Thought (CoT). Based on them, LEADE assesses and increases the behavior differences between the ego vehicle and human-driving in semantic equivalent scenarios (here equivalent semantics means that each participant in test scenarios has the same behaviors as those observed in the original real traffic scenarios). We implement and evaluate LEADE on the industrial-grade Level-4 ADS, Apollo.