🤖 AI Summary
To address the sparsity, generation difficulty, weak interactivity, and poor real-time responsiveness to the vehicle under test (VUT) in safety-critical scenario generation for autonomous driving system (ADS) testing, this paper proposes a hierarchical generative framework integrating vision-language models (VLMs) with guided diffusion models. We introduce, for the first time, a VLM as a strategic policy generator within a three-tier “strategic–tactical–executive” architecture, enabling semantic-driven risk reasoning, goal-conditioned scene specification, and adaptive guidance of the diffusion process. A VLM-mediated closed-loop feedback mechanism further supports dynamic scenario refinement and fine-grained control of background agents. Experiments demonstrate that our method efficiently generates high-fidelity, diverse, and highly interactive safety-critical test scenarios, significantly outperforming state-of-the-art approaches in both criticality and real-time responsiveness to the VUT.
📝 Abstract
The safe deployment of autonomous driving systems (ADSs) relies on comprehensive testing and evaluation. However, safety-critical scenarios that can effectively expose system vulnerabilities are extremely sparse in the real world. Existing scenario generation methods face challenges in efficiently constructing long-tail scenarios that ensure fidelity, criticality, and interactivity, while particularly lacking real-time dynamic response capabilities to the vehicle under test (VUT). To address these challenges, this paper proposes a safety-critical testing scenario generation framework that integrates the high-level semantic understanding capabilities of Vision Language Models (VLMs) with the fine-grained generation capabilities of adaptive guided diffusion models. The framework establishes a three-layer hierarchical architecture comprising a strategic layer for VLM-directed scenario generation objective determination, a tactical layer for guidance function formulation, and an operational layer for guided diffusion execution. We first establish a high-quality fundamental diffusion model that learns the data distribution of real driving scenarios. Next, we design an adaptive guided diffusion method that enables real-time, precise control of background vehicles (BVs) in closed-loop simulation. The VLM is then incorporated to autonomously generate scenario generation objectives and guidance functions through deep scenario understanding and risk reasoning, ultimately guiding the diffusion model to achieve VLM-directed scenario generation. Experimental results demonstrate that the proposed method can efficiently generate realistic, diverse, and highly interactive safety-critical testing scenarios. Furthermore, case studies validate the adaptability and VLM-directed generation performance of the proposed method.