VLM as Strategist: Adaptive Generation of Safety-critical Testing Scenarios via Guided Diffusion

📅 2025-12-02
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the sparsity, generation difficulty, weak interactivity, and poor real-time responsiveness to the vehicle under test (VUT) in safety-critical scenario generation for autonomous driving system (ADS) testing, this paper proposes a hierarchical generative framework integrating vision-language models (VLMs) with guided diffusion models. We introduce, for the first time, a VLM as a strategic policy generator within a three-tier “strategic–tactical–executive” architecture, enabling semantic-driven risk reasoning, goal-conditioned scene specification, and adaptive guidance of the diffusion process. A VLM-mediated closed-loop feedback mechanism further supports dynamic scenario refinement and fine-grained control of background agents. Experiments demonstrate that our method efficiently generates high-fidelity, diverse, and highly interactive safety-critical test scenarios, significantly outperforming state-of-the-art approaches in both criticality and real-time responsiveness to the VUT.

Technology Category

Application Category

📝 Abstract
The safe deployment of autonomous driving systems (ADSs) relies on comprehensive testing and evaluation. However, safety-critical scenarios that can effectively expose system vulnerabilities are extremely sparse in the real world. Existing scenario generation methods face challenges in efficiently constructing long-tail scenarios that ensure fidelity, criticality, and interactivity, while particularly lacking real-time dynamic response capabilities to the vehicle under test (VUT). To address these challenges, this paper proposes a safety-critical testing scenario generation framework that integrates the high-level semantic understanding capabilities of Vision Language Models (VLMs) with the fine-grained generation capabilities of adaptive guided diffusion models. The framework establishes a three-layer hierarchical architecture comprising a strategic layer for VLM-directed scenario generation objective determination, a tactical layer for guidance function formulation, and an operational layer for guided diffusion execution. We first establish a high-quality fundamental diffusion model that learns the data distribution of real driving scenarios. Next, we design an adaptive guided diffusion method that enables real-time, precise control of background vehicles (BVs) in closed-loop simulation. The VLM is then incorporated to autonomously generate scenario generation objectives and guidance functions through deep scenario understanding and risk reasoning, ultimately guiding the diffusion model to achieve VLM-directed scenario generation. Experimental results demonstrate that the proposed method can efficiently generate realistic, diverse, and highly interactive safety-critical testing scenarios. Furthermore, case studies validate the adaptability and VLM-directed generation performance of the proposed method.
Problem

Research questions and friction points this paper is trying to address.

Generates safety-critical scenarios for autonomous driving system testing
Addresses sparsity of real-world scenarios exposing system vulnerabilities
Enables real-time dynamic responses to vehicle under test
Innovation

Methods, ideas, or system contributions that make the work stand out.

VLM-directed scenario generation objectives
Adaptive guided diffusion for real-time control
Hierarchical architecture integrating VLM and diffusion
🔎 Similar Papers
No similar papers found.
X
Xinzheng Wu
School of Automotive Studies, Tongji University, No. 4800 Cao'an Road., Shanghai, 201804, China
Junyi Chen
Junyi Chen
Shanghai Jiao Tong University
Generative AIMultimodal Learning
N
Naiting Zhong
School of Automotive Studies, Tongji University, No. 4800 Cao'an Road., Shanghai, 201804, China
Y
Yong Shen
School of Automotive Studies, Tongji University, No. 4800 Cao'an Road., Shanghai, 201804, China