Risk-Controllable Multi-View Diffusion for Driving Scenario Generation

📅 2026-03-12
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing methods struggle to generate geometrically consistent long-tail, high-risk driving scenarios and typically treat risk as a post-hoc label, lacking controllable synthesis of multi-view dynamic scenes. This work proposes RiskMV-DPO, the first framework enabling risk-controllable multi-view driving scene generation. It leverages physics-driven risk modeling with high-risk trajectories as geometric anchors and integrates diffusion models to produce spatiotemporally coherent and geometrically precise video sequences. The approach introduces a novel geometry-appearance alignment module and a region-aware direct preference optimization (RA-DPO) strategy, shifting world models from passive prediction toward active risk synthesis. Evaluated on nuScenes, the method significantly improves 3D detection mAP (from 18.17 to 30.50) and reduces FID to 15.70, successfully generating diverse, high-fidelity long-tail risk scenarios.

Technology Category

Application Category

📝 Abstract
Generating safety-critical driving scenarios is crucial for evaluating and improving autonomous driving systems, but long-tail risky situations are rarely observed in real-world data and difficult to specify through manual scenario design. Existing generative approaches typically treat risk as an after-the-fact label and struggle to maintain geometric consistency in multi-view driving scenes. We present RiskMV-DPO, a general and systematic pipeline for physically-informed, risk-controllable multi-view scenario generation. By integrating target risk levels with physically-grounded risk modeling, we autonomously synthesize diverse and high-stakes dynamic trajectories that serve as explicit geometric anchors for a diffusion-based video generator. To ensure spatial-temporal coherence and geometric fidelity, we introduce a geometry-appearance alignment module and a region-aware direct preference optimization (RA-DPO) strategy with motion-aware masking to focus learning on localized dynamic regions.Experiments on the nuScenes dataset show that RiskMV-DPO can freely generate a wide spectrum of diverse long-tail scenarios while maintaining state-of-the-art visual quality, improving 3D detection mAP from 18.17 to 30.50 and reducing FID to 15.70. Our work shifts the role of world models from passive environment prediction to proactive, risk-controllable synthesis, providing a scalable toolchain for the safety-oriented development of embodied intelligence.
Problem

Research questions and friction points this paper is trying to address.

risk-controllable generation
multi-view consistency
driving scenario generation
long-tail scenarios
geometric fidelity
Innovation

Methods, ideas, or system contributions that make the work stand out.

risk-controllable generation
multi-view diffusion
geometry-appearance alignment
region-aware DPO
driving scenario synthesis
🔎 Similar Papers
No similar papers found.
H
Hongyi Lin
School of Vehicle and Mobility, Tsinghua University, China
W
Wenxiu Shi
Z-one Technology Co., Ltd., China
Heye Huang
Heye Huang
University of Wisconsin–Madison
Autonomous SystemsMulti-AgentsRisk AssessmentInteractive Decision-MakingHuman-Centered AI
D
Dingyi Zhuang
Department of Urban Studies and Planning, Massachusetts Institute of Technology, USA
S
Song Zhang
Chengdu Tianfu Invo Technology Co., Ltd.
Yang Liu
Yang Liu
Tsinghua University
X
Xiaobo Qu
School of Vehicle and Mobility, Tsinghua University, China
Jinhua Zhao
Jinhua Zhao
Professor of Cities and Transportation, Massachusetts Institute of Technology
Urban MobilityTravel BehaviorTransportation PolicyPublic TransitUrban Science