DiffusionDriveV2: Reinforcement Learning-Constrained Truncated Diffusion Modeling in End-to-End Autonomous Driving

📅 2025-12-08
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address mode collapse and behavioral homogenization in diffusion-based end-to-end autonomous driving, this paper proposes a diffusion framework integrating reinforcement learning (RL) constraints with multi-anchor trajectory modeling. Methodologically, we design intra- and inter-anchor truncated GRPO (Generalized Reinforcement Policy Optimization) to prevent erroneous cross-intent comparisons, leveraging multi-anchor advantage estimation. We further introduce scale-adaptive multiplicative noise, truncated diffusion modeling, and a Gaussian mixture prior, jointly optimized with RL objectives. Implemented with a ResNet-34 backbone for closed-loop control, our approach achieves state-of-the-art performance on NAVSIM v1 and v2, attaining 91.2 PDMS and 85.5 EPDMS, respectively. These results significantly improve the trade-off between trajectory diversity and output quality, outperforming prior diffusion-based methods in both metrics and behavioral fidelity.

Technology Category

Application Category

📝 Abstract
Generative diffusion models for end-to-end autonomous driving often suffer from mode collapse, tending to generate conservative and homogeneous behaviors. While DiffusionDrive employs predefined anchors representing different driving intentions to partition the action space and generate diverse trajectories, its reliance on imitation learning lacks sufficient constraints, resulting in a dilemma between diversity and consistent high quality. In this work, we propose DiffusionDriveV2, which leverages reinforcement learning to both constrain low-quality modes and explore for superior trajectories. This significantly enhances the overall output quality while preserving the inherent multimodality of its core Gaussian Mixture Model. First, we use scale-adaptive multiplicative noise, ideal for trajectory planning, to promote broad exploration. Second, we employ intra-anchor GRPO to manage advantage estimation among samples generated from a single anchor, and inter-anchor truncated GRPO to incorporate a global perspective across different anchors, preventing improper advantage comparisons between distinct intentions (e.g., turning vs. going straight), which can lead to further mode collapse. DiffusionDriveV2 achieves 91.2 PDMS on the NAVSIM v1 dataset and 85.5 EPDMS on the NAVSIM v2 dataset in closed-loop evaluation with an aligned ResNet-34 backbone, setting a new record. Further experiments validate that our approach resolves the dilemma between diversity and consistent high quality for truncated diffusion models, achieving the best trade-off. Code and model will be available at https://github.com/hustvl/DiffusionDriveV2
Problem

Research questions and friction points this paper is trying to address.

Addresses mode collapse in diffusion models for autonomous driving.
Resolves diversity versus consistent high-quality trajectory generation dilemma.
Enhances output quality while preserving multimodality in trajectory planning.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Reinforcement learning constrains low-quality modes and explores superior trajectories
Scale-adaptive multiplicative noise promotes broad exploration for trajectory planning
Intra-anchor and inter-anchor GRPO manage advantage estimation across different driving intentions
🔎 Similar Papers
No similar papers found.
Jialv Zou
Jialv Zou
Huazhong University of Science and Technology
Computer VisionArtificial Intelligence
S
Shaoyu Chen
Horizon Robotics
B
Bencheng Liao
Institute of Artificial Intelligence, Huazhong University of Science & Technology
Z
Zhiyu Zheng
School of Computer Science, Wuhan University
Y
Yuehao Song
School of Electronic Information and Communications, Huazhong University of Science & Technology
Lefei Zhang
Lefei Zhang
School of Computer Science, Wuhan University
Pattern RecognitionMachine LearningImage ProcessingRemote Sensing
Q
Qian Zhang
Horizon Robotics
W
Wenyu Liu
School of Electronic Information and Communications, Huazhong University of Science & Technology
Xinggang Wang
Xinggang Wang
Professor, Huazhong University of Science and Technology
Artificial IntelligenceComputer VisionAutonomous DrivingObject DetectionObject Segmentation