DiffusionDriveV2: Reinforcement Learning-Constrained Truncated Diffusion Modeling in End-to-End Autonomous Driving

📅 2025-12-08

📈 Citations: 0

✨ Influential: 0

career value

187K/year

🤖 AI Summary

To address mode collapse and behavioral homogenization in diffusion-based end-to-end autonomous driving, this paper proposes a diffusion framework integrating reinforcement learning (RL) constraints with multi-anchor trajectory modeling. Methodologically, we design intra- and inter-anchor truncated GRPO (Generalized Reinforcement Policy Optimization) to prevent erroneous cross-intent comparisons, leveraging multi-anchor advantage estimation. We further introduce scale-adaptive multiplicative noise, truncated diffusion modeling, and a Gaussian mixture prior, jointly optimized with RL objectives. Implemented with a ResNet-34 backbone for closed-loop control, our approach achieves state-of-the-art performance on NAVSIM v1 and v2, attaining 91.2 PDMS and 85.5 EPDMS, respectively. These results significantly improve the trade-off between trajectory diversity and output quality, outperforming prior diffusion-based methods in both metrics and behavioral fidelity.

Technology Category

Application Category

📝 Abstract

Generative diffusion models for end-to-end autonomous driving often suffer from mode collapse, tending to generate conservative and homogeneous behaviors. While DiffusionDrive employs predefined anchors representing different driving intentions to partition the action space and generate diverse trajectories, its reliance on imitation learning lacks sufficient constraints, resulting in a dilemma between diversity and consistent high quality. In this work, we propose DiffusionDriveV2, which leverages reinforcement learning to both constrain low-quality modes and explore for superior trajectories. This significantly enhances the overall output quality while preserving the inherent multimodality of its core Gaussian Mixture Model. First, we use scale-adaptive multiplicative noise, ideal for trajectory planning, to promote broad exploration. Second, we employ intra-anchor GRPO to manage advantage estimation among samples generated from a single anchor, and inter-anchor truncated GRPO to incorporate a global perspective across different anchors, preventing improper advantage comparisons between distinct intentions (e.g., turning vs. going straight), which can lead to further mode collapse. DiffusionDriveV2 achieves 91.2 PDMS on the NAVSIM v1 dataset and 85.5 EPDMS on the NAVSIM v2 dataset in closed-loop evaluation with an aligned ResNet-34 backbone, setting a new record. Further experiments validate that our approach resolves the dilemma between diversity and consistent high quality for truncated diffusion models, achieving the best trade-off. Code and model will be available at https://github.com/hustvl/DiffusionDriveV2

Problem

Research questions and friction points this paper is trying to address.

Addresses mode collapse in diffusion models for autonomous driving.

Resolves diversity versus consistent high-quality trajectory generation dilemma.

Enhances output quality while preserving multimodality in trajectory planning.

Innovation

Methods, ideas, or system contributions that make the work stand out.

Reinforcement learning constrains low-quality modes and explores superior trajectories

Scale-adaptive multiplicative noise promotes broad exploration for trajectory planning

Intra-anchor and inter-anchor GRPO manage advantage estimation across different driving intentions

🔎 Similar Papers

No similar papers found.