Model-Based Policy Adaptation for Closed-Loop End-to-End Autonomous Driving

📅 2025-11-26

📈 Citations: 0

✨ Influential: 0

career value

219K/year

🤖 AI Summary

End-to-end autonomous driving models exhibit strong performance in open-loop evaluation but suffer from poor generalization and safety deficiencies in closed-loop deployment due to error accumulation. To address this, we propose a model-based policy adaptation framework. Our method introduces: (1) a diffusion-model-driven policy adapter that generates geometrically consistent, diverse counterfactual trajectories during inference; and (2) a coupled multi-step Q-value evaluation mechanism that dynamically selects trajectories with high expected utility to enhance decision robustness. Crucially, our approach requires no online fine-tuning—policy optimization is achieved solely through inference-time guidance. Evaluated on the nuScenes benchmark and high-fidelity closed-loop simulation, our method significantly improves both in-distribution and out-of-distribution generalization, while reducing collision rates and trajectory deviation in safety-critical scenarios—demonstrating its effectiveness and practicality.

Technology Category

Application Category

📝 Abstract

End-to-end (E2E) autonomous driving models have demonstrated strong performance in open-loop evaluations but often suffer from cascading errors and poor generalization in closed-loop settings. To address this gap, we propose Model-based Policy Adaptation (MPA), a general framework that enhances the robustness and safety of pretrained E2E driving agents during deployment. MPA first generates diverse counterfactual trajectories using a geometry-consistent simulation engine, exposing the agent to scenarios beyond the original dataset. Based on this generated data, MPA trains a diffusion-based policy adapter to refine the base policy's predictions and a multi-step Q value model to evaluate long-term outcomes. At inference time, the adapter proposes multiple trajectory candidates, and the Q value model selects the one with the highest expected utility. Experiments on the nuScenes benchmark using a photorealistic closed-loop simulator demonstrate that MPA significantly improves performance across in-domain, out-of-domain, and safety-critical scenarios. We further investigate how the scale of counterfactual data and inference-time guidance strategies affect overall effectiveness.

Problem

Research questions and friction points this paper is trying to address.

Enhances robustness of end-to-end autonomous driving models

Addresses cascading errors in closed-loop driving scenarios

Improves generalization using counterfactual trajectory generation

Innovation

Methods, ideas, or system contributions that make the work stand out.

Generates counterfactual trajectories using geometry-consistent simulation

Trains diffusion-based policy adapter to refine predictions

Selects trajectories via multi-step Q value model

🔎 Similar Papers

No similar papers found.