AD-R1: Closed-Loop Reinforcement Learning for End-to-End Autonomous Driving with Impartial World Models

πŸ“… 2025-11-25
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
End-to-end autonomous driving suffers from insufficient safety risk prediction due to the inherent optimism bias of world models. To address this, we propose a β€œJust World Model” that employs causal modeling and counterfactual data synthesis to actively generate diverse collision and anomaly scenarios, thereby mitigating over-optimistic estimates of long-tail hazardous events. Integrated into a closed-loop reinforcement learning framework, it serves as an internal critic guiding policy optimization. Our approach unifies counterfactual reasoning, closed-loop policy updating, and end-to-end perception-decision joint modeling. Evaluated on a newly constructed risk-anticipation benchmark, our method significantly improves hazardous event prediction accuracy and reduces safety violations by 42.7% over baselines. It represents the first work to achieve robust policy learning driven by explicit world model bias correction.

Technology Category

Application Category

πŸ“ Abstract
End-to-end models for autonomous driving hold the promise of learning complex behaviors directly from sensor data, but face critical challenges in safety and handling long-tail events. Reinforcement Learning (RL) offers a promising path to overcome these limitations, yet its success in autonomous driving has been elusive. We identify a fundamental flaw hindering this progress: a deep seated optimistic bias in the world models used for RL. To address this, we introduce a framework for post-training policy refinement built around an Impartial World Model. Our primary contribution is to teach this model to be honest about danger. We achieve this with a novel data synthesis pipeline, Counterfactual Synthesis, which systematically generates a rich curriculum of plausible collisions and off-road events. This transforms the model from a passive scene completer into a veridical forecaster that remains faithful to the causal link between actions and outcomes. We then integrate this Impartial World Model into our closed-loop RL framework, where it serves as an internal critic. During refinement, the agent queries the critic to ``dream" of the outcomes for candidate actions. We demonstrate through extensive experiments, including on a new Risk Foreseeing Benchmark, that our model significantly outperforms baselines in predicting failures. Consequently, when used as a critic, it enables a substantial reduction in safety violations in challenging simulations, proving that teaching a model to dream of danger is a critical step towards building truly safe and intelligent autonomous agents.
Problem

Research questions and friction points this paper is trying to address.

Addressing optimistic bias in world models for autonomous driving
Teaching impartial world models to accurately forecast dangerous scenarios
Reducing safety violations through counterfactual data synthesis in RL
Innovation

Methods, ideas, or system contributions that make the work stand out.

Impartial World Model for honest danger prediction
Counterfactual Synthesis pipeline generates collision events
Closed-loop RL framework integrates model as critic
πŸ”Ž Similar Papers
No similar papers found.