RoaD: Rollouts as Demonstrations for Closed-Loop Supervised Fine-Tuning of Autonomous Driving Policies

📅 2025-12-01

📈 Citations: 0

✨ Influential: 0

career value

221K/year

🤖 AI Summary

Autonomous driving policies are prone to covariate shift in closed-loop deployment, leading to error accumulation. To address this, we propose RoaD—a novel method that leverages the policy’s own execution in real closed-loop environments to autonomously generate high-quality trajectory data, and incorporates a lightweight expert-guided mechanism to enhance trajectory plausibility and safety. RoaD enables efficient closed-loop fine-tuning without reinforcement learning by unifying behavioral cloning, closed-loop rollout generation, and supervised fine-tuning—bypassing reliance on human demonstrations or strong modeling assumptions. Evaluated on the WOSAC benchmark, RoaD achieves state-of-the-art or competitive performance. In AlpaSim, it improves driving scores by 41% and reduces collision rates by 54%, demonstrating substantial gains in generalization and robustness.

Technology Category

Application Category

📝 Abstract

Autonomous driving policies are typically trained via open-loop behavior cloning of human demonstrations. However, such policies suffer from covariate shift when deployed in closed loop, leading to compounding errors. We introduce Rollouts as Demonstrations (RoaD), a simple and efficient method to mitigate covariate shift by leveraging the policy's own closed-loop rollouts as additional training data. During rollout generation, RoaD incorporates expert guidance to bias trajectories toward high-quality behavior, producing informative yet realistic demonstrations for fine-tuning. This approach enables robust closed-loop adaptation with orders of magnitude less data than reinforcement learning, and avoids restrictive assumptions of prior closed-loop supervised fine-tuning (CL-SFT) methods, allowing broader applications domains including end-to-end driving. We demonstrate the effectiveness of RoaD on WOSAC, a large-scale traffic simulation benchmark, where it performs similar or better than the prior CL-SFT method; and in AlpaSim, a high-fidelity neural reconstruction-based simulator for end-to-end driving, where it improves driving score by 41% and reduces collisions by 54%.

Problem

Research questions and friction points this paper is trying to address.

Mitigates covariate shift in autonomous driving policies

Uses policy rollouts as demonstrations for fine-tuning

Enables robust closed-loop adaptation with minimal data

Innovation

Methods, ideas, or system contributions that make the work stand out.

Closed-loop rollouts as training data

Expert guidance for high-quality trajectories

Efficient adaptation with less data than reinforcement learning

🔎 Similar Papers

Autonomous Algorithm for Training Autonomous Vehicles with Minimal Human Intervention