🤖 AI Summary
Autonomous driving policies are prone to covariate shift in closed-loop deployment, leading to error accumulation. To address this, we propose RoaD—a novel method that leverages the policy’s own execution in real closed-loop environments to autonomously generate high-quality trajectory data, and incorporates a lightweight expert-guided mechanism to enhance trajectory plausibility and safety. RoaD enables efficient closed-loop fine-tuning without reinforcement learning by unifying behavioral cloning, closed-loop rollout generation, and supervised fine-tuning—bypassing reliance on human demonstrations or strong modeling assumptions. Evaluated on the WOSAC benchmark, RoaD achieves state-of-the-art or competitive performance. In AlpaSim, it improves driving scores by 41% and reduces collision rates by 54%, demonstrating substantial gains in generalization and robustness.
📝 Abstract
Autonomous driving policies are typically trained via open-loop behavior cloning of human demonstrations. However, such policies suffer from covariate shift when deployed in closed loop, leading to compounding errors. We introduce Rollouts as Demonstrations (RoaD), a simple and efficient method to mitigate covariate shift by leveraging the policy's own closed-loop rollouts as additional training data. During rollout generation, RoaD incorporates expert guidance to bias trajectories toward high-quality behavior, producing informative yet realistic demonstrations for fine-tuning. This approach enables robust closed-loop adaptation with orders of magnitude less data than reinforcement learning, and avoids restrictive assumptions of prior closed-loop supervised fine-tuning (CL-SFT) methods, allowing broader applications domains including end-to-end driving. We demonstrate the effectiveness of RoaD on WOSAC, a large-scale traffic simulation benchmark, where it performs similar or better than the prior CL-SFT method; and in AlpaSim, a high-fidelity neural reconstruction-based simulator for end-to-end driving, where it improves driving score by 41% and reduces collisions by 54%.