🤖 AI Summary
To address the low reliability of driving agents and the degradation of evaluation signal-to-noise ratio due to unexpected behaviors (e.g., collisions) in autonomous driving simulation benchmarking, this paper proposes a high-reliability end-to-end simulated driving agent. Methodologically, it extends self-play reinforcement learning—novelly—to thousands of diverse scenarios under perceptual and control constraints, enabling behavior-controllable training. It further introduces behavior-constrained modeling and cross-distribution generalization optimization to support minute-scale fine-tuning. Trained from scratch on the Waymo Open Motion Dataset using only a single GPU, the agent achieves a 99.8% task completion rate on 10,000 unseen scenarios, with collision and off-road rates below 0.8%. Key contributions include: (i) the first large-scale self-play RL framework for driving simulation; (ii) behavior-aware, generalizable training under realistic constraints; and (iii) efficient, reproducible training and deployment. The model and full codebase are publicly released.
📝 Abstract
Simulation agents are essential for designing and testing systems that interact with humans, such as autonomous vehicles (AVs). These agents serve various purposes, from benchmarking AV performance to stress-testing the system's limits, but all use cases share a key requirement: reliability. A simulation agent should behave as intended by the designer, minimizing unintended actions like collisions that can compromise the signal-to-noise ratio of analyses. As a foundation for reliable sim agents, we propose scaling self-play to thousands of scenarios on the Waymo Open Motion Dataset under semi-realistic limits on human perception and control. Training from scratch on a single GPU, our agents nearly solve the full training set within a day. They generalize effectively to unseen test scenes, achieving a 99.8% goal completion rate with less than 0.8% combined collision and off-road incidents across 10,000 held-out scenarios. Beyond in-distribution generalization, our agents show partial robustness to out-of-distribution scenes and can be fine-tuned in minutes to reach near-perfect performance in those cases. Demonstrations of agent behaviors can be found at this link. We open-source both the pre-trained agents and the complete code base. Demonstrations of agent behaviors can be found at url{https://sites.google.com/view/reliable-sim-agents}.