Building reliable sim driving agents by scaling self-play

📅 2025-02-20
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the low reliability of driving agents and the degradation of evaluation signal-to-noise ratio due to unexpected behaviors (e.g., collisions) in autonomous driving simulation benchmarking, this paper proposes a high-reliability end-to-end simulated driving agent. Methodologically, it extends self-play reinforcement learning—novelly—to thousands of diverse scenarios under perceptual and control constraints, enabling behavior-controllable training. It further introduces behavior-constrained modeling and cross-distribution generalization optimization to support minute-scale fine-tuning. Trained from scratch on the Waymo Open Motion Dataset using only a single GPU, the agent achieves a 99.8% task completion rate on 10,000 unseen scenarios, with collision and off-road rates below 0.8%. Key contributions include: (i) the first large-scale self-play RL framework for driving simulation; (ii) behavior-aware, generalizable training under realistic constraints; and (iii) efficient, reproducible training and deployment. The model and full codebase are publicly released.

Technology Category

Application Category

📝 Abstract
Simulation agents are essential for designing and testing systems that interact with humans, such as autonomous vehicles (AVs). These agents serve various purposes, from benchmarking AV performance to stress-testing the system's limits, but all use cases share a key requirement: reliability. A simulation agent should behave as intended by the designer, minimizing unintended actions like collisions that can compromise the signal-to-noise ratio of analyses. As a foundation for reliable sim agents, we propose scaling self-play to thousands of scenarios on the Waymo Open Motion Dataset under semi-realistic limits on human perception and control. Training from scratch on a single GPU, our agents nearly solve the full training set within a day. They generalize effectively to unseen test scenes, achieving a 99.8% goal completion rate with less than 0.8% combined collision and off-road incidents across 10,000 held-out scenarios. Beyond in-distribution generalization, our agents show partial robustness to out-of-distribution scenes and can be fine-tuned in minutes to reach near-perfect performance in those cases. Demonstrations of agent behaviors can be found at this link. We open-source both the pre-trained agents and the complete code base. Demonstrations of agent behaviors can be found at url{https://sites.google.com/view/reliable-sim-agents}.
Problem

Research questions and friction points this paper is trying to address.

Develop reliable simulation agents for autonomous vehicles.
Minimize unintended actions to ensure analysis accuracy.
Generalize effectively to unseen and out-of-distribution scenarios.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Scaling self-play in simulations
Training on Waymo Open Motion Dataset
Achieving high goal completion rates
🔎 Similar Papers