🤖 AI Summary
Existing autonomous vehicle (AV) evaluation paradigms suffer from three key limitations: low safety in real-world testing, a trade-off between realism and efficiency in closed-loop simulation, and the neglect of cumulative error propagation in open-loop evaluation. To address these, we propose “pseudo-simulation”—a novel evaluation paradigm that leverages real-world driving data to synthesize diverse observations via 3D Gaussian splatting, and perturbs position, heading, and velocity to model plausible AV future states. We further introduce a behavior-matched proximity-weighted metric to quantify error recovery capability and mitigate causal confounding. Our approach achieves, for the first time, strong correlation between open-loop evaluation and closed-loop performance (R² = 0.8), surpassing the state-of-the-art by 0.1—without requiring interactive closed-loop simulation. We release the first public pseudo-simulation benchmark and leaderboard, along with open-source code and toolchain.
📝 Abstract
Existing evaluation paradigms for Autonomous Vehicles (AVs) face critical limitations. Real-world evaluation is often challenging due to safety concerns and a lack of reproducibility, whereas closed-loop simulation can face insufficient realism or high computational costs. Open-loop evaluation, while being efficient and data-driven, relies on metrics that generally overlook compounding errors. In this paper, we propose pseudo-simulation, a novel paradigm that addresses these limitations. Pseudo-simulation operates on real datasets, similar to open-loop evaluation, but augments them with synthetic observations generated prior to evaluation using 3D Gaussian Splatting. Our key idea is to approximate potential future states the AV might encounter by generating a diverse set of observations that vary in position, heading, and speed. Our method then assigns a higher importance to synthetic observations that best match the AV's likely behavior using a novel proximity-based weighting scheme. This enables evaluating error recovery and the mitigation of causal confusion, as in closed-loop benchmarks, without requiring sequential interactive simulation. We show that pseudo-simulation is better correlated with closed-loop simulations (R^2=0.8) than the best existing open-loop approach (R^2=0.7). We also establish a public leaderboard for the community to benchmark new methodologies with pseudo-simulation. Our code is available at https://github.com/autonomousvision/navsim.