🤖 AI Summary
Conventional closed-loop simulation relies heavily on rule-based reactive agents (e.g., IDM), failing to capture realistic multi-agent interactions and thereby overestimating planner performance and distorting comparative rankings.
Method: This work introduces SMART—a learned, reactive traffic agent—into the nuPlan platform for the first time, establishing a more realistic and challenging closed-loop evaluation environment. We systematically benchmark 14 state-of-the-art motion planners across complex multi-lane interactive scenarios.
Contribution/Results: Experiments reveal significant performance degradation for most planners under SMART, exposing the inadequacy of traditional simulators in assessing complex interaction capabilities. Learned planners achieve superior average performance but exhibit poor robustness in edge cases; rule-based planners maintain baseline stability. Crucially, this study quantifies the substantive impact of the simulation-to-reality gap on planner evaluation outcomes and establishes the first learning-based traffic agent benchmark for nuPlan—providing a more credible, rigorous, and realistic testing paradigm for autonomous driving planning evaluation.
📝 Abstract
Planner evaluation in closed-loop simulation often uses rule-based traffic agents, whose simplistic and passive behavior can hide planner deficiencies and bias rankings. Widely used IDM agents simply follow a lead vehicle and cannot react to vehicles in adjacent lanes, hindering tests of complex interaction capabilities. We address this issue by integrating the state-of-the-art learned traffic agent model SMART into nuPlan. Thus, we are the first to evaluate planners under more realistic conditions and quantify how conclusions shift when narrowing the sim-to-real gap. Our analysis covers 14 recent planners and established baselines and shows that IDM-based simulation overestimates planning performance: nearly all scores deteriorate. In contrast, many planners interact better than previously assumed and even improve in multi-lane, interaction-heavy scenarios like lane changes or turns. Methods trained in closed-loop demonstrate the best and most stable driving performance. However, when reaching their limits in augmented edge-case scenarios, all learned planners degrade abruptly, whereas rule-based planners maintain reasonable basic behavior. Based on our results, we suggest SMART-reactive simulation as a new standard closed-loop benchmark in nuPlan and release the SMART agents as a drop-in alternative to IDM at https://github.com/shgd95/InteractiveClosedLoop.