🤖 AI Summary
Existing autonomous driving prediction evaluation relies excessively on scalar metrics such as ADE/FDE, failing to capture safety-critical behavioral discrepancies under multi-agent interaction and lacking systematic robustness testing across scene topology, map context, and agent spatial distribution. Method: We propose the first scenario-aware, comprehensive evaluation framework. It employs controlled-variable experiments to quantify proximity effects—revealing how close-range interactions critically degrade prediction accuracy—and leverages multidimensional real-world driving data to uncover failure modes masked by conventional metrics. Contribution/Results: Experiments demonstrate significant vulnerabilities of state-of-the-art models under high-density traffic and specific spatial configurations. Our framework fills a critical methodological gap in evaluating prediction models under complex interactive scenarios and provides a reproducible, benchmarking toolkit to enhance model safety and robustness.
📝 Abstract
Current evaluation methods for autonomous driving prediction models rely heavily on simplistic metrics such as Average Displacement Error (ADE) and Final Displacement Error (FDE). While these metrics offer basic performance assessments, they fail to capture the nuanced behavior of prediction modules under complex, interactive, and safety-critical driving scenarios. For instance, existing benchmarks do not distinguish the influence of nearby versus distant agents, nor systematically test model robustness across varying multi-agent interactions. This paper addresses this critical gap by proposing a novel testing framework that evaluates prediction performance under diverse scene structures, saying, map context, agent density and spatial distribution. Through extensive empirical analysis, we quantify the differential impact of agent proximity on target trajectory prediction and identify scenario-specific failure cases that are not exposed by traditional metrics. Our findings highlight key vulnerabilities in current state-of-the-art prediction models and demonstrate the importance of scenario-aware evaluation. The proposed framework lays the groundwork for rigorous, safety-driven prediction validation, contributing significantly to the identification of failure-prone corner cases and the development of robust, certifiable prediction systems for autonomous vehicles.