🤖 AI Summary
This work addresses the challenges in autonomous driving safety evaluation—namely, the scarcity of safety-critical scenarios, the difficulty of obtaining supervised labels, and the oversimplification of traditional rule-based metrics that lack physical risk validation—by proposing an unsupervised anomaly detection framework based on a multi-agent Transformer. The method quantifies trajectory deviations via prediction residuals and introduces a dual evaluation mechanism that jointly assesses stability and physical plausibility, balanced through a novel max-residual aggregator. For the first time, it systematically validates the alignment between statistical anomalies and real-world risk. Evaluated on the NGSIM dataset, the approach identifies 388 anomalies missed by both Time-to-Collision and statistical baselines, which are clustered into four interpretable risk categories, offering actionable insights for simulation-based testing.
📝 Abstract
Identifying safety-critical scenarios is essential for autonomous driving, but the rarity of such events makes supervised labeling impractical. Traditional rule-based metrics like Time-to-Collision are too simplistic to capture complex interaction risks, and existing methods lack a systematic way to verify whether statistical anomalies truly reflect physical danger. To address this gap, we propose an unsupervised anomaly detection framework based on a multi-agent Transformer that models normal driving and measures deviations through prediction residuals. A dual evaluation scheme has been proposed to assess both detection stability and physical alignment: Stability is measured using standard ranking metrics in which Kendall Rank Correlation Coefficient captures rank agreement and Jaccard index captures the consistency of the top-K selected items; Physical alignment is assessed through correlations with established Surrogate Safety Measures (SSM). Experiments on the NGSIM dataset demonstrate our framework's effectiveness: We show that the maximum residual aggregator achieves the highest physical alignment while maintaining stability. Furthermore, our framework identifies 388 unique anomalies missed by Time-to-Collision and statistical baselines, capturing subtle multi-agent risks like reactive braking under lateral drift. The detected anomalies are further clustered into four interpretable risk types, offering actionable insights for simulation and testing.