Evaluation as Evolution: Transforming Adversarial Diffusion into Closed-Loop Curricula for Autonomous Vehicles

📅 2026-04-07

📈 Citations: 0

✨ Influential: 0

career value

247K/year

🤖 AI Summary

This work addresses the limited robustness of autonomous driving policies trained on static datasets, which often overlook safety-critical tail events, and the inadequacy of existing open-loop adversarial evaluations that lack closed-loop feedback on failure cases. To overcome these limitations, the authors propose Evaluation as Evolution (E²), a novel framework that integrates adversarial diffusion mechanisms directly into the closed-loop training pipeline. Leveraging a reverse-time SDE prior, transport-regularized sparse control, and topological anchoring, E² efficiently generates realistic, high-dimensional traffic scenarios near the boundary of policy failure. Evaluated on nuScenes and nuPlan, the method increases collision failure discovery rates by 9.01% and 21.43%, respectively, while maintaining high realism and low invalidity. Fine-tuning policies with the generated scenarios substantially enhances their robustness.

Technology Category

Application Category

📝 Abstract

Autonomous vehicles in interactive traffic environments are often limited by the scarcity of safety-critical tail events in static datasets, which biases learned policies toward average-case behaviors and reduces robustness. Existing evaluation methods attempt to address this through adversarial stress testing, but are predominantly open-loop and post-hoc, making it difficult to incorporate discovered failures back into the training process. We introduce Evaluation as Evolution ($E^2$), a closed-loop framework that transforms adversarial generation from a static validation step into an adaptive evolutionary curriculum. Specifically, $E^2$ formulates adversarial scenario synthesis as transport-regularized sparse control over a learned reverse-time SDE prior. To make this high-dimensional generation tractable, we utilize topology-driven support selection to identify critical interacting agents, and introduce Topological Anchoring to stabilize the process. This approach enables the targeted discovery of failure cases while strictly constraining deviations from realistic data distributions. Empirically, $E^2$ improves collision failure discovery by 9.01% on the nuScenes dataset and up to 21.43% on the nuPlan dataset over the strongest baselines, while maintaining low invalidity and high realism. It further yields substantial robustness gains when the resulting boundary cases are recycled for closed-loop policy fine-tuning.

Problem

Research questions and friction points this paper is trying to address.

autonomous vehicles

adversarial evaluation

tail events

closed-loop learning

robustness

Innovation

Methods, ideas, or system contributions that make the work stand out.

adversarial diffusion

closed-loop curriculum

reverse-time SDE