🤖 AI Summary
Existing adversarial driving video generation methods predominantly rely on abstract trajectories or bird’s-eye-view (BEV) representations, lacking realistic sensor data and thus failing to effectively stress-test autonomous driving systems. To address this, we propose the first framework jointly optimizing for physical plausibility and high-fidelity sensor observations. Our method introduces a multi-round physics-aware trajectory refinement mechanism and a transferability-driven trajectory scoring function, integrating vehicle dynamics modeling, multi-agent interaction optimization, BEV-to-video cross-modal synthesis, and multi-view rendering. Evaluated on nuScenes, our framework successfully generates diverse adversarial scenarios—including cut-in and blind-spot intrusion—with physically coherent motion and photorealistic sensor outputs. It significantly increases collision rates of end-to-end models such as UniAD under black-box evaluation, while demonstrating strong cross-model transferability across diverse autonomous driving architectures.
📝 Abstract
Generating photorealistic driving videos has seen significant progress recently, but current methods largely focus on ordinary, non-adversarial scenarios. Meanwhile, efforts to generate adversarial driving scenarios often operate on abstract trajectory or BEV representations, falling short of delivering realistic sensor data that can truly stress-test autonomous driving (AD) systems. In this work, we introduce Challenger, a framework that produces physically plausible yet photorealistic adversarial driving videos. Generating such videos poses a fundamental challenge: it requires jointly optimizing over the space of traffic interactions and high-fidelity sensor observations. Challenger makes this affordable through two techniques: (1) a physics-aware multi-round trajectory refinement process that narrows down candidate adversarial maneuvers, and (2) a tailored trajectory scoring function that encourages realistic yet adversarial behavior while maintaining compatibility with downstream video synthesis. As tested on the nuScenes dataset, Challenger generates a diverse range of aggressive driving scenarios-including cut-ins, sudden lane changes, tailgating, and blind spot intrusions-and renders them into multiview photorealistic videos. Extensive evaluations show that these scenarios significantly increase the collision rate of state-of-the-art end-to-end AD models (UniAD, VAD, SparseDrive, and DiffusionDrive), and importantly, adversarial behaviors discovered for one model often transfer to others.