🤖 AI Summary
This work addresses a critical gap in the robustness evaluation of end-to-end autonomous driving systems, which has predominantly focused on image-level perturbations while overlooking system-level deployment defects—such as inference latency, ego-vehicle state estimation errors, and camera stream failures—that cumulatively degrade closed-loop control performance. The paper introduces the first device-aware, closed-loop robustness benchmark for end-to-end autonomous driving, explicitly incorporating these deployment-relevant perturbations into its evaluation framework and proposing a realistic testing protocol grounded in real-world scenarios. Experiments on mainstream end-to-end models within a closed-loop simulation environment demonstrate that such system-level disturbances substantially impair driving performance, exposing key vulnerabilities undetectable by conventional image-perturbation-based evaluations. These findings establish a new direction toward building deployment-aware robust autonomous driving systems.
📝 Abstract
Robustness is a critical requirement for deploying autonomous driving systems in the real world. Existing robustness benchmarks for autonomous driving have made important progress in studying the effects of image-level corruptions, such as adverse weather or camera degradation, on perception modules and open-loop planning outputs. However, deployment can also involve system-level imperfections, such as inference latency and ego-state estimation errors, which remain less studied in closed-loop E2E-AD evaluation. These imperfections can accumulate through the feedback loop and destabilize control. In this work, we present Bench2Drive-Robust, to our knowledge the first device-centric robustness benchmark for closed-loop end-to-end autonomous driving under realistic deployment perturbations. We systematically evaluate deployment-oriented perturbations arising from three major sources: camera-stream failures (frame drop, partial observation), ego-state estimation errors (GPS noise, and speed or odometry errors), and compute-induced control delay (model inference delay). We evaluate representative end-to-end driving methods and analyze their robustness under different perturbation severities. Our results show that these deployment-related perturbations can substantially degrade closed-loop driving performance, revealing robustness challenges that are not fully captured by conventional image-level corruption evaluations. By establishing a closed-loop evaluation protocol and demonstrating the substantial impact of these deployment-oriented perturbations, Bench2Drive-Robust defines practical robustness problems for end-to-end autonomous driving and encourages further research on deployment-aware robust driving systems.