🤖 AI Summary
This study addresses the critical robustness challenges faced by end-to-end autonomous driving systems under black-box adversarial attacks. It presents the first systematic closed-loop evaluation in the CARLA simulation environment of realistic physical-digital hybrid attacks—including acoustic-induced blur, electromagnetic interference, and digital ghost objects—demonstrating that such attacks can degrade the driving performance of state-of-the-art agents like Transfuser and Interfuser by up to 99%. To mitigate this vulnerability, the authors propose AD², a lightweight adversarial detection model that leverages multi-camera inputs and a spatiotemporal attention mechanism to efficiently identify anomalies by modeling the spatiotemporal consistency of perceptual inputs. Experimental results show that AD² outperforms existing methods in both detection accuracy and computational efficiency.
📝 Abstract
End-to-end autonomous driving systems have achieved significant progress, yet their adversarial robustness remains largely underexplored. In this work, we conduct a closed-loop evaluation of state-of-the-art autonomous driving agents under black-box adversarial threat models in CARLA. Specifically, we consider three representative attack vectors on the visual perception pipeline: (i) a physics-based blur attack induced by acoustic waves, (ii) an electromagnetic interference attack that distorts captured images, and (iii) a digital attack that adds ghost objects as carefully crafted bounded perturbations on images. Our experiments on two advanced agents, Transfuser and Interfuser, reveal severe vulnerabilities to such attacks, with driving scores dropping by up to 99% in the worst case, raising valid safety concerns. To help mitigate such threats, we further propose a lightweight Attack Detection model for Autonomous Driving systems (AD$^2$) based on attention mechanisms that capture spatial-temporal consistency. Comprehensive experiments across multi-camera inputs on CARLA show that our detector achieves superior detection capability and computational efficiency compared to existing approaches.