Comparative Analysis of Patch Attack on VLM-Based Autonomous Driving Architectures

📅 2026-03-09

📈 Citations: 0

✨ Influential: 0

career value

253K/year

🤖 AI Summary

This study addresses the critical gap in systematically evaluating the physical adversarial robustness of vision-language models (VLMs) in autonomous driving, a deficiency that poses significant safety risks. We present the first comprehensive evaluation framework, conducting physically realizable patch attacks on three prominent VLM architectures—Dolphins, OmniDrive, and LeapVAD—within the CARLA simulation environment. To ensure fair comparison, we employ black-box optimization combined with semantic homogenization strategies and introduce multi-frame consistency analysis. Our findings reveal starkly divergent vulnerability patterns across architectures under physical attack conditions, with all models exhibiting severe weaknesses: adversarial patches cause substantial degradation in critical object detection performance, persisting across multiple consecutive frames. These results underscore a profound lack of adversarial robustness in current VLM designs for safety-critical autonomous driving applications.

Technology Category

Application Category

📝 Abstract

Vision-language models are emerging for autonomous driving, yet their robustness to physical adversarial attacks remains unexplored. This paper presents a systematic framework for comparative adversarial evaluation across three VLM architectures: Dolphins, OmniDrive (Omni-L), and LeapVAD. Using black-box optimization with semantic homogenization for fair comparison, we evaluate physically realizable patch attacks in CARLA simulation. Results reveal severe vulnerabilities across all architectures, sustained multi-frame failures, and critical object detection degradation. Our analysis exposes distinct architectural vulnerability patterns, demonstrating that current VLM designs inadequately address adversarial threats in safety-critical autonomous driving applications.

Problem

Research questions and friction points this paper is trying to address.

vision-language models

autonomous driving

adversarial attacks

patch attacks

robustness

Innovation

Methods, ideas, or system contributions that make the work stand out.

patch attack

vision-language models

adversarial robustness