AttackVLA: Benchmarking Adversarial and Backdoor Attacks on Vision-Language-Action Models

📅 2025-11-15

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

VLA models in embodied intelligence face novel security threats, yet existing attacks lack a unified evaluation framework, real-world validation, and precise control over long-horizon action sequences. To address these gaps, we propose AttackVLA—the first comprehensive VLA security evaluation framework covering data, training, and inference stages—and BackdoorVLA, the first backdoor attack specifically designed for long-horizon action sequences. Leveraging a unified action tokenizer and VLA-specific adversarial techniques, AttackVLA supports both simulation and real-robot deployment. Experiments demonstrate that BackdoorVLA achieves an average targeted attack success rate of 58.4% on physical robots, reaching 100% in several tasks—marking the first empirical confirmation of feasible, malicious long-sequence action execution against VLAs. This reveals a critical limitation of prior attacks: their frequent failure to achieve reliable, task-directed manipulation.

Technology Category

Application Category

📝 Abstract

Vision-Language-Action (VLA) models enable robots to interpret natural-language instructions and perform diverse tasks, yet their integration of perception, language, and control introduces new safety vulnerabilities. Despite growing interest in attacking such models, the effectiveness of existing techniques remains unclear due to the absence of a unified evaluation framework. One major issue is that differences in action tokenizers across VLA architectures hinder reproducibility and fair comparison. More importantly, most existing attacks have not been validated in real-world scenarios. To address these challenges, we propose AttackVLA, a unified framework that aligns with the VLA development lifecycle, covering data construction, model training, and inference. Within this framework, we implement a broad suite of attacks, including all existing attacks targeting VLAs and multiple adapted attacks originally developed for vision-language models, and evaluate them in both simulation and real-world settings. Our analysis of existing attacks reveals a critical gap: current methods tend to induce untargeted failures or static action states, leaving targeted attacks that drive VLAs to perform precise long-horizon action sequences largely unexplored. To fill this gap, we introduce BackdoorVLA, a targeted backdoor attack that compels a VLA to execute an attacker-specified long-horizon action sequence whenever a trigger is present. We evaluate BackdoorVLA in both simulated benchmarks and real-world robotic settings, achieving an average targeted success rate of 58.4% and reaching 100% on selected tasks. Our work provides a standardized framework for evaluating VLA vulnerabilities and demonstrates the potential for precise adversarial manipulation, motivating further research on securing VLA-based embodied systems.

Problem

Research questions and friction points this paper is trying to address.

Evaluating safety vulnerabilities in Vision-Language-Action models for robotics

Addressing lack of unified framework for comparing attack effectiveness across VLAs

Developing targeted attacks that manipulate long-horizon action sequences in VLAs

Innovation

Methods, ideas, or system contributions that make the work stand out.

Proposes unified AttackVLA framework for vulnerability assessment

Introduces BackdoorVLA for targeted long-horizon action manipulation

Validates attacks in both simulation and real-world robotic settings

🔎 Similar Papers

Manipulation Facing Threats: Evaluating Physical Vulnerabilities in End-to-End Vision Language Action Models

2024-09-20arXiv.orgCitations: 4

ImgTrojan: Jailbreaking Vision-Language Models with ONE Image

2024-03-05arXiv.orgCitations: 14

Authors to Follow