BadVLA: Towards Backdoor Attacks on Vision-Language-Action Models via Objective-Decoupled Optimization

📅 2025-05-22
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work identifies, for the first time, a novel backdoor threat against vision-language-action (VLA) models under the “training-as-a-service” paradigm. Due to their tightly coupled multimodal architecture, conventional adversarial attacks on VLA models are difficult to detect and mitigate. To address this, we propose BadVLA—a targeted decoupled optimization framework that jointly designs trigger feature isolation and conditional behavioral shift, enabling explicit feature-space separation and compatibility across multiple VLA benchmarks (e.g., RT-2, OpenVLA). Experiments demonstrate near-perfect attack success rates (≈100%) across diverse VLA benchmarks, with negligible degradation (<1%) in clean-task accuracy. Moreover, BadVLA exhibits strong robustness against input perturbations, task transfer, and fine-tuning. This work establishes the first systematic backdoor attack paradigm and evaluation benchmark for VLA model security.

Technology Category

Application Category

📝 Abstract
Vision-Language-Action (VLA) models have advanced robotic control by enabling end-to-end decision-making directly from multimodal inputs. However, their tightly coupled architectures expose novel security vulnerabilities. Unlike traditional adversarial perturbations, backdoor attacks represent a stealthier, persistent, and practically significant threat-particularly under the emerging Training-as-a-Service paradigm-but remain largely unexplored in the context of VLA models. To address this gap, we propose BadVLA, a backdoor attack method based on Objective-Decoupled Optimization, which for the first time exposes the backdoor vulnerabilities of VLA models. Specifically, it consists of a two-stage process: (1) explicit feature-space separation to isolate trigger representations from benign inputs, and (2) conditional control deviations that activate only in the presence of the trigger, while preserving clean-task performance. Empirical results on multiple VLA benchmarks demonstrate that BadVLA consistently achieves near-100% attack success rates with minimal impact on clean task accuracy. Further analyses confirm its robustness against common input perturbations, task transfers, and model fine-tuning, underscoring critical security vulnerabilities in current VLA deployments. Our work offers the first systematic investigation of backdoor vulnerabilities in VLA models, highlighting an urgent need for secure and trustworthy embodied model design practices. We have released the project page at https://badvla-project.github.io/.
Problem

Research questions and friction points this paper is trying to address.

Exposing backdoor vulnerabilities in Vision-Language-Action models
Proposing Objective-Decoupled Optimization for stealthy attacks
Ensuring high attack success with minimal clean-task impact
Innovation

Methods, ideas, or system contributions that make the work stand out.

Objective-Decoupled Optimization for backdoor attacks
Two-stage feature-space separation and conditional control
Robust against perturbations, transfers, and fine-tuning
🔎 Similar Papers
No similar papers found.
X
Xueyang Zhou
Huazhong University of Science and Technology
G
Guiyao Tie
Huazhong University of Science and Technology
Guowen Zhang
Guowen Zhang
The Hong Kong Polytechnic University
Computer Vision3D VisionAutonomous Driving
H
Hechang Wang
Huazhong University of Science and Technology
P
Pan Zhou
Huazhong University of Science and Technology
L
Lichao Sun
Lehigh University