When Robots Obey the Patch: Universal Transferable Patch Attacks on Vision-Language-Action Models

๐Ÿ“… 2025-11-26
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
Vision-Language-Action (VLA) models are vulnerable to adversarial attacks, yet existing physical-world adversarial patches lack cross-model, cross-task, and cross-domain (simulation-to-real) generalizability and physical transferability. Method: We propose the first unified physical-world adversarial patch framework for VLA systemsโ€”requiring no knowledge of target model architecture or label information. It jointly optimizes robust feature learning, attention-driven hijacking, and semantic misalignment to achieve unsupervised, text-guided visual attention shifting and image-text semantic decoupling. Our approach integrates shared feature-space modeling, an โ„“โ‚ deviation prior, repulsive InfoNCE loss, and two-stage min-max robust enhancement. Contribution/Results: Evaluated across multiple VLA models (e.g., RT-2, OpenVLA), simulation environments, and real robotic arms, our method achieves high attack success rates and strong cross-model/cross-scenario transferability. It exposes critical safety vulnerabilities in physically deployed VLA systems and releases a reproducible benchmark.

Technology Category

Application Category

๐Ÿ“ Abstract
Vision-Language-Action (VLA) models are vulnerable to adversarial attacks, yet universal and transferable attacks remain underexplored, as most existing patches overfit to a single model and fail in black-box settings. To address this gap, we present a systematic study of universal, transferable adversarial patches against VLA-driven robots under unknown architectures, finetuned variants, and sim-to-real shifts. We introduce UPA-RFAS (Universal Patch Attack via Robust Feature, Attention, and Semantics), a unified framework that learns a single physical patch in a shared feature space while promoting cross-model transfer. UPA-RFAS combines (i) a feature-space objective with an $ell_1$ deviation prior and repulsive InfoNCE loss to induce transferable representation shifts, (ii) a robustness-augmented two-phase min-max procedure where an inner loop learns invisible sample-wise perturbations and an outer loop optimizes the universal patch against this hardened neighborhood, and (iii) two VLA-specific losses: Patch Attention Dominance to hijack text$ o$vision attention and Patch Semantic Misalignment to induce image-text mismatch without labels. Experiments across diverse VLA models, manipulation suites, and physical executions show that UPA-RFAS consistently transfers across models, tasks, and viewpoints, exposing a practical patch-based attack surface and establishing a strong baseline for future defenses.
Problem

Research questions and friction points this paper is trying to address.

Universal adversarial patches attack Vision-Language-Action models across architectures
Physical patches transfer across models, tasks, and sim-to-real settings
Framework hijacks attention and creates semantic mismatches without labels
Innovation

Methods, ideas, or system contributions that make the work stand out.

Universal patch attack via robust feature attention semantics
Two-phase min-max procedure with invisible perturbations
Patch attention dominance and semantic misalignment losses
Hui Lu
Hui Lu
Department of Computer Science and Engineering (CSE), the University of Texas at Arlington (UTA)
Cloud ComputingVirtualizationFile and Storage SystemsComputer NetworksComputer Systems
Y
Yi Yu
Nanyang Technological University
Y
Yiming Yang
Nanyang Technological University
C
Chenyu Yi
Nanyang Technological University
Q
Qixin Zhang
Nanyang Technological University
B
Bingquan Shen
Nanyang Technological University
A
Alex C. Kot
Nanyang Technological University
X
Xudong Jiang
Nanyang Technological University