When Robots Obey the Patch: Universal Transferable Patch Attacks on Vision-Language-Action Models

๐Ÿ“… 2025-11-26
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF

career value

219K/year
๐Ÿค– AI Summary
Vision-Language-Action (VLA) models are vulnerable to adversarial attacks, yet existing physical-world adversarial patches lack cross-model, cross-task, and cross-domain (simulation-to-real) generalizability and physical transferability. Method: We propose the first unified physical-world adversarial patch framework for VLA systemsโ€”requiring no knowledge of target model architecture or label information. It jointly optimizes robust feature learning, attention-driven hijacking, and semantic misalignment to achieve unsupervised, text-guided visual attention shifting and image-text semantic decoupling. Our approach integrates shared feature-space modeling, an โ„“โ‚ deviation prior, repulsive InfoNCE loss, and two-stage min-max robust enhancement. Contribution/Results: Evaluated across multiple VLA models (e.g., RT-2, OpenVLA), simulation environments, and real robotic arms, our method achieves high attack success rates and strong cross-model/cross-scenario transferability. It exposes critical safety vulnerabilities in physically deployed VLA systems and releases a reproducible benchmark.

Technology Category

Application Category

๐Ÿ“ Abstract
Vision-Language-Action (VLA) models are vulnerable to adversarial attacks, yet universal and transferable attacks remain underexplored, as most existing patches overfit to a single model and fail in black-box settings. To address this gap, we present a systematic study of universal, transferable adversarial patches against VLA-driven robots under unknown architectures, finetuned variants, and sim-to-real shifts. We introduce UPA-RFAS (Universal Patch Attack via Robust Feature, Attention, and Semantics), a unified framework that learns a single physical patch in a shared feature space while promoting cross-model transfer. UPA-RFAS combines (i) a feature-space objective with an $ell_1$ deviation prior and repulsive InfoNCE loss to induce transferable representation shifts, (ii) a robustness-augmented two-phase min-max procedure where an inner loop learns invisible sample-wise perturbations and an outer loop optimizes the universal patch against this hardened neighborhood, and (iii) two VLA-specific losses: Patch Attention Dominance to hijack text$ o$vision attention and Patch Semantic Misalignment to induce image-text mismatch without labels. Experiments across diverse VLA models, manipulation suites, and physical executions show that UPA-RFAS consistently transfers across models, tasks, and viewpoints, exposing a practical patch-based attack surface and establishing a strong baseline for future defenses.
Problem

Research questions and friction points this paper is trying to address.

Universal adversarial patches attack Vision-Language-Action models across architectures
Physical patches transfer across models, tasks, and sim-to-real settings
Framework hijacks attention and creates semantic mismatches without labels
Innovation

Methods, ideas, or system contributions that make the work stand out.

Universal patch attack via robust feature attention semantics
Two-phase min-max procedure with invisible perturbations
Patch attention dominance and semantic misalignment losses