๐ค AI Summary
Vision-Language-Action (VLA) models are vulnerable to adversarial attacks, yet existing physical-world adversarial patches lack cross-model, cross-task, and cross-domain (simulation-to-real) generalizability and physical transferability.
Method: We propose the first unified physical-world adversarial patch framework for VLA systemsโrequiring no knowledge of target model architecture or label information. It jointly optimizes robust feature learning, attention-driven hijacking, and semantic misalignment to achieve unsupervised, text-guided visual attention shifting and image-text semantic decoupling. Our approach integrates shared feature-space modeling, an โโ deviation prior, repulsive InfoNCE loss, and two-stage min-max robust enhancement.
Contribution/Results: Evaluated across multiple VLA models (e.g., RT-2, OpenVLA), simulation environments, and real robotic arms, our method achieves high attack success rates and strong cross-model/cross-scenario transferability. It exposes critical safety vulnerabilities in physically deployed VLA systems and releases a reproducible benchmark.
๐ Abstract
Vision-Language-Action (VLA) models are vulnerable to adversarial attacks, yet universal and transferable attacks remain underexplored, as most existing patches overfit to a single model and fail in black-box settings. To address this gap, we present a systematic study of universal, transferable adversarial patches against VLA-driven robots under unknown architectures, finetuned variants, and sim-to-real shifts. We introduce UPA-RFAS (Universal Patch Attack via Robust Feature, Attention, and Semantics), a unified framework that learns a single physical patch in a shared feature space while promoting cross-model transfer. UPA-RFAS combines (i) a feature-space objective with an $ell_1$ deviation prior and repulsive InfoNCE loss to induce transferable representation shifts, (ii) a robustness-augmented two-phase min-max procedure where an inner loop learns invisible sample-wise perturbations and an outer loop optimizes the universal patch against this hardened neighborhood, and (iii) two VLA-specific losses: Patch Attention Dominance to hijack text$ o$vision attention and Patch Semantic Misalignment to induce image-text mismatch without labels. Experiments across diverse VLA models, manipulation suites, and physical executions show that UPA-RFAS consistently transfers across models, tasks, and viewpoints, exposing a practical patch-based attack surface and establishing a strong baseline for future defenses.