Tex3D: Objects as Attack Surfaces via Adversarial 3D Textures for Vision-Language-Action Models

📅 2026-04-02
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing vision-language-action (VLA) models lack robustness against physically realizable 3D adversarial attacks in robotic manipulation. This work proposes Tex3D, the first framework enabling end-to-end differentiable optimization of 3D adversarial textures within VLA systems. By integrating foreground-background decoupling, vertex-based parameterization, dual-renderer alignment, and trajectory-aware optimization, Tex3D generates physically plausible adversarial perturbations effective across multi-view and long-horizon tasks. Experiments demonstrate that Tex3D induces failure rates as high as 96.7% across diverse manipulation tasks in both simulation and real-world robotic settings, revealing a critical vulnerability of VLA systems to 3D physical adversarial attacks.
📝 Abstract
Vision-language-action (VLA) models have shown strong performance in robotic manipulation, yet their robustness to physically realizable adversarial attacks remains underexplored. Existing studies reveal vulnerabilities through language perturbations and 2D visual attacks, but these attack surfaces are either less representative of real deployment or limited in physical realism. In contrast, adversarial 3D textures pose a more physically plausible and damaging threat, as they are naturally attached to manipulated objects and are easier to deploy in physical environments. Bringing adversarial 3D textures to VLA systems is nevertheless nontrivial. A central obstacle is that standard 3D simulators do not provide a differentiable optimization path from the VLA objective function back to object appearance, making it difficult to optimize through an end-to-end manner. To address this, we introduce Foreground-Background Decoupling (FBD), which enables differentiable texture optimization through dual-renderer alignment while preserving the original simulation environment. To further ensure that the attack remains effective across long-horizon and diverse viewpoints in the physical world, we propose Trajectory-Aware Adversarial Optimization (TAAO), which prioritizes behaviorally critical frames and stabilizes optimization with a vertex-based parameterization. Built on these designs, we present Tex3D, the first framework for end-to-end optimization of 3D adversarial textures directly within the VLA simulation environment. Experiments in both simulation and real-robot settings show that Tex3D significantly degrades VLA performance across multiple manipulation tasks, achieving task failure rates of up to 96.7\%. Our empirical results expose critical vulnerabilities of VLA systems to physically grounded 3D adversarial attacks and highlight the need for robustness-aware training.
Problem

Research questions and friction points this paper is trying to address.

adversarial 3D textures
vision-language-action models
physical adversarial attacks
robotic manipulation
robustness
Innovation

Methods, ideas, or system contributions that make the work stand out.

adversarial 3D textures
vision-language-action models
differentiable rendering
trajectory-aware optimization
physical adversarial attacks
🔎 Similar Papers
No similar papers found.
J
Jiawei Chen
East China Normal University
S
Simin Huang
East China Normal University
Jiawei Du
Jiawei Du
National Taiwan University; ex-Intern @ Samsung Research
Speech processingNeural codingGenerative AIAI security
S
Shuaihang Chen
Zhongguancun Academy
Y
Yu Tian
Tsinghua University
Mingjie Wei
Mingjie Wei
xidian university
3D HumanMotion generation3D human pose estimation
C
Chao Yu
Tsinghua University
Z
Zhaoxia Yin
East China Normal University