Model-agnostic Adversarial Attack and Defense for Vision-Language-Action Models

📅 2025-10-15

📈 Citations: 0

✨ Influential: 0

career value

216K/year

🤖 AI Summary

This work addresses the insufficient adversarial robustness of Vision-Language-Action (VLA) models in robotic tasks. We propose the first model-agnostic adversarial patch attack and defense framework. Methodologically, we introduce the Embedding Disruption Patch Attack (EDPA), a novel black-box attack that disrupts vision–language embedding alignment in latent space without requiring model priors or white-box access. Concurrently, we design a general defense mechanism based on adversarial fine-tuning to enhance the visual encoder’s robustness against such perturbations. Evaluated on the LIBERO simulation benchmark, EDPA significantly increases task failure rates across mainstream VLA models, while our defense effectively mitigates performance degradation. To the best of our knowledge, this is the first end-to-end, transferable, and white-box-free adversarial attack and defense framework for VLA models—establishing both theoretical foundations and practical solutions for the secure deployment of embodied intelligence.

Technology Category

Application Category

📝 Abstract

Vision-Language-Action (VLA) models have achieved revolutionary progress in robot learning, enabling robots to execute complex physical robot tasks from natural language instructions. Despite this progress, their adversarial robustness remains underexplored. In this work, we propose both adversarial patch attack and corresponding defense strategies for VLA models. We first introduce the Embedding Disruption Patch Attack (EDPA), a model-agnostic adversarial attack that generates patches directly placeable within the camera's view. In comparison to prior methods, EDPA can be readily applied to different VLA models without requiring prior knowledge of the model architecture, or the controlled robotic manipulator. EDPA constructs these patches by (i) disrupting the semantic alignment between visual and textual latent representations, and (ii) maximizing the discrepancy of latent representations between adversarial and corresponding clean visual inputs. Through the optimization of these objectives, EDPA distorts the VLA's interpretation of visual information, causing the model to repeatedly generate incorrect actions and ultimately result in failure to complete the given robotic task. To counter this, we propose an adversarial fine-tuning scheme for the visual encoder, in which the encoder is optimized to produce similar latent representations for both clean and adversarially perturbed visual inputs. Extensive evaluations on the widely recognized LIBERO robotic simulation benchmark demonstrate that EDPA substantially increases the task failure rate of cutting-edge VLA models, while our proposed defense effectively mitigates this degradation. The codebase is accessible via the homepage at https://edpa-attack.github.io/.

Problem

Research questions and friction points this paper is trying to address.

Addressing adversarial robustness in Vision-Language-Action models for robotics

Developing model-agnostic attack to disrupt semantic alignment in VLAs

Proposing defense strategies to mitigate adversarial degradation in VLAs

Innovation

Methods, ideas, or system contributions that make the work stand out.

Model-agnostic adversarial patch attack for VLA models

Disrupts semantic alignment between vision and language

Adversarial fine-tuning defense for visual encoder robustness

🔎 Similar Papers

Exploring Transferability of Multimodal Adversarial Samples for Vision-Language Pre-training Models with Contrastive Learning