FreezeVLA: Action-Freezing Attacks against Vision-Language-Action Models

📅 2025-09-24

📈 Citations: 0

✨ Influential: 0

career value

233K/year

🤖 AI Summary

This work identifies and formalizes a novel adversarial vulnerability—“action freezing”—where adversarial images cause Vision-Language-Action (VLA) models to enter persistent response stagnation, ignoring subsequent linguistic instructions and rendering robots inactive during critical intervention phases. To address this, we propose the first systematic adversarial attack framework tailored for VLA models, featuring a minimax bilevel optimization method that generates adversarial images with high attack success rates and strong cross-instruction transferability. Extensive experiments across three state-of-the-art VLA models and four robotic benchmarks demonstrate an average attack success rate of 76.2%, confirming a substantial risk of operational paralysis in real-world deployments of multimodal embodied AI systems. Our work establishes both theoretical foundations and empirical evidence for security evaluation and robustness enhancement of VLA models.

Technology Category

Application Category

📝 Abstract

Vision-Language-Action (VLA) models are driving rapid progress in robotics by enabling agents to interpret multimodal inputs and execute complex, long-horizon tasks. However, their safety and robustness against adversarial attacks remain largely underexplored. In this work, we identify and formalize a critical adversarial vulnerability in which adversarial images can "freeze" VLA models and cause them to ignore subsequent instructions. This threat effectively disconnects the robot's digital mind from its physical actions, potentially inducing inaction during critical interventions. To systematically study this vulnerability, we propose FreezeVLA, a novel attack framework that generates and evaluates action-freezing attacks via min-max bi-level optimization. Experiments on three state-of-the-art VLA models and four robotic benchmarks show that FreezeVLA attains an average attack success rate of 76.2%, significantly outperforming existing methods. Moreover, adversarial images generated by FreezeVLA exhibit strong transferability, with a single image reliably inducing paralysis across diverse language prompts. Our findings expose a critical safety risk in VLA models and highlight the urgent need for robust defense mechanisms.

Problem

Research questions and friction points this paper is trying to address.

Identifies adversarial vulnerability freezing VLA models

Examines safety risks from image-induced instruction ignoring

Proposes attack framework to paralyze robotic actions

Innovation

Methods, ideas, or system contributions that make the work stand out.

Min-max bi-level optimization generates adversarial images

Action-freezing attacks cause VLA models to ignore instructions

Single adversarial image transfers across diverse language prompts

🔎 Similar Papers

ImgTrojan: Jailbreaking Vision-Language Models with ONE Image