SABER: A Stealthy Agentic Black-Box Attack Framework for Vision-Language-Action Models

📅 2026-03-25

📈 Citations: 0

✨ Influential: 0

career value

220K/year

🤖 AI Summary

This work addresses the vulnerability of vision-language-action (VLA) models to subtle textual perturbations, which can induce erratic robotic behaviors. To this end, the authors propose SABER, a novel framework that introduces agent-driven black-box attacks into VLA red-teaming for the first time. Leveraging a ReAct agent trained via GRPO reinforcement learning, SABER automatically generates stealthy adversarial instructions at the character, token, and prompt levels under strict editing budgets. Evaluated on the LIBERO benchmark, SABER reduces task success rates by 20.6% across six prominent VLA models, increases action sequence length by 55%, and raises constraint violation rates by 33%, while simultaneously decreasing tool invocations by 21.1% and character edits by 54.7%. These results significantly outperform GPT-based baselines, demonstrating SABER’s effectiveness in delivering efficient, adaptive, and semantically coherent instruction-level attacks.

Technology Category

Application Category

📝 Abstract

Vision-language-action (VLA) models enable robots to follow natural-language instructions grounded in visual observations, but the instruction channel also introduces a critical vulnerability: small textual perturbations can alter downstream robot behavior. Systematic robustness evaluation therefore requires a black-box attacker that can generate minimal yet effective instruction edits across diverse VLA models. To this end, we present SABER, an agent-centric approach for automatically generating instruction-based adversarial attacks on VLA models under bounded edit budgets. SABER uses a GRPO-trained ReAct attacker to generate small, plausible adversarial instruction edits using character-, token-, and prompt-level tools under a bounded edit budget that induces targeted behavioral degradation, including task failure, unnecessarily long execution, and increased constraint violations. On the LIBERO benchmark across six state-of-the-art VLA models, SABER reduces task success by 20.6%, increases action-sequence length by 55%, and raises constraint violations by 33%, while requiring 21.1% fewer tool calls and 54.7% fewer character edits than strong GPT-based baselines. These results show that small, plausible instruction edits are sufficient to substantially degrade robot execution, and that an agentic black-box pipeline offers a practical, scalable, and adaptive approach for red-teaming robotic foundation models.

Problem

Research questions and friction points this paper is trying to address.

Vision-Language-Action Models

Adversarial Attack

Black-Box Attack

Instruction Perturbation

Robotic Robustness

Innovation

Methods, ideas, or system contributions that make the work stand out.

black-box attack

vision-language-action models

adversarial instruction editing