🤖 AI Summary
This study systematically investigates whether the intermediate textual plans generated by chain-of-thought (CoT) reasoning in vision–language–action (VLA) models constitute a security vulnerability. By designing seven text manipulation methods—spanning three attack levels: blind noise, mechanical–semantic, and LLM-adaptive—the authors evaluate the impact of CoT corruption on task performance across 40 LIBERO tabletop tasks. The work reveals for the first time that substituting entity names reduces overall success rates by 8.3 percentage points, with goal-conditioned tasks suffering declines up to 19.3% (and as high as 45% in individual cases), whereas perturbations to word order or directional cues have negligible effects. This vulnerability is unique to reasoning-augmented VLAs, can bypass input validation defenses, and underscores the action decoder’s strong reliance on the integrity of entity references.
📝 Abstract
Recent Vision-Language-Action (VLA) models increasingly adopt chain-of-thought (CoT) reasoning, generating a natural-language plan before decoding motor commands. This internal text channel between the reasoning module and the action decoder has received no adversarial scrutiny. We ask: which properties of this intermediate plan does the action decoder actually rely on, and can targeted corruption of the reasoning trace alone -- with all inputs left intact -- degrade a robot's physical task performance? We design a taxonomy of seven text corruptions organized into three attacker tiers (blind noise, mechanical-semantic, and LLM-adaptive) and apply them to a state-of-the-art reasoning VLA across 40 LIBERO tabletop manipulation tasks. Our results reveal a striking asymmetry: substituting object names in the reasoning trace reduces overall success rate by 8.3~percentage points (pp) -- reaching $-$19.3~pp on goal-conditioned tasks and $-$45~pp on individual tasks -- whereas sentence reordering, spatial-direction reversal, token noise, and even a 70B-parameter LLM crafting plausible-but-wrong plans all have negligible impact (within $\pm$4~pp). This asymmetry indicates that the action decoder depends on entity-reference integrity rather than reasoning quality or sequential structure. Notably, a sophisticated LLM-based attacker underperforms simple mechanical object-name substitution, because preserving plausibility inadvertently retains the entity-grounding structure the decoder needs. A cross-architecture control using a non-reasoning VLA confirms the vulnerability is exclusive to reasoning-augmented models, while instruction-level attacks degrade both architectures -- establishing that the internal reasoning trace is a distinct and stealthy threat vector invisible to input-validation defenses.