RedVLA: Physical Red Teaming for Vision-Language-Action Models

📅 2026-04-24
📈 Citations: 0
Influential: 0
📄 PDF

career value

198K/year
🤖 AI Summary
This work addresses the critical challenge of physical safety in real-world deployments of vision-language-action (VLA) models, which are exposed to unpredictable and irreversible physical risks yet lack effective pre-deployment detection mechanisms. To bridge this gap, we propose RedVLA, the first red-teaming framework specifically designed for evaluating and exposing physical safety vulnerabilities in VLA systems. RedVLA employs a two-stage pipeline—risk scenario synthesis followed by risk amplification—to systematically induce and stably reproduce unsafe behaviors. Our approach integrates gradient-free optimization, trajectory-feature-guided risk factor embedding, and identification of critical interaction regions. We also release the generated safety-critical dataset and a lightweight defense module, SimpleVLA-Guard. Experiments across six state-of-the-art VLA models demonstrate RedVLA’s effectiveness, achieving up to 95.5% attack success within ten iterations and significantly enhancing model robustness through our mitigation strategy.

Technology Category

Application Category

📝 Abstract
The real-world deployment of Vision-Language-Action (VLA) models remains limited by the risk of unpredictable and irreversible physical harm. However, we currently lack effective mechanisms to proactively detect these physical safety risks before deployment. To address this gap, we propose \textbf{RedVLA}, the first red teaming framework for physical safety in VLA models. We systematically uncover unsafe behaviors through a two-stage process: (I) \textbf{Risk Scenario Synthesis} constructs a valid and task-feasible initial risk scene. Specifically, it identifies critical interaction regions from benign trajectories and positions the risk factor within these regions, aiming to entangle it with the VLA's execution flow and elicit a target unsafe behavior. (II) \textbf{Risk Amplification} ensures stable elicitation across heterogeneous models. It iteratively refines the risk factor state through gradient-free optimization guided by trajectory features. Experiments on six representative VLA models show that RedVLA uncovers diverse unsafe behaviors and achieves the ASR up to 95.5\% within 10 optimization iterations. To mitigate these risks, we further propose SimpleVLA-Guard, a lightweight safety guard built from RedVLA-generated data. Our data, assets, and code are available \href{https://redvla.github.io}{here}.
Problem

Research questions and friction points this paper is trying to address.

Vision-Language-Action models
physical safety
red teaming
risk detection
unsafe behaviors
Innovation

Methods, ideas, or system contributions that make the work stand out.

Red Teaming
Vision-Language-Action Models
Physical Safety
Risk Scenario Synthesis
Gradient-free Optimization
🔎 Similar Papers
Y
Yuhao Zhang
Institute for AI, Peking University; State Key Laboratory of General Artificial Intelligence, Peking University
Borong Zhang
Borong Zhang
University of Macau
Reinforcement learningRobotics
J
Jiaming Fan
Institute for AI, Peking University; State Key Laboratory of General Artificial Intelligence, Peking University
Jiachen Shen
Jiachen Shen
University of Science and Technology Beijing
Y
Yishuai Cai
Institute for AI, Peking University; State Key Laboratory of General Artificial Intelligence, Peking University
Yaodong Yang
Yaodong Yang
Boya (博雅) Assistant Professor at Peking University
Reinforcement LearningAI AlignmentEmbodied AI
J
Jiaming Ji
Institute for AI, Peking University; State Key Laboratory of General Artificial Intelligence, Peking University