SafeVLA: Towards Safety Alignment of Vision-Language-Action Model via Safe Reinforcement Learning

📅 2025-03-05
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address environmental, hardware, and human safety risks arising from deploying vision-language-action (VLA) models in real-world settings, this paper introduces the first safety alignment paradigm tailored for VLA models. Our method deeply integrates constrained reinforcement learning with generalizable safety boundary modeling, explicitly embedding multi-level physical safety constraints. It enables large-scale policy alignment via simulation and ensures safety generalization under out-of-distribution perturbations. Experiments demonstrate a 35× reduction in high-risk behavior incidence compared to state-of-the-art methods, an 83.58% improvement in simulated safety performance, and a 3.85% gain in task performance. Crucially, learned safety constraints exhibit strong transferability across unseen tasks and perturbation scenarios. To foster reproducible research, we open-source our safety-focused dataset, trained models, and a novel safety evaluation benchmark.

Technology Category

Application Category

📝 Abstract
Vision-language-action models (VLAs) have shown great potential as generalist robot policies. However, these models pose urgent safety challenges during deployment, including the risk of physical harm to the environment, the robot itself, and humans. How can safety be explicitly incorporated into VLAs? In this work, we propose SafeVLA, a novel algorithm designed to integrate safety into VLAs, ensuring the protection of the environment, robot hardware and humans in real-world settings. SafeVLA effectively balances safety and task performance by employing large-scale constrained learning within simulated environments. We demonstrate that SafeVLA outperforms the current state-of-the-art method in both safety and task performance, achieving average improvements of 83.58% and 3.85%, respectively, in simulation. By prioritizing safety, our approach eliminates high-risk behaviors and reduces the upper bound of unsafe behaviors to 1/35 of that in the current state-of-the-art, thereby significantly mitigating long-tail risks. Furthermore, the learned safety constraints generalize to diverse, unseen scenarios, including multiple out-of-distribution perturbations and tasks. Our data, models and newly proposed benchmark environment are available at https://sites.google.com/view/pku-safevla.
Problem

Research questions and friction points this paper is trying to address.

Integrates safety into Vision-Language-Action models (VLAs)
Balances safety and task performance using constrained learning
Generalizes safety constraints to diverse, unseen scenarios
Innovation

Methods, ideas, or system contributions that make the work stand out.

Integrates safety into Vision-Language-Action models
Employs large-scale constrained learning in simulations
Generalizes safety constraints to unseen scenarios
🔎 Similar Papers
No similar papers found.
Borong Zhang
Borong Zhang
University of Macau
Reinforcement learningRobotics
Y
Yuhao Zhang
Institute for AI, Peking University
J
Jiaming Ji
Institute for AI, Peking University
Y
Yingshan Lei
Institute for AI, Peking University
Josef Dai
Josef Dai
Zhejiang University
Alignment
Yuanpei Chen
Yuanpei Chen
South China University of Technology
Robotic
Y
Yaodong Yang
Institute for AI, Peking University