RLinf-VLA: A Unified and Efficient Framework for VLA+RL Training

📅 2025-10-08
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Current vision-language-action (VLA) models rely heavily on supervised fine-tuning (SFT), exhibiting poor generalization and sensitivity to distributional shifts; while reinforcement learning (RL) holds promise for VLA, it lacks a standardized evaluation framework. This paper introduces the first unified, RL-oriented research framework for VLA. We design a fine-grained, hybrid pipeline resource scheduler—enabling efficient co-scheduling of rendering, inference, and training for the first time. The framework provides seamless integration across diverse VLA architectures (e.g., OpenVLA), RL algorithms (e.g., PPO, GRPO), and benchmark environments (e.g., ManiSkill, LIBERO). Evaluated on 130 tasks from LIBERO, our approach achieves a mean success rate of 98.11%; on 25 ManiSkill tasks, it attains 97.66%. Crucially, real-world deployment on a Franka robot demonstrates substantially superior cross-scenario generalization compared to SFT-based baselines.

Technology Category

Application Category

📝 Abstract
Recent progress in vision and language foundation models has significantly advanced multimodal understanding, reasoning, and generation, inspiring a surge of interest in extending such capabilities to embodied settings through vision-language-action (VLA) models. Yet, most VLA models are still trained with supervised fine-tuning (SFT), which struggles to generalize under distribution shifts due to error accumulation. Reinforcement learning (RL) offers a promising alternative by directly optimizing task performance through interaction, but existing attempts remain fragmented and lack a unified platform for fair and systematic comparison across model architectures and algorithmic designs. To address this gap, we introduce RLinf-VLA, a unified and efficient framework for scalable RL training of VLA models. The system adopts a highly flexible resource allocation design that addresses the challenge of integrating rendering, training, and inference in RL+VLA training. In particular, for GPU-parallelized simulators, RLinf-VLA implements a novel hybrid fine-grained pipeline allocation mode, achieving a 1.61x-1.88x speedup in training. Through a unified interface, RLinf-VLA seamlessly supports diverse VLA architectures (e.g., OpenVLA, OpenVLA-OFT), multiple RL algorithms (e.g., PPO, GRPO), and various simulators (e.g., ManiSkill, LIBERO). In simulation, a unified model achieves 98.11% across 130 LIBERO tasks and 97.66% across 25 ManiSkill tasks. Beyond empirical performance, our study distills a set of best practices for applying RL to VLA training and sheds light on emerging patterns in this integration. Furthermore, we present preliminary deployment on a real-world Franka robot, where RL-trained policies exhibit stronger generalization than those trained with SFT. We envision RLinf-VLA as a foundation to accelerate and standardize research on embodied intelligence.
Problem

Research questions and friction points this paper is trying to address.

Addressing generalization issues in vision-language-action models under distribution shifts
Providing unified platform for fair RL algorithm and architecture comparisons
Overcoming fragmented approaches to reinforcement learning for VLA training
Innovation

Methods, ideas, or system contributions that make the work stand out.

Unified framework for scalable VLA reinforcement learning training
Hybrid fine-grained pipeline allocation for GPU-parallelized simulators
Seamless support for diverse architectures, algorithms, and simulators
🔎 Similar Papers
No similar papers found.
H
Hongzhi Zang
Tsinghua University
Mingjie Wei
Mingjie Wei
xidian university
3D HumanMotion generation3D human pose estimation
S
Si Xu
Infinigence AI
Yongji Wu
Yongji Wu
UC Berkeley
Machine Learning SystemsDatacenter Networks
Z
Zhen Guo
Infinigence AI
Yuanqing Wang
Yuanqing Wang
Materials Genome Institute, Shanghai University
CatalysisMaterials ScienceEnvironmental Science
H
Hao Lin
Infinigence AI
L
Liangzhi Shi
Tsinghua University
Y
Yuqing Xie
Tsinghua University
Z
Zhexuan Xu
Tsinghua University
Z
Zhihao Liu
Institute of Automation, Chinese Academy of Sciences
K
Kang Chen
Peking University
W
Wenhao Tang
Tsinghua University
Q
Quanlu Zhang
Infinigence AI
W
Weinan Zhang
Harbin Institute of Technology
C
Chao Yu
Tsinghua University
Y
Yu Wang
Tsinghua University