RLinf-Co: Reinforcement Learning-Based Sim-Real Co-Training for VLA Models

📅 2026-02-13
📈 Citations: 0
Influential: 0
📄 PDF

Technology Category

Application Category

📝 Abstract
Simulation offers a scalable and low-cost way to enrich vision-language-action (VLA) training, reducing reliance on expensive real-robot demonstrations. However, most sim-real co-training methods rely on supervised fine-tuning (SFT), which treats simulation as a static source of demonstrations and does not exploit large-scale closed-loop interaction. Consequently, real-world gains and generalization are often limited. In this paper, we propose an \underline{\textit{RL}}-based sim-real \underline{\textit{Co}}-training \modify{(RL-Co)} framework that leverages interactive simulation while preserving real-world capabilities. Our method follows a generic two-stage design: we first warm-start the policy with SFT on a mixture of real and simulated demonstrations, then fine-tune it with reinforcement learning in simulation while adding an auxiliary supervised loss on real-world data to anchor the policy and mitigate catastrophic forgetting. We evaluate our framework on four real-world tabletop manipulation tasks using two representative VLA architectures, OpenVLA and $\pi_{0.5}$, and observe consistent improvements over real-only fine-tuning and SFT-based co-training, including +24% real-world success on OpenVLA and +20% on $\pi_{0.5}$. Beyond higher success rates, RL co-training yields stronger generalization to unseen task variations and substantially improved real-world data efficiency, providing a practical and scalable pathway for leveraging simulation to enhance real-robot deployment.
Problem

Research questions and friction points this paper is trying to address.

sim-real co-training
vision-language-action models
reinforcement learning
generalization
robotic manipulation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Reinforcement Learning
Sim-Real Co-Training
Vision-Language-Action Models
Catastrophic Forgetting Mitigation
Data Efficiency
🔎 Similar Papers
No similar papers found.
L
Liangzhi Shi
Tsinghua University, Shanghai AI Laboratory
S
Shuaihang Chen
Harbin Institute of Technology, Zhongguancun Academy
Feng Gao
Feng Gao
Tsinghua University
Reinforcement LearningRobot Learning
Y
Yinuo Chen
Tsinghua University
Kang Chen
Kang Chen
Peking University
Event CameraSpike CameraGreen Infrastructure
T
Tonghe Zhang
Carnegie Mellon University
Hongzhi Zhang
Hongzhi Zhang
Professor of Computer Science and Technology, Harbin Institute of Technology
Deep LearningArtificial IntelligenceComputer Vision
W
Weinan Zhang
Harbin Institute of Technology
C
Chao Yu
Tsinghua University
Y
Yu Wang
Tsinghua University