TwinRL-VLA: Digital Twin-Driven Reinforcement Learning for Real-World Robotic Manipulation

📅 2026-02-09
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work proposes TwinRL, a novel framework addressing the performance bottlenecks of vision-language-action (VLA) models in real-world robotic manipulation, which stem from the high cost of expert demonstrations, inefficient exploration in online reinforcement learning, and limited workspace. TwinRL innovatively integrates high-fidelity digital twins with real-world collaborative reinforcement learning, enabling, for the first time, bidirectional transfer based on smartphone-reconstructed scenes. The approach enhances exploration efficiency and generalization through distribution-augmented supervised fine-tuning, guided sim-to-real exploration, failure-configuration-aware guidance, and human-in-the-loop interaction. Experiments demonstrate that TwinRL achieves near-perfect success rates on both in-distribution and out-of-distribution tasks, outperforming existing methods by over 30% in speed and completing four real-world robotic manipulation tasks in an average of just 20 minutes.

Technology Category

Application Category

📝 Abstract
Despite strong generalization capabilities, Vision-Language-Action (VLA) models remain constrained by the high cost of expert demonstrations and insufficient real-world interaction. While online reinforcement learning (RL) has shown promise in improving general foundation models, applying RL to VLA manipulation in real-world settings is still hindered by low exploration efficiency and a restricted exploration space. Through systematic real-world experiments, we observe that the effective exploration space of online RL is closely tied to the data distribution of supervised fine-tuning (SFT). Motivated by this observation, we propose TwinRL, a digital twin-real-world collaborative RL framework designed to scale and guide exploration for VLA models. First, a high-fidelity digital twin is efficiently reconstructed from smartphone-captured scenes, enabling realistic bidirectional transfer between real and simulated environments. During the SFT warm-up stage, we introduce an exploration space expansion strategy using digital twins to broaden the support of the data trajectory distribution. Building on this enhanced initialization, we propose a sim-to-real guided exploration strategy to further accelerate online RL. Specifically, TwinRL performs efficient and parallel online RL in the digital twin prior to deployment, effectively bridging the gap between offline and online training stages. Subsequently, we exploit efficient digital twin sampling to identify failure-prone yet informative configurations, which are used to guide targeted human-in-the-loop rollouts on the real robot. In our experiments, TwinRL approaches 100% success in both in-distribution regions covered by real-world demonstrations and out-of-distribution regions, delivering at least a 30% speedup over prior real-world RL methods and requiring only about 20 minutes on average across four tasks.
Problem

Research questions and friction points this paper is trying to address.

Vision-Language-Action
real-world robotic manipulation
reinforcement learning
exploration efficiency
digital twin
Innovation

Methods, ideas, or system contributions that make the work stand out.

Digital Twin
Reinforcement Learning
Vision-Language-Action Models
Sim-to-Real Transfer
Exploration Efficiency
🔎 Similar Papers
No similar papers found.
Q
Qinwen Xu
State Key Laboratory of Multimedia Information Processing, School of Computer Science, Peking University
J
Jiaming Liu
State Key Laboratory of Multimedia Information Processing, School of Computer Science, Peking University
Rui Zhou
Rui Zhou
Swinburne University of Technology, Australia
DatabaseData MiningAlgorithms
S
Shaojun Shi
State Key Laboratory of Multimedia Information Processing, School of Computer Science, Peking University
N
Nuowei Han
State Key Laboratory of Multimedia Information Processing, School of Computer Science, Peking University
Zhuoyang Liu
Zhuoyang Liu
Peking University
Embodied AIComputer Vision
Chenyang Gu
Chenyang Gu
Undergraduate, Peking University
Embodied AIRobotic Manipulation
S
Shuo Gu
Simplexity Robotics
Y
Yang Yue
Tsinghua University
G
Gao Huang
Tsinghua University
Wenzhao Zheng
Wenzhao Zheng
EECS, University of California, Berkeley
Large ModelsEmbodied AgentsAutonomous Driving
Sirui Han
Sirui Han
The Hong Kong University of Science and Technology
Large Language ModelInterdisciplinary Artificial Intelligence
P
Peng Jia
Simplexity Robotics
Shanghang Zhang
Shanghang Zhang
Peking University
Embodied AIFoundation Models