TwinRL-VLA: Digital Twin-Driven Reinforcement Learning for Real-World Robotic Manipulation

📅 2026-02-09

📈 Citations: 0

✨ Influential: 0

career value

244K/year

🤖 AI Summary

This work proposes TwinRL, a novel framework addressing the performance bottlenecks of vision-language-action (VLA) models in real-world robotic manipulation, which stem from the high cost of expert demonstrations, inefficient exploration in online reinforcement learning, and limited workspace. TwinRL innovatively integrates high-fidelity digital twins with real-world collaborative reinforcement learning, enabling, for the first time, bidirectional transfer based on smartphone-reconstructed scenes. The approach enhances exploration efficiency and generalization through distribution-augmented supervised fine-tuning, guided sim-to-real exploration, failure-configuration-aware guidance, and human-in-the-loop interaction. Experiments demonstrate that TwinRL achieves near-perfect success rates on both in-distribution and out-of-distribution tasks, outperforming existing methods by over 30% in speed and completing four real-world robotic manipulation tasks in an average of just 20 minutes.

Technology Category

Application Category

📝 Abstract

Despite strong generalization capabilities, Vision-Language-Action (VLA) models remain constrained by the high cost of expert demonstrations and insufficient real-world interaction. While online reinforcement learning (RL) has shown promise in improving general foundation models, applying RL to VLA manipulation in real-world settings is still hindered by low exploration efficiency and a restricted exploration space. Through systematic real-world experiments, we observe that the effective exploration space of online RL is closely tied to the data distribution of supervised fine-tuning (SFT). Motivated by this observation, we propose TwinRL, a digital twin-real-world collaborative RL framework designed to scale and guide exploration for VLA models. First, a high-fidelity digital twin is efficiently reconstructed from smartphone-captured scenes, enabling realistic bidirectional transfer between real and simulated environments. During the SFT warm-up stage, we introduce an exploration space expansion strategy using digital twins to broaden the support of the data trajectory distribution. Building on this enhanced initialization, we propose a sim-to-real guided exploration strategy to further accelerate online RL. Specifically, TwinRL performs efficient and parallel online RL in the digital twin prior to deployment, effectively bridging the gap between offline and online training stages. Subsequently, we exploit efficient digital twin sampling to identify failure-prone yet informative configurations, which are used to guide targeted human-in-the-loop rollouts on the real robot. In our experiments, TwinRL approaches 100% success in both in-distribution regions covered by real-world demonstrations and out-of-distribution regions, delivering at least a 30% speedup over prior real-world RL methods and requiring only about 20 minutes on average across four tasks.

Problem

Research questions and friction points this paper is trying to address.

Vision-Language-Action

real-world robotic manipulation

reinforcement learning

exploration efficiency

digital twin

Innovation

Methods, ideas, or system contributions that make the work stand out.

Digital Twin

Reinforcement Learning

Vision-Language-Action Models

Sim-to-Real Transfer

Exploration Efficiency

🔎 Similar Papers

No similar papers found.