RoboPaint: From Human Demonstration to Any Robot and Any View

📅 2026-02-05

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

This work addresses the scarcity of large-scale, high-fidelity robotic demonstration data that limits the scalability of vision–language–action (VLA) models for dexterous manipulation. To overcome this challenge, the authors propose a Real-Sim-Real pipeline that leverages multimodal human demonstrations—comprising RGB-D video, glove-based joint angles, and tactile signals—and introduces a tactile-aware retargeting method. This method combines geometric and force-guided optimization to efficiently map human hand motions onto arbitrary dexterous hands. The resulting trajectories are used to generate cross-robot, multi-view, high-fidelity simulation data in Isaac Sim, eliminating the need for real-world teleoperation. A Pi0.5 VLA policy trained on this synthetic data achieves an average success rate of 80% across three representative tasks, while the retargeted trajectories attain an 84% success rate across ten dexterous manipulation tasks, demonstrating the approach’s effectiveness and strong generalization capability.

Technology Category

Application Category

📝 Abstract

Acquiring large-scale, high-fidelity robot demonstration data remains a critical bottleneck for scaling Vision-Language-Action (VLA) models in dexterous manipulation. We propose a Real-Sim-Real data collection and data editing pipeline that transforms human demonstrations into robot-executable, environment-specific training data without direct robot teleoperation. Standardized data collection rooms are built to capture multimodal human demonstrations (synchronized 3 RGB-D videos, 11 RGB videos, 29-DoF glove joint angles, and 14-channel tactile signals). Based on these human demonstrations, we introduce a tactile-aware retargeting method that maps human hand states to robot dex-hand states via geometry and force-guided optimization. Then the retargeted robot trajectories are rendered in a photorealistic Isaac Sim environment to build robot training data. Real world experiments have demonstrated: (1) The retargeted dex-hand trajectories achieve an 84\% success rate across 10 diverse object manipulation tasks. (2) VLA policies (Pi0.5) trained exclusively on our generated data achieve 80\% average success rate on three representative tasks, i.e., pick-and-place, pushing and pouring. To conclude, robot training data can be efficiently"painted"from human demonstrations using our real-sim-real data pipeline. We offer a scalable, cost-effective alternative to teleoperation with minimal performance loss for complex dexterous manipulation.

Problem

Research questions and friction points this paper is trying to address.

robot demonstration data

Vision-Language-Action models

dexterous manipulation

data bottleneck

teleoperation

Innovation

Methods, ideas, or system contributions that make the work stand out.

tactile-aware retargeting

Real-Sim-Real pipeline

Vision-Language-Action models

dexterous manipulation

human-to-robot demonstration transfer

🔎 Similar Papers

No similar papers found.

Authors to Follow