End-to-End Dexterous Arm-Hand VLA Policies via Shared Autonomy: VR Teleoperation Augmented by Autonomous Hand VLA Policy for Efficient Data Collection

📅 2025-10-31
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the low efficiency of high-quality data collection, high cost of teleoperation, and poor generalization of autonomous planning in dexterous manipulation for general-purpose robots, this paper proposes a shared autonomy framework. It decouples macro-scale arm motion from micro-scale hand manipulation: VR-based teleoperation guides arm trajectory generation, while an autonomous DexGrasp-VLA policy enables real-time closed-loop hand control. An arm-hand feature enhancement module explicitly models cross-limb coordination, and a human-in-the-loop corrective teleoperation mechanism supports iterative policy refinement. The approach efficiently generates high-fidelity, coordinated arm-hand demonstration data with minimal human effort. Evaluated on diverse dexterous manipulation tasks—including previously unseen objects—the end-to-end policy achieves over 90% success rate, demonstrating substantial improvements in generalization capability and practical deployability.

Technology Category

Application Category

📝 Abstract
Achieving human-like dexterous manipulation remains a major challenge for general-purpose robots. While Vision-Language-Action (VLA) models show potential in learning skills from demonstrations, their scalability is limited by scarce high-quality training data. Existing data collection methods face inherent constraints: manual teleoperation overloads human operators, while automated planning often produces unnatural motions. We propose a Shared Autonomy framework that divides control between macro and micro motions. A human operator guides the robot's arm pose through intuitive VR teleoperation, while an autonomous DexGrasp-VLA policy handles fine-grained hand control using real-time tactile and visual feedback. This division significantly reduces cognitive load and enables efficient collection of high-quality coordinated arm-hand demonstrations. Using this data, we train an end-to-end VLA policy enhanced with our novel Arm-Hand Feature Enhancement module, which captures both distinct and shared representations of macro and micro movements for more natural coordination. Our Corrective Teleoperation system enables continuous policy improvement through human-in-the-loop failure recovery. Experiments demonstrate that our framework generates high-quality data with minimal manpower and achieves a 90% success rate across diverse objects, including unseen instances. Comprehensive evaluations validate the system's effectiveness in developing dexterous manipulation capabilities.
Problem

Research questions and friction points this paper is trying to address.

Developing human-like dexterous manipulation for general-purpose robots
Overcoming limited training data scalability for Vision-Language-Action models
Addressing unnatural motion constraints in existing data collection methods
Innovation

Methods, ideas, or system contributions that make the work stand out.

Shared Autonomy framework divides macro and micro motion control
Autonomous DexGrasp-VLA policy handles fine-grained hand movements
Arm-Hand Feature Enhancement module captures distinct and shared representations
🔎 Similar Papers
No similar papers found.
Y
Yu Cui
ByteDance Seed
Y
Yujian Zhang
ByteDance Seed
L
Lina Tao
ByteDance Seed
Y
Yang Li
ByteDance Seed
X
Xinyu Yi
ByteDance Seed
Zhibin Li
Zhibin Li
Professor in School of Transportation, Southeast University
Intelligent Transportation SystemTraffic ControlTraffic SafetyTraffic FlowData Mining