HERMES: Human-to-Robot Embodied Learning from Multi-Source Motion Data for Mobile Dexterous Manipulation

📅 2025-08-27
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenges of transferring multi-source human hand-motion data to dexterous manipulation policies for mobile dual-arm robots, and the limited environmental adaptability of existing approaches. We propose a unified reinforcement learning framework that integrates multi-source motion mapping, end-to-end deep vision-driven sim-to-real transfer, visual-closed-loop PnP pose estimation, and navigation foundation models. Crucially, our method enables joint optimization of navigation and dexterous manipulation—departing from conventional sequential pipelines. Evaluated in real-world complex environments, it successfully executes diverse tasks including grasping, assembly, and dynamic obstacle avoidance, demonstrating strong generalization and cross-environment robustness. The approach significantly improves both the efficiency and precision of policy transfer from simulation to physical deployment. By unifying perception, planning, and control within a scalable architecture, it establishes a practical pathway toward autonomous, fine-grained manipulation for embodied agents.

Technology Category

Application Category

📝 Abstract
Leveraging human motion data to impart robots with versatile manipulation skills has emerged as a promising paradigm in robotic manipulation. Nevertheless, translating multi-source human hand motions into feasible robot behaviors remains challenging, particularly for robots equipped with multi-fingered dexterous hands characterized by complex, high-dimensional action spaces. Moreover, existing approaches often struggle to produce policies capable of adapting to diverse environmental conditions. In this paper, we introduce HERMES, a human-to-robot learning framework for mobile bimanual dexterous manipulation. First, HERMES formulates a unified reinforcement learning approach capable of seamlessly transforming heterogeneous human hand motions from multiple sources into physically plausible robotic behaviors. Subsequently, to mitigate the sim2real gap, we devise an end-to-end, depth image-based sim2real transfer method for improved generalization to real-world scenarios. Furthermore, to enable autonomous operation in varied and unstructured environments, we augment the navigation foundation model with a closed-loop Perspective-n-Point (PnP) localization mechanism, ensuring precise alignment of visual goals and effectively bridging autonomous navigation and dexterous manipulation. Extensive experimental results demonstrate that HERMES consistently exhibits generalizable behaviors across diverse, in-the-wild scenarios, successfully performing numerous complex mobile bimanual dexterous manipulation tasks. Project Page:https:/gemcollector.github.io/HERMES/.
Problem

Research questions and friction points this paper is trying to address.

Translating multi-source human hand motions into feasible robot behaviors
Adapting manipulation policies to diverse environmental conditions
Bridging autonomous navigation and dexterous manipulation for mobile robots
Innovation

Methods, ideas, or system contributions that make the work stand out.

Unified reinforcement learning for motion translation
End-to-end depth image sim2real transfer
PnP-augmented navigation foundation model
🔎 Similar Papers
No similar papers found.