MAPLE: Encoding Dexterous Robotic Manipulation Priors Learned From Egocentric Videos

📅 2025-04-08

📈 Citations: 0

✨ Influential: 0

career value

226K/year

🤖 AI Summary

Addressing the challenges of modeling fine-grained hand-object interactions and weak sim-to-real transfer in dexterous robotic manipulation, this paper proposes learning transferable, fine-grained manipulation priors from large-scale first-person video data. We introduce the first explicit modeling of hand-object contact points alongside high-fidelity hand pose estimation, establishing a multimodal vision-driven joint contact-pose representation. Leveraging this prior, we design a reinforcement learning framework that enables efficient policy training and robust cross-domain transfer. Our approach achieves significant success-rate improvements across multiple simulation benchmarks and newly introduced high-difficulty tasks. Real-world evaluation on a physical dexterous hand platform demonstrates superior generalization and robustness over state-of-the-art methods. Key contributions are: (1) a novel transferable paradigm for joint contact-pose representation learning; and (2) a unified evaluation framework bridging simulation pretraining and real-world deployment.

Technology Category

Application Category

📝 Abstract

Large-scale egocentric video datasets capture diverse human activities across a wide range of scenarios, offering rich and detailed insights into how humans interact with objects, especially those that require fine-grained dexterous control. Such complex, dexterous skills with precise controls are crucial for many robotic manipulation tasks, yet are often insufficiently addressed by traditional data-driven approaches to robotic manipulation. To address this gap, we leverage manipulation priors learned from large-scale egocentric video datasets to improve policy learning for dexterous robotic manipulation tasks. We present MAPLE, a novel method for dexterous robotic manipulation that exploits rich manipulation priors to enable efficient policy learning and better performance on diverse, complex manipulation tasks. Specifically, we predict hand-object contact points and detailed hand poses at the moment of hand-object contact and use the learned features to train policies for downstream manipulation tasks. Experimental results demonstrate the effectiveness of MAPLE across existing simulation benchmarks, as well as a newly designed set of challenging simulation tasks, which require fine-grained object control and complex dexterous skills. The benefits of MAPLE are further highlighted in real-world experiments using a dexterous robotic hand, whereas simultaneous evaluation across both simulation and real-world experiments has remained underexplored in prior work.

Problem

Research questions and friction points this paper is trying to address.

Leveraging egocentric videos to improve dexterous robotic manipulation

Predicting hand-object contact points for better policy learning

Enhancing performance in complex, fine-grained manipulation tasks

Innovation

Methods, ideas, or system contributions that make the work stand out.

Leveraging egocentric videos for robotic manipulation priors

Predicting hand-object contact points and poses

Training policies with learned features for manipulation

🔎 Similar Papers

What Foundation Models can Bring for Robot Learning in Manipulation : A Survey