MAPLE: Encoding Dexterous Robotic Manipulation Priors Learned From Egocentric Videos

📅 2025-04-08
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Addressing the challenges of modeling fine-grained hand-object interactions and weak sim-to-real transfer in dexterous robotic manipulation, this paper proposes learning transferable, fine-grained manipulation priors from large-scale first-person video data. We introduce the first explicit modeling of hand-object contact points alongside high-fidelity hand pose estimation, establishing a multimodal vision-driven joint contact-pose representation. Leveraging this prior, we design a reinforcement learning framework that enables efficient policy training and robust cross-domain transfer. Our approach achieves significant success-rate improvements across multiple simulation benchmarks and newly introduced high-difficulty tasks. Real-world evaluation on a physical dexterous hand platform demonstrates superior generalization and robustness over state-of-the-art methods. Key contributions are: (1) a novel transferable paradigm for joint contact-pose representation learning; and (2) a unified evaluation framework bridging simulation pretraining and real-world deployment.

Technology Category

Application Category

📝 Abstract
Large-scale egocentric video datasets capture diverse human activities across a wide range of scenarios, offering rich and detailed insights into how humans interact with objects, especially those that require fine-grained dexterous control. Such complex, dexterous skills with precise controls are crucial for many robotic manipulation tasks, yet are often insufficiently addressed by traditional data-driven approaches to robotic manipulation. To address this gap, we leverage manipulation priors learned from large-scale egocentric video datasets to improve policy learning for dexterous robotic manipulation tasks. We present MAPLE, a novel method for dexterous robotic manipulation that exploits rich manipulation priors to enable efficient policy learning and better performance on diverse, complex manipulation tasks. Specifically, we predict hand-object contact points and detailed hand poses at the moment of hand-object contact and use the learned features to train policies for downstream manipulation tasks. Experimental results demonstrate the effectiveness of MAPLE across existing simulation benchmarks, as well as a newly designed set of challenging simulation tasks, which require fine-grained object control and complex dexterous skills. The benefits of MAPLE are further highlighted in real-world experiments using a dexterous robotic hand, whereas simultaneous evaluation across both simulation and real-world experiments has remained underexplored in prior work.
Problem

Research questions and friction points this paper is trying to address.

Leveraging egocentric videos to improve dexterous robotic manipulation
Predicting hand-object contact points for better policy learning
Enhancing performance in complex, fine-grained manipulation tasks
Innovation

Methods, ideas, or system contributions that make the work stand out.

Leveraging egocentric videos for robotic manipulation priors
Predicting hand-object contact points and poses
Training policies with learned features for manipulation
🔎 Similar Papers
No similar papers found.
A
Alexey Gavryushin
ETH Zürich, Rämistrasse 101, 8092 Zürich, Switzerland
X
Xi Wang
ETH Zürich, Rämistrasse 101, 8092 Zürich, Switzerland
R
Robert J. S. Malate
ETH Zürich, Rämistrasse 101, 8092 Zürich, Switzerland; Mimic Robotics, Andreasstrasse 5, 8050 Zürich, Switzerland
C
Chenyu Yang
ETH Zürich, Rämistrasse 101, 8092 Zürich, Switzerland
X
Xiangyi Jia
ETH Zürich, Rämistrasse 101, 8092 Zürich, Switzerland
S
Shubh Goel
ETH Zürich, Rämistrasse 101, 8092 Zürich, Switzerland
Davide Liconti
Davide Liconti
PhD, Soft Robotics Lab, ETH Zurich
Robotic ManipulationImitation LearningDexterous Robotic Hands
R
Ren'e Zurbrugg
ETH Zürich, Rämistrasse 101, 8092 Zürich, Switzerland
Robert K. Katzschmann
Robert K. Katzschmann
ETH Zurich | ETH AI Center | Mimic Robotics
Soft RoboticsMusculoskeletal RoboticsBiohybrid RoboticsModelingMachine Learning
Marc Pollefeys
Marc Pollefeys
Professor of Computer Science, ETH Zurich, and Director Spatial AI Lab, Microsoft
Computer VisionComputer GraphicsRoboticsMachine LearningAugmented Reality