EMMA: Scaling Mobile Manipulation via Egocentric Human Data

📅 2025-09-04
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This paper addresses the challenge of expensive teleoperation data dependency in mobile robot imitation learning. We propose an end-to-end training paradigm that requires no teleoperated demonstrations from mobile robots. Methodologically, we introduce the first framework that jointly leverages human first-person vision–pose data and static-robot offline datasets, enabling cross-modal alignment and co-training to transfer full-body human motion into mobile robot policies. Our key contributions are: (1) eliminating reliance on mobile teleoperation data; (2) enabling generalization across diverse spatial layouts and unseen environments; and (3) achieving consistent performance gains with increasing scale of human demonstration data. Evaluated on three real-world navigation and manipulation tasks, our approach matches or surpasses the success rates of Mobile ALOHA—a teleoperation-based baseline—demonstrating strong efficacy, cross-environment generalizability, and scalability.

Technology Category

Application Category

📝 Abstract
Scaling mobile manipulation imitation learning is bottlenecked by expensive mobile robot teleoperation. We present Egocentric Mobile MAnipulation (EMMA), an end-to-end framework training mobile manipulation policies from human mobile manipulation data with static robot data, sidestepping mobile teleoperation. To accomplish this, we co-train human full-body motion data with static robot data. In our experiments across three real-world tasks, EMMA demonstrates comparable performance to baselines trained on teleoperated mobile robot data (Mobile ALOHA), achieving higher or equivalent task performance in full task success. We find that EMMA is able to generalize to new spatial configurations and scenes, and we observe positive performance scaling as we increase the hours of human data, opening new avenues for scalable robotic learning in real-world environments. Details of this project can be found at https://ego-moma.github.io/.
Problem

Research questions and friction points this paper is trying to address.

Scaling mobile manipulation imitation learning
Avoiding expensive mobile robot teleoperation
Training policies from human and static data
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses egocentric human motion data
Combines static robot data co-training
Eliminates need for mobile teleoperation
🔎 Similar Papers
No similar papers found.
L
Lawrence Y. Zhu
School of Interactive Computing, Georgia Institute of Technology, Atlanta, GA 30332
P
Pranav Kuppili
School of Interactive Computing, Georgia Institute of Technology, Atlanta, GA 30332
Ryan Punamiya
Ryan Punamiya
Georgia Institute of Technology
Robotics
P
Patcharapong Aphiwetsa
School of Interactive Computing, Georgia Institute of Technology, Atlanta, GA 30332
Dhruv Patel
Dhruv Patel
Postdoctoral fellow, Stanford
Computational SciencesInverse ProblemsScientific Machine LearningUncertainty Quantification
Simar Kareer
Simar Kareer
PhD Student, Georgia Tech
Computer VisionRobotics
Sehoon Ha
Sehoon Ha
Georgia Institute of Technology
roboticscomputer graphicsmachine learning
Danfei Xu
Danfei Xu
Assistant Professor at School of Interactive Computing
Robot LearningComputer Vision