EMMA: Scaling Mobile Manipulation via Egocentric Human Data

📅 2025-09-04

📈 Citations: 0

✨ Influential: 0

career value

228K/year

🤖 AI Summary

This paper addresses the challenge of expensive teleoperation data dependency in mobile robot imitation learning. We propose an end-to-end training paradigm that requires no teleoperated demonstrations from mobile robots. Methodologically, we introduce the first framework that jointly leverages human first-person vision–pose data and static-robot offline datasets, enabling cross-modal alignment and co-training to transfer full-body human motion into mobile robot policies. Our key contributions are: (1) eliminating reliance on mobile teleoperation data; (2) enabling generalization across diverse spatial layouts and unseen environments; and (3) achieving consistent performance gains with increasing scale of human demonstration data. Evaluated on three real-world navigation and manipulation tasks, our approach matches or surpasses the success rates of Mobile ALOHA—a teleoperation-based baseline—demonstrating strong efficacy, cross-environment generalizability, and scalability.

Technology Category

Application Category

📝 Abstract

Scaling mobile manipulation imitation learning is bottlenecked by expensive mobile robot teleoperation. We present Egocentric Mobile MAnipulation (EMMA), an end-to-end framework training mobile manipulation policies from human mobile manipulation data with static robot data, sidestepping mobile teleoperation. To accomplish this, we co-train human full-body motion data with static robot data. In our experiments across three real-world tasks, EMMA demonstrates comparable performance to baselines trained on teleoperated mobile robot data (Mobile ALOHA), achieving higher or equivalent task performance in full task success. We find that EMMA is able to generalize to new spatial configurations and scenes, and we observe positive performance scaling as we increase the hours of human data, opening new avenues for scalable robotic learning in real-world environments. Details of this project can be found at https://ego-moma.github.io/.

Problem

Research questions and friction points this paper is trying to address.

Scaling mobile manipulation imitation learning

Avoiding expensive mobile robot teleoperation

Training policies from human and static data

Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses egocentric human motion data

Combines static robot data co-training

Eliminates need for mobile teleoperation

🔎 Similar Papers

M3Bench: Benchmarking Whole-body Motion Generation for Mobile Manipulation in 3D Scenes