UMIGen: A Unified Framework for Egocentric Point Cloud Generation and Cross-Embodiment Robotic Imitation Learning

πŸ“… 2025-11-12
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
Robotics imitation learning faces dual challenges: scarcity of high-quality 3D demonstration data and poor cross-morphology generalization. To address these, we propose UMIGenβ€”a unified framework featuring a custom-built handheld Cloud-UMI device that synchronously captures first-person point clouds and action trajectories without requiring SLAM. UMIGen introduces a field-of-view-aware generative mechanism to synthesize photorealistic point clouds under visibility constraints, and an action-observation alignment model enabling end-to-end cross-morphology policy learning. Unlike existing approaches, UMIGen supports zero-shot transfer, significantly improving both data collection efficiency and policy generalizability. Extensive experiments across simulation and real-world multi-task benchmarks demonstrate that UMIGen achieves an average 32.7% improvement in cross-morphology policy transfer success rates (e.g., from robotic arm to dexterous hand), validating its effectiveness and practicality.

Technology Category

Application Category

πŸ“ Abstract
Data-driven robotic learning faces an obvious dilemma: robust policies demand large-scale, high-quality demonstration data, yet collecting such data remains a major challenge owing to high operational costs, dependence on specialized hardware, and the limited spatial generalization capability of current methods. The Universal Manipulation Interface (UMI) relaxes the strict hardware requirements for data collection, but it is restricted to capturing only RGB images of a scene and omits the 3D geometric information on which many tasks rely. Inspired by DemoGen, we propose UMIGen, a unified framework that consists of two key components: (1) Cloud-UMI, a handheld data collection device that requires no visual SLAM and simultaneously records point cloud observation-action pairs; and (2) a visibility-aware optimization mechanism that extends the DemoGen pipeline to egocentric 3D observations by generating only points within the camera's field of view. These two components enable efficient data generation that aligns with real egocentric observations and can be directly transferred across different robot embodiments without any post-processing. Experiments in both simulated and real-world settings demonstrate that UMIGen supports strong cross-embodiment generalization and accelerates data collection in diverse manipulation tasks.
Problem

Research questions and friction points this paper is trying to address.

Robotic learning requires large demonstration data but collection is costly
Existing methods lack 3D geometric information needed for many tasks
Current approaches have limited spatial generalization across robot embodiments
Innovation

Methods, ideas, or system contributions that make the work stand out.

Handheld device collects point cloud observation-action pairs
Visibility-aware optimization generates field-of-view points
Framework enables cross-embodiment transfer without post-processing
πŸ”Ž Similar Papers
No similar papers found.
Y
Yan Huang
Shenzhen International Graduate School, Tsinghua University, Shenzhen 518055, China
Shoujie Li
Shoujie Li
Tsinghua University
Robot SensingGraspingEmbodied AI
X
Xingting Li
Shenzhen International Graduate School, Tsinghua University, Shenzhen 518055, China
Wenbo Ding
Wenbo Ding
UNIVERSITY AT BUFFALO
securityMachine Learning