Morphology-Consistent Humanoid Interaction through Robot-Centric Video Synthesis

πŸ“… 2026-03-20
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
This work addresses the spatial misalignment commonly encountered by humanoid robots in interactive tasks, which arises from discrepancies between human pose estimates and the robot’s own morphology, causing conventional retargeting methods to fail due to skeletal scale mismatches. To overcome this, the authors propose Dream2Act, a novel framework that achieves zero-shot, retargeting-free full-body interaction for the first time. By leveraging a robot-centric generative video model, the approach directly synthesizes native-compliant motions from third-person images, integrating high-fidelity pose extraction with a universal whole-body controller to produce physically feasible joint trajectories within the robot’s intrinsic coordinate system. Evaluated on the Unitree G1 platform across four locomotion-interaction tasks, the method attains an overall success rate of 37.5%, substantially outperforming traditional approaches (0%) and enabling reliable physical contact.

Technology Category

Application Category

πŸ“ Abstract
Equipping humanoid robots with versatile interaction skills typically requires either extensive policy training or explicit human-to-robot motion retargeting. However, learning-based policies face prohibitive data collection costs. Meanwhile, retargeting relies on human-centric pose estimation (e.g., SMPL), introducing a morphology gap. Skeletal scale mismatches result in severe spatial misalignments when mapped to robots, compromising interaction success. In this work, we propose Dream2Act, a robot-centric framework enabling zero-shot interaction through generative video synthesis. Given a third-person image of the robot and target object, our framework leverages video generation models to envision the robot completing the task with morphology-consistent motion. We employ a high-fidelity pose extraction system to recover physically feasible, robot-native joint trajectories from these synthesized dreams, subsequently executed via a general-purpose whole-body controller. Operating strictly within the robot-native coordinate space, Dream2Act avoids retargeting errors and eliminates task-specific policy training. We evaluate Dream2Act on the Unitree G1 across four whole-body mobile interaction tasks: ball kicking, sofa sitting, bag punching, and box hugging. Dream2Act achieves a 37.5% overall success rate, compared to 0% for conventional retargeting. While retargeting fails to establish correct physical contacts due to the morphology gap (with errors compounded during locomotion), Dream2Act maintains robot-consistent spatial alignment, enabling reliable contact formation and substantially higher task completion.
Problem

Research questions and friction points this paper is trying to address.

humanoid interaction
morphology gap
motion retargeting
robot-centric synthesis
spatial misalignment
Innovation

Methods, ideas, or system contributions that make the work stand out.

robot-centric video synthesis
morphology-consistent motion
zero-shot humanoid interaction
pose extraction
whole-body control
πŸ”Ž Similar Papers
No similar papers found.
W
Weisheng Xu
Hong Kong University of Science and Technology (Guangzhou)
J
Jian Li
Hong Kong University of Science and Technology (Guangzhou)
Yi Gu
Yi Gu
Nara Institute of Science and Technology
B
Bin Yang
Hong Kong University of Science and Technology (Guangzhou)
H
Haodong Chen
Harbin Institute of Technology, Shenzhen
Shuyi Lin
Shuyi Lin
Northeastern University
System
M
Mingqian Zhou
University of Cambridge
Jing Tan
Jing Tan
The Chinese University of Hong Kong
Immersive Scene Generation3D-Aware Generative AIVideo Understanding
Qiwei Wu
Qiwei Wu
Hong Kong University of Science and Technology, Guangzhou
Humanoid robotsLLMVisual-Tactile Perception
X
Xiangrui Jiang
Hong Kong University of Science and Technology (Guangzhou)
T
Taowen Wang
Hong Kong University of Science and Technology (Guangzhou)
J
Jiawen Wen
Hong Kong University of Science and Technology (Guangzhou)
Q
Qiwei Liang
Hong Kong University of Science and Technology (Guangzhou)
Jiaxi Zhang
Jiaxi Zhang
Peking University
Electronic Design Automation
Renjing Xu
Renjing Xu
HKUST(GZ)
Brain-inspired ComputingHumanoid Computing