Make Tracking Easy: Neural Motion Retargeting for Humanoid Whole-body Control

📅 2026-03-23
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses joint discontinuities and self-collisions in humanoid motion retargeting caused by non-convex optimization by proposing a Neural Motion Retargeting (NMR) framework that reframes the task from optimal control to learning the distribution of motion data. The approach introduces an end-to-end, dynamics-aware pipeline that innovatively integrates a Clustered-Expert Physics Refinement data generation pipeline with a non-autoregressive CNN-Transformer architecture. Hierarchical motion synthesis and noise suppression are achieved through VAE-based motion clustering and parallel reinforcement learning experts. Evaluated on the Unitree G1 robot, the method successfully transfers highly dynamic motions—such as martial arts and dance—while significantly mitigating geometric and physical conflicts and accelerating convergence in downstream whole-body control policy training.

Technology Category

Application Category

📝 Abstract
Humanoid robots require diverse motor skills to integrate into complex environments, but bridging the kinematic and dynamic embodiment gap from human data remains a major bottleneck. We demonstrate through Hessian analysis that traditional optimization-based retargeting is inherently non-convex and prone to local optima, leading to physical artifacts like joint jumps and self-penetration. To address this, we reformulate the targeting problem as learning data distribution rather than optimizing optimal solutions, where we propose NMR, a Neural Motion Retargeting framework that transforms static geometric mapping into a dynamics-aware learned process. We first propose Clustered-Expert Physics Refinement (CEPR), a hierarchical data pipeline that leverages VAE-based motion clustering to group heterogeneous movements into latent motifs. This strategy significantly reduces the computational overhead of massively parallel reinforcement learning experts, which project and repair noisy human demonstrations onto the robot's feasible motion manifold. The resulting high-fidelity data supervises a non-autoregressive CNN-Transformer architecture that reasons over global temporal context to suppress reconstruction noise and bypass geometric traps. Experiments on the Unitree G1 humanoid across diverse dynamic tasks (e.g., martial arts, dancing) show that NMR eliminates joint jumps and significantly reduces self-collisions compared to state-of-the-art baselines. Furthermore, NMR-generated references accelerate the convergence of downstream whole-body control policies, establishing a scalable path for bridging the human-robot embodiment gap.
Problem

Research questions and friction points this paper is trying to address.

humanoid robots
motion retargeting
embodiment gap
whole-body control
kinematic and dynamic discrepancy
Innovation

Methods, ideas, or system contributions that make the work stand out.

Neural Motion Retargeting
Clustered-Expert Physics Refinement
non-autoregressive CNN-Transformer
embodiment gap
whole-body control
🔎 Similar Papers
Q
Qingrui Zhao
Nanjing University
K
Kaiyue Yang
Nanjing University
X
Xiyu Wang
Nanjing University; Huawei Technologies
S
Shiqi Zhao
Nanjing University
Y
Yi Lu
Nanjing University
X
Xinfang Zhang
Huawei Technologies
Wei Yin
Wei Yin
Staff Research Scientist, Horizon Robotics
World ModelGenerative AIPhysical AI
Qiu Shen
Qiu Shen
Nanjing University
Xiao-Xiao Long
Xiao-Xiao Long
Associate Professor at Nanjing University; AnySyn3D
3D VisionGenerative AISpatial IntelligenceEmbodied AI
Xun Cao
Xun Cao
Nanjing University
Computational PhotographyComputational ImagingImage & Video Processing